Debates are raging on social media regarding explainable AI in healthcare. Geoffrey Hinton, one of the ‘godfathers of AI’ recently tweeted – “Suppose you have cancer and you have to choose between a black box AI surgeon that cannot explain how it works but has a 90% cure rate and a human surgeon with an 80% cure rate.
Do you want the AI surgeon to be illegal?”. [JT1] As you can imagine twitterverse took sides. One camp argued that healthcare AI should be explainable and other camp argued that we should not sacrifice usefulness for explainability. There are different approaches to explainability. In linear models we could use the weight for each variable to determine how much it contributes to the prediction. In medical image classification we could use Eli5, LIME, and SHAP to explain the predictions. This adds another layer of computing complexity and in turns requires more computing resources and time.
Why can’t we combine good accuracy and explainability in the same software? When faced with a clinical problem, we decided to combine the best of both worlds. More than 50 percent of women over the age of 50 have thyroid nodules. Because of increased use of imaging modalities, we are detecting more of these thyroid nodules. But, only about 5 to 10 percent of these nodules are cancerous. At present, the only way to identify whether there is cancer in these nodules are by invasive procedures like surgery and needle biopsy. We employed artificial intelligence to create a model that will help physicians to choose the right nodule for biopsy. In our study published in Thyroid journal[JT2] , we showed that by employing this model (AIBx), unnecessary biopsies could be reduced by more than 50 percent. The probability that a nodule is actually benign when the model predicts it to be benign, namely the negative predictive value of AIBx was 93.2 percent. AIBx finds similar images to the test image and displays those images along with their actual diagnosis. Physician reviews these similar images and the corresponding diagnosis to make the final decision. Our model was created to enhance the physician’s ability to choose the right nodules for biopsies rather than replace the physician. Every step of this process needs physician’s input and hence we used the term Physician in Loop (PIL). Latest version of AIBx also overlays heat maps over the test image to show areas of interest that resulted in the prediction.
By combining image similarity and heat maps (class activation maps) we made it an explainable model. This in turn increases physician’s trust in the model. In conclusion, using an explainable artificial intelligence model helps to increase the trust in the model’s predictions. A physician refers a patient to a surgeon based on his or her trust in the surgeon. Similarly, AI algorithms that increase trust in their prediction will be preferentially used than black box algorithms, this is my answer to Geoffrey Hinton’s question.
For more information about our research, please visit https://www.thyroidbx.com/
To read our clinical research article in Thyroid journal, please visit https://www.liebertpub.com/doi/abs/10.1089/thy.2019.0752