eanm-logo eanm-logo
European Nuclear Medicine Guide
eanm-logo eanm-logo
European Nuclear Medicine Guide
Chapter 20

Artificial Intelligence, Machine Deep Learning and Radiomics

For years, artificial intelligence (AI) has been regarded as a highly promising field in the future of medicine. Once considered an abstract concept confined to computer science research and digital mega-corporations, artificial intelligence has rapidly evolved into a key driver of innovation in healthcare.

 

Between 2018 and 2023, at least 86 randomized clinical trials have evaluated AI-guided clinical procedures, hundreds of AI-based medical products have received regulatory approval, and tens of thousands of patients have undergone AI-assisted medical interventions. Among the fields most profoundly impacted by AI, medical imaging stands out as a major beneficiary of this technological revolution. However, nuclear medicine has lagged behind other imaging disciplines in fully embracing AI. Several factors have contributed to this slower integration, including the relatively small size of nuclear medicine datasets compared to other imaging modalities, the complexity of multimodal data fusion and the biology underlying molecular imaging, and the need for standardization across imaging protocols and radiotracer use. Additionally, regulatory hurdles, clinical acceptance, and the necessity of explainability in AI-driven decision-making have posed significant challenges.

 

Despite these obstacles, AI holds immense potential to transform nuclear medicine by enhancing image interpretation, improving diagnostic accuracy, and enabling more personalized treatment planning. As research in this field advances, overcoming these barriers will be crucial to unlocking AI’s full capabilities in nuclear medicine.

 

A primer to understand AI approaches

Over the last decade and ongoing, we are observing a fast progress in the development of AI, which is defined as an independent learning of a computational entity based on available information of any kind.  Such amount of information obtained from patients or pre-clinical studies is too complex, vast and heterogeneous to be comprehensively interpretable by humans without any technological support [1]. With the dawn of AI, which encompasses the concepts of Machine Learning (ML) and Deep Learning (DL), a specific type of ML, the supportive analysis of high-dimensional patient-specific information could enable clinicians to improve their diagnostic, prognostic, and therapeutic decisions.

 

Numerous AI architectures have been developed and are used to classify, impute, predict, and cluster datasets based on so-called features. Such features can include relevant patient specific information such as medical traits and clinical measurements like blood test parameters, smart watch sensory data, conventional imaging, or hybrid imaging data, such as SPECT/CT or PET/MRI, which open the door to multi-parametric assessment of diseases [2]. One term that is used frequently is Radiomics, which refers to the high throughput extraction of quantitative features from images to build diagnostic or predictive models through ML or DL [3]. The main difference between traditional ML and DL approaches lies in how features are obtained. In ML, features are predetermined. In the case imaging data are used for traditional ML, this process is performed by extracting handcrafted (or engineered) features and selecting them through ML algorithms or manual pre-selection or labelling, often identified by a domain expert. As a result, traditional ML approaches integrate prior knowledge, therefore avoiding the complex and computationally-expensive task of determining potentially important features from a vast space of information. However, the selection of features may introduce bias and limit the approach to existing knowledge. In contrast, DL aims to directly associate patterns within (imaging) data with a given prediction target. This approach can lead to complex and robust models with less bias and dependence on prior expert knowledge [4], but is also more dependent on large amounts of high quality training data.

 

The technical implementation of an AI algorithm utilizing features for prediction is called a model. The underlying mathematical concepts of such a model try to find combinations of linear and non-linear decision boundaries or patterns in a given set of information to separate individual data points such as patients or diseases. The model either classifies or clusters similar objects together into distinct groups.

 

An AI analysis is supervised if the training label (e.g., disease, treatment type, patient outcome, etc.) is provided to the algorithm, e.g., in case of labelled data. The supervised learning algorithms differ in the way they compute the decision boundary, which maps an input data point to a specific class based on the formerly learned pattern. An overview of these approaches can be found in [5]. The most common state-of-the-art supervised learning algorithms are Random Forest, Support Vector Machine, K-Nearest Neighbour, Logistic Regression, and Boosting methods, such as Gradient Boosting, XGBoost, or AdaBoost and most of these have been used for radiomics [6].

 

Unsupervised learning is applied to learn from unlabelled data in which each data point only consists of the features and the true labels are unknown. This is mainly utilized by clustering methods, which try to find common patterns to group the data points or patients into clusters based on their feature similarities. These clustering methods are different according to the group determination and the similarity measure. The most commonly used clustering approaches include hierarchical clustering and k-means clustering [7]. After clustering patients into groups, it is possible to use these newly identified groups as labels to be able to apply the formerly mentioned supervised approaches. Further methods belonging to the domain of unsupervised learning include dimensionality reduction techniques such as principal component analysis (PCA), factor analysis, uniform manifold approximation and projection for dimension reduction (UMAP), as well as t-distributed stochastic neighbour embedding (t-SNE) [8]. These techniques are less frequently exploited in radiomic studies. Deep learning-based approaches to unsupervised learning include autoencoders and generative adversarial networks, which can be used for dimensionality reduction, denoising, standardization and data augmentation [9–11].

 

Transferring AI approaches to nuclear medicine

Despite being quantitative by nature, nuclear medicine images are, in most clinical publications, clinical trials and obviously routine clinical practice, exploited in a very restrictive manner (i.e., analysed mostly visually or semi quantitatively) [12]. Where radiologists or nuclear medicine physicians mostly rely on recognition of a handful of semantic features, e.g. to detect and describe tumours, thousands of agnostic features can potentially be extracted from medical images [13], including some not even visible to an expertly trained eye [14]. This complexity within medical images extending beyond the scope of the human brain is amenable to an analysis by ML and DL approaches that will reveal the additional information that the images may hold. The field of Radiomics has progressed from a direct selection of predefined features that can be used alone or in combination as inputs into ML classifiers, to obtaining indirect learned features without a priori definition using DL data-driven methodology [15].

 

The Pitfalls of AI in Nuclear Medicine

Data management

Good Clinical Practice (GCP) as well as Good Laboratory Practice (GLP) guidelines define how standardized clinical processes as well as high-level medical research shall be conducted to achieve high-quality, trustable, and reusable data [16]. Nevertheless, the understanding of how and to which extent such guidelines have been followed by particular research groups is difficult to maintain. While high-impact journals require reporting certain aspects of processing steps in line with the above guidelines, the publication of research data represents different practices. Many journals do not make data publication mandatory, however, even if data are mandatory to be published, there is no guarantee that it was properly peer-reviewed [17]. This phenomenon renders most AI-related medical studies challenging to reproduce by other research groups on their own datasets. Current recommendations are being produced to make data FAIR (i.e., findable, accessible, interoperable, and reusable) in such investigations [18, 87]. 

 

Properties of imaging data

 

Typical properties of imaging data further complicate their successful analysis with AI. Due to the fact that various imaging as well as clinical protocols change over time, even within a single centre, a retrospective patient cohort may represent missing, inhomogeneous or unstructured data records. Clinicians who supply the data for AI analysis are therefore often required to delete incomplete cases from the collected database, which may dramatically reduce the amount of exploitable data to an insufficient level for training complex AI algorithms [19]. Furthermore, data in the field are often imbalanced, as e.g. subtypes of diseases or adverse outcome events are typically not presented with the same occurrence in the given patient population [20]. Imbalanced data are one of the main reasons why AI-established predictive models may result in poor performance over a minority disease subgroup [21], especially if adequate imbalance management approaches were not applied [22]. The imbalanced nature of data subgroups is particularly true for tumours, where hybrid imaging plays a prominent role in the detection and characterization phases [23]. Data may vary between centres and over time as well, e.g. due to different metabolic processes of the human body presented in PET images, especially if patients have undergone different treatments prior to imaging [24].

 

Multi-centric data

 

Multi-centric data are generally hard to access and process in a normalized way. First, there is a certain element of reluctancy present in most clinicians and some image scientists to share data in general. Second, local hospital rules and sharing processes may appear overcomplicated and time-consuming, which delay successful research built on multi-centric collaborations. And last, even if the willingness to share is present and the data went through local anonymization processes, imaging data may still reveal certain characteristics of individuals [25]. All these factors together, especially in light of otherwise highly-appreciated proceedings of the general data protection regulation,  appear to challenge the establishment of a publicly available multi-centric imaging dataset, which could boost AI-related research [2]. Despite some databases providing small multi-centric imaging data, such as the TCIA (https://www.cancerimagingarchive.net/), the lack of multi-centric data is generally considered one of the major reasons that only few AI solutions have been integrated into clinical routine practice [26].

 

Evaluation

 

Existing AI solutions that are applied in the field of functional as well as hybrid imaging research are either radiomics or DL based with a current overweight of radiomics approaches [24]. There are multiple reasons for this phenomenon. On one hand, radiomics models are simpler, built on so-called engineered or manually handcrafted features [27], which makes their applicability as well as interpretability easier than more complex DL frameworks. Second, given the fact that most research groups only have access to small datasets, simple radiomics models - that have fewer unknown parameters to optimize - can be better trained using such small sized datasets. In contrast, DL approaches appear to be powerful alternatives towards radiomics, but as they have much more unknown parameters to identify and optimize during the training process, hence requiring larger data samples for proper training [28]. Irrespective of the choice of AI approaches, functional and hybrid imaging AI studies are generally prone to establish overfitted models operating with small, single-centre data [29].

 

There is a certain element of bias in the selection of AI methods as well, which is typically driven by prior expertise and familiarity of AI tools or popularity of certain AI methods that may be sub-optimal for a given study. The „no free lunch theorem” states that there is no superior AI approach above all in general, but the ideal AI approach is rather data- and application-specific [30,31]. This suggests that one shall test multiple AI models over the available data to understand the underlying characteristics of the data and the applicability of the AI method. Nevertheless, to date, this approach is rarely present in the corresponding literature [24]. Furthermore, different performance metrics, such as the area under the receiver operating characteristic curve (AUC), receiver operating characteristic curve (ROC), Matthews correlation coefficient (MCC), or F1/F2-score also make established model performances difficult hard to compare among different research groups, especially because different AI tools tend to utilize different metrics for the training process themselves [32]. The lack of proper cross-validation in single-centre studies is one the major concerns of AI-driven predictive models [24]. Even though today’s mainstream processing capacities may allow to perform advanced cross-validation, e.g. a high Monte Carlo fold count-based cross-validation of radiomics models [33], this practice is rarely followed in the corresponding literature, potentially rendering most works to the level of advanced correlation analyses, rather than clinically-applicable predictive models [34]. Similarly, since DL training may be extremely time-consuming, the vast majority of studies utilizing DL either perform a one-fold training-validation approach or a very low cross-validation count, which leaves room for selection bias and high variances of DL-related predictive performances. Due to the aforementioned challenges, the vast majority of PET and hybrid imaging related research focusing on AI are single-centre only [35], which on its own potentially introduces an overestimation of predictive models [36].

 

Last, lack of interpretability and explainability of predictive models is a general concern for clinicians where interpretability is concerned with the transparency of the model itself and explainability with the post-hoc methods used to make complex models more understandable [88]. AI predictive models can be considered as „black boxes” from which understanding basic underlying mechanisms and gathering new knowledge is rarely possible [37]. The same is true for the output of predictive models, that are typically probability-based and need further processing. In contrast, there is an inherent wish to simplify the results of otherwise complex predictive models to the level of „green-yellow-red” outputs, which may challenge the establishment of a truly personalized treatment decision process [88].

 

The promise of AI in Nuclear Medicine

 

Despite these challenges, AI will, without any doubt, transform healthcare. It has the potential to play a pivotal role in personalized/precision/systems medicine, where interpreting large amounts of multi-modal data into a single model or Clinical Decision Support System (CDS) might be central. AI shows great promise in the field of nuclear medicine and is already setting new standards. At this point, there is no universal nuclear medicine AI algorithm that can replace all parts of the medical imaging workflow. The research has therefore been focused on developing specialized alternatives to each task. A typical medical imaging workflow can be divided into planning, image acquisition, interpretation, and reporting [38]. AI has the potential to assist, guide and/or replace elements in all these steps. In the following, we will examine the areas of acquisition, interpretation, and reporting, where AI is already now being utilized.

 

Image acquisition

 

Rather than focusing on replacing medical doctors by directly predicting a disease outcome, there has been a focus on supportive approaches, such as utilizing AI to improve image quality [39]. This is typically an image-to-image task, where the advantage is that training data are typically easily and widely accessible. Given a high-quality image, a low-quality image can be simulated. Such a training scheme allows generation of perfectly co-registered paired data for training. It is therefore possible to build and train a model that predicts a high-quality image from a low-quality input, allowing faster image acquisition protocols, noise reduction, and a lower radiation exposure, to the benefit of both the patients and personnel.

 

Rather than a focus on quantitative image quality metrics, there needs to be a focus on clinical accuracy for these methods to be implemented in hospital routine settings. If validation is achieved, we will enter in a new era for low-dose PET imaging [40]. One domain where low-dose PET appears ready for clinical implementation is the assessment of dementia. Chen et al. showed that a reading of a noise-reduced image with only 1% of the original radiotracer had high accuracy for amyloid status definition (89%), which was similar to intra-reader reproducibility of the full-dose images (91%).

 

In hybrid PET imaging, one of the largest challenges has been to achieve accurate attenuation correction without CT. Several studies have demonstrated the ability of DL-based networks to generate artificial CT from only MRI input, or even directly from non-attenuation-corrected PET to attenuation and scatter-corrected PET, bypassing the need for AC all together [43].

 

Interpretation and reporting

 

There is a large part of AI research in nuclear medicine aiming at replacing manual tasks, such as delineation. Automated delineations could free the physician up to higher valued tasks [44], or part of research that would allow them to collect more data. Several automatic segmentation challenges exist for e.g. brain tumours [45], lung nodules [46], or ischemic stroke lesions [47,48]. Few of the reported methods have moved into clinical routine despite impressive results in some patients populations, probably due to the extremely diverse appearance of these diseases, requiring large amounts of labelled training data emerging from various centres [38].

 

Another large area of research is the early detection of Alzheimer’s disease and mild cognitive impairment using DL [49–51]. Ding et al. showed how DL was able to outperform human interpreters for the early diagnosis of  Alzheimer’s disease with 82% specificity and 100% sensitivity (AUC: 0.98) [52]. Similarly, Kim et al. used 54 normal and 54 abnormal 123I-ioflupane SPECT scans to train a network that predicts the diagnosis of Parkinson’s disease [53], with an achieved sensitivity of 96% at 67% specificity (AUC: 0.87). 

 

In oncology, there is a need to predict overall survival or response to therapy. This task is often not achievable with imaging alone, which is why several studies incorporate non-imaging features. Papp et al. combined PET features, histopathologic features, and patient characteristics in a ML model to predict the 36-month survival in 70 patients with treatment-negative gliomas (AUC: 0.9) [54]. Xiong et al. demonstrated the feasibility of predicting local disease control with chemoradiotherapy in patients with oesophageal cancer using radiomic features from 18F-FDG PET/CT [55], and Milgron et al. found five features extracted from mediastinal sites to be highly predictive of primary refractory disease in 251 patients with stage I or II Hodgkin lymphoma [56].

The major drawback for networks that predict disease evolution is the amount of available training data. While image-to-image translation, e.g. MR to CT generation, essentially has one output value for each input value, disease prediction only has the same single label for the entire data input. This increases the amount of required training data significantly, depending on the complexity of the disease, often to a level that one single department cannot provide. One way to overcome the lack of data is by generating shared databases with data from multiple hospitals. A standardization approach is essential for a successful implementation, especially for MRI where there are a vast number of sequences in use and even variations between scanners for the same sequence. Work by Gao et al. has shown that this can be overcome, again by the use of a DL to transform the MR input images to a standardized MR image [9]. Similarly, radiomic features themselves can also be harmonized for achieving better cross-validation in a multi-centric setting [57].

 

 

Transformer Models and Large Language Models (LLMs)

 

Large language models (LLMs), built on transformer architectures, are increasingly explored in nuclear medicine research for tasks such as image interpretation, report drafting, and clinical decision support. While not yet used in clinical practice, studies have demonstrated their ability to analyze PET and SPECT imaging data, assist in correlating findings with patient histories, and suggest diagnostic insights. Recent research also shows that LLMs can successfully answer board exam questions in radiology and nuclear cardiology, highlighting their potential in medical education and decision support. Additionally, vision transformers (ViTs) are being investigated for nuclear imaging analysis, showing promise in improving automated lesion detection and classification. As development continues, transformers may contribute to reducing diagnostic variability and streamlining workflows in nuclear medicine. Transformers are also ideal candidates to build multimodal foundation models that, instead of task-specific predictions, are capable of conducting a wide range of tasks, including object detection and classification across various disease types. While the promise of foundational models are clearly present [89], to date, their practical applicability in light of overall prediction performance as well as resource needs are subjects of debate. Since the field of LLMs and other transformer architectures are active, currently, it is unknown what particular architecture or model scheme is best to build high-performing, clinically-applicable foundation models.

 

 

Best Practices

 

Standardized software tools

 

Standardized tools play an important role in facilitating universal applicability of predictive models by promoting reproducibility. Even though custom frameworks are sometimes used for performing data analysis in the field of nuclear medicine, there is a broad range of free and open-source software available that can help to improve the standardization of analysis workflows. The most commonly known AI frameworks include TensorFlow [58], Keras [59] and PyTorch for the development of DL-based predictive models [60]. For radiomics driven analysis, standardized frameworks include PyRadiomics [61], LIFEx [62], MITK [63], and MPRAD [64]. Additionally, there is a variety of tools and libraries for general-purpose ML including Scikit-learn [65] for pPython and rpart [66] as well as caret [67] for R. Oftentimes, custom code is required to use and extend pre-existing, standardized frameworks. In order to make maximum use of these implementations, they should be documented thoroughly and shared with the research community.

 

Standardized imaging protocols

 

Another equally important target for standardization are (multi-centric) imaging protocols as the repeatability of the extracted ML features can only be guaranteed if a unified and AI-friendly protocol is being followed on image acquisition. As an example, optimal PET protocol settings that minimize multi-centric variations of radiomic features have been presented in [68]. Furthermore, ComBat feature-domain harmonization was proposed to deal with multi-centric radiomic variations [69]. Besides embracing existing EANM guidelines and EANM Research Ltd. (EARL) accreditation programs, future EANM guidelines and EARL accreditation programs should focus on pursuing such AI-driven requirements as well [99].

 

 

Handling of Limited Data

 

Some tools focus on handling small amounts of data and improving the generalizability of the created predictive models. Data augmentation [22], for example, consists in generating additional synthetic data with the same patterns as innative images. Simple data augmentation techniques include procedures like flipping, rotation, and translation of the input images. More sophisticated techniques incorporate methods like generative adversarial networks (GANs) to create completely new synthetic images with respect to key patterns [10,70,71]. In nuclear medicine, synthetic imaging data have been shown to be indistinguishable from real images by expert readers and have been used for improving diagnostic models. It should be noted that data augmentation has to be restricted to images that are used to train a predictive model, not for its validation or testing. Another often successfully applied technique suitable for small amounts of data is transfer learning [72,73]. Transfer learning is a general ML concept that is the especially useful adaptation of a DL model that had been previously trained on data with a larger amount of data. The principle is to reuse the first layers of the network trained with a large amount of data, since the features extracted by these first layers highlight generalizable patterns such as dots or edges, even across domains (including non-medical to medical images). Shin et al. demonstrated the benefit of relying on transfer learning arising from non-medical images for computer-aided detection (CADe) problems, and consistently achieved better performance compared to training the networks from scratch [74]. Depending on similarity of and the previous model, different numbers of layers may be transferred.

 

 

Explainable AI

 

Overall, the field of AI is currently shifting from the use of black-box models to interpretable analysis pipelines. Current techniques to undercover features from predictive models include activation maps, filter visualizations, maximum activation maps, and feature weighting [75]. However, care must be taken when interpreting the results of these techniques alone [76,75].

 

Explainable AI (XAI) aims to provide transparency in AI-driven decision-making, making it a valuable tool in medical contexts. However, current methods face significant challenges. Many existing XAI techniques, such as heat map-based techniques (e.g. class activation maps) and feature attributions (e.g. Shapley additive explanations), offer only superficial explanations that may not reliably reflect how a model reaches its conclusions. Research has shown that clinicians and AI users often misinterpret these explanations, potentially leading to misplaced trust or overconfidence in AI predictions. Instead of relying solely on explainability, some argue that rigorous validation should be prioritized to ensure AI safety and reliability. This may be achieved through external testing and real-world performance assessments. While XAI remains valuable for model auditing and bias detection, its role in clinical decision-making is still uncertain and requires further refinement before it can be effectively integrated into medical practice.

 

 

Performance evaluation scheme

 

The choice of performance metrics is critical for communicating and comparing outcomes of ML-based studies. Oftentimes, the most effective choice is to report multiple metrics, such as AUC, (balanced) accuracy, sensitivity, specificity, positive predictive value, and negative predictive value to show the model’s capabilities from as many angles as possible. In addition, it should be reported how these metrics were obtained, such as the cross-validation scheme. In an ideal case with a large enough dataset, the most ideal evaluation scheme requires the separation of the available dataset into three groups: a training set, a validation, and an independent test set. The training set is used for building the model. Several models with distinct hyperparameters can be built using this training set and the resulting models can be validated by obtaining the predictive performance using the validation data set. However, as knowledge from the validation set is incorporated into the model, another independent test set has to be employed. Consequently, the performance of the model with the best validation performance is then evaluated with the independent test set. The resulting model must not be tuned any further based on the performance of the given test set as this would lead to an overfitting towards the test set and consequently to an overestimation of the model performance. If the model should be improved any further, another data set must be added for its evaluation.

 

Social and Ethical Considerations

 

AI methods seek to solve individual problems within one specific task. While they may excel in interpreting image and contextual information, they are so far not able to make associations the way a human brain does and cannot replace clinicians for all tasks they perform [12]. Visvikis et al., and also Bosbach et al., conclude that AI does not yet have achieved the same level of performance as a human expert in all situations, and therefore, a full artificial nuclear medicine physician still belongs to the domain of science fiction. However, the role of physicians as well as nuclear medicine physicians is likely to evolve as these new techniques are integrated into their practice [77].

 

 

Improved quality of diagnostics and therapy through CDS

 

AI models are now being developed to be less and less black boxes that lack interpretability and transparency, which formerly was the most important reason for patients and clinicians to have a sceptical attitude towards this technology. It is indeed comprehensible to mistrust unfamiliar interfaces and have a hesitancy to give a machine or mathematical algorithm the responsibility of making life-critical decisions [9,78].  This is also a reason why current research is focusing on supportive systems rather than systems that autonomously make decisions such as self-driving cars. Unless a lot of comparisons between physicians and predictive machines are made, in medicine it is the human plus the machine rather versus the machine [79]. Moreover, it is important to mention that uncertainty quantification has to bring confidence and credibility in the outputs of applying AI methods. For this reason, a CDS should be seen as an extended tool such as a stethoscope for patient diagnostics that a clinician can utilize to judge on a therapeutic decision. In parallel, the vast and heterogeneous data continuously generated in clinics represent a tremendous asset for both patient care and research. The rapidly evolving field of (bio)medical informatics has contributed a wealth of concepts, algorithms, and standards to harness this potential. However, the intricate relationships among various data sources, the specialized terminologies, and the myriad implementations across institutions pose significant hurdles for those seeking to engage with these data. Recent viewpoints from medical informatics research in Germany have outlined a set of 10 critical topics aimed at enhancing interdisciplinary communication among physicians, computational experts, experimentalists, students, and patient representatives [90]. This framework is designed to lower barriers to entry and catalyze collaborations across multiple levels.

 

Ribeiro et al. demonstrated that model explanations are very useful in trust-related tasks in the textual and image domains for both expert and non-expert users (e.g. deciding between models, assessing trust, improving or rejecting untrustworthy models, and getting meaningful insights into predictions) [80]. However, interpreting a model solely on a technical level is not that same as interpreting its decision on the underlying biology and therapeutic consequences. It is nonetheless a good start for clinicians and patients to find explanations and gain trust into almost inevitable AI model predictions [81]. An important aspect that might be taken into consideration for an extension is the combination of AI approaches and traditional research-oriented mechanistic models (e.g. in vivo mice models and in vitro cell experiments like spatial transcriptomics) that are used to also identify the origin of a disease and not only predict its outcome, because for reliable decisions it is necessary to properly investigate its causes [82].

 

Changing the physician-patient relationship with AI supported decisions

 

The great aim is that an integrative AI will allow clinicians to spend more time on personal discussions with patients, while leaving time-consuming statistical calculations and predictions to the CDS [83]. Having more time on the patient side could therefore lead to better care, which enhances patient trust that is foundational to the relationship between medical practitioners and patients [84]. However, physicians also need to take care that the AI-assisted CDS does not obstruct the patient-physician relationship, because they have to realize that the legal and moral responsibility for the decisions made, still lie with them. Thus, implementers may need to ensure that physicians are adequately trained on the benefits and pitfalls of AI-assisted CDS and apply them in practice to augment rather than replace their clinical decision-making capabilities and duties to patients [84].

 

A diagram of a medical performance

AI-generated content may be incorrect.

Figure 1: A suitable AI assisted decision making process based on a common predictive performance of the clinician and the Clinical Decision Support System (CDS) to evaluate and integrate heterogeneous patient data. The key parameters range between 0 (no to less information yield) and + (high information yield) representing the varying performance of the individual Clinician and CDS. EHR: Electronic Health Record.

 

To be successful and accepted a full degree of information transparency should be provided to patients about the involved features, limitations, and suggestions of the AI systems that are assisting clinicians with their decision-making [85]. This would be an extension of current classic formulations of informed consent that reflect a disclosure of all relevant information during the decision process (e.g. information at hand to accept or reject a diagnosis and consent to a proposed therapy plan).

 

However, using these benefits will require a free and rapid flow of information from the Electronic Health Records (EHR) to the CDS platform and into reportable outputs that can be validated and disseminated also to others outside the patient-physician relationship. This will require fundamental trade-offs with the control and supervision that patients have regarding the information that is contained in the EHR [84]. To circumvent this, researchers and administrators could use aggregated, de-identified data to undertake their analysis. However, it must be noted that no data can be truly de-identified, especially in the era of high-quality imaging and molecular deep sequencing [86].

 

 

References

 

1. Dilsizian SE, Siegel EL. Artificial Intelligence in Medicine and Cardiac Imaging: Harnessing Big Data and Advanced Computing to Provide Personalized Medical Diagnosis and Treatment. Curr Cardiol Rep 2013;16:441. https://doi.org/10.1007/s11886-013-0441-8.

 

2. Cal-Gonzalez J, Rausch I, Shiyam Sundar LK, Lassen ML, Muzik O, Moser E, et al. Hybrid Imaging: Instrumentation and Data Processing. Front Phys 2018;6. https://doi.org/10.3389/fphy.2018.00047.

 

3. Wissing MD, van Leeuwen FWB, van der Pluijm G, Gelderblom H. Radium-223 chloride: Extending life in prostate cancer patients by treating bone metastases. Clin Cancer Res 2013;19:5822–7. https://doi.org/10.1158/1078-0432.CCR-13-1896.

 

4.   Wang F, Casalino LP, Khullar D. Deep Learning in Medicine-Promise, Progress, and Challenges. JAMA Intern Med 2019;179:293–4. https://doi.org/10.1001/jamainternmed.2018.7117.

 

5.   Kotsiantis SB. Supervised Machine Learning: A Review of Classification Techniques n.d.:20.

 

6.   Olson RS, Cava WL, Mustahsan Z, Varik A, Moore JH. Data-driven advice for applying machine learning to bioinformatics problems. Biocomputing 2018, WORLD SCIENTIFIC; 2017, p. 192–203. https://doi.org/10.1142/9789813235533_0018.

 

7.   Tarca AL, Carey VJ, Chen X, Romero R, Drăghici S. Machine Learning and Its Applications to Biology. PLOS Computational Biology 2007;3:e116. https://doi.org/10.1371/journal.pcbi.0030116.

 

8.   Maaten L van der, Hinton G. Visualizing Data using t-SNE. Journal of Machine Learning Research 2008;9:2579–605.

 

9.   Gao Y, Liu Y, Wang Y, Shi Z, Yu J. A Universal Intensity Standardization Method Based on a Many-to-One Weak-Paired Cycle Generative Adversarial Network for Magnetic Resonance Images. IEEE Transactions on Medical Imaging 2019;38:2059–69. https://doi.org/10.1109/TMI.2019.2894692.

 

10. Frid-Adar M, Klang E, Amitai M, Goldberger J, Greenspan H. Synthetic data augmentation using GAN for improved liver lesion classification. 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), 2018, p. 289–93. https://doi.org/10.1109/ISBI.2018.8363576

.

11.   Chen M, Shi X, Zhang Y, Wu D, Guizani M. Deep Features Learning for Medical Image Analysis with Convolutional Autoencoder Neural Network. IEEE Transactions on Big Data 2017:1–1. https://doi.org/10.1109/TBDATA.2017.2717439.

 

12.   Visvikis D, Cheze Le Rest C, Jaouen V, Hatt M. Artificial intelligence, machine (deep) learning and radio(geno)mics: definitions and nuclear medicine imaging applications. Eur J Nucl Med Mol Imaging 2019;46:2630–7. https://doi.org/10.1007/s00259-019-04373-w.

 

13.   Cook GJR, Goh V. What can artificial intelligence teach us about the molecular mechanisms underlying disease? Eur J Nucl Med Mol Imaging 2019;46:2715–21. https://doi.org/10.1007/s00259-019-04370-z

 

14.   Hatt M, Tixier F, Visvikis D, Rest CCL. Radiomics in PET/CT: More Than Meets the Eye? J Nucl Med 2017;58:365–6. https://doi.org/10.2967/jnumed.116.184655.

 

15.   Lambin P, Leijenaar RTH, Deist TM, Peerlings J, de Jong EEC, van Timmeren J, et al. Radiomics: the bridge between medical imaging and personalized medicine. Nature Reviews Clinical Oncology 2017;14:749–762. https://doi.org/10.1038/nrclinonc.2017.141.

 

16. Valentin, J. ICRP Publication 103 The 2007 Recommendations of the International Commission on Radiological Protection. ICRP 103 2007;37.

17.   Kratz J, Strasser C. Data publication consensus and controversies. F1000Res 2014;3:94. https://doi.org/10.12688/f1000research.3979.3.

 

18.   Kalendralis P, Shi Z, Traverso A, Choudhury A, Sloep M, Zhovannik I, et al. FAIR-compliant clinical, radiomics and DICOM metadata of RIDER, interobserver, Lung1 and head-Neck1 TCIA collections. Med Phys 2020. https://doi.org/10.1002/mp.14322.

 

19.   Panch T, Mattie H, Celi LA. The “inconvenient truth” about AI in healthcare. Npj Digital Medicine 2019;2:1–3. https://doi.org/10.1038/s41746-019-0155-4.

 

20.   Zhang L, Yang H, Jiang Z. Imbalanced biomedical data classification using self-adaptive multilayer ELM combined with dynamic GAN. BioMedical Engineering OnLine 2018;17:181. https://doi.org/10.1186/s12938-018-0604-3.

 

21.   Yu H, Hong S, Yang X, Ni J, Dan Y, Qin B. Recognition of Multiple Imbalanced Cancer Types Based on DNA Microarray Data Using Ensemble Classifiers. BioMed Research International 2013;2013:e239628. https://doi.org/10.1155/2013/239628.

 

22.   Shorten C, Khoshgoftaar TM. A survey on Image Data Augmentation for Deep Learning. Journal of Big Data 2019;6:60. https://doi.org/10.1186/s40537-019-0197-0.

 

23.   Wibmer AG, Hricak H, Ulaner GA, Weber W. Trends in oncologic hybrid imaging. European Journal of Hybrid Imaging 2018;2:1. https://doi.org/10.1186/s41824-017-0019-6.

 

24.   Papp L, Spielvogel CP, Rausch I, Hacker M, Beyer T. Personalizing Medicine Through Hybrid Imaging and Medical Big Data Analysis. Front Phys 2018;6. https://doi.org/10.3389/fphy.2018.00051.

 

25.   Milchenko M, Marcus D. Obscuring Surface Anatomy in Volumetric Imaging Data. Neuroinform 2013;11:65–75. https://doi.org/10.1007/s12021-012-9160-3.

 

228.   Kesner A, Koo P. A consideration for changing our PET data saving practices: a cost/benefit analysis. J Nucl Med 2016;57:1912–1912.

 

26.   Allen B, Seltzer SE, Langlotz CP, Dreyer KP, Summers RM, Petrick N, et al. A Road Map for Translational Research on Artificial Intelligence in Medical Imaging: From the 2018 National Institutes of Health/RSNA/ACR/The Academy Workshop. Journal of the American College of Radiology 2019;16:1179–89. https://doi.org/10.1016/j.jacr.2019.04.014.

 

27.   Avanzo M, Stancanello J, Naqa IE. Beyond imaging: The promise of radiomics. Physica Medica: European Journal of Medical Physics 2017;38:122–39. https://doi.org/10.1016/j.ejmp.2017.05.071.

 

28.   Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, et al. A survey on deep learning in medical image analysis. Medical Image Analysis 2017;42:60–88. https://doi.org/10.1016/j.media.2017.07.005.

 

29.   Zhang C, Bengio S, Hardt M, Recht B, Vinyals O. Understanding deep learning requires rethinking generalization. ArXiv:161103530 [Cs] 2017.

 

30.   Wolpert DH, Macready WG. No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation 1997;1:67–82. https://doi.org/10.1109/4235.585893.

 

31.   Yu-Chi Ho, Pepyne DL. Simple explanation of the no free lunch theorem of optimization. Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228), vol. 5, 2001, p. 4409–14 vol.5. https://doi.org/10.1109/CDC.2001.980896.

 

32.   Mohseni S, Zarei N, Ragan ED. A Multidisciplinary Survey and Framework for Design and Evaluation of Explainable AI Systems. ArXiv:181111839 [Cs] 2020. 10.1145/3387166.

 

33.   Li T, Tang W, Zhang L. Monte Carlo cross-validation analysis screens pathway cross-talk associated with Parkinson’s disease. Neurol Sci 2016;37:1327–33. https://doi.org/10.1007/s10072-016-2595-9.

 

34.   Sollini M, Antunovic L, Chiti A, Kirienko M. Towards clinical application of image mining: a systematic review on artificial intelligence and radiomics. Eur J Nucl Med Mol Imaging 2019;46:2656–72. https://doi.org/10.1007/s00259-019-04372-x.

 

35.   Hatt M, Lucia F, Schick U, Visvikis D. Multicentric validation of radiomics findings: challenges and opportunities. EBioMedicine 2019;47:20–1. https://doi.org/10.1016/j.ebiom.2019.08.054.

 

36.   Park SH, Han K. Methodologic Guide for Evaluating Clinical Performance and Effect of                Artificial Intelligence Technology for Medical Diagnosis and                Prediction. Radiology 2018;286:800–9. https://doi.org/10.1148/radiol.2017171920.

 

37.   Rudin C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 2019;1:206–15. https://doi.org/10.1038/s42256-019-0048-x.

 

38.   Nensa F, Demircioglu A, Rischpler C. Artificial Intelligence in Nuclear Medicine. J Nucl Med 2019;60:29S-37S. https://doi.org/10.2967/jnumed.118.220590.

 

39.   Zaharchuk G, Gong E, Wintermark M, Rubin D, Langlotz CP. Deep Learning in Neuroradiology. AJNR Am J Neuroradiol 2018;39:1776–84. https://doi.org/10.3174/ajnr.A5543.

 

40.   Catana C. The Dawn of a New Era in Low-Dose PET Imaging. Radiology 2019;290:657–8. https://doi.org/10.1148/radiol.2018182573.

 

41.   Guo J, Gong E, Fan AP, Goubran M, Khalighi MM, Zaharchuk G. Predicting 15O-Water PET cerebral blood flow maps from multi-contrast MRI using a deep convolutional neural network with evaluation of training cohort bias: Journal of Cerebral Blood Flow & Metabolism 2019. https://doi.org/10.1177/0271678X19888123.

 

42.   Wei W, Poirion E, Bodini B, Durrleman S, Ayache N, Stankoff B, et al. Learning Myelin Content in Multiple Sclerosis from Multimodal MRI through Adversarial Training. ArXiv:180408039 [Cs] 2018;11072:514–22. https://doi.org/10.1007/978-3-030-00931-1_59.

 

43.   Hemmen HV, Massa H, Hurley S, Cho S, Bradshaw T, McMillan A. A deep learning-based approach for direct whole-body PET attenuation correction. J Nucl Med 2019; 60:569–569.

 

44.   Hainc N, Federau C, Stieltjes B, Blatow M, Bink A, Stippich C. The Bright, Artificial Intelligence-Augmented Future of Neuroimaging Reading. Front Neurol 2017;8. https://doi.org/10.3389/fneur.2017.00489

 

45.   Menze BH, Jakab A, Bauer S, Kalpathy-Cramer J, Farahani K, Kirby J, et al. The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Transactions on Medical Imaging 2015;34:1993–2024. https://doi.org/10.1109/TMI.2014.2377694.

 

46.   Armato SG, McLennan G, Bidaut L, McNitt-Gray MF, Meyer CR, Reeves AP, et al. The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): a completed reference database of lung nodules on CT scans. Med Phys 2011;38:915–31. https://doi.org/10.1118/1.3528204.

 

47.   Maier O, Menze BH, von der Gablentz J, Ḧani L, Heinrich MP, Liebrand M, et al. ISLES 2015 - A public evaluation benchmark for ischemic stroke lesion segmentation from multispectral MRI. Med Image Anal 2017;35:250–69. https://doi.org/10.1016/j.media.2016.07.009.

 

48.   Kistler M, Bonaretti S, Pfahrer M, Niklaus R, Büchler P. The Virtual Skeleton Database: An Open Access Repository for Biomedical Research and Collaboration. J Med Internet Res 2013;15. https://doi.org/10.2196/jmir.2930.

 

49.   Kim J, Lee B. Identification of Alzheimer’s disease and mild cognitive impairment using multimodal sparse hierarchical extreme learning machine. Human Brain Mapping 2018;39:3728–41. https://doi.org/10.1002/hbm.24207.

 

50.   Katako A, Shelton P, Goertzen AL, Levin D, Bybel B, Aljuaid M, et al. Machine learning identified an Alzheimer’s disease-related FDG-PET pattern which is also expressed in Lewy body dementia and Parkinson’s disease dementia. Scientific Reports 2018;8:13236. https://doi.org/10.1038/s41598-018-31653-6.

 

51.   Liu M, Cheng D, Yan W, Alzheimer’s Disease Neuroimaging Initiative. Classification of Alzheimer’s Disease by Combination of Convolutional and Recurrent Neural Networks Using FDG-PET Images. Front Neuroinform 2018;12:35. https://doi.org/10.3389/fninf.2018.00035.

 

52.   Ding Y, Sohn JH, Kawczynski MG, Trivedi H, Harnish R, Jenkins NW, et al. A Deep Learning Model to Predict a Diagnosis of Alzheimer Disease by Using 18F-FDG PET of the Brain. Radiology 2019;290:456–64. https://doi.org/10.1148/radiol.2018180958.

 

53.   Kim DH, Wit H, Thurston M. Artificial intelligence in the diagnosis of Parkinson’s disease from ioflupane-123 single-photon emission computed tomography dopamine transporter scans using transfer learning. Nucl Med Commun 2018;39:887–93. https://doi.org/10.1097/MNM.0000000000000890.

 

54.   Papp L, Pötsch N, Grahovac M, Schmidbauer V, Woehrer A, Preusser M, et al. Glioma Survival Prediction with Combined Analysis of In Vivo 11C-MET PET Features, Ex Vivo Features, and Patient Features by Supervised Machine Learning. J Nucl Med 2018;59:892–9. https://doi.org/10.2967/jnumed.117.202267.

 

55.   Xiong J, Yu W, Ma J, Ren Y, Fu X, Zhao J. The Role of PET-Based Radiomic Features in Predicting Local Control of Esophageal Cancer Treated with Concurrent Chemoradiotherapy. Scientific Reports 2018;8:9902. https://doi.org/10.1038/s41598-018-28243-x.

 

56.   Milgrom SA, Elhalawani H, Lee J, Wang Q, Mohamed ASR, Dabaja BS, et al. A PET Radiomics Model to Predict Refractory Mediastinal Hodgkin Lymphoma. Sci Rep 2019;9:1322. https://doi.org/10.1038/s41598-018-37197-z.

 

57.   Lucia F, Visvikis D, Vallières M, Desseroit M-C, Miranda O, Robin P, et al. External validation of a combined PET and MRI radiomics model for prediction of recurrence in cervical cancer patients treated with chemoradiotherapy. Eur J Nucl Med Mol Imaging 2019;46:864–77. https://doi.org/10.1007/s00259-018-4231-9.

 

58.   Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, et al. TensorFlow: A system for large-scale machine learning n.d.:21.https://doi.org/10.48550/arXiv.1605.08695

 

59.   Team K. Keras documentation: Keras FAQ 2015. https://keras.io/getting_started/faq/#how-should-i-cite-keras (accessed August 10, 2020).

 

60.   Ketkar N. Deep Learning with Python: A Hands-on Introduction. Apress; 2017. https://doi.org/10.1007/978-1-4842-2766-4.

 

61.   Griethuysen JJM van, Fedorov A, Parmar C, Hosny A, Aucoin N, Narayan V, et al. Computational Radiomics System to Decode the Radiographic Phenotype. Cancer Res 2017;77:e104–7. https://doi.org/10.1158/0008-5472.CAN-17-0339.

 

62.   Nioche C, Orlhac F, Boughdad S, Reuzé S, Goya-Outi J, Robert C, et al. LIFEx: A Freeware for Radiomic Feature Calculation in Multimodality Imaging to Accelerate Advances in the Characterization of Tumor Heterogeneity. Cancer Res 2018;78:4786–9. https://doi.org/10.1158/0008-5472.CAN-18-0125.

 

63.   Götz M, Nolden M, Maier-Hein K. MITK Phenotyping: An open-source toolchain for image-based personalized medicine with radiomics. Radiother Oncol 2019;131:108–11. https://doi.org/10.1016/j.radonc.2018.11.021.

 

64.   Parekh VS, Jacobs MA. MPRAD: A Multiparametric Radiomics Framework. Breast Cancer Res Treat 2020;180:407–21. https://doi.org/10.1007/s10549-020-05533-5.

 

65.   Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 2011;12:2825–30.

 

66.   Therneau T, Atkinson B, port BR (producer of the initial R, maintainer 1999-2017). rpart: Recursive Partitioning and Regression Trees. 2019.

 

67.   Kuhn M. Building Predictive Models in R Using the caret Package. Journal of Statistical Software 2008;28:1–26. https://doi.org/10.18637/jss.v028.i05.

 

68.   Papp L, Rausch I, Grahovac M, Hacker M, Beyer T. Optimized Feature Extraction for Radiomics Analysis of 18F-FDG PET Imaging. J Nucl Med 2019;60:864–72. https://doi.org/10.2967/jnumed.118.217612.

 

69.   Orlhac F, Boughdad S, Philippe C, Stalla-Bourdillon H, Nioche C, Champion L, et al. A Postreconstruction Harmonization Method for Multicenter Radiomic Studies in PET. J Nucl Med 2018;59:1321–8. https://doi.org/10.2967/jnumed.117.199935.

70.   Zhu J-Y, Park T, Isola P, Efros AA. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. 2017 IEEE International Conference on Computer Vision (ICCV), Venice: IEEE; 2017, p. 2242–51. https://doi.org/10.1109/ICCV.2017.244.

 

71.   Shin H-C, Tenenholtz NA, Rogers JK, Schwarz CG, Senjem ML, Gunter JL, et al. Medical Image Synthesis for Data Augmentation and Anonymization using Generative Adversarial Networks. ArXiv:180710225 [Cs, Stat] 2018.10.48550/arXiv.1807.10225

 

72.   Pan SJ, Yang Q. A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering 2010;22:1345–59. https://doi.org/10.1109/TKDE.2009.191.

 

73.   Shin H-C, Roth HR, Gao M, Lu L, Xu Z, Nogues I, et al. Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning. IEEE Trans Med Imaging 2016;35:1285–98. https://doi.org/10.1109/TMI.2016.2528162.

 

74. McCollough CH, Leng S, Yu L, Cody DD, Boone JM, McNitt-Gray MF. CT dose index and patient dose: they are not the same thing. Radiology 2011;259:311–6. https://doi.org/10.1148/radiol.11101800 

 

75.   Zhang Q, Zhu S. Visual interpretability for deep learning: a survey. Frontiers Inf Technol Electronic Eng 2018;19:27–39. https://doi.org/10.1631/FITEE.1700808.

 

76.   Gilpin LH, Bau D, Yuan BZ, Bajwa A, Specter M, Kagal L. Explaining Explanations: An Overview of Interpretability of Machine Learning. ArXiv:180600069 [Cs, Stat] 2019.:10.1109/DSAA.2018.00018

 

77.   Hustinx R. Physician centred imaging interpretation is dying out — why should I be a nuclear medicine physician? Eur J Nucl Med Mol Imaging 2019;46:2708–14. https://doi.org/10.1007/s00259-019-04371-y.

 

78.   Begoli E, Bhattacharya T, Kusnezov D. The need for uncertainty quantification in machine-assisted medical decision making. Nature Machine Intelligence 2019;1:20–3. https://doi.org/10.1038/s42256-018-0004-1.

 

79.   Liu X, Faes L, Kale AU, Wagner SK, Fu DJ, Bruynseels A, et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. The Lancet Digital Health 2019;1:e271–97. https://doi.org/10.1016/S2589-7500(19)30123-2.

 

80.   Ribeiro MT, Singh S, Guestrin C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. ArXiv:160204938 [Cs, Stat] 2016. https://doi.org/10.48550/arXiv.1602.04938 


81.   Avati A, Jung K, Harman S, Downing L, Ng A, Shah NH. Improving Palliative Care with Deep Learning. ArXiv:171106402 [Cs, Stat] 2017.10.1186/s12911-018-0677-8