ABSTRACT
As an umbrella term, artificial intelligence (AI) covers machine learning and deep learning. This review aimed to elaborate on these terms to act as a primer for radiologists to learn more about the algorithms commonly used in musculoskeletal radiology. It also aimed to familiarize them with the common practices and issues in the use of AI in this domain.
Main points
• Proficiency in data interpretation and validation will ensure the accuracy and reliability of artificial intelligence (AI) algorithms beneficial for radiologists in their clinical practice.
• Understanding the underlying principles of machine learning (ML) models, such as neural networks and deep learning architectures, is essential for critical appraisal and informed decision-making and has been covered in this article.
• This article also discusses the limitations and potential biases inherent in AI systems, emphasizing the importance of human oversight in clinical decision-making.
• Furthermore, knowledge of regulatory frameworks and ethical considerations surrounding AI adoption in healthcare is imperative to navigate legal and ethical challenges.
• Continuous learning and collaboration with data scientists and AI experts are essential for radiologists to harness the full potential of AI and ML in improving diagnostic accuracy, efficiency, and patient care while upholding professional standards and ethical principles.
Approximately 1.71 billion people have musculoskeletal (MSK) conditions worldwide.1 The need for imaging on MSK disorders is increasing in parallel with the rising and progressively aging global population,2 posing a significant threat of fatigue in radiologists and unmet needs for patients.3, 4The evolution of MSK radiology traces back to the inception of the field of radiology itself with the discovery of X-rays in 1895. On a separate trajectory, the 1950s witnessed the introduction of the first programming languages and software, raised by Turing’s5 question, “can machines think?”. However, it was not until 1992, nearly a century later, that these two fields merged, culminating in the first research into artificial intelligence (AI) in radiology.6 Today, AI has become an ever-growing field and is reshaping the world, including medicine, with radiology at the forefront, evidenced by Food and Drug Administration (FDA)-approved AI-based tools. The first AI-based algorithm was approved by the FDA in 2017. By 2022, radiology dominated the medical field by a striking 87% of all FDA-authorized AI-based devices.7 In 2017, MSK applications were the second most common subject of AI-related publications in radiology, second only to neuroradiology.8
Thus far, AI research in radiology has primarily focused on interpretive tasks, including fracture detection, osteoarthritis detection and grading (cartilage and meniscal lesions), bone age determination, osteoporosis and bone quality assessment, tissue/region identification and segmentation, radiographic angle and bone measurements, clinical decision making on various bone and ligament anomalies, lesions characterization and diagnosis of infectious, oncological or rheumatological diseases, quantitative analysis and radiomics, and estimation of patient demographics.9 However, AI also offers promising solutions for non-interpretive tasks, which aim to ensure high-quality care and time-efficient outputs for the rising demands on imaging.10, 11 Indeed, non-interpretive tasks, such as protocoling, quality control, and overseeing imaging studies, comprise 44% of a radiologist’s daily workload.12 However, most of these tasks are neglected where productivity is mainly assessed by the number of produced reports. Research in the emergency radiology department shows that for every 1 minute spent on the phone by radiologists, the report turnaround time increases by approximately 4 minutes.12 Therefore, it is imperative to create time-efficient solutions to meet the rising demand in the field, where AI offers revolutionary solutions.
Therefore, radiologists must embrace a comprehensive understanding of AI and machine learning (ML) to integrate these technologies into their practice effectively, as described in Figure 1. Proficiency in data interpretation and validation will ensure the accuracy and reliability of AI algorithms beneficial for clinical practice. Understanding the underlying principles of ML models, such as neural networks and deep learning (DL) architectures, is essential for critical appraisal and informed decision-making. Radiologists must also grasp the limitations and potential biases inherent in AI systems, emphasizing the importance of human oversight in clinical decision-making. Furthermore, knowledge of regulatory frameworks and ethical considerations surrounding AI adoption in healthcare is imperative to navigate legal and ethical challenges.
Algorithms
Alongside advancements in computational power, computer algorithms, and data availability, AI has gained popularity as a rapidly developing tool that can transform industries. Broadly defined, AI refers to computer systems that can perform assigned tasks, such as learning, decision-making, and problem-solving, with satisfactory or better-than-expected performance within a given context. Subsets of AI include the following: artificial narrow intelligence, which can perform specific tasks well but cannot transfer knowledge; artificial general intelligence, which can transfer knowledge across systems or tasks; and artificial superintelligence, which functions beyond the capability of human beings and is currently mainly conceptualized.13 Commonly used AI concepts and descriptions are listed in Table 1.
ML essentially entails all techniques that can be employed to train a machine to mimic human performance. In the current context, it refers14 to the development of algorithms that predict discrete labels (classification), continuous quantities (regression), data subgroups (clustering), or important features (dimensionality reduction) based on previous experiences using probability, statistics, and linear algebra. Traditional ML algorithms include linear classifiers, logistic regression, decision trees, and nearest-neighbor searches. Each of these algorithms seeks to learn a mapping between input and output variables by defining decision boundaries between labeled data or clustering of the data.
DL refers to a subset of ML that utilizes neural networks to learn new high-level feature representations of data for computer vision tasks, such as object segmentation, classification, and detection, with high efficiency.15 Neural networks are composed of multiple layers of interconnected nodes with internal weights modeled after biological neural systems. The network learns to perform tasks by iteratively performing complex, non-linear transforms, involving passing forward input data through the network to predict a desired output and then using the discrepancy between the predicted and expected output to update the internal weights of the nodes in the network to improve task performance.
Convolutional neural networks (CNNs) perform convolution operations over local regions using shared convolution weights such that networks achieve translational invariance (i.e., objects can be detected regardless of location). Additional pooling operations down-sample data representations, automatically extracting relevant spatial hierarchical features. Variational CNNs have modified the underlying network structure to improve versatility and effectiveness. The two-dimensional (2D) U-Net was a significant breakthrough for medical imaging tasks, particularly segmentation. In 2015, Ronneberger et al.16 proposed a unique U-shaped architecture (Figure 2), which down-sampled and up-sampled input images of varying image modalities to predict regions of interest with “very good performance,” even after training with a very limited amount of training data.Despite their successes, CNNs are prone to overfitting, meaning CNN-based models do not perform as well on new unseen data. They also suffer from a requirement for large amounts of data for training and a lack of interpretability due to the model’s architectural complexity.
Federated learning proposes a framework to address challenges with model generalizability, with special benefits when using medical data. An aggerate model encapsulates shared model weights from multiple collaborators who trained the model on private datasets.17
Generative adversarial networks (GANs) are popular for image-to-image translation, consisting of two opposing networks: a generator and a discriminator.18 The generator creates an image to fool the discriminator, while the discriminator attempts to discern real or synthetic images.19 Due to the oppositional nature of the network, GANs can be challenging to train and often require careful consideration of hyperparameters. Mode collapse occurs when the generator produces similar images that may not capture the full distribution of the training data and the discriminator is unable to provide useful feedback to guide training.
Recently, large language models and vision transformers (ViTs)20 have spurred a new wave of innovation. Both of these DL architectures are based on transformers, which consist of an encoder, which extracts meaning features from input data, and a decoder network, which uses the features to generate outputs. Transformers process data as a sequence of tokens, enabling the model to capture global relationships between the data (Figure 3). For ViTs, images are vectorized into tokens, which can be combined with text.21
A typical workflow to develop an ML algorithm involves several distinct stages. It begins with problem definition and data collection where a specific object is identified, and relevant data is gathered. Subsequently, data preprocessing involves cleaning, transforming, and processing the dataset for training. Common preprocessing techniques include image normalization and clipping to achieve favorable image intensity ranges and contrast for ML models. Before model development, data is split into training, validation, and testing subsets, often with balanced distributions of relevant metadata, such as age, for proper evaluation of model performance. During training, models may be prone to overfitting if highly sensitive to patterns in the training dataset. The validation dataset allows for the evaluation of model performance during training, while the test set is only used to assess performance on the final selected model for unbiased assessment. Next, model selection and training occur, where various algorithms are evaluated, and a suitable model is chosen. Existing models may offer excellent zero-shot capabilities such that no modification of model weights is needed. On the other hand, models may be trained for a specific use case by fine-tuning, which involves further training of a pre-trained model on a smaller, targeted data set. After training, the model is evaluated on the test dataset using appropriate metrics specific to the objects of the model. Finally, the model is deployed and undergoes monitoring and maintenance to ensure optimal performance over time. This iterative process requires collaboration between domain experts, data scientists, and computer programmers to achieve successful outcomes. Some of the crucial technical terms and metrics used in everyday ML, and what they mean, are listed in Table 2. Although AI seems to be an omnipresent tool in current radiology practices, many users remain unfamiliar with the basic concepts, utilities, challenges, processes, and biases associated with it. We aim to provide comprehensive starting content that prepares the community of medical experts to become tuned to the vocabulary and its nuances and to get a sense of how AI can be integrated into their daily MSK radiology practice.
Applications in musculoskeletal radiology
Image acquisition
Imaging acceleration
Extensive research dedicated to reducing the time required to acquire medical images has led to the development of unique data sampling and reconstruction techniques in MSK radiology, primarily for computed tomography (CT) and magnetic resonance imaging (MRI). In particular, MRI is an important modality for radiologists to diagnose many MSK conditions, but it suffers from increased cost and increased time to acquire images compared with other modalities. AI-based image acceleration techniques aim to break those Nyquist limits, though this must be done while considering any in-domain and domain-shift artifacts. Reconstruction, therefore, is equally essential to ensure the quality of images is clinically preserved in rapidly acquired MRI. AI researchers have developed algorithms that achieve both high accelerations for faster imaging and excellent reconstitution with comparable or improved image resolution. Such methodologies have been developed using data-driven guidance, such as compressed sensing or dictionary learning, or physics-guided networks combined with artifact removal.22 These techniques are often modified for solution-specific problems, including accelerating higher-dimensional 2D or 3D MRI scans, such as dynamic (temporal) MRI.23 AI techniques for the joint optimization of a non-Cartesian k-space sampling trajectory and an image-reconstruction network have been rising in popularity. For example, one such framework, PROJECTOR,24 proposed dubbed projection for jointly learning non-Cartesian trajectories while optimizing reconstructor trajectories. It also ensured that the learned trajectories were compatible with gradient-related hardware constraints. Previous techniques enforced these constraints via penalty terms, but PROJECTOR enforces them via embedded steps that project the learned trajectory on a feasible set.
Synthesis of images and parametric maps
Another exciting application of AI is to characterize meaningful tissue maps or images from raw data (Figures 4 and 5). Wu et al.25 proposed CNNs for synthesizing water/fat images from only two echoes instead of multiple. The method achieved high-fidelity output images, a 10-fold acceleration in computation time, and also generalizability to unseen organ images and metal artifacts. Zou et al.26 have also proposed reconstructing free-breathing cardiac MRI data and synthesizing cardiac cine movies from manifold learning networks. This enables a unique generation of synthetic breath-hold cine movies with data on demand: specifically, movies with different inversion contrasts. Additionally, it enables the estimation of T1 maps with specific respiratory phases. So far, the derivation of tissue parameter maps has been achieved by repeating acquisition in steady-state conditions and longer scan times.22 However, rapid extraction of such parameters is no longer a challenge due to AI-based solutions, such as synthetic mapping of T1, T1p, R2*, and T2 relaxation, chemical exchange saturation transfer proton volume fraction and exchange rate, magnetization transfer, and susceptibility. Conventional magnetic resonance fingerprinting (MRF) is regularly used for quantitative parameter estimation. However, it suffers from the computational burden of dictionary generation and pattern matching. The burden further grows exponentially with the number of fitting parameters considered. ML has also been utilized to accelerate both the acquisition and reconstruction and thus optimize MRF sequences.22
End-to-end design
End-to-end design of reconstruction and segmentation techniques have recently been a heavy focus in the medical imaging community. Often addressed separately, these two tasks could benefit from being handled in tandem. Tolpadi et al.27 recently hosted and summarized a challenge entitled “K2S,” hosted at the 25th International Conference on Medical Image Computing and Computer-Assisted Intervention (Singapore, 2022). Eight-times under-sampled raw MRI measurements were provided as training data with their fully sampled counterparts and segmentation masks (i.e., a unique dataset consisting of 300 knee MRI scans accompanied by radiologist-approved tissue segmentation labels). In the testing phase, the challenge participants submitted DL models that generated high-efficiency segmentation maps directly from the under-sampled raw data. No correlations were found between the reconstruction and segmentation metrics (Figure 6). Some researchers suggest pre-training segmentation models on “pretext tasks”. In these tasks, the model is trained to restore distorted images. Context prediction and context restoration challenges demonstrate that segmentation models can be made robust with pre-training, particularly if labeled data availability is limited.22
Image post-processing
Registration
Image registration is a critical process in imaging that focuses on the accurate alignment of images, which is necessary for the diagnosis, treatment planning, and monitoring of diseases. However, it is difficult to develop robust algorithms to register images of varying resolution and from different modalities efficiently and accurately. This is particularly challenging in the presence of significant anatomical variation in the case of MSK disease. Conventional registration methods often rely on solving pairwise optimization problems, which can be time-consuming and computationally expensive.28 Recent literature has demonstrated the growing application of AI, in particular DL models, in image registration. CNNs, for instance, have been employed to predict the transformation required to align images. For example, a study by Sokooti et al.29 proposed a CNN-based method for non-rigid registration on 3D chest CT follow-up data. Another novel approach involves using spatial transformer networks (STNs), a DL model that can learn spatial transformations to align images. In a study by Sokooti et al.29 an STN was used for image registration, showing that the model could learn complex transformations from training data.30 Models such as VoxelMorph, a CNN-based unsupervised framework for image registration,31 have also shown promising results. Although VoxelMorph was trained on 3D brain MRI, the architecture of the models can be used to train on specific MSK datasets due to the unsupervised and generalizable nature of the models.
Segmentation
Image segmentation is a well-defined problem that involves the delineation of specific regions of interest. As manual image segmentation is both time-consuming and repetitive, the research community has explored AI to improve medical image segmentation workflows with great interest.16 Over the years, various network architectures have been developed to segment MSK structures. One of the most popular CNN models is the U-Net, discussed earlier. It is often utilized to solve 2D or 3D segmentation tasks, such as identifying muscles, bones, cartilages, menisci, femoral and acetabular regions, and shoulder structures in knee, spine, hip, thigh, and wrist anatomy.32, 33 Usually, the performance of existing segmentation algorithms can only be fairly compared on a specific case basis, such as anatomical region, medical imaging acquisition setting, or study population.34
DL can establish a useful representation of any object without prior super-imposition of user-designed features. This is why the performance of a vertebral body segmentation algorithm relies on the integrity of intervertebral discs and is compromised when disc pathologies are present if not trained with enough variety of data. Identification of a thoracic vertebral body is achieved using intrinsic features and its proximity to a disc. The disc serves as an extrinsic feature for the vertebral body. In other words, it becomes the landmark that the network learns in the context of spine segmentation (Figure 7a-c). This is also the reason for failures in patch-based approaches. Only limited contextual information is passed, which limits the outcome efficiency.
On the positive side, network learning from diverse data may often learn how the images, anatomies, and pathologies are integrated beyond visual perception, suggest new biomarkers as predictors of MSK diseases through image analysis, and potentially overcome the limitations of human perception.
Anomaly detection
Anomaly detection involves identifying abnormal structures or pathologies, such as fractures, tumors, or degenerative diseases, amidst a wide range of normal anatomical variations. To accurately distinguish between benign variants and clinically significant abnormalities, DL models-particularly CNNs-have been implemented due to their ability to learn hierarchical feature representations.35, 36 Autoencoders have also been used for unsupervised anomaly detection, whereby during the training process for reconstructing input data, they learn to encode “normal” data patterns and can thus highlight deviations from the norm when encountering an anomalous data point and produce a significantly different output.37 These models can assist in identifying subtle or complex anomalies that may be missed by the human eye while providing consistent performance, thus reducing variability between different radiologists’ interpretations. Workflow efficiency can be improved by prioritizing cases with potential anomalies identified by AI. However, there is a risk of generating false positives, false negatives, or model hallucinations, leading to unnecessary interventions or missed diagnoses. Radiologists should seek AI tools that balance sensitivity and specificity to minimize false positive and negative rates.
Shape modeling
Shape modeling focuses on the accurate representation and analysis of the anatomical structures of the MSK system, with the challenge of capturing the complex geometry and variability of bones and soft tissues; this is essential for surgical pre-operative planning, prosthesis design, and the study of biomechanical properties. Active shape models and statistical shape modeling are common statistical methods to capture the variability of shape across a population and can be used for tasks such as segmentation.38 However, they require a large amount of representative data for accurate modeling and can be sensitive to outliers with large shape deviations (Figure 8).
DL-based methods have been increasingly utilized for shape modeling due to their ability to learn complex, non-linear relationships. CNNs are commonly used due to their ability to process hierarchical features from image data directly. For instance, the U-Net architecture16 and its variants have been extensively used for biomedical image segmentation tasks, providing detailed shape models of various anatomical structures. U-Net’s strength lies in its symmetric expanding path, which allows precise localization, a key factor in accurate shape modeling. Another DL model, V-Net,39 is a 3D variant of U-Net and is used for volumetric medical image segmentation, providing 3D shape models. Both U-Net and V-Net have shown competitive performance compared with traditional methods, with the added advantage of handling large datasets and capturing fine-grained details. DL models have recently been used for shape prediction and generation. For instance, GANs have been employed to generate realistic 3D shapes to synthesize anatomical structures for augmentation and analysis.40 One hidden benefit of an AI-based shape model is the ability to predict changes in MSK structures over time, aiding in prognostic assessments.35
Radiomics
Radiomics, merging the word “radiology” with “-omics” to describe the high-throughput, data-driven approach to characterizing radiological images, involves computer-assisted image analysis where many quantitative “features” are extracted from images that are not readily appreciable to the human eye. Radiomic features have historically involved mathematical operations on the voxels of an image, converting morphological information about anatomical structure into quantitative values. Over time, the number of features has grown exponentially as more features have been identified, making the application of ML techniques, or classifiers, to identify radiomic features increasingly popular over the past few years.41 Support vector machines, random forests, and neural networks have been used to identify and analyze features that are most predictive of disease presence, severity, progression, and response to treatment. CNNs are also increasingly being applied to automate feature extraction. However, the clinical utility of radiomics is still being established, and integration into clinical workflows remains a challenge.
Metal artifact reduction
AI, particularly DL algorithms, is increasingly applied to mitigate metal artifacts in MSK imaging. Metal implants or instruments introduce significant artifacts, particularly in MRI, which can impair diagnostic accuracy and limit the utility of these scans. Current literature points to the use of AI in CT and radiography, but its application in MRI is less explored.42 In the context of MRI, the integration of AI for metal artifact reduction is still in its infancy. Existing techniques without the use of AI, such as multi-acquisition variable-resonance image combination and slice encoding for metal artifact correction (SEMAC), have limitations in their application and efficacy. Studies have used neural networks to accelerate SEMAC MRI while maintaining comparable metal artifact suppression,43 as well as using unsupervised learning or attention maps from deep neural networks to guide correction.44 However, most of these studies rely on phantom data or MRIs of other organs of interest. There is a need for more research and development, including robust validation studies, to explore the full potential of AI in MSK MRI specifically.
Report generation
Generating accurate and informative reports is a crucial task for radiologists to convey their findings and interpretations to the referring physician in a clear, concise, and clinically relevant manner. To reduce the reporting burden on radiologists, natural language processing (NLP) techniques, such as recurrent neural networks, long short-term memory networks, and more recently, transformer-based models, such as bidirectional encoder representations from transformers and generative pre-trained transformer, can be utilized for generating radiological reports. These are trained on a large body of annotated radiological reports to learn the language and structure of report writing, as well as the relationships between imaging findings and clinical diagnoses. An additional speech recognition step can also add to the automation of the report generation process,45 creating a text output that can be considered a “preliminary report.” As radiology reports traditionally lack standardized structure and content, NLP can then be used for the extraction of meaningful or contextual information46 from the preliminary radiology report, whether traditional text or text from speech recognition. Applications range from the extraction of specific MSK data or follow-up recommendations47 to the generation of a final report of classification, diagnostic criteria, disease probability, or follow-up recommendations. However, AI may not capture the subtleties of human language, leading to reports that lack the nuanced communication often necessary between radiologists and referring physicians. Radiologists should view AI in report generation as a complementary tool that can assist with the reporting process but not as a replacement for the expert interpretation provided by a trained radiologist.
Considerations
Challenges defining ground truth data, benchmarks, and radiologists’ availabilities
To achieve the highest yield from AI technologies, it is imperative to have large and reliable ground truth datasets for training, validation, and testing. Ideally, these should be from several different sources and representative of diverse communities accessible by non-radiologists, such as AI researchers, engineers, and data scientists.48 The recent increase in the availability of such publicly available medical image banks and large-scale international AI challenges have catalyzed progress in the field, leading to the development of AI algorithms capable of handling different tasks, such as classification, detection, or segmentation, in different modalities.49-51 The ground truth required for the current supervised AI models requires a labor- and time-intensive curation process for ideal workflow and to ensure the generalizability of a model. Moreover, this process is subject to regulatory constraints, commercial and operational pressures, as well as epistemic differences and limits of labeling.52, 53 Annotated images and their respective radiology reports are available in hospital databases but due to ethical reasons are not readily available to developers. It is important to follow the regulatory procedures and obtain approval from responsible committees to ensure an ethical approach when accessing and sharing this data between developers.52
Radiologists rely on visual detection, pattern recognition, memory, and cognitive reasoning to consolidate a final interpretation while making decisions.4 Radiologists’ errors have a vast impact on medical errors, which constitute the third most common cause of death in the USA, following cancer and heart disease.54, 55 The error rate is approximately 4% in clinical radiology practices, which translates into 40 million errors out of 1 billion worldwide radiographs annually.4 Of particular importance, the distinction between an “error” and “observation variation” is highly relevant when creating such datasets. Imaging findings alone, without clinical information, are frequently not enough to definitively indicate a specific diagnosis. Consequently, interpreting radiologic studies is typically not a straightforward binary process of discriminating normal from pathologic entities. Professional acceptability lies on an arbitrary scale, between an obvious error and the unavoidable difference of opinion in interpretation.56 This is particularly of concern given that most clinical AI applications are developed using data generated by “expert radiologists.” Thus, these models are subjected to many kinds of human errors and biases and it falls on us humans to be cognizant of inequality, data availability, and privacy, ethical and medicolegal concerns with these rapidly evolving technologies.57, 58
The top five most influential radiology societies from the USA, Canada, Europe, Australia, and New Zealand recently released a joint statement on potential practical and ethical concerns in deploying and integrating AI in radiology practices. The key take-home statements, which also apply specifically to MSK radiology, include a strong recommendation for rigorous monitoring of its uses and safety in clinical practice, close collaboration between developers, end-users, and regulators, and strict adherence to all the regulatory steps from the development to deployment and integration in the clinical workflow.59 Radiologists in particular should be aware of automation bias as a potential source of error when working with AI tools in decision making.60
Model deployment
Deploying and maintaining AI models requires a robust infrastructure that addresses computational needs for both initial deployments using off-the-shelf pre-trained models and more advanced adaptations through fine-tuning. Most radiologists and clinical departments start with off-the-shelf pre-trained AI models. These models are developed on large, general datasets and can be used directly for common imaging tasks with minimal setup and without extensive customization. Standard computing hardware, including central processing units or modest graphics processing units (GPUs), can be used to run these models, making them accessible to most clinical environments.
Fine-tuning is necessary when adapting a pre-trained model to specific datasets or unique clinical scenarios in MSK radiology. This involves modifying the pre-trained model’s parameters to better fit the particular characteristics of the new data, such as custom protocols for rare conditions, integrating specific patient demographics, or adapting models to unique imaging modalities or contrasts, improving the performance and relevance of the model. From a computational perspective, fine-tuning is less resource-intensive than training a model from scratch, as the model has already learned useful features from the initial large-scale dataset. This can be particularly beneficial in medical imaging, where annotated datasets are often limited and expensive to acquire. For instance, a model initially trained on a large dataset of general MRI images can be fine-tuned on a smaller dataset of specific MSK conditions. Studies using this approach have been reviewed by Cheplygina et al.61, demonstrating improved performance on the tasks of interest. However, higher computational resources than those used for deployment are still needed for the fine-tuning process to handle the training workload. High-performance GPUs or tensor processing units are resources that can accelerate the processing of large datasets and complex model architectures during the training phase of fine-tuning. Cloud-based solutions with an environment that is secure and compliant with the Health Insurance Portability and Accountability Act also offer scalable resources that can be dynamically adjusted based on the computational load, making them ideal for training and deploying models without the need for local high-performance hardware.
Successful deployment of AI tools requires seamless integration into clinical workflows, which may involve Digital Imaging and Communications in Medicine (DICOM) standards and interoperability with various Picture Archiving and Communication System software, supported by robust infrastructure capable of handling ongoing model monitoring and updates to ensure sustained performance over time, adjust for any data shifts or incorporate new data, and maintain model relevance and performance.
Equitable medical artificial intelligence
The development and deployment of AI technologies in MSK radiology must be prioritized for fairness and justice. Algorithms should aim to mitigate biases, ensure accessibility to all demographic groups, and deliver personalized care tailored to individual needs, irrespective of socio-economic status or background. Doo and McGinty62 argue that bias in radiology AI stems from various stages of model design encompassing the selection of training data, algorithm development, deployment, and performance assessment. These biases, in turn, have repercussions on patient care and health outcomes. Notably, there is a lack of standardized protocols for demographic labeling in AI. Existing datasets often blur distinctions between crucial identifiers, such as sex and gender, or oversimplify complex racial categories, leading to distorted outcomes and predictive inaccuracies. Consequently, AI models trained on such biased datasets tend to reinforce preexisting biases, contributing to unintended consequences.
When contemplating advanced healthcare imaging within the AI landscape, a fundamental query arises: Is it possible to completely anonymize (deidentifying without any possibility of reidentification) data?63 At first glance, the task appears simple: selectively erase or encode identifiers within the metadata headers of images. Despite the widespread use of the DICOM standard for radiologic data, an increasing number of exceptions complicate efforts to establish standardized procedures. Recently, the progress in facial recognition technology has raised concerns about the potential for matching images from CT or MRI scans with individuals’ photographs. Consequently, it has become standard practice in medical imaging research to alter images using defacing or skull-stripping algorithms to eliminate facial features. Unfortunately, such alterations can undermine the generalizability of ML models developed using such data.64 The topic is extremely complicated in terms of types of biases and there are several remedies, which are almost impossible to comprehensively cover in the scope of the article. However, it is important to introduce the concepts of bias and equitable medical AI in MSK radiology and something to be conscious of while utilizing the AI tools.64 Some of the most common issues with MSK imaging in AI and potential solutions to those are listed in Table 3.
Conclusion: current trends and future directions
Integration of AI with other emerging technologies, such as augmented reality and virtual reality is enabling more immersive and interactive visualization of medical images. New tools may facilitate better surgical planning, training, and intraoperative guidance. Additionally, AI-assisted tools have a niche role in aiding radiologists who are training and provide an avenue for additional diagnostic opinion where multiple radiologists reading images is not feasible. Protocolling, which involves choosing the right imaging protocol to obtain the most diagnostic images for each patient, is supervised by a radiologist and is particularly important in MSK MRI applications where imaging protocols frequently require patient-specific tailoring. The limited number of research reports, using CNN and natural language classifier-based algorithms, have demonstrated encouraging outcomes.65-67 Nevertheless, it is important to acknowledge the diversity of MSK imaging protocols for a wide spectrum of clinical scenarios, where these tools should be fine-tuned and advanced by taking medical history, prior imaging studies, scanner-specific data, contrast information, and radiation exposure dose into account.68 AI can also offer dual working solutions for scheduling, by reducing both MRI times and waiting times by identifying no-shows or canceled appointments ahead of time.69 Finally, radiology reports are the final product of radiologists and are the means of communication of findings between physicians. ML can help generate decision-making algorithms as a support system based on the available information on the patient’s medical background.68, 70 Conversely, ML-based NLP can be a powerful tool to harness data from radiology reports and is currently being investigated.9
Conflict of interest disclosure
The authors declared no conflicts of interest.