ABSTRACT
PURPOSE
Non-invasive assessment of iron deposition is the standard of care for guiding chelation therapy in patients with iron overload. Several magnetic resonance imaging (MRI)-based techniques have been developed. This study compares the MRI-based R2* method with the standard R2-based method for quantifying iron levels in the liver and heart in children and young adults with secondary iron overload.
METHODS
A single-center prospective study was conducted over 2.5 years involving 14 patients aged 4–22 years with secondary iron overload. These patients underwent 40 MRI scans using both R2 and R2* methods at same time. A total of 36 scans were analyzed, comparing the two methods using linear regression analysis and Bland–Altman plots.
RESULTS
The study shows a significant correlation between liver iron concentration measurements obtained using the R2* method and those obtained using the R2-based method (adjusted R2 = 0.77128). The agreement was even stronger for R2* values in the cardiac septum (adjusted R2 = 0.93483).
CONCLUSION
The R2* method for assessing iron deposition in the liver and cardiac septum is comparable to the R2-based method and is suitable for clinical use. However, due to slight differences in measurements between the two techniques, it is advisable to consistently use one method for monitoring treatment in each patient. Further research is needed to refine the calibration equations.
CLINICAL SIGNIFICANCE
This study highlights the MRI-based R2* method as a reliable, non-invasive, and cost-effective alternative to the R2-based method for monitoring iron overload in pediatric patients, with no additional costs for institutions or third parties.
Main points
• Magnetic resonance imaging is the standard clinical practice for assessing tissue iron deposition, particularly in children.
• The R2* method is comparable to the standard R2-based method for quantifying iron overload in children.
• Adherence to one method is crucial to avoid underestimation or overestimation of iron deposition.
Quantitative assessment of iron deposition in body organs is critical for the clinical management of patients with iron overload. Excessive iron in primary and secondary hemochromatosis is absorbed through the intestine and accumulates in various body parts. The liver is the main storage site for excess iron. However, when the liver storage capacity is exceeded, the iron overflows to other organs–including the reticuloendothelial system–such as the spleen, bone marrow, cardiac septum, and pancreas, leading to various complications (e.g., diabetes, cardiomyopathy, and, more commonly, liver fibrosis and cirrhosis). Only a small portion of iron is eliminated through the sloughed mucosa, feces, and menstruation. There is little correlation between ferritin levels and iron deposition, particularly in the heart.1, 2 Liver iron concentration (LIC) reflects body iron content. Therefore, LIC assessment is used as a surrogate to indicate total body iron deposition.2, 3 Liver biopsy is the gold standard for quantifying LIC. However, it has certain limitations, such as being invasive and inconvenient, requiring sedation, and the relatively small cross-sectional sampling volume of the hepatic parenchyma, which can lead to inaccurate results, particularly if the iron deposition is not homogeneous.2 Misestimated LIC measurements can lead to the progression of liver fibrosis and cirrhosis and undesired side effects of medications and chelation therapy.
In the last few decades, various non-invasive magnetic resonance imaging (MRI)-based techniques–both qualitative and quantitative–for assessing LIC have emerged, gained popularity, and are now used in clinical practice. Although they cannot provide objective measurements or detect slight iron deposition, qualitative methods, such as gradient echo (in-phase and out-of-phase) sequences, can alert possible iron overload. For example, iron deposition results in a decreased signal on the in-phase sequence. Quantitative methods include R2-based and R2*-based relaxometry techniques and the signal intensity ratio (SIR) method.4 Relaxometry techniques depend on MRI signal shortening with increased iron deposition, whereas the SIR method is based on the observation of a drop in liver signal intensity with increased iron deposition in comparison to reference tissues such as the paraspinal muscles. Developed by Clark and St Pierre5, the United States Food and Drug Administration-validated, widely accepted, and standardized method is based on R2 measurement and is commercially known as FerriScan® (Resonance Health, Australia). This method requires validation with an external phantom, involves additional cost, and takes several days to return quantitative results to the medical team. More importantly, it involves a longer scanning time (approx. 15 min), which makes it susceptible to breathing and motion artifacts. The R2* (R2* = 1/T2*) method has been proposed as a reliable alternative for assessing LIC. This method is faster (approx. 15 s), and R2* values can be converted to tissue iron content using appropriate calibration curves.
Research groups have published calibration equations based on the T2*/R2* technique; however, there has been little independent validation of these calibration equations in clinical settings, with no consensus on the ideal acquisition protocol. It is noteworthy that LIC measurements vary between institutions due to differences in acquisition parameters and post-processing fitting algorithms. Therefore, internal validation of these parameters is necessary before providing clinical results.
Although images for the R2* relaxometry technique are easy to acquire, the need for post-processing steps limits their adoption in clinical practice.6 Several MRI vendors have attempted to promote the use of this method by providing software packages that help analyze the images and convert R2* values into clinically relevant LIC values. This is conducted at the scanner console or local workstation by drawing a few regions of interest (ROI) over the right lobe of the liver and cardiac septum. This software varies according to the vendor and the sequences acquired.7 However, there is limited data on the validation of these potentially useful products. At IWK health, we use a 1.5 T GE magnet with STARmap analysis via CardiacVX software Wipro GE Healthcare Pvt. Ltd. In this study, we aimed to validate the results of this software in comparison to the standard FerriScan® method to eliminate the need for third-party processing. We expect that this will benefit our institution, as well as other facilities with a similar setup.
Methods
Patient selection and study design
This prospective study is compliant with Health Insurance Portability and Accountability Act standards and the principles of the Declaration of Helsinki, and was approved by the IWK Institutional Research Ethics Board (# 1021517), with approval obtained in April 2018 and renewed annually until completion of the study. All patients with known secondary iron overload referred to by their primary hematologist, gastroenterologist, or treating physicians for MRI evaluation of iron overload at our institution were recruited over 2.5 years and consent was obtained at the time of the scan. A total of 40 scans were performed for 14 patients between October 2020 and April 2023. Of these, four scans were excluded from the analysis as they were considered technically inadequate, either due to motion artifacts or lack of specific liver or cardiac sequences.
Magnetic resonance imaging studies
FerriScan® studies were performed according to the specified protocol. This included five sequences of axial spin-echo images with variable time to echo (TE) of 6, 9, 12, 15, and 18 ms. The field of view (FOV) was 34 cm, with a 5-mm slice thickness and 5-mm spacing, covering 11 slices. The non-breath-hold (BH) scan duration was 2 min and 28 s for each sequence. Other parameters included repetition time (TR) = 1,000 ms, matrix = 256 × 192, bandwidth (BW) = 62.5 kHz, and number of excitations (NEX) = 1. A bag of normal saline was used as a phantom for imaging to provide a reference signal intensity for measurement correction purposes in case of any potential machine drift.
A single BH technique was used to obtain non-contrast-enhanced T2* axial gradient-echo images at the level of the main portal vein, with increasing TE sequences. Noise correction was applied, but fat correction was not performed. The parameters for this sequence were TR = 170 ms, multi-echo TE = 0.9–7.7 ms at regular intervals of approximately 0.97 ms, flip angle (FA) = 10°, BW = 50 kHz, and matrix = 220 × 220. The inherent phased-array body coil was used as the transmitter coil, and a receive-only coil was used for signal collection. Furthermore, a single mid-ventricular short axis 2D GE (multi-echo fast gradient-recalled echo) cardiac gated slice was imaged at eight TEs ranging from 2.09 to 19.9 ms, with increments of 2.53 ms. The FOV was 40 cm, and the slice thickness was 10 mm (no gap). Other parameters included FA = 20°, frequency matrix = 224, phase matrix = 128, NEX = 1, and BW = 83.33 kHz. Fast imaging employing steady-state acquisition sequences were used to obtain axial, two-chamber, four-chamber, and then short-axis views. All scans were performed on a single 1.5 T MRI scanner (Signa HD TwinSpeed, 2002/hardware update in 2012; GE Healthcare, Milwaukee, WI, USA).
Image analysis
The FerriScan® sequences were sent to the primary company, Resonance Health Center in Australia, for quantitative analysis as per routine clinical practice. The multi-echo gradient images were transferred to the onsite Advantage Workstation (AW, GE HealthCare) for diagnostic imaging processing. Three small ROIs, each at least 1 cm in diameter, were manually drawn in the right lobe of the liver on areas that appeared homogenous in signal intensity, devoid of vessels or biliary trees, and away from the diaphragm (Supplementary Figure 1a). The mean value of these ROIs was used for analysis. The fitted curve was evaluated, and truncation was occasionally used to remove late outlier points to account for the plateau observed due to the low signal-to-noise ratio at later TE values.7-9 The R2* value was converted to LIC using the vendor-provided STARmap analysis via CardiacVX software, which is based on Dr. Wood’s calibration formula [Fe (mg/g) = 0.0254 × R2* + 0.202].10
For cardiac analysis, a homogeneous full-thickness ROI was drawn on the cardiac septum to avoid the epicardium and blood pool (Supplementary Figure 1b). Studies have shown that mid-ventricular septal iron correlates well with global left ventricular iron concentration.7, 11
Magnetic resonance imaging-based quantification of liver iron concentration and T2 relaxometry
The severity of liver iron overload using MRI techniques is categorized as follows according to the literature:
• Normal: <1.8 mg/g (<32 μmol/g)
• Borderline: 1.8–3.2 mg/g (32–57 μmol/g)
• Mild: 3.2–7.0 mg/g (57–125 μmol/g)
• Moderate: 7.0–15.0 mg/g (125–269 μmol/g) (increased risk of complications)
• Severe: >15.0 mg/g (>269 μmol/g) (high mortality risk)12
The T2* thresholds for iron overload severity
T2* > 20 ms: normal
T2* 10–20 ms: mild iron overload
T2* 5–10 ms: moderate iron overload
T2* < 5 ms: severe iron overloads13, 14
Statistical analysis
The T2* values were automatically transformed into reciprocal R2* values for analysis to obtain a positive linear correlation: R2* (Hz) = 1,000/T2* (ms). The mean R2* value from the three hepatic ROIs was used for analysis. The R2* was converted to LIC using the vendor-specific software, in line with Dr. Woods’s calibration-based formula. The FerriScan® LIC was considered the “gold standard” for comparison. The agreement between LICs calculated by the R2* and FerriScan® methods was assessed using linear regression analysis and Bland–Altman analysis, the latter of which characterizes both systematic differences (bias) and random fluctuations (variance).
The data followed a normal distribution pattern. Due to the limited sample size and challenges in collecting enough pediatric cases in a reasonable time frame–and as the primary focus of the study was to assess the images’ characteristics and not the longitudinal changes or patient-specific effect–we assumed independence of the images and thus used linear regression analysis and Bland–Altman testing.
Results
Overall, 36 MRI scans from 14 patients were included in the analysis (Figure 1). Each scan included both FerriScan® and STARmap sequences and analyses (Supplementary Figure 2). The age range of our cohort was 4.0–22.0 years, with mean and median ages of 14.56 and 17 years, respectively. The group consisted of nine male patients and five female patients (Table 1). However, the female group underwent more MRI follow-up scans during this period, resulting in an equal distribution of the MRI studies between male and female patients, with each group representing 50% of the total scans analyzed. Among these patients, two had sideroblastic anemia, one had Blackfan–Diamond anemia, one had pyruvate kinase deficiency anemia, and the remainder had B-thalassemia major disease.
The LIC measured by FerriScan® ranged from 2.4 to 39.5 mg/g tissue dry weight (DW), with an average of 19.44 mg/g DW. In comparison, R2* values ranged from 96.4 to 894.5 Hz, corresponding to LIC values of 2.6–22.9 mg/g DW, with an average of 12.9 mg/g DW. Cardiac R2* values assessed by FerriScan® ranged from 19.0 to 369.0 Hz, whereas cardiac R2* values assessed locally ranged from 20.0 to 282.0 Hz (Table 2).
The regression analysis showed a substantial correlation between LIC values calculated by the R2* method and those measured by FerriScan®, with an adjusted R2 of 0.77128. This indicates that approximately 77% of the LIC values obtained using the FerriScan® method can be predicted or accounted for by the values obtained using the R2* method. This finding indicates that the R2* method is a reliable predictor of LIC values, although not a perfect match with FerriScan® measurements. The analysis yielded a slope of 0.43054 and a y-intercept of 4.539, suggesting a consistent relationship between the two measurement techniques.
The Bland–Altman analysis revealed a bias of 6.53, meaning FerriScan® values were, on average, 6.53 mg/g DW higher than those estimated by the R2* method. The standard deviation was 7.239, with limits of agreement between −7.658 and 20.719. This means that in most cases, the difference between the two methods would fall within this range. The confidence intervals ranged from −3.43 to −11.8 and from 16.49 to 24.944. The highest agreement was observed for LICs below 12 mg/g DW, as shown in the scattergram and regression line in Figure 2a.
The cardiac analysis indicated an excellent correlation between the two methods, with an adjusted R2 of 0.93483. This suggests that the differences or changes seen in the R2* values (measured using FerriScan®) can mostly be predicted using the values measured locally. As such, the local R2* method is almost as good as FerriScan® in measuring these values. The slope was 0.763, with an intercept of 11.05, indicating a strong relationship between the two sets of measurements (Figure 2b).
Discussion
This project evaluates the reliability of the local R2* method for the assessment of the iron overload in the liver and heart compared with the gold standard FerriScan® method. Our findings demonstrated substantial agreement between the R2* method and FerriScan® in estimating LIC, particularly in cases of mild to moderate iron overload (LIC < 12 mg/g DW). This is consistent with the findings obtained by Meloni et al.16, who concluded that signal decay models result in clinically acceptable estimations of LIC provided the ROIs are correctly drawn and the proper calibration curve is used to correct for any systematic differences in R2* estimation.14, 15 The agreement between the two methods was highest at LICs below 12 mg/g DW, in line with the findings by Abou Zahr et al.17, where the best agreement was observed at LICs below 15 mg/g DW in their cohort. This suggests that both methods perform similarly in the range of mild to moderate liver iron deposition.
In our cohort, FerriScan® values consistently showed higher LICs compared with the R2* method, with a positive bias of 6.7%. This overestimation of FerriScan® was previously reported in Reeder et al.’s12 study. We believe that this difference could be partially attributed to the calibration equation used in the CardiacVX software, which is based on Wood et al.’s10 2005 study. Although this equation was confirmed by Hankins et al.18 in 2009, Meloni et al.16 later demonstrated a 15%-lower scaling coefficient between R2* and LIC. It is worth noting that Dr. Wood’s equation was originally formulated on a single-echo sequence, whereas our images were based on multi-echo sequences. However, this may have had a limited contribution, as cross-validation of single-center and multicenter R2* relaxometry methods, including both single and multi-echo sequences, has been performed in earlier studies and has demonstrated no significant difference3 Since FerriScan® is considered the gold standard, the observed overestimation by this method should be considered when interpreting results. Although R2* values might underestimate iron content compared with FerriScan®, this discrepancy could be due to the calibration model used by the latter, rather than an inherent inaccuracy in the R2* method itself. Therefore, while FerriScan® may provide more accurate results in the context of our study, both methods are valuable for assessing LIC; however, discrepancies in calibration should be considered.
For validation, we plan to reanalyze the measured ROIs using another calibration equation, such as Garbowski’s et al.19 2014 equation, which follows parameters closer to ours, to determine whether better agreement can be reached.20 This would require an update of the software to set the latter equation as the default one. The wide range of agreement between the two methods have been previously explained by both Wood et al.10 and Clark and St Pierre5, and may be attributed to the spatial variability of iron concentration within the liver. Other confounding factors, such as iron particle size, shape, and local metabolites (fat and fibrosis), that affect signal relaxation might also contribute to this wide range.
The cardiac analysis demonstrated a substantial agreement between our analysis and that of FerriScan®. The difference in measurement is not surprising, as variations in R2* values with different ROI sizes and measurements have been reported. Thus, we opted to use the mean values of three ROIs for the analysis.18-20 The intra- and inter-rater, as well as inter-scanner reproducibility of R2* analysis, was assessed in previous work by Hernando et al.3, who further demonstrated that the calibration from different studies can be translated, improving the utilization of R2* mapping. Kirk et al.21 also tested the reproducibility of the R2* technique among five different international centers on different scanners and concluded that the measurement of tissue T2* (heart and liver) can be achieved reproducibly between centers across the world, provided appropriate vendor-specified sequences are followed, along with appropriate software analysis packages and calibration curves.1, 22 The inter-scanner and inter-center reproducibility and transferability were also evaluated and confirmed in multiple other studies.8, 9
The strength of our study lies in the prospective collection of cases and the simultaneous acquisition of sequences for both methods, specifically focusing on pediatric patients, which minimizes confounding factors, such as fatty infiltration and cirrhosis, that could contribute to signal alterations.22However, the study also has several few limitations. First, the study has a small sample size, a common challenge in pediatric research. Second, although Dr. Wood’s 2005 equation is the first published calibration formula and still the most widely used equation, in future studies, we aim to analyze ROIs with a calibration equation more aligned with our parameters to potentially improve agreement.
In conclusion, this study successfully validated the use of R2* MRI for assessing iron overload in the heart and liver of children with secondary iron overload. While R2* measurements do not perfectly align with FerriScan® results, the R2* is a reliable predictor of LIC. This emphasizes the importance of using a consistent method for assessment and follow-up. Monitoring trends in iron concentration is crucial for adjusting clinical management. A larger multicenter validation of the R2* method is necessary to establish its reliability across various settings and populations.