This work aimed to analyse the differences between 99mTc-MAA-SPECT/CT and 90Y-microsphere PET/CT dosimetries at different levels, including the introduction of a voxel-to-voxel comparison using the QVH concept. DVHs are extremely useful to compare dose plans; however, other evaluation criteria can be considered to complement and improve the situations. A major drawback of the DVH method is the lack of spatial information of dose distributions, i.e. DVHs do not show where within a specific volume a dose is received [29]. Therefore, comparing post-treatment and predictive DVHs of the same VOI, to assess the agreement between DPredicitve-D and DPost-treatment-R heterogeneity, may be insufficient. Two similar DVHs could correspond to different spatial dose distributions. However, QVHs are based on a voxel-by-voxel comparison and deal with dose differences directly. Thus, even though a QVH does not give either complete spatial information, it compares voxels at the same location and is therefore suited to assess DPredicitve-D and DPost-treatment-R heterogeneity differences (see Fig. 3). A future perspective would be to display the value of the Qi of each voxel to obtain a Qi map, which would show the location of the difference. Recently, Ferreira et al. investigated the value of 99mTc-MAA-SPECT/CT-based predictive dosimetry, using the gamma-index (γ-index) analytical method [30]. This γ-index defines a combined evaluation of geometrical (distance to agreement) and dosimetric (dose difference) accuracy. In their paper, the acceptance criteria generally used in EBRT were also adapted in the case of radioembolization, passing rates > 90% were achieved for 15 mm/15% tolerance criteria. Importantly, several studies in EBRT demonstrated that gamma analysis was not correlated to the clinical impact of a dose discrepancy [31, 32]. Furthermore, the γ-index is taking into account at the same time spatial and dose discrepancies; thus, the outcome of the evaluation does not provide sufficient precision for very small lesions. In our case, we overcome the spatial mismatch by performing an oriented DIR based on the CT and liver delineation information. This way, our main investigation could solely focus on the dose differences. The proposed QVH method is not intended to replace the DVH and γ-index analyses, but to bring additional information to assess the compliance between the radioembolization predictive and post-treatment dose distribution.
Our results confirmed that the implementation of QVH from EBRT into radioembolization is feasible and gives complementary information to the DVH-based analysis for the comparison of predictive and post-treatment 90Y-microsphere radioembolization dosimetry. The QF provides a rapid and easy interpretation of the agreement between predictive and post-treatment dosimetries.
In contrast to dose-painting in EBRT for target volume, absorbed dose ranges encountered in radioembolization are much wider, with voxel doses ranging from 0 to 250 Gy or higher, including target and normal tissue VOIs. Therefore, the QVH model from EBRT must be adapted to the context of radioembolization. Notably, the QVH used in EBRT is calculated from the direct dose ratio while we introduced the log of the ratio. Since dose differences can be much higher in radioembolization than in EBRT, the ratio alone would have produced very asymmetrical QVHs. In particular, doses infinitely higher in the predictive dose matrix compare to post-treatment dose matrix would produce a QVH tending to zero and a QF tending to one. On the contrary, doses infinitely higher on the post-treatment dose matrix compared to the predictive dose matrix would have led to both the QVH and QF tending to infinity. Two such extreme scenarios should produce the same outcome: a QF demonstrating a poor concordance. The introduction of the log of the ratio makes the QVH symmetrical with respect to zero, and both scenarios then yield a very large QF.
Also, dose differences between very low doses (< 10 Gy) or high dose (> 200 Gy) would have led to high Qi (e.g. a voxel receiving 1 Gy at the predictive dosimetry and 3 Gy at the post-treatment dosimetry has a Qi of log103 ≈ 0.48) and would therefore highlight a large discrepancy between predictive and post-treatment dosimetry, while it would not be considered as such in clinical practice. Besides, this could hinder accurate assessment of the agreement between predictive and post-treatment dosimetry, as a discrepancy between two very low doses (< 10 Gy) or two high doses (> 200 Gy) at the predictive/post-treatment dosimetry leads to equivalent or even higher Qi than two different doses comprised between 40 and 120 Gy. In our previously published report, post-treatment absorbed dose cut-offs of 60 Gy and 40 Gy for predicting respectively metabolic response and non-response were also defined [8]. Similarly, absorbed dose > 50 Gy and > 40–60 Gy also provided better metabolic response in two studies [6, 9]. In these trials, lesions that received more than 100–120 Gy had a higher probability of complete metabolic response. Therefore, weighting factors were defined for each voxel and according to clinical data in mCRC patients, to limit the influence of low and extremely high doses on the QVH shape [6, 8, 9]. This parameter should be adapted in case glass spheres are used for the radioembolization (as their specific activity differs from that of resin spheres) and/or with other types of liver cancer [33]. The Wi was applied to the voxel volume contribution into the QVH. Other options would have been to apply directly the Wi to the Qi or not to apply Wi factors at all. Further research using patient clinical outcome is needed to decide on the best approach.
To facilitate interpretation of QVHs and QFs, we introduced cut-offs to classify QVHs into good (QF < 0.18), acceptable (0.18 ≤ QF < 0.3) and poor (QF ≥ 0.3). Admittedly, these cut-offs were arbitrarily defined as a first-estimate analysis. Cut-offs of 0.18 and 0.3 correspond to cases where one of the dose maps is systematically 33% and 50% lower than the other, respectively. Further research is needed to define cut-offs linked with clinical outcomes (prediction of treatment failure and disease-free survival).
Therefore, because of the assumption made on the weighting factors and the arbitrary categorization of QFs, the clinical conclusion derived from the QVH analysis is limited.
DVH results showed good agreement between DPredicitve-D and DPost-treatment-R for individual lesions, whole tumoural liver (TL) and non-tumoural liver (NTL). Notably, the DVH analysis showed no significant difference in terms of Dmean, confirming results from previous studies [10, 11]. Hence, previously defined 90Y-PET/CT-based Dmean cut-offs may be used at the predictive dosimetry for determining the activity of 90Y-microspheres to administer [6, 8, 9]. Statistically significant dose differences between predictive and post-treatment dosimetries were found in NTL for D90 and D70 (10 vs. 5 Gy, p < 0.0001 and 20 vs. 16 Gy, p = 0.005, respectively), but can be considered clinically not significant.
For lesions and TL, DVH and QVH results are mostly concordant. In terms of dose distribution correspondence assessed with QVH, 69% of lesions had a QF < 0.3 (40% < 0.18) and 65% of TL had a QF < 0.3 (23% < 0.18). These results suggest that dose heterogeneity in lesion and TL could be reasonably predicted by the 99mTc-MAA predictive dosimetry. Interestingly, several studies suggested that the lesions/TL minimal dose (Dmin) would be an interesting parameter to take into consideration for activity prescription, to ensure that the entire volume receives at least Dmin [8, 34]. Our results, and principally DVH analysis, support that it would be possible.
On the other hand, DVH and QVH analyses of the NTL showed mixed results. In the QVH analysis, only 12%/40% of NTL comparisons resulted in good/acceptable agreement, while 48% showed poor agreement between DPost-treatment and DPredicitve. Therefore, QVH findings brought additional information to the DVH analysis and highlighted that NTL dose heterogeneity on the post-treatment dosimetry might differ from the one predicted by DPredicitve. This should be taken into consideration in ongoing and future clinical trials aiming to define new NTL dose cut-offs and/or to combine DPredicitve heterogeneity with liver function, e.g. assessed with 3D hepatobiliary-scintigraphy, to predict treatment toxicity [35, 36]. These differences in dose heterogeneity can be partly explained by the difference between the number of administered 90Y-microspheres and 99mTc-MAA. Recently, Walrand et al. showed that the physical embolization redirected a part of the resin microspheres to other parts of the arterial tree because of the high number of microspheres (40–80 million). As the capillaries of the tumours are progressively embolized (due to the high number of resin microspheres) during the administration, the redirection will increase and transport more resin 90Y-microspheres into the NTL. On the contrary, the physical embolization and redirection with glass spheres were negligible because of their lower number (1.2 million), which is comparable with the MAA particle number (2–4.5 million) [33, 37]. Therefore, differences in dose distribution in the NTL could be explained by the much higher number of resin 90Y-microspheres, within the NTL compared to 99mTc-MAA. Other possible sources of difference in dose distribution between predictive and post-treatment dosimetries are that some of the MAA particles are smaller than the microspheres (10–70 μm and 20–60 μm, respectively), which can lead to different distribution patterns and shunt to extrahepatic organ and consequently to different dose deposition [37]. Also, because of the degradation of MAA in the liver, the dissociated 99mTc-pertechnetate can hinder an accurate evaluation of the dose distribution [38]. Importantly, all patients were orally administrated with 1 mL sodium perchlorate (Irenat, Alliance Pharmaceutical®, Chippenham, UK) to avoid dissociated 99mTc-pertechnetate uptake to non-targeted organs (gastric region, thyroid gland). These effects will also impact the dose distribution within the lesions and especially larger ones. However, because of the vascular properties and the smaller volume of the tumour in mCRC patients, the impact of these effects on the differences between predictive/post-treatment dosimetry will be lower than for the NTL.
QVHs could be also used to identify predictive factors for the differences between DPredicitve and DPost-treatment. In our study, a delay between predictive and post-treatment dosimetry > 9 days was associated with a significantly higher QF across all VOIs. Importantly, the 9 days cut-off used in this study was the median value and was not derived through a method to optimize the predictive power, such as a receiver operating characteristic curve, that could be part of further work. Radinsky et al. demonstrated that mCRC expresses high levels of vascular endothelial growth factor promoting angiogenesis and tumour growth, contributing to their relatively poor prognosis [25, 39]. QVHs could be used to define a maximal delay between predictive and post-treatment dosimetry to limit anatomical/vascularization modification caused by disease progression and thereby to maximize the conformity of dose distributions. The catheter tip position variation between predictive and post-treatment dosimetry has also been reported as a critical variable predictive of dose differences [11]. In this report, only patients with the same catheter position between predictive and post-treatment dosimetry (assessed by an interventional radiologist) were included.
The combination of DVH and QVH analyses allows a more extensive assessment of radioembolization. This quality assurance process could have two different impacts:
Firstly, even if the evaluation per patient is de facto performed after the treatment, the results of this study support that optimizing radioembolization activity prescription using 99mTc-MAA dose heterogeneity is feasible in patients with liver mCRC. Notably, Dmin could be used instead of Dmean, to ensure that the entire volume will be sufficiently treated. Additionally, QVH analysis could be used to identify factors impacting the agreement between predictive and post-treatment dosimetry such as the delay between the two dosimetries. Nevertheless, this would need to be investigated in future trials. Therefore, the results of this quality assurance process could benefit to future patients by assessing the entire predictive value of pre-treatment dosimetry which could enable to use it at its full potential for personalizing the activity of 90Y-microspheres to administer.
Secondly, it can be used to determine for a specific patient if the post-treatment dosimetry was performed in concordance with the predictive dosimetry and therefore with the therapeutic intent (pre-operative setting, bridging to surgery or palliative setting). Performing this quality assurance process compels the clinicians to compare predictive and post-treatment dosimetry and, in the case of large discrepancies, it can alert clinicians and could streamline the decision to retreat. Indeed, for example, in case post-SIRT dosimetry shows that lesions were underdosed completely or partially in comparison with what was planned based on the predictive dosimetry, clinicians could then identify the possible reason for the discrepancy (e.g. different catheter positioning). A new treatment could be then proposed and adapted with this information. Or in case of a NTL overdose, clinicians can decide to monitor liver toxicity more closely.
Ultimately, both the quality assurance process, based on combined DVH and QVH analyses, and an enhanced personalized radioembolization could contribute to a better patient outcome.
The end-to-end processing time for QVH calculation and evaluation is around 20 min. Several software packages were used to implement the QVH method (Fig. 1) as at the time of this investigation, no single solution was able to cover the entire process. Importantly, dose-matrices computation and DIR were obtained using clinically available software. Performance and quality of the DIR algorithm are essential, as QVH computation required voxel-to-voxel association. Therefore, the DIR was performed using the HybridReg solution, which was validated for several regions (including the torso region) [27]. To have the same voxel size between dose matrices DPost-treatment (pixels of 2.73 × 2.73 mm with a slice thickness of 3.27 mm) was resampled to the grid of DPredicitve (pixels of 4.79 × 4.79 mm with a slice thickness of 4.79 mm), which consequently reduced miss-registration errors. Also, we chose the largest available (5 mm) isotropic knob size for the deformation, to avoid any overfitting and limit the freedom of the DIR. Finally, the overall induced differences between original/processed dose matrices were within clinically acceptable limits (Fig. 4). Notably, dose differences before vs. after DIR of the predictive dose matrix were up to 10 Gy for TL. To increase the speed and improve the reproducibility of the method, it would be required to implement the entire process into a single software and to make an extensive analysis of the DIR performance. This will be advocated before using QVH in clinical routine and/or in multicentre trials.
Several limitations of this study should be noted. The study is subject to bias, due to its retrospective character. Also, the small sample size limits the generalizability of our results to other datasets. The number of lesions per patient was not restricted. The impact of acquisition and reconstruction parameter on QVH results was not tested but should be an important point to investigate in a future trial. Even though several actions were undertaken to maximize the performance of the DIR, its influence on QVH results should be further evaluated. Weighting factors should be adapted in case glass spheres are used and/or with other types of liver cancer. This study did not include any clinical outcome to conclude on the real influence of a good/bad matching between predictive and post-treatment dosimetry on patients. Notably, cut-offs for classifying QFs into good, acceptable and poor categories were defined arbitrarily which influences our results. Further studies should intend to better define them. Thus, the clinical conclusions derived from the QVH analysis are limited. Finally, our method and results must be validated in prospective multicentre studies.