EORTC PET response criteria are more influenced by reconstruction inconsistencies than PERCIST but both benefit from the EARL harmonization program

Background This study evaluates the consistency of PET evaluation response criteria in solid tumours (PERCIST) and European Organisation for Research and Treatment of Cancer (EORTC) classification across different reconstruction algorithms and whether aligning standardized uptake values (SUVs) to the European Association of Nuclear Medicine acquisition (EANM)/EARL standards provides more consistent response classification. Materials and methods Baseline (PET1) and response assessment (PET2) scans in 61 patients with non-small cell lung cancer were acquired in protocols compliant with the EANM guidelines and were reconstructed with point-spread function (PSF) or PSF + time-of-flight (TOF) reconstruction for optimal tumour detection and with a standardized ordered subset expectation maximization (OSEM) reconstruction known to fulfil EANM harmonizing standards. Patients were recruited in three centres. Following reconstruction, EQ.PET, a proprietary software solution was applied to the PSF ± TOF data (PSF ± TOF.EQ) to harmonize SUVs to the EANM standards. The impact of differing reconstructions on PERCIST and EORTC classification was evaluated using standardized uptake values corrected for lean body mass (SUL). Results Using OSEMPET1/OSEMPET2 (standard scenario), responders displayed a reduction of −57.5% ± 23.4 and −63.9% ± 22.4 for SULmax and SULpeak, respectively, while progressing tumours had an increase of +63.4% ± 26.5 and +60.7% ± 19.6 for SULmax and SULpeak respectively. The use of PSF ± TOF reconstruction impacted the classification of tumour response. For example, taking the OSEMPET1/PSF ± TOFPET2 scenario reduced the apparent reduction in SUL in responding tumours (−39.7% ± 31.3 and −55.5% ± 26.3 for SULmax and SULpeak, respectively) but increased the apparent increase in SUL in progressing tumours (+130.0% ± 50.7 and +91.1% ± 39.6 for SULmax and SULpeak, respectively). Consequently, variation in reconstruction methodology (PSF ± TOFPET1/OSEMPET2 or OSEM PET1/PSF ± TOFPET2) led, respectively, to 11/61 (18.0%) and 10/61 (16.4%) PERCIST classification discordances and to 17/61 (28.9%) and 19/61 (31.1%) EORTC classification discordances. An agreement was better for these scenarios with application of the propriety filter, with kappa values of 1.00 and 0.95 compared to 0.75 and 0.77 for PERCIST and kappa values of 0.93 and 0.95 compared to 0.61 and 0.55 for EORTC, respectively. Conclusion PERCIST classification is less sensitive to reconstruction algorithm-dependent variability than EORTC classification but harmonizing SULs within the EARL program is equally effective with either. Electronic supplementary material The online version of this article (doi:10.1186/s40658-017-0185-4) contains supplementary material, which is available to authorized users.

(Continued from previous page) PERCIST classification discordances and to 17/61 (28.9%) and 19/61 (31.1%) EORTC classification discordances. An agreement was better for these scenarios with application of the propriety filter, with kappa values of 1.00 and 0.95 compared to 0. 75 and 0.77 for PERCIST and kappa values of 0.93 and 0.95 compared to 0.61 and 0.55 for EORTC, respectively. Conclusion: PERCIST classification is less sensitive to reconstruction algorithmdependent variability than EORTC classification but harmonizing SULs within the EARL program is equally effective with either.
Keywords: PET, 18 F-FDG, Therapy response, PERCIST, EORTC, Harmonization Background 18 F-FDG PET is increasingly being used for response evaluation in cancer patients, in clinical routine or in clinical trials [1][2][3][4][5][6]. Two main schemas based on the degree of standardized uptake value (SUV) change following treatment are currently used: the European Organisation for Research and Treatment of Cancer (EORTC) criteria [7] and PET evaluation response criteria in solid tumours (PERCIST) [8]. However, many sources of error in SUV measurement exist [9][10][11]. In particular, technological improvements can lead to significant device-dependent and reconstruction-dependent variations in quantitative values [12][13][14]. This could lead to classification errors by exceeding thresholds used for discriminating between responding and non-responding tumours unless acquisition and processing of pre-and post-treatment scans are acquired on the same scanner and processed identically.
The European Association Research Ltd (EARL) accreditation program [15] is an SUV harmonization strategy aiming at minimizing the variability in SUV measurements by harmonizing patient preparation and scan acquisition and processing [16]. While many sources of error in SUV measurements are overcome by complying with the EANM guidelines for PET tumour imaging [17][18][19], reconstruction-dependent variations require either the use of an additional filtering step [20] or the generation of two sets of images: one to provide optimal diagnostic quality and another to meet quantitative harmonization standards [21]. Previous research from the collaborators in this study have shown that SUV max is more sensitive to reconstruction inconsistency than SUV peak [20] and that reconstruction inconsistencies may affect PERCIST classification [22]. Consequently, one could expect a more significant impact of these inconsistencies on EORTC classification, which is based on SUV max variation, than on PERCIST, which is based on SUV peak .
The aim of this study was to evaluate the impact of SUV reconstruction dependency on PERCIST and EORTC classification and the ability of the EARL program to minimize variability in response assessment. To assess this, we reconstructed the same PET raw data with an OSEM algorithm known to meet EANM requirements and also with PSF with or without TOF reconstruction (PSF ± TOF). Post-reconstruction filtering was then applied to the PSF ± TOF reconstruction with EQ.PET (Siemens Medical Solutions), a proprietary software solution allowing visualization of optimized images while simultaneously obtaining harmonized SUV values [20,23].

Patients
Sixty-one patients with non-small cell lung cancer (NSCLC) who were scanned for monitoring efficacy of chemotherapy, molecularly targeted therapies or radiotherapy were included. The cohort was comprised of 51 patients prospectively included in a multicentre study involving three PET centres and 10 patients included in a singlecentre prospective study. Informed consent was waived for this type of study by the local ethics committee (Ref A12-D24-VOL13, Comité de protection des personnes Nord-Ouest III) since the scans were performed for clinical indications, and the study procedures were performed independently without influencing clinical reporting.

PET systems
Data from the following three PET systems were used for this study: a Biograph 6 TrueV with PSF reconstruction, a mCT with PSF + TOF, and a Biograph 64 TrueV with PSF reconstruction (Siemens Medical Solutions). Both the Biograph systems were equipped with an extended axial field-of-view.

Patient preparation, PET acquisition and reconstruction parameters
All patients were requested to fast for 6 h prior to the 18 F-FDG injection. Patient height, weight and blood glucose levels were recorded. Patients were injected intravenously with 18 F-FDG, followed by a 60 min rest in a warm room.
A daily calibration of each PET system was performed with a 68 Ge source according to the manufacturer's protocol. A quarterly cross-calibration of each PET system was performed according to the EANM guidelines, as described elsewhere [17,18], and clocks from workstations were synchronized weekly.
Patients were scanned from the skull vertex or base to the mid-thighs. All raw PET data were reconstructed with the local PSF ± TOF settings for optimal lesion detection and an OSEM-3D reconstruction algorithm fulfilling the EANM guidelines regarding recovery coefficients (Table 1). Scatter and attenuation corrections were applied on all PET acquisitions.

EQ.PET methodology
For each PET system, the EQ.PET filter was calculated on the phantom data of each PSF ± TOF reconstruction as described in details elsewhere [21]. Briefly, the recovery coefficients (RCs; defined as the ratio between the measured and true activity concentration for each sphere) of a National Electrical Manufacturers Association NU2 phantom scanned as per EANM guidelines were aligned to the EANM reference RCs by applying a Gaussian filter.

PERCIST and EORTC evaluation
All PET exams were analyzed on Syngo.via software equipped with EQ.PET (Siemens Medical Solutions). For interpretation purposes, both the reconstruction for optimal Table 1 PET/CT acquisition and reconstruction parameters for the three participating centres  lesion detection (PSF ± TOF) and the OSEM reconstruction were displayed on the screen together with the EQ.PET-filtered harmonized SUV results for the tumour region(s) of interest. The EQ.PET-filtered images were not displayed on the screen. For PERCIST criteria [8], the measurable target lesion is the single most intense tumour site on pre-and post-treatment scans, which means that the target lesion is not necessarily the same pre-and post-treatment. As per EORTC PET response criteria, the volumes of interest (VOI) should involve the same tumour lesion on pre-and post-treatment scan.
In practice, the target lesion on baseline scan was chosen as the most intense lesion and located by scaling the 3D MIP view both on the OSEM and PSF ± TOF reconstructions. VOIs were drawn on one reconstruction and automatically propagated to the second set of reconstruction (propagation from OSEM to PSF ± TOF and vice versa). Within these volumes of interest, lean body mass SUV peak (SUL peak ) and SUL max were measured.
The same VOI methodology was used on the post-treatment scan, where the target lesion was chosen as the most intense lesion for PERCIST, while the same target lesion for baseline and post-treatment scans was used for EORTC classification.
Based on the SUL peak and SUL max variation between the pre-and post-treatment scans, patients were classified according to PERCIST and EORTC as follows: -Complete metabolic response (CMR): complete resolution of 18 F-FDG uptake in the tumour volume, with tumour SUL lower than liver SUL and background blood pool, and disappearance of all lesions if multiple.

Statistical analysis
Quantitative data from clinical PET/CT examinations are presented as mean (standard deviation ± SD). The relationship between PSF ± TOF, PSF ± TOF.EQ and OSEM quantitative values were assessed with Bland-Altman plots. Levels of agreement between the different types of reconstruction were evaluated using the kappa statistic. The use of OSEM reconstruction for both pre-and post-therapeutic PET examinations (OSEM-PET1 /OSEM PET2 ) was used as the "current standard" to classify the therapeutic response of each lesion and compared to other scenarios. Kappa values were reported using the benchmarks of Landis and Koch [24].
Graphs and analyses were carried out using Prism GraphPad and the Vassar University website for statistical computation (http://vassarstats.net).
Impact of reconstruction-dependent variation on SUL changes between baseline and post-treatment scans The same target lesion for baseline and post-treatment scans was used for EORTC classification except for two patients. The first patient displayed a large tumoural and nodal complex for which the EQ.PET software was unable to differentiate nodes from a tumour on post-treatment scan. The second patient had a complete disappearance of the initial target lesion in a patient with multiple tumour lesions, requiring to use the hottest remaining lesion on post-treatment scan.
The variations in SUL max and SUL peak between the pre-and post-treatment scans are shown in Fig. 2. For the OSEM PET1 /OSEM PET2 scenario, which was taken as the reference standard, the change in SUL max was −57.5% ± 23.4 and +63.4% ± 26.5 in the groups of tumours showing a decrease and an increase in 18 F-FDG uptake, respectively. For SUL peak , it was −63.9% ± 22.4 and +60.7% ± 19.6, respectively.
The use of PSF reconstruction impacted SULs, depending whether this reconstruction was used for the pre-or post-treatment scans. For example, OSEM PET1 /PSF ± TOF PET2 scenario reduced the apparent reduction in SUL in responding tumours (−39.7% ± 31.3 and −55.5% ± 26.3 for SUL max and SUL peak , respectively) but increased the apparent increase in SUL in progressing tumours (+130.0% ± 50.7 and +91.1% ± 39.6 for SUL max and SUL peak , respectively) as compared to the OSEM PET1 /OSEM PET2 scenario described above. Accordingly, inconsistent reconstructions induced discordant response classifications amongst the different scenarios, as described in the section below. Fig. 1 Relationship between SUL max and SUL peak in lesions extracted from PSF ± TOF or PSF ± TOF.EQ and OSEM images, assessed using Bland-Altman plots. Mean percentage difference between SUL max (a) and SUL peak (b) obtained with a conventional OSEM algorithm and those obtained with PSF ± TOF reconstructions are shown before and after application of the EQ.PET methodology. The red lines denote the 25% and 30% thresholds used to discriminate between stable metabolic disease and progressive metabolic disease with EORTC classification and PERCIST, respectively

Impact of reconstruction-dependent variation of SUL on PERCIST and EORTC evaluation
By using OSEM for the pre-and post-treatment scans, PET classified 7 patients as CMR, 18 as PMR, 14 as SMD and 22 as PMD according to EORTC classification (Fig. 3) and 7 patients as CMR, 14 as PMR, 17 as SMD and 23 as PMD according to PERCIST (Fig. 4). According to EORTC evaluation, CMR occurred in five patients with a decrease in SUL max to a level below the liver and blood pool background and in two patients to complete disappearance of the target lesions. PMD occurred in four patients with an increase in tumour SUL max greater than 25% and in 18 patients with new lesions on the post-treatment scan. According to PERCIST classification, CMR occurred in five patients with a decrease in SUL peak to a level below the liver and blood pool background and in two patients to complete disappearance of the target lesions. PMD occurred in five patients with an increase in tumour SUL peak greater than 30% and in 18 patients with new lesions on the posttreatment scan.
The agreement level between EORTC and PERCIST therapeutic evaluations was almost perfect with a kappa value equal of 0.84 (0.73-0.95). Eight discordances (13%) occurred: one patient classified as CMR with EORTC and PMR with PERCIST, one patient classified as PMR with EORTC and CMR with PERCIST, four patients classified as PMR with EORTC and SMD with PERCIST and one patient classified as SMD with EORTC and PD with PERCIST.
Agreement levels between the OSEM PET1 /OSEM PET2 scenario and other scenarios involving reconstruction inconsistency were found to be almost perfect with narrow confidence intervals for the scenarios using EQ.PET-filtered data either pre-or posttreatment and the reconstruction-consistent scenario for both EORCT and PERCIST classifications ( Table 2). For EORTC and PERCIST evaluations, agreement levels were moderate to substantial for the scenario OSEM PET1 /PSF ± TOF PET2 and PSF ± TOF-PET1 /OSEM PET2 , with wide confidence intervals. Noticeably, kappa values were lower for EORTC classification than for PERCIST, especially for the OSEM PET1 /PSF ± TOF-PET2 scenario (0.55 quoted as moderate vs 0.77 quoted as substantial).  Table 3 and Figs. 3 and 4 show the number of discordances in the EORTC and PER-CIST classifications that occurred for the different scenarios tested. The EORTC classification displayed more discordances than what PERCIST did for all scenarios. For example, the scenario OSEM PET1 /PSF ± TOF PET2 led to three patients being classified as PMR instead of CMR, seven as SMD instead of PMR, and nine as PMD instead of SMD with the EORTC classification whereas these same changes occurred, respectively, in two, five and three cases with the PERCIST classification. Figure 5 illustrates a patient classified as SMD according to the OSEM PET1 /OSEM PET2 standard of reference with EORTC classification and PERCIST, while PSF + TOF PET1/ OSEM PET2 led to PMR with both classifications and OSEM PET1 /PSF + TOF PET2 led to PD with EORTC classification.

Discussion
In the framework of therapy monitoring with PET, pre-and post-treatment scans should ideally involve identical scan acquisition and image processing. However, this is often impractical in busy PET centres, especially those running several scanners. This can also be challenged by a scanner upgrade during the conduct of a trial or when a patient relocates. Previous studies aimed at validating the EARL harmonization strategy in the clinical setting have shown that SUV max is more sensitive to reconstruction inconsistency than SUV peak or their lean body mass equivalents, SUL max and SUL peak . Consequently, one could expect a more significant impact of reconstruction inconsistencies on EORTC classification than on PERCIST.
In the present study, we evaluated the impact of inconsistent reconstruction on both EORTC and PERCIST response classifications, demonstrating variation in up to 31% of cases for EORTC classification vs up to 18% for PERCIST classification. Further, we showed that applying the EARL harmonization strategy provided more consistent response classification with kappa values greater than 0.93 for all the scenarios involving harmonized SULs, compared to the OSEM PET1 /OSEM PET2 scenario used as a standard of  reference. In line with its greater sensitivity to reconstruction inconsistencies, the EORTC classification benefited more from the EARL harmonization strategy, with kappa values increasing from 0.55 to 0.95 for the worst case scenario (OSEM PET1 /PSF ± TOF PET2 ), compared with an improvement from 0.77 to 0.95 for PERCIST (Table 2). This has practical advantages when there is variation of acquisition/reconstruction settings. This situation seems relatively common even in centres running the same PET system, as recently described by Sunderland and colleagues [25] in a survey involving 237 PET/CT systems in 170 international imaging centres with technology advancements spanning more than a decade, reporting that site-specific reconstruction parameters increased the quantitative variability of similar scanners, post-reconstruction smoothing filters being the most influential parameter. Harmonization has also practical advantages when the use of the same scanner for both scans is impractical, for instance in centres running two or more PET systems, as illustrated by the study by Skougaard et al. [26], in which 12 of 81 (14%) patients undergoing pre-and posttreatment PET in the same department were excluded for analysis because they were scanned on two different generation PET systems.
Taking, for example, the scenario of a system upgrade during a trial, the use of OSEM for the pre-treatment scan while using PSF ± TOF for the post-treatment scan led to discordant response assessments in 19/61 (31%) for EORTC classification and 10/61 (16%) for PERCIST (Table 3). Using a harmonization strategy (hereby aligning quantitative values to the EARL/EANM harmonizing standards with a proprietary filter, the EQ.PET methodology) either for the pre-or post-treatment scans gave almost perfect agreement levels in comparison with the OSEM PET1 /OSEM PET2 reference standard, with narrow confidence intervals. We observed only two discordances for the OSEM-PET1 /PSF ± TOF.EQ PET2 vs OSEM PET1 /OSEM PET2 scenario for both the EORTC and PERCIST classifications and three discordances which occurred for the PSF ± TOF.EQ-PET1 /OSEM PET2 vs OSEM PET1 /OSEM PET2 scenario for the EORTC classification. No discordance occurred for the PSF ± TOF.EQ PET1 /OSEM PET2 vs OSEM PET1 /OSEM PET2 scenario for PERCIST classification. The three discordances that occurred only with EORTC classification for the PSF ± TOF.EQ PET1 /OSEM PET2 were due to SUL max variations between the pre and post-treatment scans very close to the cut-off value of +25 or −25% with the standard scenario OSEM PET1 /OSEM PET2 resulting in changes from SMD to either PMR or PMD and vice versa for other scenarios.
It is noteworthy that consistent reconstruction (i.e. the PSF ± TOF PET1 /PSF ± TOF PET2 and PSF ± TOF.EQ PET1 / PSF ± TOF.EQ PET2 scenarios) did not give perfect agreement compared to the OSEM PET1 /OSEM PET2 standard of reference. These discordances were due to PSF reconstruction increasing SUV metrics in the tumours while not impacting the background (blood pool and liver) [27,28], leading to CMR being changed to PMR. Also, both the EORTC and PERCIST classifications were affected by %change in SUL close to +30%/+25% or −30%/−25% for the OSEM PET1 /OSEM PET2 scenario resulting in changes from SMD to either PMR or PMD and vice versa for other scenarios. A limitation of this study is that we used EQ.PET, a software solution developed for and applied only to scanners and reconstruction algorithms of the company that developed this product. EQ.PET has not been validated for equipment from other manufacturers but has been shown to be as effective as the alternative approach of obtaining a second reconstruction dataset, as recommended by the EARL accreditation program for quantitation [29,30]. The ability of this algorithm to correct for scans performed on different scanners and then processed with different reconstruction methods was not tested.

Conclusions
PERCIST classification is less sensitive to reconstruction algorithm-dependent variability than EORTC classification. The EORTC and PERCIST classifications would benefit from harmonization strategies such as the EARL accreditation program in multicentre studies or in sites equipped with multiple PET systems. Fig. 5 Representative images of a 66-year-old female with a NSCLC staged T1N2M0 or stage III according to AJCC stadification treated by chemotherapy. This patient was classified as SMD with EORTC classification and PERCIST according to the OSEM PET1 /OSEM PET2 standard of reference, while OSEM PET1 /PSF ± TOF PET2 , a scenario mimicking a system upgrade during a trial led to a PMD with EORTC classification. The use of the EQ.PET methodology correctly classified the patient as SMD. a MIP images and transverse slices at the level of a mediastinal nodal involvement on OSEM and PSF ± TOF reconstructions for baseline scan. b MIP images and transverse slices at the level of a mediastinal nodal involvement on OSEM and PSF ± TOF reconstructions for post-treatment scans. c % change in SUL max and SUL peak for EORTC classification and PERCIST according to the different scenarios