Signal separation of simultaneous dual-tracer PET imaging based on global spatial information and channel attention

Background Simultaneous dual-tracer positron emission tomography (PET) imaging efficiently provides more complete information for disease diagnosis. The signal separation has long been a challenge of dual-tracer PET imaging. To predict the single-tracer images, we proposed a separation network based on global spatial information and channel attention, and connected it to FBP-Net to form the FBPnet-Sep model. Results Experiments using simulated dynamic PET data were conducted to: (1) compare the proposed FBPnet-Sep model to Sep-FBPnet model and currently existing Multi-task CNN, (2) verify the effectiveness of modules incorporated in FBPnet-Sep model, (3) investigate the generalization of FBPnet-Sep model to low-dose data, and (4) investigate the application of FBPnet-Sep model to multiple tracer combinations with decay corrections. Compared to the Sep-FBPnet model and Multi-task CNN, the FBPnet-Sep model reconstructed single-tracer images with higher structural similarity, peak signal-to-noise ratio and lower mean squared error, and reconstructed time-activity curves with lower bias and variation in most regions. Excluding the Inception or channel attention module resulted in degraded image qualities. The FBPnet-Sep model showed acceptable performance when applied to low-dose data. Additionally, it could deal with multiple tracer combinations. The qualities of predicted images, as well as the accuracy of derived time-activity curves and macro-parameters were slightly improved by incorporating a decay correction module. Conclusions The proposed FBPnet-Sep model was considered a potential method for the reconstruction and signal separation of simultaneous dual-tracer PET imaging. Supplementary Information The online version contains supplementary material available at 10.1186/s40658-024-00649-9.


Introduction
Positron emission tomography (PET) is a molecular imaging technology measuring metabolic functions in vivo by radionuclide-labeled tracers.Benefiting from the development of various types of PET tracers, a growing number of biomarkers can be measured to aid in the early detection and diagnosis of diseases [1][2][3].
For diseases with complex pathological characteristics, measurements of more than one biomarker are needed, e.g., the amyloid-β and tau in Alzheimer's disease [4].While this is currently accomplished by performing separate PET scans with different tracers [5], it is natural to think of using multiple tracers in a single scan for efficiency.However, the difficulty of signal separation in multi-tracer PET imaging has been an obstacle to its application, since all tracers emit 511-keV photons generated from the positron-electron annihilation, making it impossible to distinguish different tracers during the detection.
Signal separation, or reconstruction, of rapid dual-tracer PET imaging has been studied extensively.Except a small portion of studies using special tracers that emit additional prompt γ-rays [6][7][8][9], the main idea of most separation approaches is to make use of distinct temporal characteristics of different tracers to separate the mixed dual-tracer signals, as reviewed in [10].In that case, a dynamic PET scan is necessary.
Before the rising of deep learning, methods based on the difference in tracer half-lives were firstly proposed [11] and applied [12].Although established the foundation of using temporal characteristics to separate dual-tracer signals, these methods could only be applied when tracer concentrations reached the equilibrium.Later proposed methods were mainly based on the kinetic modeling of dual-tracer time-activity curves (TACs) by parallel compartment models [13][14][15][16][17][18][19][20].After estimating the model parameters, the single-tracer TACs could be easily recovered.Other approaches free of compartment models included principal component analysis [21], generalized factor analysis [22], basis function fitting [23], spectral analysis [24] and recurrent extreme gradient boosting [25].Nevertheless, these methods were limited by the requirement of arterial blood sampling or staggered injection, or might be sensitive to tracer pairs and the order of injection.
The deep learning methods do not require the arterial blood sampling, and can be applied to simultaneously injected tracers.These methods can be divided into two categories.One is the indirect reconstruction methods, which firstly reconstruct the dualtracer images by traditional algorithms, then separate the images by neural networks [26][27][28][29][30][31][32].The separation can be performed to either the voxel TACs [26][27][28][29][30] or the dynamic image as a whole [31,32], with the latter utilizing the spatial information in addition to the temporal information.
The other category is the direct reconstruction methods that reconstruct the singletracer images from the dual-tracer sinogram by usually the convolutional neural networks (CNNs), like FBP-CNN [33] and Multi-task CNN [34,35].The reconstruction part of FBP-CNN adopted a two-dimensional convolution layer to approximate the spatial filter, and a fully-connected layer to approximate the back-projection in the traditional filtered back-projection (FBP) algorithm.And the separation part used three-dimensional (3D) convolution kernels to learn the spatiotemporal features.In the Multi-task CNN, an encoder-decoder framework was adopted instead of the reconstruction-separation framework, and the multi-task learning mechanism was incorporated.The network was fully composed of 3D convolution and deconvolution layers.However, the performance of FBP-CNN was limited by its large network scale, which was mainly caused by the fully-connected layer.Although the Multi-task CNN contained much less parameters and was therefore easier to train, it was less interpretable.Considering the limitations of FBP-CNN and Multi-task CNN, this study aimed to propose a reconstruction-separation neural network with limited number of parameters.Besides, the 3D convolution or deconvolution kernels used in both methods processed the spatiotemporal information locally.To expand the receptive field, introducing the global feature processing might be more effective than cascading the layers using small-size kernels, which was also considered in the current study.
We firstly proposed a CNN-based model for the signal separation of simultaneous dual-tracer PET imaging.In order to extract and process global features, an Inceptionlike module and channel attention were applied to spatial and temporal dimensions respectively.The Inception module parallelly extracts and concatenates features of different scales to expand the receptive field [36].In its advanced version, 3 × 3 kernels were replaced by a combination of 1 × 3 and 3 × 1 kernels to save the memory [37].Motivated by these works, we included an Inception-like module to extract global spatial features by H × 1 and 1 ×W kernels, with H and W representing the height and width of the images.The attention mechanism lets neural networks automatically learn to assign different weights to features, which means learning to focus on and select important information.It can be applied to different dimensions [38][39][40][41].Inspired by the Squeezeand-Excitation Networks [39], we used the channel attention for the time dimension of dynamic PET data.
The proposed separation network was then cascaded with FBP-Net [42], a deep learning implementation of the FBP algorithm, generating a direct reconstruction model named FBPnet-Sep.The FBPnet-Sep model was verified by simulated dynamic PET data, and compared to Multi-task CNN [34].Moreover, experiments were performed to verify the superiority of image separation over sinogram separation, the effectiveness of using global spatial information and channel attention, as well as the application to lowdose data and to different tracer combinations, which will be illustrated in the following sections.

Model of simultaneous dual-tracer PET imaging
The simultaneous dual-tracer PET imaging is modeled as follows: S dual (t) is the dual-tracer sinogram at time t.I 1 (t) and I 2 (t) are the activity images of Tracer 1 and Tracer 2. G is the system matrix.r 1 (t) and r 2 (t) are random coincidence events.The scatter coincidence events are not considered in this study.
The values of the j-th pixel in a dynamic activity image I depend on regional physiological parameters, the activity in the blood, and the radioactive decay of the tracer: C T and C P represent undecayed radioactivity concentrations in tissue and plasma.k (j) is the physiology-related parameters in the region corresponding to the j-th pixel.T is the half-life of radioactive decay.

Network architecture
In this study, we proposed a novel CNN for signal separation of simultaneous dualtracer PET imaging, incorporating the use of global spatial information and attention mechanism.The structure of the separation network is displayed in Fig. 1a. (1) (2) The input of the network is the dynamic dual-tracer image, which is in the shape of N b × C × H × W , where N b is the batch size, C is the number of channels (i.e., frames), H and W are the height and width of the image.
The network is made up of several basic convolution modules, an Inception-like module, and a channel attention module.The basic convolution module consists of a convolution layer, a batch normalization layer, and the rectified linear unit.The Inception-like module includes two basic convolution modules, which parallelly use H× 1 and 1 ×W kernels to extract features by rows and columns.The features are concatenated in channel dimension.Although different from the original Inception that used smallsize kernels [37], it is still named as Inception in the current study.The channel attention module is composed of max pooling, average pooling, fully-connected layers with 2C neurons, and sigmoid activation function.At last, the output is separated in channel dimension to get two single-tracer dynamic images.
We further cascaded the separation network with FBP-Net to form the FBPnet-Sep model (Fig. 1b).This model firstly reconstructs and denoises the dual-tracer image by FBP-Net, then separates the image by the separation network in Fig. 1a.The FBP-Net adopts a learnable filter in the frequency domain, and maps the filtered sinogram to image by traditional back-projection without learnable parameters.Following the reconstruction, it denoises the images by a residual CNN.The details of FBP-Net can be referred to previous work of Wang and Liu [42].For comparison, the Sep-FBPnet model (Fig. 1c) was also proposed, by which the sinogram separation was conducted before image reconstruction.
For the application to multiple tracer combinations, especially when two tracers have different half-lives, we improved the FBPnet-Sep model by adding decay corrections before the separation part, forming the FBPnet-DC-Sep model (Fig. 1d).As shown in Fig. 1d, the reconstructed dual-tracer image is decay corrected according to half-lives of two tracers respectively.The two decay-corrected images, together with the uncorrected image, are input to the separation network, and processed by different Inception modules.

Loss functions
The loss function of a single output is a weighted summation of mean squared error (MSE) and structural similarity (SSIM) index [43] between the output and label: where x is the predicted dynamic output, and x is the label.The coefficient β is used to balance the MSE and SSIM.
The total loss of the entire model is composed of loss of the FBP-Net and loss of the separation part: FBP and Sep are the weights of two losses.And the above-mentioned parameter β is set different in reconstruction part ( β FBP ) and separation part ( β Sep ).
For the FBPnet-Sep (Fig. 1b) and FBPnet-DC-Sep (Fig. 1d) models, the L FBP serves as the auxiliary loss and L Sep as the main loss, which are formulated as: I dual , I 1 and I 2 are images of dual-tracer mixture, Tracer 1 and Tracer 2 respectively.Here, the coefficients in loss function are set as: As for the Sep-FBPnet model (Fig. 1c), the L Sep serves as the auxiliary loss and L FBP as the main loss, which are formulated as: S 1 and S 2 are sinograms of Tracer 1 and Tracer 2. Here, the coefficients are set as:

Experimental settings
In this study, we conducted four experiments to verify the capability of the proposed method by multiple simulated datasets.The experimental settings and tracer combinations are listed in Table 1 and Table 2.
In Experiment 1, we compared the FBPnet-Sep model which separated in the image domain with two contrast methods.One was the Sep-FBPnet model that separated in the sinogram domain, and the other was our previously proposed Multi-task CNN, (3) which directly mapped the dual-tracer sinogram to two single-tracer images.The FBPnet-Sep model, Sep-FBPnet model and Multi-task CNN contain 0.64 M, 0.66 M and 1.31 M parameters respectively.All three models were trained and tested by simulated 18 F-FDG/ 11 C-FMZ dynamic PET data.As shown in Table 2, 18 F-FDG and 11 C-FMZ have different half-lives and different types of kinetic characteristics [44].
Single-tracer images inferenced by these three methods were also compared to those reconstructed from noisy single-tracer sinograms by maximum likelihood expectation maximization (MLEM) with 50 iterations.
In Experiment 2, we conducted the ablation study of FBPnet-Sep model by the same 18   reconstructed the dual-tracer image by ordered-subset expectation maximization (OSEM) algorithm [45] with 6 iterations and 5 subsets.
In Experiment 3, we investigated the generalization of the FBPnet-Sep model to several dose levels.The model trained in Experiment 1 was tested by 18 F-FDG/ 11 C-FMZ data of 1/2, 1/3 and 1/5 standard dose.The low-dose PET data were obtained by event reduction of test data used in Experiment 1.
In Experiment 4, data of four tracer combinations were together used to train the models, which represented different relationships between two tracers.As listed in Table 2, 11  C-FMZ has different half-life and kinetic type from 18 F-FDG. 11C-MET differs in half-life from 18 F-FDG while 18 F-AV45 differs in kinetic type.And 18 F-FLT has the same half-life and kinetic type with 18 F-FDG.To deal with multiple tracer combinations, we improved the original FBPnet-Sep model by taking the decay correction into account, getting the FBPnet-DC-Sep model.The two models were tested and compared.

Phantoms
The two-dimensional brain phantoms used for data simulation were modified from the 3D Zubal brain phantom [46].We chose 40 slices having different structures.The 40 phantoms are sized 128 pixels × 128 pixels, and each contains up to five regions of interest (ROIs).The average size of ROI 1 to ROI 5 are 1393, 1427, 78, 110 and 130 pixels.Representative phantoms are shown in Fig. 2.

Generation of dynamic activity images
The dynamic PET images were generated based on the two-tissue compartment model [44], which is a widely used kinetic model of PET tracers.It is described as: C P (t) is the plasma TAC, i.e., the plasma input function.C T (t) is the tissue TAC, which is the summation of TACs of two compartments, C 1 (t) and C 2 (t) .The two compartments (9)  represent different metabolic states in tissue of the tracer.The rates of exchanges between different compartments (including the blood vessel) depend on the rate constants K 1 , k 2 , k 3 and k 4 , which are also known as kinetic parameters.The values of these parameters are related to the pharmacokinetics of the tracer and physiological characteristics of its molecular target.When k 3 ≫ k 4 , the transfer from Compartment 1 to Compartment 2 is regarded irreversible, the tracer is therefore considered having irreversible kinetics.Otherwise, the bound between the tracer and its target is reversible.
The Compartment Model Kinetic Analysis Tool (COMKAT) [47] provides numerical solutions of compartment models.Before solving Eqs. ( 9)-( 11), the scanning protocol, input function and kinetic parameters of each pixel were determined.As an example, the 18 F-FDG dynamic scan was designed with a duration of 60 min and divided into 26 frames (15 s × 8, 60 s × 8, 300 s × 10).The input function of 18 F-FDG is modeled as: According to experimental data of human subjects [48], In order to mimic the individual difference in physiological states, the Gaussian randomization was applied to the parameters of input function as well as the kinetic parameters of each ROI.Mean values of the parameters were referred to those reported in previous studies [18,48], and the standard deviations were set to 10% of the mean values.Totally, 22 groups of physiological parameters were simulated.The output of compartment model was computed by COMKAT for each pixel to form the dynamic activity image of 18 F-FDG.
Images of 11 C-FMZ, 11 C-MET, 18 F-AV45 and 18 F-FLT were generated in the same way, following the same 60-min-26-frame protocol.The input functions and the empirical values of kinetic parameters were different among tracers, which were determined according to corresponding researches [18,[48][49][50][51][52].The dual-tracer activity images were obtained by adding two single-tracer images.All images were sized 128 pixels × 128 pix- els × 26 frames.

Generation of dynamic sinograms
The dynamic single-tracer sinograms were obtained by projecting the single-tracer activity images using the Michigan Image Reconstruction Toolbox [53].The geometry of Inveon PET/CT scanner (Siemens) was simulated.Subsequently, 20% random coincidence events were added to the projections, and the Poisson noises were simulated.The dual-tracer sinograms were the summation of two single-tracer sinograms.All sinograms were sized 128 bins × 160 angles × 26 frames.

Data preprocessing
For each tracer combination, a total of 880 groups (22 sets of physiological parameters × 40 phantoms) of dynamic PET data were generated.Each group of matched dynamic data consisted of the noisy dual-tracer sinogram ( S dual ), two noisy single-tracer sino- grams ( S 1 , S 2 ), noise-free dual-tracer activity image ( I dual ) and two noise-free single- tracer activity images ( I 1 , I 2 ).In each group, sinograms were scaled by dividing max(S dual )/3, and images were scaled by dividing max(S dual )/150.(12

Training details
The 880 groups of data were randomly split into training and test datasets by 10:1 according to parameter sets.In other words, all the 40 phantom slices were included in three datasets, while the physiological parameters were not repeated.The sample sizes are listed in Table 1.Note that in the training of FBPnet-DC-Sep model in Experiment 4, data was augmented by including 11 C-FMZ/ 18 F-FDG and 11 C-MET/ 18 F-FDG combinations, which were obtained by switching two tracers of 18 F-FDG/ 11 C-FMZ and 18 F-FDG/ 11 C-MET combinations.
Network training was performed on PyTorch 1.11 by a NVIDIA TITAN RTX graphics card.In the training session of each experiment, the FBP-Net and separation network were separately pretrained for 300 epochs and 100 epochs respectively, using the training dataset listed in Table 1, and both with a batch size of 4 and a learning rate of 0.0001.The entire models were trained for 100 epochs based on pre-trained network parameters and by the same dataset, with a batch size of 16 and a learning rate of 0.0001.No early stopping or regularization was used during training.

Evaluative metrics
The performance of different methods was quantitatively evaluated by MSE, SSIM and peak signal-to-noise ratio (PSNR) between the predictions and labels.In Experiment 1, bias and standard deviation of ROI TACs and images were also evaluated.
In Experiment 4, we also estimated the macro-parameters from the predicted ROI TACs, and compared them to those derived from the label TACs.For irreversible tracers ( 18 F-FDG, 11 C-MET, 18 F-FLT), the net uptake rate K i was estimated by Patlak plot [54].For reversible tracers ( 11 C-FMZ, 18 F-AV45), the total distribution volume V T was estimated by Logan plot [54].The parameters were computed in COMKAT using data of the last 10 frames.And the relative errors of the estimated macro-parameters were calculated.Table 3 lists the mean values of the metrics.For all methods, the predictions of 18 F-FDG images are more accurate than those of 11 C-FMZ images.For both tracers, the FBPnet-Sep model has highest SSIM and PSNR, and lowest MSE compared to other methods.Since the noise-free single-tracer images were used as labels, all three deeplearning methods show superior performance than MLEM.In the following comparisons, MLEM was not included.Figure 4 displays the frame-wise metrics.For both 18  F-FDG and 11 C-FMZ, the superiority of FBPnet-Sep model lasts throughout the scan.Quantitative evaluations are listed in Table 5.For all models, the predicted 18 F-FDG images have higher SSIM and PSNR than 11 C-FMZ images, and have lower MSE except the FBPnet-Sep(Conv) model.For 18 F-FDG, the FBPnet-Sep(Conv+Inc+CA) model show best performance, which is consistent with the observations in Fig. 6.For 11 C-FMZ, the FBPnet-Sep(Conv+Inc+CA) as well as the FBPnet-Sep(Conv+Inc) models show better performance than other models.

Experiment 3
Table 6 records the quantitative evaluations of the predicted images under different dose levels.The counts of standard dose were around 1e7. Same as the results of former experiments, the reconstructed 18 F-FDG images are more accurate than 11 C-FMZ images.Although slightly degrades with the decreasing dose, the qualities of reconstructed images of both tracers are acceptable even if under low-dose conditions.

Experiment 4
Table 7 displays the metrics of predicted images from four tracer combinations.Both FBPnet-Sep and FBPnet-DC-Sep models were capable to deal with multiple tracer combinations.In most cases, the FBPnet-DC-Sep model exceeded the FBPnet-Sep model.Figure 7 further plots the metrics of FBPnet-DC-Sep predictions.In each tracer combination, the image qualities of 18 F-FDG predictions were better than Tracer 2 except the PSNR of 18 F-AV45 images.Among the other four tracers used as Tracer 2, 18 F-FLT had lowest SSIM and PSNR, and highest MSE.Macro-parameters of all tracers and all ROIs were estimated using the TACs derived from label images and predicted images.Figure 9 plots the average parameters of the test dataset.For 18 F-FDG, the K i derived from predicted images were close to the ground truth.And no obvious difference was found between the FBPnet-DC-Sep and FBPnet-Sep models.Similar results were found in the K i of 18 F-FLT.In the subplots of 11 C-FMZ and 18 F-AV45, V T derived from FBPnet-DC-Sep model were more accurate than those from FBPnet-Sep model.However, the 11 C-MET K i derived from the FBPnet-DC-Sep model were more biased.Table 8 lists the average relative errors of the estimated macro-parameters.The results were in general consistent with Fig. 9, showing that the FBPnet-DC-Sep model was comparable to or better than FBPnet-Sep model in all tracers but 11 C-MET.However, the accuracy of macro-parameters was sensitive to tracers and ROIs.

Discussion
In this study, we proposed a CNN-based approach for the separation of simultaneously injected dual-tracer PET imaging.The network incorporated the Inception module and channel attention module.The Inception module used H× 1 and 1 ×W kernels to extract global spatial features.When applied to sinograms, it extracted features by projection angles and distance bins off the center of view.When applied to images, it extracted features by rows and columns.And the attention vector learned by channel attention could also be regarded as a global filter in time dimension.The separation network was connected to FBP-Net to predict single-tracer images from dual-tracer sinogram.
In Experiment 1, the proposed FBPnet-Sep model was firstly compared with Multitask CNN.The Multi-task CNN had an encoder-decoder framework, while FBPnet-Sep model adopted a more interpretable reconstruction-separation framework.The results in Figs. 3, 4, 5 and Tables 3, 4 and showed that images predicted by FBPnet-Sep model had better qualities than predicted by Multi-task CNN, which might be attributed to using Inception and channel attention to process information globally rather than fully using small convolution kernels.
Additionally, we also compared image separation (FBPnet-Sep model) with sinogram separation (Sep-FBPnet model).The results showed that the FBPnet-Sep model performed better than the Sep-FBPnet model.This might be due to the differences in tasks and data distributions.In the Sep-FBPnet model, the dual-tracer sinogram S dual was separated to get single-tracer sinograms S 1 and S 2 , both containing noises.It was beyond the capability of the model to separate the random noises.Apart from that, S 1 and S 2 were from different distributions.When they were simultaneously fed into the FBP-Net, the distribution of input became decentralized, making reconstruction more difficult.Contrarily in the FBPnet-Sep model, the FBP-Net reconstructed dual-tracer image I dual from S dual , of which the distribution was relatively centralized.Besides, the reconstructed I dual had already been denoised, which was easier for the separation net- work to process.
In Experiment 2, the ablation study of the FBPnet-Sep model was conducted.We first studied on the separation part.From the results in Fig. 6 and Table 5, adding Inception module or channel attention into the basic convolutional separation network improved the model performance, and incorporating both modules achieved best performance.These results demonstrated the effectiveness of the Inception and channel attention modules.In addition, we also compared the FBPnet-Sep and OSEM-Sep approaches.The obvious superiority of the former indicated the necessity of deep learning-based reconstruction instead of traditional reconstruction algorithm.
The radiation safety and restrictions on injected doses is one of the concerns in dualtracer PET imaging.In Experiment 3, we investigated the generalization of FBPnet-Sep model to low-dose data.As shown in Table 6, the FBPnet-Sep model could directly generalize to data of 1/2, 1/3 and 1/5 standard dose without training.
According to Eq.( 2), temporal changes in radioactivity concentrations depend on both the physical radioactive decay of the tracer, which is characterized by half-life, and the physiological process the tracer is involved, which is characterized by kinetic parameters.In single-tracer PET imaging, decay correction is performed during the reconstruction.In dual-tracer PET imaging, however, decay correction cannot be conducted when two tracers have different half-lives.In that case, deep learning methods for dual-tracer PET reconstruction need to learn the hidden information about radioactive decay in addition to the kinetic characteristics.
In Experiment 4, we investigated the capability of the proposed method to deal with multiple tracer combinations.To cope with tracer pairs having different half-lives, we also proposed and tested the FBPnet-DC-Sep model.As displayed in Table 7 and Fig. 7, both methods could well predict single-tracer images of multiple tracer combinations, with the FBPnet-DC-Sep model showing better performance in most cases.According to Figs. 8, 9 and Table 8, both methods could derive ROI TACs and macro-parameters with acceptable accuracy, with the FBPnet-DC-Sep performing slightly better.These results indicated the effectiveness of the decay correction module.Although the corrections were biased, the processed images provided hints of decay information, which might make it easier for the separation network to extract useful features.
The current study still had several limitations.As shown in results, even though the method could separate two tracers, the derived TACs and macro-parameters were not accurate enough, which limited the application of dual-tracer PET imaging in further quantitative analysis.In addition, due to the lack of data from real simultaneous dualtracer PET scans, the proposed method was verified only by simulated data.The scarcity of training data from real PET scans also limits the current study as well as other deep learning-based separation methods to separate images slice by slice.Another limitation of the proposed method is that it cannot be generalized to other phantoms, as shown in the supplementary material.
The acquisition and utilizing of training data from real dual-tracer PET scans face many challenges, like the inadequacy of paired data, image misalignment and image noises.In future studies, incorporating the kinetic model into deep learning methods for dual-tracer reconstruction and separation should be taken into consideration.A hybrid of data-driven and model-driven methods can increase the network interpretability, reduce the need for training data and be less sensitive to noises.Also, it can be easily applied to the training using ROI TACs, to increase the accuracy of recovered ROI TACs and macro-parameters.Furthermore, the generalization of the reconstruction methods to data from new distributions should also be investigated.

Conclusions
We proposed a CNN-based model incorporating global spatial information and channel attention for the signal separation of dual-tracer PET.The separation network was extended to the FBPnet-Sep model, which predicted the single-tracer images from the dual-tracer sinogram.The FBPnet-Sep model was confirmed superior to the previously proposed Multi-task CNN.The effectiveness of Inception and channel attention modules was also verified.Moreover, the FBPnet-Sep model could be applied to data of low doses or multiple tracer combinations.Therefore, the FBPnet-Sep model can be considered as a potential method for dual-tracer PET reconstruction.

Fig. 1
Fig. 1 Architecture of a separation network, b FBPnet-Sep model, c Sep-FBPnet model and d FBPnet-DC-Sep model.The size and number of convolutional kernels are noted in the figure.Conv: convolutional layer, BN: batch normalization layer, ReLU: Rectified Linear Unit, FC: fully-connected layer, FBP: filtered back-projection.S dual : dual-tracer sinogram, S 1 : sinogram of Tracer 1, S 2 : sinogram of Tracer 2, I dual : dual-tracer image, I 1 : image of Tracer 1, I 2 : image of Tracer 2, DC: decay correction, ˆIdual−DC1 : dual-tracer image decay-corrected as Tracer 1, ˆIdual−DC2 : dual-tracer image decay-corrected as Tracer 2. ˆIdual , ˆI1 , ˆI2 , Ŝ1 , Ŝ2 are predictions of I dual , I 1 , I 2 , S 1 , S 2 respectively.L FBP : loss of FBP-Net, L Sep : loss of the separation part, L total : total loss F-FDG/ 11 C-FMZ dataset.The proposed FBPnet-Sep model included basic convolution modules (Conv), the Inception module (Inc) and the channel attention module (CA) in its separation part.To investigate the contribution of the Inception and channel attention modules, we compared the original FBPnet-Sep(Conv+Inc+CA) model with the FBPnet-Sep(Conv), FBPnet-Sep(Conv+Inc) and FBPnet-Sep(Conv+CA) models.In addition, to confirm the superiority of deep learning-based reconstruction over traditional iterative reconstruction, we further compared the FBPnet-Sep(Conv+Inc+CA) model with the OSEM-Sep(Conv+Inc+CA) approach, which

Fig. 2
Fig. 2 Representative brain phantoms from different slices

Figure 3
Figure 3 displays the single-tracer images reconstructed by MLEM, FBPnet-Sep model, Sep-FBPnet model and Multi-task CNN, among which the results of the proposed FBPnet-Sep model show clearest details at the boundary of ROIs.

Fig. 3
Fig. 3 Single-tracer images predicted by different methods.All images are from Slice 30 and Frame 20

C 2 Figure 6
Figure 6 displays the representative single-tracer images predicted by FBPnet-Sep model and its ablation models.The full FBPnet-Sep(Conv+Inc+CA) model reconstructs images with best qualities, followed by FBPnet-Sep(Conv+Inc) or FBPnet-Sep(Conv+CA), while the basic FBPnet-Sep(Conv) model reconstructs images with poorer qualities.Images predicted by the OSEM-Sep(Conv+Inc+CA) approach are severely blurred.Quantitative evaluations are listed in Table5.For all models, the predicted 18 F-FDG images have higher SSIM and PSNR than 11 C-FMZ images, and have lower MSE except the FBPnet-Sep(Conv) model.For 18 F-FDG, the FBPnet-Sep(Conv+Inc+CA) model show best performance, which is consistent with the observations in Fig.6.For 11 C-FMZ, the FBPnet-Sep(Conv+Inc+CA) as well as the FBPnet-Sep(Conv+Inc) models show better performance than other models.

Figure 8
Figure 8 plots the representative ROI TACs extracted from the predicted images.ROI 1 and ROI 5 were chosen to represent large ROIs and small ROIs respectively.In all tracer combinations, the TACs predicted by FBPnet-DC-Sep model fitted better to the ground truth than TACs predicted by FBPnet-Sep model.Macro-parameters of all tracers and all ROIs were estimated using the TACs derived from label images and predicted images.Figure9plots the average parameters of the test dataset.For 18 F-FDG, the K i derived from predicted images were

Fig. 7 Fig. 8
Fig. 7 SSIM, PSNR and MSE of single-tracer images predicted by FBPnet-DC-Sep model.Metrics are averaged over the test dataset

Table 1
Experimental settings 1 Generated by down-sampling the test data of Experiment 1 2 The FBPnet-Sep model trained in Experiment 1 3 4800:480 for FBPnet-DC-Sep model including 11 C-FMZ/ 18 F-FDG and 11 C-MET/ 18 F-FDG

Table 2
Tracer combinations

Table 4
lists the mean bias and standard deviations of ROI TACs predicted by different models.For 18 F-FDG images, FBPnet-Sep model shows lowest bias in all ROIs except ROI 1.The other two models show high bias in small ROIs (ROI 3-5).FBPnet-Sep model reconstructs images with lowest variation in ROI 1, 2 and 4. As for11

Table 3
Quantitative evaluations of the predicted images in Experiment 1 Fig. 4 Frame-wise metrics of single-tracer images predicted by different methods.Metrics are averaged over the test dataset

Table 5
Quantitative evaluations of the predicted images in Experiment 2

Table 6
Quantitative evaluations of the predicted images in Experiment 3

Table 7
Quantitative evaluations of the predicted images in Experiment 4

Table 8
Mean relative errors (%) of estimated macro-parameters