Skip to main content
  • Original research
  • Open access
  • Published:

Deep learning based bilateral filtering for edge-preserving denoising of respiratory-gated PET



Residual image noise is substantial in positron emission tomography (PET) and one of the factors limiting lesion detection, quantification, and overall image quality. Thus, improving noise reduction remains of considerable interest. This is especially true for respiratory-gated PET investigations. The only broadly used approach for noise reduction in PET imaging has been the application of low-pass filters, usually Gaussians, which however leads to loss of spatial resolution and increased partial volume effects affecting detectability of small lesions and quantitative data evaluation. The bilateral filter (BF) — a locally adaptive image filter — allows to reduce image noise while preserving well defined object edges but manual optimization of the filter parameters for a given PET scan can be tedious and time-consuming, hampering its clinical use. In this work we have investigated to what extent a suitable deep learning based approach can resolve this issue by training a suitable network with the target of reproducing the results of manually adjusted case-specific bilateral filtering.


Altogether, 69 respiratory-gated clinical PET/CT scans with three different tracers (\([^{18}\text {F}]\)FDG, \([^{18}\text {F}]\)L-DOPA, \([^{68}\text {Ga}]\)DOTATATE) were used for the present investigation. Prior to data processing, the gated data sets were split, resulting in a total of 552 single-gate image volumes. For each of these image volumes, four 3D ROIs were delineated: one ROI for image noise assessment and three ROIs for focal uptake (e.g. tumor lesions) measurements at different target/background contrast levels. An automated procedure was used to perform a brute force search of the two-dimensional BF parameter space for each data set to identify the “optimal” filter parameters to generate user-approved ground truth input data consisting of pairs of original and optimally BF filtered images. For reproducing the optimal BF filtering, we employed a modified 3D U-Net CNN incorporating residual learning principle. The network training and evaluation was performed using a 5-fold cross-validation scheme. The influence of filtering on lesion SUV quantification and image noise level was assessed by calculating absolute and fractional differences between the CNN, manual BF, or original (STD) data sets in the previously defined ROIs.


The automated procedure used for filter parameter determination chose adequate filter parameters for the majority of the data sets with only 19 patient data sets requiring manual tuning. Evaluation of the focal uptake ROIs revealed that CNN as well as BF based filtering essentially maintain the focal \(\text {SUV}_\text {max}\) values of the unfiltered images with a low mean ± SD difference of \(\delta \text {SUV}_\text {max}^{\text {CNN},\text {STD}}\) = (−3.9 ± 5.2)% and \(\delta \text {SUV}_\text {max}^{\text {BF},\text {STD}}\) = (−4.4 ± 5.3)%. Regarding relative performance of CNN versus BF, both methods lead to very similar \(\text {SUV}_\text {max}\) values in the vast majority of cases with an overall average difference of \(\delta \text {SUV}_\text {max}^{\text {CNN},\text {BF}}\) = (0.5 ± 4.8)%. Evaluation of the noise properties showed that CNN filtering mostly satisfactorily reproduces the noise level and characteristics of BF with \(\delta \text {Noise}^{\text {CNN},\text {BF}}\) = (5.6 ± 10.5)%. No significant tracer dependent differences between CNN and BF were observed.


Our results show that a neural network based denoising can reproduce the results of a case by case optimized BF in a fully automated way. Apart from rare cases it led to images of practically identical quality regarding noise level, edge preservation, and signal recovery. We believe such a network might proof especially useful in the context of improved motion correction of respiratory-gated PET studies but could also help to establish BF-equivalent edge-preserving CNN filtering in clinical PET since it obviates time consuming manual BF parameter tuning.


Even for state of the art systems, image noise in positron emission tomography (PET) remains inherently much higher than in computed tomography (CT) or magnetic resonance imaging (MRI), thus improving noise reduction techniques remains of considerable interest. This is especially true for respiratory-gated investigations where the available acquisition time is partitioned across multiple so-called gates (typically about 6–10), which substantially increases the noise level in the individual gates compared to that of an ungated acquisition with the same overall duration.

For many years, the only broadly used approach to noise reduction in PET imaging has been the application of low-pass filters — usually Gaussians — which leads to loss of spatial resolution and increased partial volume effects that adversely affect detectability of small lesions as well as quantitative data evaluation.

A way to overcome these problems is the use of locally adaptive edge-preserving filters. One such filter, which exhibits interesting properties and has attracted considerable interest in diverse applications, is the so-called bilateral filter (BF) [1]. It acts combinedly in the spatial and in the intensity domain, which allows to reduce image noise while preserving well defined object edges. In the context of PET imaging, the principle usefulness of BF has been previously demonstrated [2, 3]. However, application of this filter requires case-specific tuning of its two free parameters, especially the one acting in the intensity domain. This process can be tedious and might require non-negligible time from the user. The inability to use fixed filter settings and the absence of efficient algorithmic approaches to automatically adjust these filter parameters has made bilateral filtering so far ill-suited for routine clinical use. The present investigation addresses this deficit by proposing to emulate case-specific bilateral filtering with a suitable neural network based approach.

Recently, much work has been done on Deep Learning (DL) based denoisers. In particular, convolutional neural networks (CNN) have been investigated thoroughly in the context of natural and medical image denoising [4]. The special properties of PET imaging, namely a combination of high correlated noise and high dynamic range and the fact that relevant quantitative information is contained in the images that needs to be preserved, makes PET noise reduction a challenging task requiring special care and attention. There are two principally different approaches to DL-based PET denoising [5]: supervised and unsupervised methods. Supervised DL methods rely on availability of a substantial number of training data sets consisting of pairs of low and high noise PET images. The employed neural networks architectures usually feature various forms of U-Nets [6,7,8,9,10] and generative adversarial networks (GANs) [6, 10, 11]. However, other architectures have been suggested as well [6, 10,11,12,13,14,15]. As is well known, acquiring a large number of high quality training examples is difficult in medical imaging in general and in PET in particular. Therefore, non-supervised approaches are gaining increased attention. One example is the deep image prior technique [16], which was already successfully applied to static [17] and dynamic [18, 19] PET imaging and has demonstrated improvements over conventional denoising methods. Another family of unsupervised methods relies on training using noisy data only. Examples are the so-called Noise2Noise [20], Noisier2Noise [21], and Noise2Void [22] approaches. However, applied to PET, these methods perform generally worse than commonly used supervised approaches [23, 24], with the modified Noise2Noise being the closest contender [25].

Considering the bilateral filter, it has in fact been previously investigated in the context of DL. Specifically, bilateral filters as part of the network design have shown to improve natural image processing tasks [26,27,28]. Medical imaging applications, however, are so far restricted to denoising of CT and X-Ray data. In this context, bilateral filtering solutions with static learnable kernels [29] and adaptive kernels [30,31,32,33] were proposed. However, the possibility of reproducing manually optimized bilateral filtering with a general purpose CNN (e.g. U-Net) has not been investigated before. Moreover, so far no DL-based bilateral filters have been developed explicitly for PET noise reduction in a clinical setting.

In this work, we therefore have investigated to what extent a suitably trained CNN can reproduce the results of manually adjusted case-specific bilateral filtering for single-gate denoising of respiratory-gated PET data with the ultimate aim to use such a filter as an integral preprocessing step in motion correction of such PET investigations. For this purpose, we trained a dedicated U-Net based CNN on a large number of clinical respiratory-gated PET/CT investigations. The training data consisted of pairs of unfiltered and BF processed single-gate images where the filter parameters were optimized separately for each gate. The trained CNN was then applied to separate test data, i.e. image volumes unseen during network training. The resulting CNN-filtered image volumes were quantitatively analysed regarding image noise and signal recovery (standardized uptake value, SUV) in suitable three-dimensional regions of interest (ROIs) and compared to the corresponding results obtained in image volumes resulting from individually optimized bilateral filtering of the respective original data set.


Bilateral filter

A bilateral filter operates in two different domains: the filter kernel is defined as the product of a spatial filter \(W_S\) and an intensity filter \(W_I\) [1]. If both filters have a Gaussian shape and assuming that the three-dimensional spatial filter is isotropic the BF kernel is given by

$$\begin{aligned} W(m,n) = \underbrace{\exp {\left[ -\frac{(\textbf{P}_m-\textbf{P}_n)^2}{2\sigma _S^2}\right] }}_{\mathrm{spatial\ domain}\ (W_S)} ~\cdot ~ \underbrace{\exp {\left[ -\frac{(I_m-I_{n})^2}{2\sigma _I^2}\right] }}_{\mathrm{intensity\ domain}\ (W_I)}\, \end{aligned}$$

where m is the index of the target voxel at position \(\textbf{P}_m\) with intensity \(I_m\) and n enumerates the neighbouring voxels included in the weighted average computation. \(\sigma _S\) and \(\sigma _I\) are the free parameters of the filter, representing the standard deviations of the two Gaussian components. In the present study an anisotropic spatial filter was used with reduced width along the axial direction (scale factor: \(0.76\)) to accommodate for the slightly higher axial resolution of the reconstructed images. In the remainder of this paper, \(\sigma _S\) denotes the arithmetic mean \((2 \sigma _{xy} + \sigma _z)/3\) of the actually applied spatial filter. The parameters \(\sigma _S\) and \(\sigma _I\) determine which distance to the target voxel is considered to be close in the respective domain: only voxels that are close to the target voxel simultaneously in the spatial as well as the intensity domain will substantially contribute to the value assigned to that target voxel in the resulting, filtered image.

Patient group and ground truth definition

Altogether, 70 respiratory-gated PET data sets from clinical routine investigations acquired on a Biograph Vision 600 PET/CT system (Siemens Medical Solutions, Knoxville, TN, USA) with different tracers (\([^{18}\text {F}]\)FDG: \(N = 34\), \([^{18}\text {F}]\)L-DOPA: \(N = 25\), \([^{68}\text {Ga}]\)DOTATATE: \(N = 11\)) were available for the present investigation. Table 1 provides further information regarding the study cohort. The following clinical vendor-provided gating procedure was used: during PET/CT data acquisition the respiratory signal was recorded using the pressure sensitive belt sensor version of an external respiratory gating system (AZ-733VI; Anzai Medical Co. Ltd., Tokyo, Japan). Using this signal, the PET data was subsequently divided via a manufacturer-specific algorithm resulting in 8 phase-based gates with similar noise characteristics. For each PET gate, the corresponding CT gate was used for attenuation correction. Visual inspection of all data sets led to exclusion of a single \([^{18}\text {F}]\)FDG scan with untypical excessively high image noise due to a very low injected dose (132 MBq) and high post-injection scan delay (2.5 h). See Figure S2 in the supplemental material for more details.

Table 1 Compilation of patient data, acquisition and image reconstruction details and summary statistics of the performed data analysis

Prior to further data processing, the gated data sets were split into single gates, resulting in a total of 552 single-gate image volumes. For each of these image volumes four three-dimensional ROIs were delineated by two experienced observers using the ROVER software package (version 3.0.73; ABX GmbH, Radeberg, Germany). For image noise assessment, one ROI was placed in a visually homogeneous liver area. The remaining three ROIs in each image volume were positioned on different target structures with elevated focal uptake at different target to background contrast levels. These ROIs mostly correspond to tumor lesions but partly also to areas in the myocardium or excretory system. Wherever possible, the three ROIs were furthermore selected in such a way that their \(\text {SUV}_\text {max}\) differed by at least a factor of 1.5–2 in order to achieve adequate coverage of dynamic range.

Due to the large number of 552 image volumes for which case specific user-approved BF filter parameter settings are required, it was deemed necessary to facilitate this process by utilizing a dedicated automated optimization procedure that we implemented specifically for this project. The procedure performs a brute force grid search of the two-dimensional BF parameter space for each patient, minimizing the function

$$\begin{aligned} S(\sigma _S,\sigma _I) = \sum _{i=1}^{g}\sum _{j=1}^{l} {\left[ \delta {\text {SUV}_\text {max}^{i,j}(\sigma _S,\sigma _I)}\right] ^2} + \sum _{i=1}^{g} {\left[ \delta {\text {Noise}^{i}(\sigma _S,\sigma _I)}\right] ^2} \end{aligned}$$

where \(\delta {\text {SUV}_\text {max}^{i,j}(\sigma _S,\sigma _I)}\) and \(\delta {\text {Noise}^{i}(\sigma _S,\sigma _I)}\) are the fractional differences between original (unfiltered) and BF-processed gate-specific \(\text {SUV}_\text {max}\) and noise level, respectively, for gate i, focal uptake ROI j, and filter parameters \(\sigma _S\) and \(\sigma _I\) (see Eqs. 4 and 5 for definitions of \(\delta {\text {SUV}_\text {max}}\) and noise level). Summation extends over the given \(g = 8\) gates and \(l = 3\) focal “lesion” ROIs per patient. The optimization is performed jointly for all gates per patient to improve stability of the process and with the rationale that all gates within a single gated study exhibit essentially identical image properties (noise, spatial resolution, contrast) so that the same filter settings should be suitable/optimal for all gates. Note that the gate-specific unfiltered \(\text {SUV}_\text {max}\) and the noise-level of the ungated data (summed over all gates) serve as a reference in the above equation.

The objective thus can be paraphrased as striving to make the BF-processed single gates match the noise level of the ungated summed image without causing relevant signal loss, thus \(\text {SUV}_\text {max}\) reduction, in hot spots.

Optimization was performed by scanning a 11 × 15 grid of \(\sigma _S\) and \(\sigma _I\) values covering a range from 3.1 to 5.9 mm for \(\sigma _S\) and from 0.1 to 10.0 SUV for \(\sigma _I\). This process generates 165 bilaterally filtered image volumes for each of the 552 single-gate data sets, thus altogether 91080 individually BF-processed PET image volumes. Given the nature of bilateral filters — locally variant, inability to factor 3D filtering into successive 1D filtering operations as is possible with spatial Gaussians — the described grid search can be computationally intensive and very time consuming. To reduce processing time with the used bilateral filter solution at hand, computations were distributed over a high performance cluster of 12 nodes with 560 CPU cores and 3.8 TiB available memory.

Finally, the — according to the above explained criterion — optimally BF-processed image volumes underwent visual inspection by two experienced observers and filter parameters were adjusted in cases where this was deemed necessary.

Data preprocessing and neural network training

As described above, input for network training consisted of pairs of original and optimally bilaterally filtered single gate image volumes. All image volumes had a common voxel size of 3.3 × 3.3 × 2.0 mm\(^3\) and transaxial matrix size of 220 × 220 and extended axially over 132 to 153 slices. Image intensities in each image volume were normalized to the [0, 1] range. Image patches of 32 × 32 × 32 voxels were extracted with a partial overlap of at least 25%. The whole process resulted in a total of 276048 pairs of volumes used for neural network input. Additional run-time data augmentation was applied in the form of random flips in all 3 dimensions and a gamma correction of both input and output, simultaneously, with \(\gamma = (1 + |G|)^{\textrm{sign}(G)}\), where G is a normally distributed random variable with zero mean and standard deviation of 0.5.

For the network, we developed a modified residual 3D U-Net CNN architecture (see Fig. 1). The key modification consists of an additional connection bypassing the whole CNN in a way that only minimal corrections to the input image have to be learned, which corresponds to residual learning similar to [6, 8, 34]. In particular, the input image was transformed using an inverse sigmoid (logit) function and then added to the output of the U-Net before the final sigmoid activation layer. This way, the output intensities are limited to a [0, 1] range and zero output of the U-Net (i.e. no corrections proposed) translates to an unaltered resulting image. Additionally, the proposed design of the bypass connection makes it difficult for the CNN to change low and high image intensity values, which is beneficial for the task at hand. The Apache MXNet (version 1.9.1) package for the R language and environment for statistical computing (version 4.3.1) [35] was used for implementation. Training was performed on a Ubuntu Linux 22.04.3 system on four NVIDIA V100S Tesla graphic processing units (CUDA: 11.7; driver: 515.65.01) equipped with 32GiB of graphics memory and available 512 GiB of system memory for the two 2.6 GHz Intel Xeon Gold 6132 CPUs, providing a total of 56 processing cores for the preprocessing.

Fig. 1
figure 1

Architecture of the U-Net based CNN. Numbers above and beside each block designate the number of feature channels and matrix size at the given state, respectively. Separate blocks show the structure of residual and bypass connections. Ellipsis in the bypass block designate the main body of the U-Net. The image on the left indicates an example input PET volume and the image on the right is the corresponding output PET volume

A 5-fold cross-validation scheme with five equally sized folds was employed to assess the network performance in all data. For each of the five training runs, data from 64% of the patients were assigned for training, 16% for validation, and 20% for testing data sets, respectively while ensuring that all single-gate volumes of selected patient data sets were not split throughout these partitions. Network training was performed for a maximum of 400 epochs with the Adadelta optimizer, a batch size of 128 and using a Mean Absolute Error loss function. The training process was monitored by calculating the Mean Squared Error (MSE) metric in the validation phase. Training was stopped if no improvement in the evaluation metric was recorded for 50 epochs. The five models achieving the lowest MSE in the respective validation data were selected for further evaluation.

Data analysis and data evaluation

Influence of filtering on lesion SUV quantification was assessed by considering the absolute differences

$$\begin{aligned} \Delta \text {SUV}_\text {max}^{A,B} = \text {SUV}_\text {max}^A - \text {SUV}_\text {max}^B\, \end{aligned}$$

as well as symmetrical fractional differences

$$\begin{aligned} \delta \text {SUV}_\text {max}^{A,B} = \frac{\Delta \text {SUV}_\text {max}^{A,B}}{\left[ \text {SUV}_\text {max}^A+\text {SUV}_\text {max}^B\right] / 2} \times 100\%, \end{aligned}$$

where superscripts A and B denote the image volume types to be compared (two out of STD: vendor-provided/unfiltered, BF: BF-processed, CNN: CNN-processed). Note, that all possible combinations, i.e. (STD,BF), (STD,CNN), (BF,CNN), were considered.

Furthermore, the image noise level was quantified by

$$\begin{aligned} \text {Noise}^A = \frac{\text {SUV}_{\text {sd},\text {liver}}^A}{\text {SUV}_{\text {mean},\text {liver}}^A} \times 100\% \, \end{aligned}$$

where \(\text {SUV}_{\text {sd},\text {liver}}^A\) and \(\text {SUV}_{\text {mean},\text {liver}}^A\) are the standard deviation and the mean of the SUV values in the liver ROI, respectively. The absolute and fractional noise level differences between image types A and B, \(\Delta \text {Noise}^{A,B}\) and \(\delta \text {Noise}^{A,B}\), are defined analogously to the respective \(\text {SUV}_\text {max}\) differences.


The automated procedure used for optimal filter parameter determination successfully chose adequate filter parameters for 50 out of the 69 patient data sets according to consensus of two experienced observers. The other 19 data sets where automatic filter parameter selection did not produce optimal results were mostly considered to be the result of over-smoothing but partly also exhibiting obvious artifacts such as too abrupt intensity changes at lesion boundaries. For these data sets, the BF parameters where adjusted manually according to the stated objective of maximizing overall noise-reduction while avoiding or minimizing resolution loss in hot lesions without causing too sharp edges. The manual optimization of BF parameter settings took between one and five minutes per patient underlining the desirability of a fully-automated approach for clinical use. After this manual intervention the overall distribution of filter parameter values displayed in Fig. 2 was obtained.

Fig. 2
figure 2

Scatter plot of the bilateral filter parameters \(\sigma _I\) and \(\sigma _S\) derived for (and applied for filtering of) the 69 gated studies used for neural network training. Note the significant inter-study and inter-tracer variability of the filter parameters

All five neural network training runs finished successfully. The early stop criterion was met in all folds after \(286 \pm 53\) epochs. Total training time was approximately 11 days with the aforementioned computational hardware.

Computation time required for processing a single image volume with the final trained CNN is about 2–4 s on standard computing hardware, which is comparable to the time required for bilateral filtering of the same data. A summary of the quantitative evaluation can be found in Table 1. Evaluation of the focal uptake ROIs revealed that CNN as well as BF processing essentially maintain the focal \(\text {SUV}_\text {max}\) values of the unfiltered images with a \(\text {mean} \pm \text {SD}\) of \(\delta \text {SUV}_\text {max}^{\text {CNN},\text {STD}}=(-3.9\pm 5.2)\%\) and \(\delta \text {SUV}_\text {max}^{\text {BF},\text {STD}}= (-4.4\pm 5.3)\%\), respectively. Overall, there is a slight tendency for reduction of \(\text {SUV}_\text {max}\) in the CNN and BF images compared to STD, which is to be expected due to the effect of smoothing in the vicinity of the maximum voxel reducing the known noise-induced systematic positive bias exhibited by the \(\text {SUV}_\text {max}\) metric. However, in some cases the reduction exceeded 15%, which hints at actual moderate signal recovery loss. Figure 3 illustrates these results. On the left, \(\delta \text {SUV}_\text {max}\) is plotted against \(\text {SUV}_\text {max}\) determined in the vendor-provided images. The plots on the right show \(\delta \text {SUV}_\text {max}\) as a function of the ratio of \(\text {SUV}_\text {max}\) and \(\text {SUV}_\text {mean}\) in the liver (focal uptake to liver ratio, FLR), again determined in the vendor-provided STD images. Results for BF versus STD are shown in the top row and results for CNN versus STD in the middle row. Obviously, \(\text {SUV}_\text {max}\) of the ROIs showing large \(\delta \text {SUV}_\text {max}\) exhibit a tracer accumulation comparable to the liver uptake.

Regarding the relative performance of CNN and BF, the bottom row in Fig. 3 demonstrates that both methods lead to very similar \(\text {SUV}_\text {max}\) values in the vast majority of cases with an overall average difference of \(\delta \text {SUV}_\text {max}^{\text {CNN},\text {BF}}=(0.5\pm 4.8)\%\). Still, there are isolated instances with distinctly higher fractional deviations between both methods in the presence of reduced tracer uptake as can be seen in Fig. 3. The boxplots in Fig. 4 show \(\delta \text {SUV}_\text {max}\) for the three investigated tracers separately. For \([^{68}\text {Ga}]\)DOTATATE there appears to be a small bias towards larger \(\text {SUV}_\text {max}\) values after CNN filtering but overall the effect is statistically not significant (P = 0.689). Representative examples of BF and CNN processed images are compared to the respective unprocessed images in Fig. 5. Further examples including some corner cases (Figure S1) as well as additional combined violin and boxplots (Figure S3) are provided in the supplementary material.

Fig. 3
figure 3

Relative \(\delta \text {SUV}_\text {max}\) differences as a function of absolute \(\text {SUV}_\text {max}\) (left) and as a function of focal uptake to liver ratio (FLR) (right) in the unprocessed STD images. Top row: BF versus STD, middle row: CNN versus STD, bottom row: CNN versus BF. Note the logarithmic scale of the abscissa

Fig. 4
figure 4

Boxplots of the relative \(\delta \text {SUV}_\text {max}\) differences in \(\text {SUV}_\text {max}\) separately for the three included tracers A: BF versus STD, B: CNN versus STD, C: CNN versus BF

Evaluation of the noise properties of the differently processed images showed that CNN filtering mostly satisfactorily reproduces the noise level and characteristics of bilateral filtering. \(\delta \text {Noise}^{\text {CNN},\text {BF}}\) was on average \((5.6\pm 10.5)\%\). Both, CNN and BF reduced noise compared to their STD image counterparts distinctly, on average by −79.7% and −84.3%, respectively. Overall, noise in the CNN-filtered data was on average only slightly larger than in the BF-filtered images (P = 0.006). However, in rare cases \(\delta \text {Noise}^{\text {CNN},\text {BF}}\) turned out to exceeded 20%. Visual inspection of the corresponding image data revealed two main reasons for this phenomenon: (a) spurious, noise-induced hotspots, which were removed by BF but not by CNN leading to higher residual noise in the CNN-processed image in comparison to BF, (b) inability to reliably determine the noise metric in the liver due to absence of sufficiently large homogeneous regions within the liver. While (a) is a real effect, (b) rather points to limitations of the available noise metric.

Fig. 5
figure 5

Examples of the differently processed data. Left: vendor-provided/unprocessed; middle: BF processed; right: CNN processed. The top row A shows an example (\([^{18}\text {F}]\)L-DOPA) of nearly identical performance of BF and CNN. The middle row B shows an example (\([^{18}\text {F}]\)FDG) with notable reduction of \(\text {SUV}_\text {max}\) after CNN processing, which is not present after BF processing. The bottom row C shows an example (\([^{18}\text {F}]\)FDG) where CNN performs notably better then BF regarding \(\text {SUV}_\text {max}\)


In the present study we have investigated to what extent edge preserving noise reduction in PET, as achieved by case-specific manually tuned (thus, time-consuming) bilateral filtering, can be mimicked by a neural network. Our results clearly show that this is possible. The difference in \(\text {SUV}_\text {max}\) between BF and CNN processing was below 10% in the majority of the cases and tracer independent in the considered group of tracers (\([^{18}\text {F}]\)FDG, \([^{18}\text {F}]\)L-DOPA, \([^{68}\text {Ga}]\)DOTATATE), notably including \([^{68}\text {Ga}]\)DOTATATE with its extremely high target to background ratios. Nevertheless, in certain situations notable differences between CNN and BF were observed with \(\text {SUV}_\text {max}\) differences up to 30%. On closer inspection, all affected ROIs were, however, those with the lowest tracer uptake in the respective image volume and had — intentionally as per the chosen ROI selection strategy — an uptake similar to the liver (see Fig. 3F). The ground truth BF parameters were optimized in such a way as to reduce noise measured in the liver ROI while maintaining \(\text {SUV}_\text {max}\) in all other ROIs including the low uptake ROIs. This is challenging when both, focal uptake ROI and liver ROI, have a comparable uptake. Nevertheless, the objective — noise reduction without signal recovery loss — was possible to achieve for most of the low uptake ROIs but not for the above mentioned extreme cases where application of BF leads to a sizable unwanted reduction of \(\text {SUV}_\text {max}\) compared to the unfiltered STD data, as can be seen in Fig. 3A, B.

This behaviour might be considered a general limitation of bilateral filtering: even though the filter is locally adaptive, the filter parameters apply globally. They, therefore, cannot be simultaneously chosen in such a way that the filter reduces noise in one region and preserves \(\text {SUV}_\text {max}\) in another region when both regions have similar uptake. The root cause is the fact that bilateral filters typically contain a rather broad spatial component characterised by \(\sigma _S\), which achieves satisfactory spatial smoothing — and thus noise reduction — in homogeneous regions like the liver. The filter’s intensity component suppresses certain parts of the spatial filter kernel wherever a sizable — as determined by \(\sigma _I\) — intensity change (e.g. at object edges) occurs. The filter parameters together thus define, in a certain sense, which intensity changes across the spatial extension of the filter kernel are to be considered as noise and which intensity changes are to be considered as “object edges”. This distinction becomes increasingly less clear-cut when the target’s uptake approaches the general noise level in the image. It thus might be concluded that for very low uptake lesions the bilateral filter starts to act too similar to a purely spatial filter. It is then interesting to note that in our study CNN processing did in fact not over-smooth these low uptake ROIs as severely as BF (see Fig. 3E, F; positive values). On the other hand, CNN processing also exhibited issues in certain corner cases (see Fig. 3C, D) which were better handled by BF. Altogether, it can be stated that both, BF and CNN, can yield unsatisfactory results for target regions with very low uptake. But whenever the uptake clearly exceeds the liver uptake both filters result in comparable \(\text {SUV}_\text {max}\) values, with only minor signal recovery loss compared to the original data (see Fig. 3, left column).

Noise after CNN processing was slightly larger than after BF processing with average relative difference of 5.6%. In a few cases, CNN seemingly performed much worse than BF, but closer inspection revealed that this was often a consequence of shortcomings of the noise metric that relies on presence of sufficiently large homogeneous liver regions: in the mentioned cases the liver exhibited unusually heterogeneous tracer distribution and/or multiple lesions, which made determination of the noise metric unreliable.

It should be noted, that the CNN training takes as input only the original data and the BF processed data as the targeted “ground truth”. Information on how the target data are created is not provided. The network does not “know” about the ROIs which were used to optimize the bilateral filter and the network is not trained to reproduce the parameters of the bilateral filter but to reproduce the resulting filtered denoised image. In principle, one might argue, then, that it should have been possible, maybe preferable, to use the low noise sum of all 8 gates as target image in the respective patient instead of generating a target image individually for each gate by application of an optimized BF. In fact, using the summed low noise image would correspond to the usual strategy for training denoisers in PET. Here, too, the usual prerequisite to training of such denoisers is availability of high-quality (low noise) ground truth data. More often than not “high quality” just denotes the typical clinically achievable quality of (ungated) investigations performed with standard protocols regarding scan duration and injected dose. The approach mostly is to just train the network to reproduce that given standard quality when fed with subsets — emulating shortened scan times — of the full data in order to allow dose or scan time reduction in future applications. In the presence of breathing induced motion blurring, however, the training will obviously strive at denoising a motion blurred noisy image volume in such a way as to create a low noise motion blurred image: the denoising cannot be really separated from the general texture properties (including quality of object edge definition, i.e. spatial resolution). Gated images, however, differ in obvious ways from motion blurred images with comparable noise. We thus believe that application of conventionally trained denoisers using ungated training data to gated studies would require further evaluation regarding validity of the results and extent of loss in spatial resolution (as we surmise). On the other hand, using the ungated/summed data in our present study as ground truth instead of the BF-processed single gates would amount to training the network to produce a low noise motion blurred image from a noisy motion-free image. This would obviously be rather pointless in our view.

We therefore believe that — for the special case of respiratory-gated studies — the approach proposed in the present study is offering an interesting, probably superior, alternative to the conventional approach of training the denoiser on subsets of the available full data while using the full data as ground truth. In our view, single-gate denoising will also offer benefits in the context of respiratory motion correction.

Regarding application of DL based denoising in PET, only a few DL studies report on quantitative accuracy of their methods. In [11] noise reduction was performed with a GAN. There, the authors reported a \(\text {SUV}_\text {mean}\) bias of \((0.45 \pm 5.59)\%\) in normal tissue and \(\text {SUV}_\text {max}\) bias of \((-0.84 \pm 6.94)\%\) in lesions. The same architecture was used in [10] to denoise PET images. Here, a \(\text {SUV}_\text {max}\) bias of up to 35% in lung nodules was found. Finally, in [14] a semi-supervised learning approach was used featuring two autoencoders extracting image features at two noise levels with a translation layer in between. For the six evaluated lung nodules \(\text {SUV}_\text {max}\) bias was approximately (15–20)%. In our study the \(\text {SUV}_\text {max}\) bias was \((0.5\pm 4.8)\%\), which is in line with the result reported in [11]. Also, similar outliers as reported in [10, 14] were present in our investigation.

A limitation of the current study is the above-mentioned partly unreliable determination of the noise level using the liver. Another limitation is the rather low number of included patients (N = 69). Altogether 552 image volumes were included, but each patient contributed 8 respiratory gates representing individual respiratory states. Even though these images are far from being identical, they are of course rather similar, apart from breathing induced relative deformation. This low number of independent datasets could lead to a potentially unbalanced training data set. This is, for example, very likely the cause for the small positive bias of \(\text {SUV}_\text {max}\) in CNN filtered \([^{68}\text {Ga}]\)DOTATATE data (Fig. 4C). Such data have typically much higher SUV values than measurements with the other two tracers, but were underrepresented in the current data sample (N = 11 versus N = 58). It is also worth noting that the performance of the network was evaluated via the mentioned 5-fold cross-validation scheme only. Further testing with dedicated external data from unseen sources is necessary for better understanding of the CNN performance especially with respect to its generalization capabilities. Thus, inclusion of more data will potentially remove these limitations. Another potential shortcoming is the fact that the bilateral filter was restricted to three spatial dimensions and not extended to 4D by including the temporal dimension (the gates). We believe, however, that it is by no means obvious whether this would have been beneficial since it would potentially require to sacrifice the already low temporal resolution (due to the rather low number of gates) by smoothing along the temporal dimension. Therefore, a separate investigation of the usefulness of 4D bilateral smoothing of gated studies — and, by extension, the corresponding 4D CNNs — would be valuable.


Our results show that a DL-based denoising utilizing a 3D U-Net CNN architecture can reproduce the results of a case by case optimized BF in a fully automated way. Apart from rare cases, CNN and BF processing lead to images of practically identical quality regarding noise level, edge preservation, and signal recovery. We believe such a network might proof especially useful in the context of improved motion correction of respiratory-gated PET studies but can also help to establish BF-equivalent edge-preserving CNN filtering in clinical PET since it obviates time consuming manual BF parameter tuning.

Availability of data and materials

The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.


  1. Tomasi C, Manduchi R. Bilateral filtering for gray and color images. In: Sixth international conference on computer vision (IEEE Cat. No.98CH36271), Narosa Publishing House, Bombay, India; 1998, pp. 839–846.

  2. Hofheinz F, Langner J, Beuthien-Baumann B, Oehme L, Steinbach J, Kotzerke J, van den Hoff J. Suitability of bilateral filtering for edge-preserving noise reduction in PET. EJNMMI Res. 2011;1(1):23.

    Article  PubMed Central  PubMed  Google Scholar 

  3. Langner J, Hofheinz F, Beuthien-Baumann B, van den Hoff J. Locally adaptive filtering for edge preserving noise reduction on images with low SNR in PET. In: 2011 IEEE nuclear science symposium conference record, IEEE, Valencia, Spain; 2011, pp. 3102–3104.

  4. Izadi S, Sutton D, Hamarneh G. Image denoising in the deep learning era. Artif Intell Rev. 2022;56(7):5929–74.

    Article  Google Scholar 

  5. Matsubara K, Ibaraki M, Nemoto M, Watabe H, Kimura Y. A review on AI in PET imaging. Ann Nucl Med. 2022;36(2):133–43.

    Article  PubMed  Google Scholar 

  6. Wang Y-R, Wang P, Adams LC, Sheybani ND, Qu L, Sarrami AH, Theruvath AJ, Gatidis S, Ho T, Zhou Q, Pribnow A, Thakor AS, Rubin D, Daldrup-Link HE. Low-count whole-body PET/MRI restoration: an evaluation of dose reduction spectrum and five state-of-the-art artificial intelligence models. Eur J Nucl Med Mol Imag. 2023;50(5):1337–50.

    Article  Google Scholar 

  7. Schaefferkoetter J, Yan J, Ortega C, Sertic A, Lechtman E, Eshet Y, Metser U, Veit-Haibach P. Convolutional neural networks for improving image quality with noisy PET data. EJNMMI Res. 2020.

    Article  PubMed Central  PubMed  Google Scholar 

  8. Chen KT, Schürer M, Ouyang J, Koran MEI, Davidzon G, Mormino E, Tiepolt S, Hoffmann K-T, Sabri O, Zaharchuk G, Barthel H. Generalization of deep learning models for ultra-low-count amyloid PET/MRI using transfer learning. Eur J Nucl Med Mol Imag. 2020;47(13):2998–3007.

    Article  CAS  Google Scholar 

  9. Liu H, Wu J, Lu W, Onofrey JA, Liu Y-H, Liu C. Noise reduction with cross-tracer and cross-protocol deep transfer learning for low-dose PET. Phys Med Biol. 2020;65(18):185006.

    Article  CAS  PubMed  Google Scholar 

  10. Lu W, Onofrey JA, Lu Y, Shi L, Ma T, Liu Y, Liu C. An investigation of quantitative accuracy for deep learning based denoising in oncological PET. Phys Med Biol. 2019;64(16):165019.

    Article  CAS  PubMed  Google Scholar 

  11. Zhou L, Schaefferkoetter JD, Tham IWK, Huang G, Yan J. Supervised learning with cyclegan for low-dose FDG PET image denoising. Med Image Anal. 2020;65:101770.

    Article  PubMed  Google Scholar 

  12. Xiang L, Qiao Y, Nie D, An L, Lin W, Wang Q, Shen D. Deep auto-context convolutional neural networks for standard-dose PET image estimation from low-dose PET/MRI. Neurocomputing. 2017;267:406–16.

    Article  PubMed Central  PubMed  Google Scholar 

  13. Gong K, Guan J, Liu C-C, Qi J. PET image denoising using a deep neural network through fine tuning. IEEE Trans Radiat Plasma Med Sci. 2019;3(2):153–61.

    Article  PubMed  Google Scholar 

  14. Jiang C, Pan Y, Cui Z, Nie D, Shen D. Semi-supervised standard-dose PET image generation via region-adaptive normalization and structural consistency constraint. IEEE Trans Med Imag. 2023.

    Article  Google Scholar 

  15. Costa-Luis CO, Reader AJ. Micro-networks for robust MR-guided low count PET imaging. IEEE Trans Radiat Plasma Med Sci. 2021;5(2):202–12.

    Article  PubMed  Google Scholar 

  16. Ulyanov D, Vedaldi A, Lempitsky V. Deep image prior. Int J Comput Vis. 2020;128(7):1867–88.

    Article  Google Scholar 

  17. Cui J, Gong K, Guo N, Wu C, Meng X, Kim K, Zheng K, Wu Z, Fu L, Xu B, Zhu Z, Tian J, Liu H, Li Q. PET image denoising using unsupervised deep learning. Eur J Nucl Med Mol Imag. 2019;46(13):2780–9.

    Article  Google Scholar 

  18. Hashimoto F, Ohba H, Ote K, Teramoto A, Tsukada H. Dynamic PET image denoising using deep convolutional neural networks without prior training datasets. IEEE Access. 2019;7:96594–603.

    Article  Google Scholar 

  19. Hashimoto F, Ohba H, Ote K, Kakimoto A, Tsukada H, Ouchi Y. 4d deep image prior: dynamic PET image denoising using an unsupervised four-dimensional branch convolutional neural network. Phys Med Biol. 2021;66(1):015006.

    Article  CAS  PubMed  Google Scholar 

  20. Lehtinen J, Munkberg J, Hasselgren J, Laine S, Karras T, Aittala M, Aila T. Noise2Noise: learning image restoration without clean data. arXiv (2018).

  21. Moran N, Schmidt D, Zhong Y, Coady P. Noisier2noise: learning to denoise from unpaired noisy data. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), IEEE, Seattle; 2020. pp. 12064–12072.

  22. Krull A, Buchholz T-O, Jug F. Noise2void: learning denoising from single noisy images. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), IEEE, Long Beach; 2019, pp. 2129–2137.

  23. Yie SY, Kang SK, Hwang D, Lee JS. Self-supervised PET denoising. Nucl Med Mol Imag. 2020;54(6):299–304.

    Article  Google Scholar 

  24. Song T-A, Dutta J. Noise2void denoising of PET images. In: 2020 IEEE nuclear science symposium and medical imaging conference (NSS/MIC), IEEE, Boston; 2020, pp. 1–2.

  25. Kang S-K, Yie S-Y, Lee J-S. Noise2noise improved by trainable wavelet coefficients for PET denoising. Electronics. 2021;10(13):1529.

    Article  Google Scholar 

  26. Gadde R, Jampani V, Kiefel M, Kappler D, Gehler PV. Superpixel convolutional networks using bilateral inceptions. In: Computer vision: ECCV 2016, Springer, Amsterdam, The Netherlands; 2016, pp. 597–613.

  27. Jampani V, Kiefel M, Gehler PV. Learning sparse high dimensional filters: image filtering, dense CRFs and bilateral neural networks. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), IEEE, Las Vegas; 2016, pp. 4452–4461.

  28. Xu Q, Wang L, Wang Y, Sheng W, Deng X. Deep bilateral learning for stereo image super-resolution. IEEE Sign Process Lett. 2021;28:613–7.

    Article  Google Scholar 

  29. Wagner F, Thies M, Gu M, Huang Y, Pechmann S, Patwari M, Ploner S, Aust O, Uderhardt S, Schett G, Christiansen S, Maier A. Ultralow-parameter denoising: trainable bilateral filter layers in computed tomography. Med Phys. 2022;49(8):5107–20.

    Article  PubMed  Google Scholar 

  30. Wagner F, Thies M, Denzinger F, Gu M, Patwari M, Ploner S, Maul N, Pfaff L, Huang Y, Maier A. Trainable joint bilateral filters for enhanced prediction stability in low-dose CT. Sci Rep. 2022.

    Article  PubMed Central  PubMed  Google Scholar 

  31. Patwari M, Gutjahr R, Raupach R, Maier A. JBFnet: low dose CT denoising by trainable joint bilateral filtering. In: Medical image computing and computer assisted intervention—MICCAI 2020, Springer, Lima, Peru; 2020, pp. 506–515.

  32. Patwari M, Gutjahr R, Raupach R, Maier A. Limited parameter denoising for low-dose x-ray computed tomography using deep reinforcement learning. Med Phys. 2022;49(7):4540–53.

    Article  PubMed  Google Scholar 

  33. Wu Q, Tang H, Liu H, Chen YC. Masked joint bilateral filtering via deep image prior for digital x-ray image denoising. IEEE J Biomed Health Inform. 2022;26(8):4008–19.

    Article  PubMed  Google Scholar 

  34. Spuhler K, Serrano-Sosa M, Cattell R, DeLorenzo C, Huang C. Full-count PET recovery from low-count image using a dilated convolutional neural network. Med Phys. 2020;47(10):4928–38.

    Article  PubMed  Google Scholar 

  35. R Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria; 2022. R Foundation for Statistical Computing.

Download references


Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations



JM contributed to the conception of the study, writing of software, data analysis, interpretation of the data, writing of the manuscript and final approval. PN contributed to the neural network design, data analysis, network training and writing of the manuscript. FH contributed to the selection of the data, data analysis and writing of the manuscript. JP provided intellectual input. AB and JK contributed the respiratory-gated data and provided intellectual input. JVDH contributed to the conception of the study, interpretation of the data and writing of the manuscript. All authors have contributed to the final revision of the manuscript and have approved it.

Corresponding author

Correspondence to Jens Maus.

Ethics declarations

Ethics approval and consent to participate

The study was approved by the institutional ethics committee of the University Hospital Carl Gustav Carus of the Technische Universität Dresden (BO-EK-513122023) and written informed consent was obtained from all included individual participants. Retrospective analysis was conducted in accordance with the guidelines of the International Conference on Harmonization/Good Clinical Practice and the principles of the 1964 Helsinki Declaration and its later amendments.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Maus, J., Nikulin, P., Hofheinz, F. et al. Deep learning based bilateral filtering for edge-preserving denoising of respiratory-gated PET. EJNMMI Phys 11, 58 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: