This section is structured as follows: the mathematical formalism and necessary preconditions of the approach are presented in the “Foundation” section. As the hardware assembly process inevitably introduces rotations, tilts, and small offsets in the alignment of the optical cameras with respect to their harboring pinholes, a calibration scheme, given in the “Calibration” section, is needed to assess these axis alignment differences. A depth prior of the gamma activity has to be determined to further improve the augmentation: a working distance of the detector to the activity inside the patient’s lymphatics needs to be estimated. For this to be valid, we assume isotropic radiation emission. The pose parameters (the position and orientation of a camera with respect to the origin) from the calibration are reused to obviate the need for a new ad hoc pose estimation during each intervention. It is generally hard to accomplish this with sufficient accuracy in a bright lit surgical setting, especially for cameras with low-resolution sensors. Thus, supportive error minimization schemes are needed and discussed in the “Error minimization schemes” section. The technical specifications of the collimator, the miniature endoscopic CMOS cameras and the detector are given in the “Hardware” section. The augmentation algorithm of the optical camera image with gamma information, based on these principles, is explained in the Appendix.
Foundation
The goal is to overlay a point that is seen by one pinhole camera, in our case the optical camera, with information from the other pinhole camera, in our case the gamma camera, when the configurations of the two cameras are known but not their distances to the projected world point, whose coordinates are given in relation to the origin. First, some definitions (cf. Fig. 2):
-
Let {C} and {C′} be the coordinate systems or frames of two cameras.
-
Let X be a point in 3D space, given by x=(x1,x2,x3)T in {C} and
by \(\mathbf {x^{\prime }}=(x_{1}^{\prime },x_{2}^{\prime },x_{3}^{\prime })^{T}\) in {C′}, respectively, where \(x_{3},x_{3}^{\prime }\) give the values along the optical axes of the respective cameras (i.e., the distance or “depth”).
-
Let T be the transformation from {C} to {C′}, that is
$$ \left(\begin{array}{c} \mathbf{x}^{\prime}\\ 1 \end{array}\right)=T\,\left(\begin{array}{c} \mathbf{x}\\ 1 \end{array}\right), $$
(1)
where
$$ T=\left[\begin{array}{cc} R & \mathbf{t}\\ 0 & 1 \end{array}\right] $$
(2)
with the 3×3 rotational part R=[ρ1|ρ2|ρ3]T (i.e., ρ. form the rows (!) of R) and translational part t=(t1,t2,t3)T, both known from the calibration step (“Calibration” section).
-
Let y=(y1,y2)T and \(\mathbf {y}^{\prime }=(y_{1}^{\prime },y_{2}^{\prime })^{T}\) be the points on the respective image planes in the units of the world coordinates (i.e., mm, units of the calibration target) and let
$$ \tilde{\mathbf{y}}=\frac{1}{f}\thinspace\mathbf{y}\quad\text{and}\quad\tilde{\mathbf{y}}^{\prime}=\frac{1}{f^{\prime}}\thinspace\mathbf{y}^{\prime} $$
(3)
with \(\tilde {\mathbf {y}}=(\tilde {y}_{1},\tilde {y}_{2})^{T}\) and \({\tilde {\mathbf {y}}^{\prime }}=\left (\tilde {y}_{1}^{\prime },\tilde {y}_{2}^{\prime }\right)^{T}\) be these very plane coordinates, normalized by the respective focal lengths f,f′, both intrinsic parameters known.
-
Let Id be the d×d identity matrix.
Making use of the intercept theorem, we observe that for camera {C}, the following equalities hold (assuming f>0,x3>0):
$$\begin{array}{*{20}l} \left(\begin{array}{c} y_{1}\\ y_{2}\\ f \end{array}\right) & =\frac{f}{x_{3}}\, \begin{array}{c} \mathbf{x} \end{array} \end{array} $$
(4)
$$\begin{array}{*{20}l} \Longleftrightarrow\left(\begin{array}{c} \tilde{y}_{1}\\ \tilde{y}_{2}\\ 1 \end{array}\right) & =\frac{1}{x_{3}}\, \begin{array}{c} \mathbf{x} \end{array}, \end{array} $$
(5)
$$\begin{array}{*{20}l} \Longleftrightarrow x_{3}\,\left(\begin{array}{c} \tilde{y}_{1}\\ \tilde{y}_{2}\\ 1 \end{array}\right) & = \begin{array}{c} \mathbf{x}\\ \end{array}, \end{array} $$
(6)
and likewise for {C′} (assuming \(f^{\prime }>0,x_{3}^{\prime }>0\)):
$$ \left(\begin{array}{c} \tilde{y}_{1}^{\prime}\\ \tilde{y}_{2}^{\prime}\\ 1 \end{array}\right)=\frac{1}{x_{3}^{\prime}}\, \begin{array}{c} \mathbf{x}^{\prime} \end{array}. $$
(7)
Let us now assume that we measure \(\tilde {\mathbf {y}}\)(or rather: y, which immediately gives us \(\tilde {\mathbf {y}}\)) for the point X in {C}’s camera plane and that we want to determine the respective\(\tilde {\mathbf {y}}^{\prime }\) in {C′}’s camera plane, solely based on that measurement\(\tilde {\mathbf {y}}\). Let us therefore express {C′}’s world coordinates in terms of {C}—which we can do by making use of Eq. (1)—and insert in Eq. (7), which gives us
$$ \left(\begin{array}{c} \tilde{y}_{1}^{\prime}\\ \tilde{y}_{2}^{\prime}\\ 1\\ 1 \end{array}\right)=\left[\begin{array}{cc} \frac{1}{x_{3}^{\prime}}\,I_{3} & 0\\ 0 & 1 \end{array}\right]\,T\,\left(\begin{array}{c} \mathbf{x}\\ 1 \end{array}\right). $$
(8)
Now let us insert Eq. (6), that is, {C}’s image plane coordinates, on the right:
$$\begin{array}{*{20}l} \left(\begin{array}{c} \tilde{y}_{1}^{\prime}\\ \tilde{y}_{2}^{\prime}\\ 1\\ 1 \end{array}\right) & = S_{c'}\, T\, S_{c}\, \left(\begin{array}{c} \tilde{y}_{1}\\ \tilde{y}_{2}\\ 1\\ 1 \end{array}\right) \end{array} $$
(9)
with
$$\begin{array}{*{20}l} S_{c^{\prime}} = \left[\begin{array}{cc} \frac{1}{x_{3}^{\prime}}\,I_{3} & 0\\ 0 & 1 \end{array}\right], \ S_{c} = \left[\begin{array}{cc} x_{3}\,I_{3} & 0\\ 0 & 1 \end{array}\right]. \end{array} $$
(10)
We denote \(S_{c^{\prime }}\) and Scscaling matrices (cf. Eq. (7)) and above Eq. (9) can equivalently be written as
$$ \tilde{\mathbf{y}}^{\prime}=\frac{x_{3}}{x_{3}^{\prime}}\,R\,\tilde{\mathbf{y}}+\frac{\mathbf{t}}{x_{3}^{\prime}}. $$
(11)
Here, we realize that we can express x3 in terms of \(x_{3}^{\prime }\) via Eqs. (1)+(6). In particular, we may write
$$\begin{array}{*{20}l} x_{3}^{\prime} & =x_{3}\,\rho_{3}\cdot\tilde{\mathbf{y}}+t_{3} \end{array} $$
(12)
$$\begin{array}{*{20}l} \Longleftrightarrow x_{3} & =\frac{x_{3}^{\prime}-t_{3}}{\rho_{3}\cdot\tilde{\mathbf{y}}}, \end{array} $$
(13)
which enables us to rewrite Eq. (11) as
$$\begin{array}{*{20}l} \tilde{\mathbf{y}}^{\prime} & =\frac{1-\nicefrac{t_{3}}{x_{3}^{\prime}}}{\rho_{3}\cdot\tilde{\mathbf{y}}}\,R\,\tilde{\mathbf{y}}+\frac{\mathbf{t}}{x_{3}^{\prime}} \end{array} $$
(14)
$$\begin{array}{*{20}l} \Longleftrightarrow\tilde{\mathbf{y}}^{\prime} & =\frac{1}{\rho_{3}\cdot\tilde{\mathbf{y}}}\,R\,\tilde{\mathbf{y}}+\frac{1}{x_{3}^{\prime}}\,\left(\frac{-t_{3}}{\rho_{3}\cdot\tilde{\mathbf{y}}} \,R\,\tilde{\mathbf{y}}+\mathbf{t}\right). \end{array} $$
(15)
Approximating \(\tilde {\mathbf {y}}^{\prime }\) without knowing \(x_{3}^{\prime }\)
In Eq. (15), only \(x_{3}^{\prime }\) remains unknown, which, in turn, only affects the right term of the equation. Assuming the right term was zero, we could exactly express \(\tilde {\mathbf {y}}^{\prime }\) in terms of \(\tilde {\mathbf {y}}\), namely
$$ \tilde{\mathbf{y}}^{\prime}=\frac{1}{\rho_{3}\cdot\tilde{\mathbf{y}}}\,R\,\tilde{\mathbf{y}}. $$
(16)
For approximating \(\tilde {\mathbf {y}}^{\prime }\) without knowing \(x_{3}^{\prime }\), we can thus try to change our camera setup so that we minimize the right term of Eq. (15), making Eq. (16) a good approximation of \(\tilde {\mathbf {y}}^{\prime }\). We have two options for this:
-
1.
Simultaneously moving the cameras away from the imaged point X in the direction of {C′}’s optical axis or vice versa, leaving T unchanged, thus
$$ x_{3}^{\prime}\rightarrow\infty\Longleftrightarrow\nicefrac{1}{x_{3}^{\prime}}\rightarrow0\Longrightarrow\frac{1}{x_{3}^{\prime}}\, \left(\frac{-t_{3}}{\rho_{3}\cdot\tilde{\mathbf{y}}}\,R\,\tilde{\mathbf{y}}+\mathbf{t}\right)\rightarrow\mathbf{0}. $$
(17)
-
2.
Minimizing the distance |t| between the cameras while keeping the distance \(x_{3}^{\prime }\), by moving {C} as close as possible to {C′}, thus
$$\left|\mathbf{t}\right|\rightarrow0\Longrightarrow\left[-t_{3}\rightarrow0\text{ and }\mathbf{t}\rightarrow\mathbf{0}\right]\Longrightarrow\frac{1}{x_{3}^{\prime}}\,\left(\frac{-t_{3}}{\rho_{3}\cdot \tilde{\mathbf{y}}}\,R\,\tilde{\mathbf{y}}+\mathbf{t}\right)\rightarrow\mathbf{0}. $$
We have a third option to get a good approximation for \(\tilde {\mathbf {y}}^{\prime }\) without knowing \(x_{3}^{\prime }\), even without changing our camera setup: in Eq. (15), we can use a reasonable estimate\(\hat {x}_{3}^{\prime }\) in place of \(x_{3}^{\prime }\) (i.e. \(\hat {x}_{3}^{\prime }\approx x_{3}^{\prime }\)), giving us an estimate \(\hat {\mathbf {y}}^{\prime }\) for \(\tilde {\mathbf {y}}^{\prime }\), namely
$$ \hat{\mathbf{y}}^{\prime}=\frac{1}{\rho_{3}\cdot\tilde{\mathbf{y}}}\,R\,\tilde{\mathbf{y}}+\frac{1}{\hat{x}_{3}^{\prime}}\, \left(\frac{-t_{3}}{\rho_{3}\cdot\tilde{\mathbf{y}}}\,R\,\tilde{\mathbf{y}}+\mathbf{t}\right)\approx\tilde{\mathbf{y}}^{\prime}. $$
(18)
Note that all three options may be combined.
Estimation error
The estimation error \(\mathbf {\epsilon }=\hat {\mathbf {y}}^{\prime }-\tilde {\mathbf {y}}^{\prime }\) is given by
$$ \mathbf{\epsilon}=\left(\frac{1}{\hat{x}_{3}^{\prime}}-\frac{1}{x_{3}^{\prime}}\right)\,\left(\frac{-t_{3}}{\rho_{3}\cdot \tilde{\mathbf{y}}}\,R\,\tilde{\mathbf{y}}+\mathbf{t}\right) $$
(19)
or written for its components εi (i=1,2):
$$ \epsilon_{i}=\left(\frac{1}{\hat{x}_{3}^{\prime}}-\frac{1}{x_{3}^{\prime}}\right)\,\left(\frac{-t_{3}\,\rho_{i}\cdot \tilde{\mathbf{y}}}{\rho_{3}\cdot\tilde{\mathbf{y}}}+t_{i}\right). $$
(20)
Note that using options 1 and 2 only or simply ignoring the right term of Eq. (15) corresponds to estimating \(\hat {x}_{3}^{\prime }\) at infinity; thus, \(\nicefrac {1}{\hat {x}_{3}^{\prime }}=0\) in Eqs. (19)+(20).
From Eqs. (19)+(20), we can see that the alignment of the cameras’ optical axes is critical for a small error: in the extreme case, when the optical axes are perpendicular, \(\rho _{3}\cdot \tilde {\mathbf {y}}=0\), the error becomes unbounded.
Calibration
We now define one pinhole camera as a purely optical camera, in the following called camera, and the other as a fixed pinhole of our multi-pinhole collimator, referred to simply as a pinhole, which form a pair, as they are physically brought together. In order to virtually realign the optical axes of such a pinhole/camera ensemble, their relative rotation R and translation t need to be determined to get T (Eq. (2)). We introduce definitions based on the formalism of the “Foundation” section and the calibration setup shown in Fig. 3. The actual implementation of the calibration setup can be seen in Fig. 4:
-
Let {O} be the world coordinate system’s origin defined by a planar target. This target is the frontal plane of a Cerrobend block with exit pupils for gamma radiation to escape in a directed manner (point source).
-
Let {P} be the coordinate system of the pinhole.
-
Let {C} be the coordinate system of the camera.
-
Let Xo be a list of known world points of the target, given by x=(x1,x2,x3)T
and ∀x∈Xo: x3=0, in world coordinates.
-
Let Yc be the projections of Xo, as detected by the optical camera, in camera image (buffer) coordinates.
-
Let Kc be the given internal parameters of the optical camera (e.g., focal length).
-
Let Ro,to=pose(Xo,Yc,Kc) be a pose estimation algorithm of {O} in {C}.
This algorithm is based on the function solvePnP of the computer vision framework OpenCV [11].
Let us now assume that we spatially measure {P} solely in terms of {O}. This can be done as we know the relative position of the pinhole tp with respect to the target origin {O} during calibration. As the pinhole’s image plane is perpendicularly aligned with the target (cf. Fig. 4), and a pinhole is part of the rigid structure of the collimator frame and thus rotation free, we are allowed to set Rp=I3 and form the matrix P as
$$ P=\left[\begin{array}{cc} I_{3} & \mathbf{t_{p}}\\ 0 & 1 \end{array}\right], $$
(21)
which transforms from {O} to {P}.
This is contrary to the unknown pose (i.e., position, orientation) of the optical camera with respect to {O}, due to rotations, tilts, and small offsets from the manufacturing process of the device. Let us assume that we measure all Xo and match them with the projections Yc. The function pose(Xo,Yc,Kc) then yields the relative measurements Rc,tc of the target in {C}, i.e., the mapping from {O} to {C}. In analogy to Eq. (21), we obtain the matrix C as
$$ C=\left[\begin{array}{cc} R_{c} & \mathbf{t_{c}}\\ 0 & 1 \end{array}\right]. $$
(22)
To get the complete transformation from {P} to {C}, we simply write
$$ R, \mathbf{t} = T = C P^{-1}. $$
(23)
With T, we have determined the necessary transformation to map a world point from the pinhole coordinate system to the camera coordinate system.
In order to validate the correct transformation and mapping of gamma activity onto the optical camera image, the calibration target is equipped with bore holes for inserting vials with tracer material. As the target block is based on a Cerrobend alloy, horizontal exit pupils allow the gamma radiation from the tracer to escape towards the detector in a point-like fashion (cf. Fig. 3). The generated point source activity images or patches can then be compared with the known locations of the exit pupils in the optical image to visually assess the augmentation and therefore the quality of the calibration. In the “Error quantificationErrorquantification” section, we propose a quantitative approach that is shown in the “Results” section.
Error minimization schemes
Recall the goal to overlay optical camera images with gamma activity from the detector. These two modalities must therefore be joined. If for both the camera and the pinhole their spatial relation is known, and they identify the same projected world point (target), we know how to transfer information from one to the other (Fig. 2). This corresponds to a homography formulation between the two sensors. On the other hand, the placement of the sensors can be chosen freely, given again that their relative spatial relation is determined, if and only if we knew exactly the distance(s) from either sensor to the target. This corresponds to solving Eqs. (12), (13), and by consequence Eq. (11) and is a more restrictive formulation of the above general homography. Neither of the two methods can be fully applied in our case by nature of the different modalities of the sensors. We therefore introduce axis alignment as a restriction.
Axis alignment
In the “Foundation” section, we propose three possible options to control the augmentation error ε, and we conclude that axis alignment is the most viable one: moving the camera/pinhole pair away too far from the target decreases the probability of gamma photons, governed by the inverse square law, to reach the collimator and to trigger a signal on the detector. Furthermore, the focal length of the camera does only allow for a certain distance range to produce reasonably sharp images. As there is neither a unique target identification possible (for a potential homography), given the different modalities of the sensors, nor an exact distance measurement available, axis alignment helps to reduce the augmentation error. This is even more important if the activity does not coincide with the optical axes. A good depth estimate compensates for small misalignments ofthe axes (Fig. 5).
New scaling matrices
Our approach needs to be flexible enough to handle variations in the placement of the device, thus also the camera/pinhole pairs, in relation to the radioactive source.
In case the distance to the target is identical to the calibration setup, no new depth estimation is required. However, in a real-world scenario, this will most likely be different; here, it depends also on the depth of the lymph nodes to be detected. From the calibration step (“Calibration” section), we get a relative transform T from {P} to {C} which is valid as long as the camera and the pinhole are not moved or rotated with respect to each other. This can be avoided by mechanical constraints. As for the augmentation to remain valid in a new setup, we must move the pinhole and the camera such that T stays the same. We adjust the scaling matrices of Eq. (9) with the depth estimate x3, and \(x^{\prime }_{3}\)determined from Eq. (11).
We consider an operative distance range of 90–130 mm reasonable. Closer distances increase the photon pressure on the collimator while farther away targets no longer provide enough gamma radiation and incur the performance of the optical camera due to the fixed focal length. These distances also allow the surgeon to operate their biopsy tools without disturbing the confined workspace of the surgical scene further. In the “Results” section, we also show results for distances of 150 mm which are used to test the performance of the method at the system boundaries.
Error quantification
To be able to give a quantitative measure, we first modeled the effect of axis misalignment in the presence of a depth estimate (cf. Fig. 6). Depending on the pose and location of the camera with respect to the pinhole, and the accuracy of the estimate, the differences between the known projection of the world point and its estimated projection and reprojection are evaluated each using the L2 norm to give rise to the metrics (errors) ε(px) and ε(mm) (Fig. 7). The error ε(px) is calculated as the disparity between the known projection of a world point (i.e., the center of the exit pupil) onto the camera sensor and the transformation of that world point (i.e., the center of the activity blob) from the pinhole sensor onto the camera sensor according to Eq. (19) in pixel space. Furthermore, as it is relevant to the surgeon to know how far off from the actual lymph node they will initially place and drive the biopsy tools, the estimated augmentation is reprojected and compared with the world point in world units (mm) and its error expressed as ε(mm). Note that in Fig. 6 we omit the drawing of the pinhole and show the already transformed activities. To calculate this reprojection, we proceed as follows and use the notation from thesections above.
-
Let X′ be the known world point (true source location).
-
Let \(\tilde {\mathbf {y}^{\prime }}\) be the projection of the world point onto the image plane.
-
Let \(\hat {\mathbf {y}^{\prime }}\) be the augmentation, based on a depth estimate.
-
Let \(\hat {\hat {\mathbf {X}}}^{\prime }\) be the reprojection of \(\hat {\mathbf {y}^{\prime }}\) to yield the virtual source location.
-
Let {C′} be the coordinate system of the camera.
The augmentation \(\hat {\mathbf {y}^{\prime }}\) is calculated according to Eq. (18). To get \(\hat {\hat {\mathbf {X}}}^{\prime }\), we find a plane Q whose normal vector n=(n1,n2,1)T is parallel to the line l (the reprojection) through the origin of {C′} and \(\hat {\mathbf {y}^{\prime }}\). We then find the intersection of this line with the plane to get the virtual source location in the coordinate system defined by {C′}. The plane Q is defined as
$$ Q: \left(\begin{array}{c} p_{1} - x_{1}^{\prime}\\ p_{2} - x_{2}^{\prime}\\ p_{3} - x_{3}^{\prime} \end{array}\right) \cdot \left(\begin{array}{cc} n_{1}\\ n_{2}\\ 1 \end{array}\right) = \left(\begin{array}{c} p_{1} - x_{1}^{\prime}\\ p_{2} - x_{2}^{\prime}\\ p_{3} - x_{3}^{\prime} \end{array}\right) \cdot \left(\begin{array}{cc} \hat{y}_{1}^{\prime}\\ \hat{y}_{2}^{\prime}\\ 1 \end{array}\right) = 0 $$
(24)
where p=(p1,p2,p3)T are points on the plane.
The line l through the origin of {C′} and \(\hat {\mathbf {y}^{\prime }}\) is given by
$$ l: \mathrm{C'}_{o} + \lambda \left(\begin{array}{cc} \hat{y}_{1}^{\prime}\\ \hat{y}_{2}^{\prime}\\ 1 \end{array}\right) = \lambda \left(\begin{array}{cc} \hat{y}_{1}^{\prime}\\ \hat{y}_{2}^{\prime}\\ 1 \end{array}\right). $$
(25)
To get \(\hat {\hat {\mathbf {X}}}^{\prime }\), we need to find λ such that
$$ \left(\begin{array}{cc} \lambda\hat{y}_{1}^{\prime} - x_{1}^{\prime}\\ \lambda\hat{y}_{2}^{\prime} - x_{2}^{\prime}\\ \lambda~1 - x_{3}^{\prime} \end{array}\right) \cdot \left(\begin{array}{cc} \hat{y}_{1}^{\prime}\\ \hat{y}_{2}^{\prime}\\ 1 \end{array}\right) = 0 \Leftrightarrow \lambda \left\lVert\left(\begin{array}{cc} \hat{y}_{1}^{\prime}\\ \hat{y}_{2}^{\prime}\\ 1 \end{array}\right)\right\rVert^{2} = \left(\begin{array}{cc} x_{1}^{\prime}\\ x_{2}^{\prime}\\ x_{3}^{\prime}\\ \end{array}\right) \cdot \left(\begin{array}{cc} \hat{y}_{1}^{\prime}\\ \hat{y}_{2}^{\prime}\\ 1 \end{array}\right) \Leftrightarrow \lambda = \frac{\mathbf{X^{\prime}} \cdot \hat{\mathbf{Y}^{\prime}} }{\left\lVert\hat{\mathbf{Y}^{\prime}}\right\rVert^{2}} $$
(26)
with \(\hat {\mathbf {Y}^{\prime }} = ({\hat {y}_{1}^{\prime }}, {\hat {y}_{2}^{\prime }}, 1)^{T}\).
The coordinates of \(\hat {\hat {\mathbf {X}}}^{\prime }\) in the coordinate system of {C′} are thus given by
$$ \hat{\hat{\mathbf{X}}}^{\prime}=\lambda \hat{\mathbf{Y}^{\prime}} = \frac{\mathbf{X^{\prime}} \cdot \hat{\mathbf{Y}^{\prime}}}{\left\lVert\hat{\mathbf{Y}^{\prime}}\right\rVert^{2}}\hat{\mathbf{Y}^{\prime}}. $$
(27)
Error curves for varying depth estimates x can then be constructed. We also introduce an upper error bound to assess ε. As the resulting augmentation needs to be as close as possible to the center of the true source location (i.e., a lymph node, diameter ≈ 5 mm), we do not tolerate augmentation errors larger than 5 mm (Fig. 7). Furthermore, as the error can only be properly evaluated during calibration, the curves need to be read as an expected augmentation error with respect to the estimated (unknown) depth.
Hardware
The design and layout of our multi-pinhole collimator are shown in Figs. 8 and 9. This collimator is a tungsten-based device with a specific field of view, given the width and length of its pinhole compartments (Fig. 18). Each such evaluated compartment of a camera/pinhole pair is marked on the gamma sensor image in the “Results” section accordingly. The thickness of the front plate (1 mm) and the length of the septa (compartment walls, 35 mm) are calculated such that the probability of background photons to penetrate the shields is at most 5%. The dimensions of the collimator frame are 86 mm in width, 36 mm in height, and 37 mm in depth. Its weight is 300 g.
In this study, we used endoscopic cameras (model NanEye, AMS AG, Premstätten, Austria) that measure 1 mm × 1 mm × 1.7 mm in width, depth, and height, respectively. Their resolution is 250×250 pixels with a pixel size of 3 μm×3 μm, and thus, an aspect ratio of 1:1. The effective focal length is 660 μm. The built-in optics are wide-angle lenses with an f-number of 2.7, an aperture of 244 μm, and an optimal focus range of 8–75 mm. The cameras are mounted and fixed on an 3D printed frame (ABS) matching the pinholes of the collimator. Thanks to this mounting, the cameras are mechanically constrained such that their lateral movement cannot exceed 0.5 mm and the distance to the pinholes is kept at a maximum of 3.0 mm (cf. Fig. 4). As the optical cameras are exposed to high energy photons, a deteriorate effect might be expected. However, we did not observe any negative impact on the performance of the sensors over the course of our experiments.
Our industrial collaborator DECTRIS (Baden-Dättwil, Switzerland) provided us with a 2D gamma detector prototype with a native resolution of 487×195 pixels and a pixel size of 172 μm×172 μm. The detector technology of DECTRIS is based on Hybrid Photon Counting (HPC) and cadmium telluride (CdTe) sensor material [12]. Its quantum efficiency (QE) at 140.5 keV is 31% [13]. A high QE is crucial as only ≈ 1% or less of the injected tracer activity arrives in the lymphatics. Its dimensions are 110 mm in width, 83 mm in height, and 109 mm in depth. Its weight is 1494 g.
The overall dimensions of the combined device (collimator, gamma detector) are 110 mm in width, 83 mm in height, and 126 mm in depth.
For the experiments (cf. “Results” section), the cameras and pinholes are initially co-calibrated at a distance of 110 mm to yield T acc. to Eq. (23), using a checkerboard with tiles 6 mm × 6 mm for automatic pose estimation (“Calibration” section). Each gamma activity blob is pre-processed using a truncated Gaussian blur filter (σ=2, kernel size 3×3) and a minimum threshold (both operator adjustable parameters) to improve the visualization. This step is needed as the background signal from unwanted gamma photons (scatterers) is spread over the sensor area and impairs the augmentation.