Measurement error and reliability of three available 3D superimposition methods in growing patients

Introduction Cone-Beam Computed Tomography (CBCT) images can be superimposed, allowing three-dimensional (3D) evaluation of craniofacial growth/treatment effects. Limitations of 3D superimposition techniques are related to imaging quality, software/hardware performance, reference areas chosen, and landmark points/volumes identification errors. The aims of this research are to determine/compare the intra-rater reliability generated by three 3D superimposition methods using CBCT images, and compare the changes observed in treated cases by these methods. Methods Thirty-six growing individuals (11–14 years old) were selected from patients that received orthodontic treatment. Before and after treatment (average 24 months apart) CBCTs were analyzed using three superimposition methods. The superimposed scans with the two voxel-based methods were used to construct surface models and quantify differences using SlicerCMF software, while distances in the landmark-derived method were calculated using Excel. 3D linear measurements of the models superimposed with each method were then compared. Results Repeated measurements with each method separately presented good to excellent intraclass correlation coefficient (ICC ≥ 0.825). ICC values were the lowest when comparing the landmark-based method and both voxel-based methods. Moderate to excellent agreement was observed when comparing the voxel-based methods against each other. The landmark-based method generated the highest measurement error. Conclusions Findings indicate good to excellent intra-examiner reliability of the three 3D superimposition methods when assessed individually. However, when assessing reliability among the three methods, ICC demonstrated less powerful agreement. The measurements with two of the three methods (CMFreg/Slicer and Dolphin) showed similar mean differences; however, the accuracy of the results could not be determined.


Introduction
Monitoring treatment progress and outcomes is pivotal to patient care [1]. Therefore, an important part of orthodontic treatment involves the study of longitudinal changes induced by growth and treatment in the dentofacial complex in individual patients [2][3][4][5]. Superimposing tracings of serial lateral cephalograms has facilitated knowledge about normal craniofacial growth and development as well as knowledge about the treatment effects produced by various orthodontic, orthopedic, and surgical procedures [3,6]. A reference system is required for a superimposition to be able to determine exactly what and where changes occurred. Such references must be consistently visible in the cephalograms of the individual, and they must be stable within the time frame of the observation period [3,7].
Several studies [8][9][10][11][12][13][14] have proposed the use of the anterior cranial base as reference for superimposition since there is little or no growth after 7-8 years of age when the spheno-ethmoidal synchondrosis ceases to grow. After that time a number of structures especially those associated with neural tissues remain stable and can be relied upon for superimposition [1].
Many types of superimposing methods have been used for 2D lateral cephalograms. However, 2D imaging does not fully represent a 3D structure, because much of the information is lost when 3D structures are depicted as 2D images [15][16][17]. Thus, while 2D cephalometric superimposition is the conventional method used to evaluate craniofacial growth and treatment outcomes, superimposition of CBCT scans, nowadays, allows a 3D visualization of these effects. Similar to cephalometric tracings, 3D models constructed from CBCT scans can be superimposed manually by registering common stable landmarks or by best fit of stable anatomical regions [18][19][20].
Three general methods of 3D cephalometric superimposition are well-published and used for clinical diagnosis and assessment of orthodontic treatment outcomes: (1) voxel-based, (2) point/landmark-based, and (3) surface-based. For overall superimposition, these methods use parts of the anterior cranial base, as a reference structure for CBCT superimposition, a structure known to have completed most of its growth before the adolescent growth spurt, therefore making it a quite stable reference structure for superimposition [14,21].
Most of the limitations of 3D superimposition techniques are related to variability in imaging and landmark identification flaws and software/hardware related errors. In addition, most of the methods that have currently been proposed [22][23][24][25] for clinical settings are quite time-consuming. Thus, the establishment of a precise, reliable and efficient system to analyze images produced by 3D imaging is needed. Therefore, this study analyzed two voxel-based [CMFreg (Craniomaxillofacial registration) and Dolphin] and one point/landmarkbased (LMD) superimposition methods. The voxel-based and the landmark-based methods have been previously validated, hence, this study evaluated and verified the reliability to measurement errors of the three methods when aligning the pre and post-growth/treatment images to provide clinicians with information about the reproducibility of the structural changes produced by growth and treatment effects in children and adolescents.

Material and methods
A retrospective, observational longitudinal study was carried out on individuals that received comprehensive orthodontic treatment at the University of Alberta. Thirty-six patients with available pre-and posttreatment CBCTs were selected from a population of 11 to 14-year-old teenagers. The mean age of patients at the time of the initial CBCT was 12.4 ± 0.9 years (Cervical Vertebrae Maturation index [CVM] stage 3-4).
The mean age at final CBCT was 14.3 ± 0.8 years. The sample included seventeen males and nineteen females.
The interval between pre-treatment (T1) and posttreatment (T2) ranged from 22 to 25 months apart. Fourteen patients presented Class I malocclusion, eight mild Class II malocclusion and fourteen mild Class III malocclusion. All patients received a non-extraction treatment and included rapid maxillary expansion, full fixed appliances, and intermaxillary elastics.
This study only analyzed previously gathered data from patients that participated in randomized clinical trials. No additional imaging was requested for these patients. Ethics approval was obtained by the Institutional Health Research Ethics Board at the University of Alberta for secondary data analysis.
CBCT volumetric data were taken using the iCAT New Generation Volumetric Scanner at 120 kV, 5 mA, and 8.9 s. Images were obtained and converted to Digital Imaging and Communications in Medicine (DICOM) format using the iCAT software with a voxel size of 0.3 mm.
Analysis of the images was carried out by one researcher using the respective superimposition techniques (CMFreg/Slicer, Dolphin and landmark-derived). Extensive training was required prior to superimposing with each method. Intra-observer reliability within each method was done using ten images and two repetitions each, with each measurement trial being at least 1 week apart. For the voxel-based methods reliability was tested twice, ten cases each, one performing a second superimposition with registration at the cranial base and one retracing landmarks only.
Reliability among the three methods was performed using the complete sample; the first trial of thirty-six cases of each method was used. Ten landmarks, used in Table 1 Landmark definition Maxilla

ANS
The tip of the bony anterior nasal spine, in the median plane

A-Point
The point at the deepest midline concavity on the maxilla between the anterior nasal spine and prosthion previous studies [7,23,[26][27][28][29][30], were marked on threedimensional images at T1 and T2 with each of the three methods to assess reliability (Table 1).

Voxel-based CMFreg/slicer method
This method uses two different open-source programs ITK-Snap (http://www.itksnap.org) and 3D Slicer (http://www.slicer.org). Using ITK-Snap software program (version 2.0.0) T1 and T2 DICOM files were opened and converted to GIPL (Guys Imaging Processing Lab) format for easy processing. Segmentations then were created using the GIPL.GZ files for both pre and post treatment scans using the 3D Slicer software program (version 4.7.0) to construct 3D volumetric label maps. Then, surface models were created using the T1 segmentation in 3D Slicer to re-orient the head to establish a common coordinate system across subjects for group comparisons [31]. Once the head orientation step was completed, the T2 image was manually approximated in relation to T1 image using 3D Slicer. ITK-Snap was used to segment the area of the cranial base to be used as a reference for the superimposition using semi-automatic segmentation.
The registration (superimposition) of the T2 image upon the T1 image was carried out on the segmented cranial base, using the craniomaxillofacial tool and the setting growing rigid automatic registration in 3D Slicer. Flow Diagram CMFreg/slicer Method. This method uses two different programs ITK-Snap and 3D Slicer. T1 and T2 DICOM files are initially opened and converted to GIPL using ITK-Snap. Segmentations then are created using the GIPL.GZ files for both pre and post treatment scans using the 3D Slicer to construct 3D volumetric label maps. Surface models are created after using the T1 scan and segmentation in 3D Slicer to re-orient the head [1]. Once the T1 scan has been reoriented, the registration (superimposition) of the T2 image upon the T1 image is carried out on the segmented cranial base. Then T1 and T2 images are landmarked using ITK-Snap and new models are created to measure the absolute differences between the pre and post-treatment images  During the superimposition, T2 was reoriented guided by the best fit of the outlines of the anterior cranial base and automatically superimposed on a static T1, creating a registered T2 surface model. Once the superimposition was completed, the T1 scan and segmentation, as well as the registered T2 scan and segmentation, were landmarked using ITK-Snap. Ten 3D landmarks were identified using the three views (axial, sagittal and coronal) for consistency of landmark location. After placing the defined landmarks to T1 and T2 images, 3D surface models were created using 3D Slicer. These models were utilized to measure the absolute differences between the pre and post-treatment images by applying the Q3DC module (Quantification in 3D and directional changes in each plane of the three planes of space). 3D linear distances between T1 and T2 of corresponding landmarks were quantified in the transversal (x-axis), antero-posterior (y-axis) and vertical (z-axis) direction (Figs. 1, 2, 3, and 4).

Landmark-derived method
Using AVIZO software, the DICOM files were rendered into a volumetric image using 512 × 512 matrices giving a range of 400-420 DICOM slices. Sagittal, axial and coronal multiplanar slices, as well as the 3D image reconstructions, were used to determine the position of the seven landmarks used to superimpose the T1 and T2 images.
Given the coordinates of three reference landmarks for a plane, 3D visualization software can compute the plane; however, entering the three-point coordinates usually is a time-consuming repetitive manual process. A similar argument applies to determine the perpendicular distance. In order to resolve this issue, this study reproduced the mathematic procedure in Microsoft Excel. This allowed the reference planes and perpendicular distances to be automatically calculated whenever the landmark coordinates were updated. Using AVIZO software, sagittal, axial and coronal multiplanar slices, as well as the 3D image reconstructions, were used to determine the position of the seven landmarks used to superimpose the T1 and T2 images; left and right auditory external meatus, left and right foramen spinosum, left and right foramen ovale and dorsum foramen magnum; as well as the ten landmarks use to assess reliability and measurement error. Once data was optimized in Matlab, linear distances between the 3D coordinates were calculated using the Euclidean distance formula in Excel Four landmarks were required to define a 3D anatomical reference co-ordinate system. The left and right external auditory meatus (EAML and EAMR, respectively) and the dorsum foramen magnum (DFM) were selected as suggested by previous research. The fourth point, ELSA, defined as the midpoint between the left and right foramen spinosum [32] was selected as the origin of the new Cartesian co-ordinate system. From the origin, 3D positional co-ordinates for the EAML, EAMR and DFM were determined [7].
The optimization formulation used in this study was the 6-point algorithm, that not only optimizes the location of the same three points (i.e. EAML, EAMR and DFM) as used in the 4-point algorithm but also includes both foramen ovale [right and left (FOR and FOL)] in each image [33,34]. The addition of two extra landmarks (FOR and FOL) in the optimization analysis was shown to reduce the envelope of error when determining the co-ordinate system [7]. Once data was optimized, linear distances between the 3D coordinates were calculated using the Euclidean distance formula. Each landmark was included in multiple linear measurements of different orientations to be able to assess all dimensions (superior-inferior, anterior-posterior, right-left) (Figs. 5 and 6).

Voxel-baseddolphin method
For each patient, T1 and T2 CBCT images were approximated using four landmarks located at the right and left frontozygomatic sutures and the right and left mental foramen and superimposed on the cranial base using voxel-based superimposition tool in Dolphin 3D (Chatsworth, CA -version 11.8.06.15 premium). The area of the cranial base used for superimposition was defined by a red box in the three different multiplanar views (axial, sagittal and coronal). The superimposition was achieved by moving the T2 image in relation to the T1 image creating a registered T2 image. No head orientation procedure was performed, as Dolphin software does not have the tool.
Then the slice views (axial, sagittal and coronal) were used to confirm the precision of Dolphin 3D superimposition. Once this step was completed, the registered post-treatment scans were exported as DICOM files and opened in ITK-Snap software to convert them into GIPL format similar to the procedure done with the CMFreg/ Slicer method. 3D slicer was then used to segment the whole skull using Intensity Segmenter tool, with the same intensity level for all cases to remove any potential error due to the segmentation process. Thus, a surface model of post-treatment segmentation was created for each particular patient. Then T1 and T2 images were ready for landmarking using ITK-Snap.
After placing the defined landmarks to pre and posttreatment images, 3D surface models were created using 3D Slicer for all the levels used in ITK-Snap. These models were utilized to measure the absolute differences

Statistical analysis
For all tests, the statistical significance was set at P-value of 0.05.

Intra-examiner reliability of 3D superimposition per method
Intraclass Correlation Coefficient (ICC) was used to measure the level of agreement between the two repeated measurements of 3D linear distances (difference between T2-T1) within each method by the principal investigator. Paired-sample T-test was performed to compare the means of corresponding measurements following the first and second superimpositions with registrations at the anterior cranial base and the first superimposition with registration at the cranial base and the landmark retracing Flow Diagram Dolphin Method. T1 and T2 CBCT images are approximated using 4 landmarks located at the right and left frontozygomatic sutures and the right and left mental foramen and superimposed on the cranial base. Then the slice views (axial, sagittal and coronal) are used to confirm the precision of Dolphin 3D superimposition. Once this step is completed, the registered post-treatment scans are exported as DICOM files and opened in ITK-Snap software to convert them into GIPL format. After placing the defined landmarks to pre and post-treatment images, 3D surface models were created using 3D Slicer. 3D linear distances between T1 and T2 of corresponding landmarks are then quantified and color-coded maps are created only for both voxel-based methods (CMFreg/Slicer and Dolphin).
Intra-examiner reliability of 3D superimposition among methods ICC was used to assess the level of agreement between the measurements of 3D linear distances (difference between T2-T1) among all the three methods. 3D changes in the craniofacial complex with each method were assessed by one-way repeated measures analysis of variance (ANOVA) followed by post-hoc analysis. Tables 2, 3 Intra-examiner reliability of 3D superimposition per method Voxel-based CMFreg/slicer method: first and second Cranial Base superimposition Using ten pre-determined 3D linear distances, good to excellent agreement for intra-examiner reliability was found on all skeletal landmarks as indicated by an ICC ≥ 0.904. All these ICC values were considered acceptable; however, lower bound of CI of two landmarks (APoint and OrR) were below 0.50 (Table 2).

A summary of results is presented in
Voxel-based CMFreg/slicer method: first Cranial Base superimposition and landmark retracing only Good to excellent agreement for intra-examiner reliability was found on all skeletal landmarks in the 3D measurements as indicated by an ICC ≥ 0.900. All lower bound of CI were above 0.50 (Table 2). Table 3 shows the differences between the first and second superimposition with registration at the anterior cranial base. Mean differences between both superimpositions were less than 0.67 mm. No statistically significant differences were found at any landmark (P-values > 0.05). Table 3 also shows the differences between the first superimposition with registration at the anterior cranial base and the landmark retracing. Mean differences between both trials were less than 0.74 mm. No statistically significant differences were found at any landmark (Pvalues > 0.05).

Landmark-derived method
Excellent agreement for intra-examiner reliability was found on eight skeletal landmarks in the 3D measurements as indicated by an ICC ≥ 0.913. OrL and PNS showed good and moderate intra-examiner reliability respectively, ICC ≥ 0.712. All these ICC values are considered acceptable; however, lower bound of CI of two landmarks (OrL and PNS) were below 0.50 (Table 4). Mean differences between the first and second superimpositions were as high as 1.168 mm. Statistically significant differences were found at five skeletal landmarks: PNS, OrL, Menton, BPoint, and GoL (Pvalues < 0.05) ( Table 5).

Voxel-based dolphin method: first and second Cranial Base superimposition
Excellent agreement for intra-examiner reliability was found on all skeletal landmarks in the 3D measurements as indicated by an ICC ≥ 0.905 (Table 6). Excellent agreement for the intra-examiner reliability was observed on all skeletal landmarks in the 3D measurements as indicated by an ICC ≥ 0.916, when only landmarks were retraced (Table 6). Table 7 shows the differences between the first and second superimposition with registration at the anterior cranial base. Mean differences between both superimpositions were less than 0.4 mm. No statistically significant differences were found at any skeletal landmark (P-values > 0.05). Table 7 also shows the differences between the first superimposition with registration at the anterior cranial base and the landmark retracing. Mean differences between both trials were less than 0.26 mm. No statistically significant differences were found at any skeletal landmark (P-values > 0.05).

Intra-examiner reliability of 3D superimposition among methods
Good agreement for the intra-examiner reliability was observed only at GoL, ICC = 0.759 when the three 3D superimposition methods were evaluated. Menton, BPoint and GoR showed moderate agreement as indicated by an ICC ≥ 0.549 (Table 8).
When assessing both voxel-based methods (CMFreg/ Slicer and Dolphin), excellent agreement for intraexaminer reliability was noted on four skeletal landmarks (Me, BPoint, GoR and Pg) in the 3D measurements as indicated by an ICC ≥ 0.904 (Table 8). However, when assessing the voxel-based CMFreg/Slicer and the Landmark-derived methods, moderate agreement was found only at GoL, ICC = 0.538. The rest of skeletal landmarks showed poor agreement as indicated by an ICC ≥ − 0.137 (Table 9). A similar trend was observed when assessing the voxel-based Dolphin and the Landmark-derived methods, moderate agreement for the intra-examiner reliability only at GoL, ICC = 0.717. The rest of the skeletal landmarks showed poor agreement as indicated by an ICC ≥ − 0.081 (Table 9).
The one-way repeated measurements ANOVA revealed evidence of a statistically significant difference between the mean of distances T2-T1 when comparing CMFreg/Slicer method to Landmark-derived method  and when comparing the Dolphin method to the Landmark-derived method in the overall 3D at all dependent variables (Table 10).

Discussion
Historically, cranial base superimposition of serial lateral cephalograms has provided clinicians with a visual assessment of overall hard and soft tissue changes resulting from treatment, either orthodontic, orthopedic or orthognathic surgery; and/or growth during a time frame. One of the major disadvantages of using a conventional cephalometric analysis is that 3D information is depicted as 2D data and often limited to midline structures. Improvements in image registration algorithms have led to the development of new methods for CBCT volume superimposition to overcome the issues faced with generated 2D images. The challenge of image registration is to superimpose CBCT volumes of patients with craniofacial changes due to the normal growth and/or treatment response at different time-points. In these situations, the different CBCT volumes may have dissimilar imaging acquisition, field of view, and dental/skeletal components modified by growth and/or treatments, making the registration process more difficult and prone to failure. Therefore, this study aimed to compare three commonly used 3D superimposition methods and determine if they can reliably be used to superimpose T1 and T2 CBCT images of growing patients registered at the anterior cranial base and if there is any difference among them.
The reliability of the three 3D superimposition methods was tested in this study by calculating the mean linear distances between the two models (T2-T1) at ten different anatomic regions. When the methods were analyzed individually, the ICC results showed good to excellent agreement for the intra-examiner reliability with CMFreg/Slicer and landmark-derived methods, and excellent intra-examiner reliability when CBCT images were superimposed with Dolphin method. The slightly higher agreement observed with the Dolphin method could just be a reflection of the examiner's expertise since this was the last method assessed. Similar although less powerful results were reported by Nada et al. [35], who tested the reproducibility of CBCT superimposition on the anterior cranial base and the zygomatic arches using voxel-based image registration of 3D CBCT scans from sixteen adult patients who underwent combined surgical orthodontic treatment. When the models were registered at the anterior cranial base, intra-observer reliability was reported to be moderate to good between the repeated superimpositions: the ICC ranged between 0.53 and 0.94 and the mean distances between the two models registered on the zygomatic arch remained within 0.5 mm. Likewise, Cevidanes et al. [22] studied the variability between observers in quantification of   treatment outcome only using color-coded distance maps for different anatomic regions on 3D CBCT models registered on the anterior cranial base using a voxel-method method. They reported an inter-examiner range of measurements across anatomic regions equal or less than 0.5 mm, which they considered to be clinically insignificant.
The reproducibility of the registration was also tested on both voxel-based (CMFreg/Slicer and Dolphin) methods. There were no evident differences found between the first and second cranial base registrations and the retracing landmarks only, as demonstrated by an excellent agreement for the intra-examiner reliability. In addition, paired t-tests showed no statistical significance with mean differences between both the superimposition and retracing landmarks only. Since differences ≤0.4 mm are not likely clinically significant, the registration process of CMFreg/Slicer and Dolphin methods can be considered clinically reproducible. These results are in agreement with the reports from Cevidanes et al., [22] who assessed cranial base superimposition in growing patients and Nguyen et al. [36] and Ruellas et al. [30] who tested regional superimpositions demonstrating a similar range in their findings.
On the other hand, when assessing reliability among the three methods, the ICC demonstrated less powerful agreement with a wide range of confidence interval. ICC values were the lowest when comparing the landmarkderived method and the voxel-based (CMFreg/Slicer and Dolphin) methods. Moderate to excellent agreement; however, was observed for the intra-examiner reliability when comparing the voxel-based methods against each other; even though the head orientation procedure was not performed with the Dolphin method. Ruellas et al. [31] have shown that the amount of directional change in each plane of 3D space is strongly influenced by head orientation, and the precise assessment of direction of change requires a common 3D coordinate system.    From the results of this study, the three 3D superimposition methods demonstrated an overall 3D change in the craniofacial complex during an average of 24 months of evaluation (mean age of 12.4 years -CVM 3-4 at initial records). Both voxel-based methods (CMFreg/Slicer and Dolphin) showed similar mean differences between T1 and T2 images with no statistical significance in their differences. On the other hand, the landmark-derived method exhibited mean differences as high as twice as the mean differences obtained with any of the voxelbased methods in the overall 3D assessment. When the methods assessed the changes at each landmark per components, eight skeletal landmarks (ANS, APoint, PNS, Menton, Bpoint, GoR, GoL and Pg) showed the highest variation in the superior-inferior component, with inferior direction, and two skeletal landmarks (OrR and OrL) in the antero-posterior component, with anterior drift. Similar to the overall 3D evaluation, the landmark-derived method exhibited the highest mean differences when assessed per component, being the superior-inferior component that demonstrated the most substantial variation (Appendices I -II).
According to the present study, the landmark-derived method generated magnified errors since the 3D linear distances were higher when compared to the other two methods in all the defined landmarks. Although the method showed moderate to excellent agreement for the intra-examiner reliability when assessed individually, poor to moderate agreement was observed when all the methods were evaluated simultaneously. These results contradict the findings from DeCesare [7] study, who reported a reduced envelope of error using the 6-point correction algorithm optimized analysis instead of the 4point when determining the co-ordinate system. Although, the landmark-derived registration method uses a number of landmarks as reference and they could be susceptible to landmark identification errors, reliability in landmark identification was determined to be adequate. Therefore, a potential reason for the reduced reliability and increased measurement error may be the lack of stability of the reference areas, as the landmarks used to superimpose the pre-and post-treatment images are located in the medial and posterior cranial base, which are known as unstable areas due to growth and remodeling that occurs during childhood and adolescence [1,14,37,38].
The magnitude of variation obtained with both voxelbased methods (CMFreg/Slicer and Dolphin) appears to be within the range of change observed by previous research [39][40][41][42][43][44][45]. However, as none of these methods are considered the gold standard for 3D superimpositionthe realistic validity standard to be compared to; the accuracy of the results cannot be determined. Therefore, it is unknown if the amount of change generated by the two voxel-based (CMFreg/Slicer and Dolphin) methods is closer to the real value or it is the landmark method the one that is closer to the truth. Nevertheless, it is a good start to know that two similar computing-based superimposition methods generated quite similar measurements (Table 11). In addition, as the included individuals had orthodontic treatment, it is not possible to verify if the amount of change seen at the specific landmarks in the maxilla and mandible was due to growth only, or it was a combination of growth and treatment effects. Consequently, even with the availability of 3D imaging, quantification of growth/treatment is still an area for research.

Limitations
The biggest limitation of this study is the lack of a gold standard (ground truth) for 3D superimposition. Thus, although two out of the three methods tested in this study showed very minor differences between them and the mean differences were not statistically significant, it is not possible to determine the accuracy of the results.
Another important limitation is the use of a single investigator and the significant learning curve that all of the three 3D superimposition methods used in this study required. CMFreg/Slicer method had the highest level of complexity among all the three methods and used two different software programs (3D Slicer and ITK-Snap) throughout the process. Although it includes systematic steps to obtain a high level of precision, it is highly timeconsuming. Dolphin method, on the other hand, is faster and user-friendlier, however, to quantify changes, scans are required to be loaded in ITKSnap for landmark placement and then measure using Q3DC tool in 3D Slicer. These additional steps increase the working time Appendix 1   and process complexity. The landmark-derived method appears to be simpler, since it only requires landmark placement similar as in a 2D cephalometric analysis, although in a 3D image. However, the software requires some expertise and it does not allow viewing the landmarks in all three planes at the same time, so the researcher requires to change planes continuously to check landmark position in all the different planes. The possible effect of the segmentation process, the different software programs used for the superimposition as well as the landmark identification are sources of measurement error in 3D radiographic imaging.
The surface model construction in CBCT is based on the voxel-based data. A threshold value specifies each structure whether it is bone or soft tissue. The threshold value and gray value entered by the operator in to the CBCT machine determines the image accuracy. Also, the CBCT imaging lacks beam homogeneity which means that the gray value of the voxels of the CBCT of the same individual at different time points differ [46,47].
The potential impact due to limited resolution of the CBCT data (0.3 mm) on the overall precision is not possible to quantify in this study as all three methods used the same data set. However, increasing imaging resolution and maintaining size of the scan would increase the radiation dose.
Finally, due to the lack of a control group differentiation between the treatment and normal growth changes was not possible.

Conclusions
Findings of the research indicate good to excellent intraexaminer reliability of the three 3D superimposition methods when assessed individually. However, when assessing reliability among the three methods, the ICC demonstrated less powerful agreement with a wide range of confidence interval. ICC values were the lowest when comparing the landmark-based method and the voxelbased (CMFreg/Slicer and Dolphin) methods. Moderate to excellent agreement was observed for the intraexaminer reliability when comparing the voxel-based methods against each other. Two of the three methods (CMFreg/Slicer and Dolphin) used in this study showed similar mean differences; however, the accuracy of the results could not be determined since none of them have been considered the gold standard for 3D superimposition in growing patients. The landmark-based method generated the highest measurement error among the three methods.