Reliable classification of facial phenotypic variation in craniofacial microsomia: a comparison of physical exam and photographs
Head & Face Medicine volume 12, Article number: 14 (2016)
Craniofacial microsomia is a common congenital condition for which children receive longitudinal, multidisciplinary team care. However, little is known about the etiology of craniofacial microsomia and few outcome studies have been published. In order to facilitate large, multicenter studies in craniofacial microsomia, we assessed the reliability of phenotypic classification based on photographs by comparison with direct physical examination.
Thirty-nine children with craniofacial microsomia underwent a physical examination and photographs according to a standardized protocol. Three clinicians completed ratings during the physical examination and, at least a month later, using respective photographs for each participant. We used descriptive statistics for participant characteristics and intraclass correlation coefficients (ICCs) to assess reliability.
The agreement between ratings on photographs and physical exam was greater than 80 % for all 15 categories included in the analysis. The ICC estimates were higher than 0.6 for most features. Features with the highest ICC included: presence of epibulbar dermoids, ear abnormalities, and colobomas (ICC 0.85, 0.81, and 0.80, respectively). Orbital size, presence of pits, tongue abnormalities, and strabismus had the lowest ICC, values (0.17 or less). There was not a strong tendency for either type of rating, physical exam or photograph, to be more likely to designate a feature as abnormal. The agreement between photographs and physical exam regarding the presence of a prior surgery was greater than 90 % for most features.
Our results suggest that categorization of facial phenotype in children with CFM based on photographs is reliable relative to physical examination for most facial features.
Craniofacial microsomia (CFM) is a congenital condition occurring in 1 in 3000 to 1 in 5000 live births [1, 2] and it is the second most common congenital facial condition after cleft lip and palate [2–4]. CFM variably affects derivatives of the first and second pharyngeal arches [5, 6] thereby creating a wide spectrum of phenotypic severity. Established diagnostic criteria for CFM do not exist, and the etiology of CFM is unknown for most patients. The combination of a lack of knowledge about the etiology and the wide variability in clinical presentation of CFM has hampered our ability to evaluate immediate and long term treatment outcomes; thus, we lack sufficient evidence to establish standardized treatment protocols. The published literature on CFM has primarily been limited to clinical reviews, case series, and reports of clinical classification systems. Due to the heterogeneous nature of this condition, multicenter studies are required for sufficiently powered analyses comparing treatment outcomes within phenotypic subgroups.
There are numerous inherent challenges in conducting a multicenter research study in CFM, not least of which is the need to establish a standardized phenotypic assessment of study participants. An accurate assessment of craniofacial features is imperative for studies reliant on meaningful comparisons among individuals with CFM. Historically, a direct physical exam has been considered the gold standard for such an appraisal. Performing reliable direct physical examinations can be challenging by virtue of the distances between study sites, variation in classification between clinicians, challenges inherent in completing a thorough exam in person (such as impact on child and ability to dedicate/coordinate time between participant and qualified rater). However, photographs are relatively easy to obtain, are often included in clinical visits, and have been incorporated into studies in similar craniofacial conditions [7–9]. Therefore, we sought to compare assessments on photographs and physical examination.
Classification systems such as Pruzansky , SAT , OMENS , and OMENS+  can facilitate standardized coding of specific features by one or more rater. Some studies have used such systems on images of individuals with CFM obtained using 2- and 3-dimensional imaging [14–16]. However, to our knowledge, a comparison of classification based on facial photographs to those carried out through an in-person physical examination has not been published.
The purpose of the current study is to measure the reliability of classification of the common facial features associated with CFM based on an in-person facial examination compared with an assessment based on a standardized set of two-dimensional images.
We enrolled children ages 0–21 years who met the research eligibility criteria previously established by the Facial Asymmetry for Interdisciplinary Assessment and Learning (FACIAL) network (Table 1) and were consecutively evaluated in 2014 at a single tertiary care craniofacial center. Study procedures included an interview to collect demographic and clinical history, 16 standardized photos  (Fig. 1) and in-person facial exam.
In-person exams and analysis of photos were sequentially completed by at least one of three clinicians [two pediatricians (CH, KE), and a plastic and reconstructive surgeon (CB)], each of whom has over 5 years experience providing craniofacial care. Clinicians were asked to rate features on each side of the face. In-person exams were performed during the clinic visits, and clinicians recorded the phenotype directly onto an enhanced paper pictorial OMENS data collection form  (Fig. 2). Additional features were added and modifications were made in order to increase the likelihood of complete data collection (image not shown).
The photographic assessment was performed at least 30 days following the in-person evaluation, using the 16 standardized photos taken on the day of the in-person exam.
Our photographer cropped the photos and created high resolution PDF files (Fig. 1) that included the standardized views and seven additional images to allow for quick assessment of detail of the eyes, ears, and tongue according to our previously published protocol . Raters viewed the contact sheets on computers with large monitors and were able to zoom in as needed to assess each feature. Clinicians completed ratings on photographs for each individual they examined in person and used the same OMENS data collection template, which included all variables present in the OMENS form, except for cleft palate and bifid uvula as our photo protocol did not include intraoral images. In addition to phenotypic ratings, clinicians were asked to provide descriptions of the subjective experience collecting these data through both modalities.
All data were entered into an Access database by the same individual. All study procedures were approved by our Institutional Review Board (IRB Study #: 14853).
Descriptive statistics were generated using Stata® (StataCorp, LP College Station, TX, USA).
We used a dataset that combined the 39 left and 39 right sides to create 78 possible ratings for each feature. In addition, we combined data for some of the features related to the same pathological process, such as (1) any tags, (2) any pits, (3) eyelid or iris coloboma, and (4) any facial nerve palsy. We also combined the categories of soft tissue deficiency and mandibular deficiency given the inherent challenges of distinguishing between the causes of lower facial deficiency on photos.
The intraclass correlation coefficient (ICC) was used to measure correlation between ratings of individual features based on the physical exam and the photographic evaluation. ICCs were estimated by fitting random effects models using the ‘lmer’ function in R, with random effects for subject and for side (left or right) as a nested factor within subject. A main effect for side was not included because there was no indication of systematic differences between results for left and right sides. For combined features related to a common pathological process, a single model was fit to the combined data with a random effect to account for the individual features as well as side (left or right). The ICC estimate was calculated as the variance attributed to the random effects divided by the total variance (which also includes the error variance). A 95 % confidence interval for the ICC was calculated using the jackknifed estimate of the standard error. For some features, meaningful ICC estimates and/or confidence intervals could not be obtained because of an insufficient number of cases with positive findings.
We dichotomized the degrees of lower facial hypoplasia by establishing a cutoff for “affected” at a rating of two or greater (on a scale between 0 and 4) for either soft tissue and/or mandibular deficiency. Similarly, we considered a rating of zero or one of the occlusal plane or ear to be “unaffected” and ratings of two or greater were considered to be “affected”.
In order to account for the impact of prior surgery on the ability of raters to accurately identify features which may no longer reflect their presurgical state (e.g. mandibular advancement) or no longer be present (e.g. preauricular tags), raters were instructed to indicate “history of surgery” for any feature for which a surgical incision was identified in the region. If the rater could determine that the incision was directly related to a specific feature, such as a scar in the precise location where a participant likely had lateral cleft, then the rater would indicate “presence of lateral cleft, status post surgery”. However, raters were asked to record “unable to rate, status post surgery” for participants who had undergone ear reconstruction. In these cases, the pre-surgical type of microtia was recoded as “affected” to be greater than or equal to two.
Thirty-nine participants were enrolled into the study. The average age was 8.4 years (range 3 months-21 years), and approximately 33 % of children were less than 6 years of age. Most participants were white (46 %) and non-Hispanic (82 %). Participants had a wide range of clinical variability, which included: isolated microtia or anotia (n = 8), facial asymmetry with microtia (n = 15), facial asymmetry without microtia but with other features from the FACIAL inclusion criteria (n = 5); or at least two other features from the FACIAL inclusion criteria without facial asymmetry or microtia (n = 5) (Table 2).
Clinician subjective impressions
Overall, the raters found the 1-page format of the OMENS form to be useful and efficient for both the in-person and photo evaluations.
For the physical exam rating, raters noted that the length of time and attention to detail required to complete study ratings using the OMENS data collection form was greater than the time typically required for a clinical exam. Clinicians also noted that the recording of some features, such as the presence of small pits, were often not directly relevant to patient care, and added to the burden of physical examination for the provider and participant. They noted that it was difficult to fully assess all features in young, mobile toddlers, particularly those related to symmetry and nerve function. Raters commented on the difficulty ensuring that all data was collected before exiting the exam room; and frequently returned to complete missing data. They reported greater ease of assessment of soft tissue deficiency, mandibular hypoplasia, and ear canal patency with in person exam as compared to photos. Surgical scars were also easier to assess in person, particularly when combined with a parent and/or participant interview at the time of the exam.
The raters found the photos easy to assess when the acquisition protocol was followed. However, for younger children who were often not able to comply with the photo protocol, clinicians noted that many features could not be assessed on the incomplete image set. For example, children for whom an adequate frontal photo was not available routinely have missing data for mandible, soft tissue, and orbital placement.
Descriptive characteristics of the photographic and in-person ratings
Table 3 includes the distribution of phenotypic characteristics among all participants, as designated both from in-person physical exam as well as photographic ratings. Photographic ratings were more likely to have missing ratings for the tongue: 12 vs. 2, and cleft lip: 6 vs 0, for photographic and PE, respectively; however, orbital displacement was more likely to be missing on in-person ratings than for photographic ratings (5 vs. 0 missing ratings, respectively). Nerve palsies were more likely to be rated as present during in person exams and unable to rate on the photographic assessments. There was not a clear pattern of either type of rating, physical exam or photograph, being more likely to designate a feature as abnormal. The rates of surgery were very low for most features (less than 5 %). Features with the highest surgical rates included: tags (28 %) and ear (6 %) (Table 3).
Reliability of phenotypic classification
The percent agreement between ratings on photographs and physical exam was greater than 80 % for all 15 categories included in the analysis. The ICCs were higher than 0.6 for most features; although the confidence intervals were wide. Features with the highest ICC’s included: dermoid, ear abnormalities, and colobomas (ICC’s 0.85, 0.81, and 0.80, respectively). Orbital size, presence of pits, tongue abnormalities, and strabismus had the lowest ICCs, with values less than 0.17. The percent agreement for surgical history by feature was greater than 90 % for most categories (Table 4).
Analysis by age less than or greater than 6 years showed similar results for the percent agreement and ICCs between the two subgroups. Ear canal was the only feature for which a meaningful difference was observed (data not shown). Upon further review of the underlying data, most discrepancies were attributable to misclassification with atresia rated as ‘unaffected’ on physical exam ratings, and the lack of concordance did not seem to be related to an effect of age.
Successful multi-center clinical research in depends on reliable classification of study data. Digital photography offers the opportunity to easily share facial phenotype data among centers around the world. But, until now, it has not been clear whether evaluation of the facial features of individuals with CFM based on photographs could replace ratings based on an in-person physical exam, which many clinicians consider to be the gold standard.
The percent agreement between ratings on physical examination and photographs were quite high for all features evaluated in this study. However, the ICC values for some features were relatively low, despite high agreement percentages. This discrepancy between high agreement and low ICC appears to be a result of the relatively low prevalence of many of the anomalies included in this study; for example, if there is only one positive finding for a feature by physical exam in 100 cases and this feature is not identified on photo assessments, then the agreement would be 99 % but the lack of agreement on the one positive case has a large negative influence on the ICC. Some of our ICC estimates were impacted by low prevalence, such as epibulbar dermoids (n = 4, based on physical examination). In addition, the ICC is also affected by the number of cases with both types of ratings available. Confidence intervals are also relatively wide in this study because of the limited number of positive findings for some features.
Advantages of photos for evaluating craniofacial characteristics
Some features were considered to be easier to assess on photos. Our researchers found it easier to accurately classify degrees of microtia using photos compared to in-person exams. It may be that examiners felt more comfortable taking the time to analyze the features in more detail on photos rather than in a clinical setting.
Our study also found advantages of photographic assessment with regards to time limitations encountered in the clinical setting. It is difficult to set aside more than a few minutes for a focused clinical exam in a busy outpatient clinic. These pressures may create a rushed environment and it may be difficult to return to a component of the exam for re-evaluation if the provider was unsure of the phenotypic findings. With photos, on the other hand, these time limitations are lifted and flexibility for accurately completing the ratings is improved. Acquisition of the photos takes only a few minutes of the patient’s time. Most importantly, the provider does not need to be present. Once the photos are acquired, the reviewer can take as much time as needed to accurately diagnose the phenotype, and do so without the pressures inherent in the clinical environment.
Additionally, phenotypic assessment from the photographic protocol relinquishes some of the awkwardness that can be inherent in completing a comprehensive standardized phenotype protocol through an in-person evaluation. For instance, it can be uncomfortable for the patient, especially a teenager, to have a provider carefully analyze their facial differences all the while marking down their “anomalies” one by one on a score sheet. This seemed contradictory to our goal of focusing on the child or teen’s function, positive self-perception, and as a clinician rater, this felt contradictory to the goals of our practice.
Clinicians noted that the OMENS data collection form used for the study contained 78 features, many of which required detailed examination for accurate coding. Though physicians typically develop a preference for the order in which they complete the assessment, this often varies to accommodate the patient’s needs during the clinic visit. For example, systematic examinations are much easier to complete in a cooperative teenager than a mobile toddler. However, infants and children seem much more eager to sit for a camera, particularly with the help of an engaged photographer.
Although the form was designed to minimize the amount of missing data, the clinicians noted that it still required attention to detail to ensure all fields were complete. In future studies, we will create an electronic version of the rating form to facilitate real-time identification of missing data fields.
Finally, phenotypic assessment using the photographic protocol allows for centralized rating of images. Digital photos are easily shared amongst researchers at various sites who can remotely access the files from a central repository and eliminates the need to have an examiner at each site.
Advantages of in-person examination
Despite the aforementioned benefits of the photographic assessment, the in-person examination also has a number of advantages. The clinicians found the in-person examination to be more accurate for evaluating soft tissue deficiency and mandibular hypoplasia. Both features are best appreciated through three dimensional assessment, and palpation. Lighting and shadows can affect the degree of perceived asymmetry based on images and the in-person exam allows the patient to move so the examiner can assess the quality and quantity of tissue from different angles. Moreover, both soft tissue and mandibular asymmetry can be enhanced or mitigated with animation. The in-person examination allows the evaluator to interact with the patient and ask them to move their face or jaw in various ways to better assess the degree of asymmetry.
The photographic protocol allows for assessment of dynamic facial nerve function by capturing a series of images of facial expressions; however, the degree of strength or weakness of each facial nerve branch is optimally classified during an in-person examination. In addition, the presence of synkinesis can be difficult to diagnose on photos alone. For individuals with atypical expressions on the photo protocol, it may be difficult for the rater to differentiate between synkinesis, facial palsy, and lack of adherence to the photo protocol. Augmenting the current photographic protocol with video capture of an individual may improve the assessments of facial asymmetry and facial movement and allow for more complete assessment of CFM features.
Finally, there are aspects of the facial phenotype in CFM that are not captured by the photographic protocol used in this study. We did not attempt to obtain intra-oral images of the soft palate and images with adequate angles to assess the external auditory canal were difficult to obtain. Future studies requiring accurate data on such features could incorporate intraoral or intra-aural photographs into the current photo protocol to improve accuracy of the phenotypic assessment of these areas.
This study focused on the reliability of the OMENS tool, and on specific craniofacial features that would be relevant to multicenter research in CFM. Our goal was to determine whether phenotypic classification based on ratings from facial photos would be reliable for evaluation of cohort differences for future research, and not intended to be used for individual patient care nor surgical planning. We have previously evaluated the reliability of the OMENS rating scale using 2D vs. 3D images . We did not test the reliability of in-person exams due to our desire to limit the burden on the study participants.
Confidence intervals for ICC were wide due to the limited sample size, which was typical for a reliability study. We also recognize that the phenotypic variability is high in CFM and not entirely represented by this cohort. Estimates of ICC could be different in other populations with a higher or lower prevalence of these features and for different age groups, although no clear differences were observed based on this age in this study.
In the next phases of this work, we plan to develop resources to facilitate electronic data capture to allow for real-time identification of missing data and to minimize the likelihood of errors in data entry. We will also continue to optimize the protocol for completing ratings. We have summarized some considerations to enhance the reliability for raters using the modified OMENS form in Table 5. We will also continue to explore various combinations of two-dimensional photos, three-dimensional surface images, video, and physical examination to capture more comprehensive data for specific research questions designed to improve craniofacial care for children with CFM.
Written informed consent was obtained from the participant’s parent for the publication of this report and any accompanying images.
Use of the Phenotypic Assessment Tool for Craniofacial Microsomia (PAT-CFM), which combines a pictorial OMENS tool and a standardized set of 2-dimensional photographs, is equivalent to in-person exam for most phenotypic features and better for some aspects of assessment of patients with CFM. Thus, the PAT-CFM can be used for multi-center research studies in CFM in lieu of in patient exams.
facial asymmetry collaborative for interdisciplinary assessment and learning
classification system for orbital anomalies in size and position, mandibular hypoplasia, ear malformations (microtia), facial nerve palsy, and facial soft tissue deficiency
Poswillo D. The aetiology and pathogenesis of craniofacial deformity. Development. 1988;103(Suppl):207–12.
Grabb WC. The first and second branchial arch syndrome. Plast Reconstr Surg. 1965;36:485–508.
Bennun RD, Mulliken JB, Kaban LB, Murray JE. Microtia: a microform of hemifacial microsomia. Plast Reconstr Surg. 1985;76:859–65.
Rollnick BR, Kaye CI. Hemifacial microsomia and variants: pedigree data. Am J Med Genet. 1983;15:233–53. doi:10.1002/ajmg.1320150207.
Stark RB, Saunders DE. The first branchial syndrome. The oral-mandibular-auricular syndrome. Plast Reconstr Surg Transplant Bull. 1962;29:229–39.
Converse JM, Coccaro PJ, Becker M, Wood-Smith D. On hemifacial microsomia. The first and second branchial arch syndrome. Plast Reconstr Surg. 1973;51:268–79.
Sabitha S, Veerabahu M, Vikraman B. Esthetic evaluation of the treated unilateral cleft lip using photographs and image analysis software: a retrospective study. J Maxillofac Oral Surg. 2011;10:225–9. doi:10.1007/s12663-011-0238-5.
Schaaf H, Wilbrand JF, Boedeker RH, Howaldt HP. Accuracy of photographic assessment compared with standard anthropometric measurements in nonsynostotic cranial deformities. Cleft Palate Craniofac J. 2010;47:447–53. doi:10.1597/09-026.
Ort R et al. The Reliability of a Three-Dimensional Photo System- (3dMDface-) based evaluation of the face in cleft lip infants. Plast Surg Int. 2012;2012:138090. doi:10.1155/2012/138090.
Pruzansky S. Not all dwarfed mandibles are alike. Birth Defects. 1969;5:120.
David DJ, Mahatumarat C, Cooter RD. Hemifacial microsomia: a multisystem classification. Plast Reconstr Surg. 1987;80:525–35.
Vento AR, LaBrie RA, Mulliken JB. The O.M.E.N.S. classification of hemifacial microsomia. Cleft Palate Craniofac J. 1991;28:68–76. doi:10.1597/1545-1569(1991)028<0068:TOMENS>2.3.CO;2. discussion 77.
Horgan JE, Padwa BL, LaBrie RA, Mulliken JB. OMENS-Plus: analysis of craniofacial and extracraniofacial anomalies in hemifacial microsomia. Cleft Palate Craniofac J. 1995;32:405–12. doi:10.1597/1545-1569(1995)032<0405:OPAOCA>2.3.CO;2.
Birgfeld CB et al. A phenotypic assessment tool for craniofacial microsomia. Plast Reconstr Surg. 2011;127:313–20. doi:10.1097/PRS.0b013e3181f95d15.
Birgfeld CB et al. Comparison of two-dimensional and three-dimensional images for phenotypic assessment of craniofacial microsomia. Cleft Palate Craniofac J. 2013;50:305–14. doi:10.1597/11-173.
Heike CL et al. Photographic protocol for image acquisition in craniofacial microsomia. Head Face Med. 2011;7:25. doi:10.1186/1746-160X-7-25.
Gougoutas AJ, Singh DJ, Low DW, Bartlett SP. Hemifacial microsomia: clinical features and pictographic representations of the OMENS classification system. Plast Reconstr Surg. 2007;120:112e–20e. doi:10.1097/01.prs.0000287383.35963.5e.
The authors would like to thank the participants and their parents. We appreciate the contributions of additional study team members: Erik Stuhaug (medical photographer), Linda Peters and Laura Stueckle (clinical research assistants). This work was supported by the following grants: Academic Enrichment Fund Program from Seattle Children’s Hospital and discretionary support from Seattle Children’s Research Institute for design and conduct of the study; collection, management, analysis, and interpretation of the data, and Institute of Translational Health Science (ITHS) grant UL1TR000423 from NCRR/NIH for management of the data using REDCap.
The authors declare that they have no competing interests. The authors of this work do not have any financial disclosures or commercial associations with any imaging device/company that might pose or create a conflict of interest with the information in this manuscript.
CH, CB, DL, and BS conceptualized the paper. All authors contributed to content of the manuscript and development of the protocol. All authors drafted and edited the manuscript. All authors have read and approved the final manuscript.
About this article
Cite this article
Birgfeld, C.B., Heike, C.L., Saltzman, B.S. et al. Reliable classification of facial phenotypic variation in craniofacial microsomia: a comparison of physical exam and photographs. Head Face Med 12, 14 (2016). https://doi.org/10.1186/s13005-016-0109-x