Postnatal Prediction of Gestational Age Using Newborn Fetal Hemoglobin Levels

Introduction In many parts of the developing world procurement of antenatal gestational age estimates is not possible, challenging provision of appropriate perinatal care. This study aimed to develop a model for postnatal gestational age estimation utilizing measures of the newborn hemoglobin levels and other metabolic analyte data derived from newborn blood spot samples. Methods We conducted a retrospective cohort analysis of 159,215 infants born January 2012–December 2014 in Ontario, Canada. Multivariable linear and logistic regression analyses were used to evaluate the precision of developed models. Results Models derived from a combination of hemoglobin ratios and birthweight were more precise at predicting gestational age (RMSE1·23 weeks) than models limited to birthweight (RMSE1·34). Models including birthweight, hemoglobin, TSH and 17-OHP levels were able to accurately estimate gestational age to ± 2 weeks in 95·3% of the cohort and discriminate ≤ 34 versus > 34 (c-statistic, 0·98). This model also performed well in small for gestational age infants (c-statistic, 0·998). Discussion The development of a point-of-care mechanism to allow widespread implementation of postnatal gestational age prediction tools that make use of hemoglobin or non-mass spectromietry-derived metabolites could serve areas where antenatal gestational age dating is not routinely available.


Introduction
Preterm birth affects over 15 million newborns each year and is the leading cause of neonatal mortality and morbidity worldwide, complications from which are the leading cause of neonatal mortality, and contributes to 40% of all deaths under the age of five (Lawn et al., 2012;Nour, 2012). The burden of preterm birth is particularly high in resource-poor settings where major risk factors including infection, inadequate nutrition, and poor socioeconomic circumstances are common (Beck et al., 2010). Knowledge of gestational age at the time of birth is critical for population level surveillance, to guide postnatal care by facilitating identification of infants with immediate high-resource needs and guiding developmental assessments (Dosman et al., 2012;DiPietro and Allen, 1991;Bonhoeffer et al., 2006). Differentiation of infants born by preterm birth versus those who are small for gestational age is important to further distinguish infant medical requirements. Unfortunately, in many low-resource environments limited access to prenatal ultrasound dating services and poor recall of self-reported menstrual histories impair accurate and timely gestational age assessment (Rijken et al., 2009;The Partnership for Maternal, Newborn and Child Health, 2006).
We and others have recently developed prediction models based on routinely collected newborn metabolic screening profiles that provide accurate estimates of gestational age (Jelliffe-Pawlowski et al., 2015;Ryckman et al., 2015;Wilson et al., 2016). Many newborn screening analytes used to identify rare metabolic conditions may only be reliably ascertained using tandem mass spectrometrytechnology requiring significant financial resources and technical expertise. Hemoglobin (Hb) screening for inherited blood disorders such as sickle cell disease EBioMedicine 15 (2017) [203][204][205][206][207][208][209] and β-thalassemia includes measurement of fetal (HbF) and adult (HbA) Hb levels. HbF is the primary protein for oxygen transport in the developing fetus. Hemoglobin production naturally shifts with advancing gestation from HbF to HbA such that HbF reserves are typically depleted by six months of age (Bank, 2006;Stamatoyannopoulos, 2005), and while residual amounts of HbF continue to be synthesized in adult erythropoiesis, the majority of adults have b 1% HbF (Thein et al., 2009). Contrary to the majority of metabolic analytes used in newborn screening programs which are measured by mass spectrometry, Hb may be measured using less technically demanding approaches including high performance liquid chromatography (HPLC) or gel electrophoresis (Association of public health laboratories, 2015; Clarke and Higgins, 2000).
Given the known relationship between HbF, HbA and gestational age, we sought to examine the effectiveness of Hb ratios in predicting gestational age at birth. The utility of Hb levels in a gestational age prediction model was compared to pre-existing prediction models incorporating newborn screening metabolites. We also evaluated models incorporating thyroid stimulating hormone and 17-hydroxyprogesterone, other non-mass spectrometry derived analytes.

Materials and Methods
A retrospective cohort study design was used to evaluate the precision of postnatal gestational age prediction models derived from fetal and adult Hb levels, other newborn screening analyte data and readily available perinatal characteristics obtained from infants born in Ontario, Canada. The study was approved by the Ottawa Hospital Science Network Research Ethics Board (20140724-01H) and the Children's Hospital of Eastern Ontario Research Ethics Board (15/143X).

The Better Outcomes Registry and Network (BORN)
An Ontario maternal child registry that includes a broad collection of prenatal and perinatal data obtained from clinics, hospitals, labs, and midwifery practice groups. As a secondary use, data within the BORN Information System (BIS) is available to researchers.

Newborn Screening Ontario (NSO)
Using heel-prick samples drawn from infants, usually within the first 72 h after birth, NSO screens virtually all infants born in the province for over 40 analytes to identify 29 rare conditions including metabolic disorders, endocrine disorders, hemoglobinopathies, immune deficiencies and other genetic disorders. Available screening analytes include markers of fatty acid oxidation, protein metabolism, endocrine function, immune function, and quantitative fetal and adult Hb levels. A summary of the newborn screening markers included in this study is provided in Table 1.
All live births captured in BIS between January 2012 and December 2014 were eligible for inclusion in the analysis. From this cohort, infants whose gestational age was determined by 1st trimester gestational dating ultrasound (from prenatal screening records), and who had complete newborn screening data were included. Infants who were positive for any of the conditions screened for by NSO were excluded, as were infants whose newborn screening samples were found to be of unsatisfactory quality by the screening laboratory. Finally, only infants whose screening samples were collected within 48 h of birth were included in the final analysis cohort, as the majority of infants born in low-resource settings are likely to be discharged from hospital within this time period.

Analysis
Four models were developed using the processes described below to assess predictive utility of HbA and HbF values alone, and in concert with other newborn screening analytes: (1) birthweight alone, (2) combination of birthweight and HbF and HbA levels (3) combination of birthweight, hemoglobin levels, TSH, and 17-OHP (all non mass-spectrometry derived analytes) and (4) full model including birthweight, and all newborn screening analytes including hemoglobin levels. Sex and multiple birth (yes, no) were included in all models. All analyses were conducted using SAS 9.4 and R v3.1.2.

Database Partition for Modelling
We used the same data partitioning strategy described previously (Wilson et al., 2016). In brief, the newborn cohort was divided into three dataset subsamples; one for model development, one to independently validate the choice of terms included in the final model, and one to independently test performance of the final model. Randomization was achieved using a stratified random sample approach, with stratification by sex and gestational age in weeks to ensure the same incidence of increasingly preterm birth was preserved in all subset data. Subsamples were generated using PROC SURVEYSELECT in SAS 9.4. Specifically, the newborn sample sets were partitioned according to a 2:1:1 ratio, distributing prematurity status (term, ≥37 weeks; near-term, 33-36 weeks; very preterm, 28-32 weeks, and extremely preterm, b28 weeks) and sex evenly to ensure balance across the 3 datasets. The final analytical dataset was partitioned as follows: model development (n = 79,620), validation (n = 39,785) and test (n = 39,810) samples.

Predictive Modelling
Predictive modelling was performed using a multivariable linear regression model of continuous gestational age in weeks versus newborn screening analytes, sex, multiple birth status (yes, no) and birthweight. Continuous analyte and birthweight values were modelled using restricted cubic splines with four knots placed at quintile cut-points; 20th, 40th, 60th and 80th percentiles. Fetal (F, F1) and adult (A) Hb levels were modelled as (F + F1)/(A + F + F1), referred to Hb ratio throughout the remainder of this article. For restricted models including only birth weight and/or Hb ratio, nine knots were placed at decile cut points.
A weighted regression approach was employed in order to compensate for the smaller sample size and thus contribution to parameter estimation of preterm infants. Infants with lower gestational ages were weighted more heavily in model development to ensure that the fit was driven by both term and preterm infants.
Model building was conducted using the model development sample. A forward step-wise variable selection procedure was conducted using the Swartz Bayesian Criterion to guide the selection of covariates retained in the final model. Pairwise interactions were evaluated as part of stepwise variable selection. For interactions to be included in the model, contributing main effects had to be in the model. When no Biotinidase; galactose-1-phosphate uridyltransferase (GALT); immunotripsinogen more terms could enter or leave the model, the stepwise procedure was terminated, and mean square error (MSE) was calculated by fitting the model from each iteration of the stepwise procedure to the independent validation data subset. The model generating the lowest MSE among all stepwise models was selected as the final model. Final model performance was then evaluated using the third test data subset, which had no role in model fitting or validation. This process provided maximum protection from over-fitting. The relative predictive power and precision of progressively more complex models were formally compared using both likelihood ratio tests (LRT), as well as performance metrics such as the MSE and AUC. Performance of models in sensitivity analyses were compared descriptively.

Model Performance for Classification as ≤ 34 weeks or N 34 weeks Gestational Age
In the current analysis, logistic regression models were also developed to distinguish between dichotomous categories of preterm birth (b37 versus ≥ 37 weeks; and ≤ 34 versus N 34 weeks). Thirty-seven weeks represents the distinction between pre-term and term birth. Thirty-four weeks gestational age is an important clinical threshold as it represents the lower limit of the late preterm period (Kugelman and Colin, 2013;Bakewell-Sachs, 2007). Predictors of gestational age identified in the multiple linear regressions were used as independent variables in logistic regressions. Logistic regressions were fit to the model development sample, and evaluated in the independent test dataset.

Sensitivity Analyses
Model performance in terms of root mean squared error (RMSE), absolute prediction within ± 1 week, c-statistic (area under receiver operator curve, AUC), and positive predictive value (PPV) was evaluated overall, and in small for gestational age infants (infants in the lowest decile of birthweight given gestational age, SGA10) as well as in those infants from multiple births to investigate whether model prediction varied in quality across these subgroups. Lastly, model performance was also compared in heterozygotic carriers of sickle (HbS) and other hemoglobinopathy alleles (HbC, D, E, F) versus non-carriers (homozygotic HbA). Infants with two disease alleles were excluded during cohort creation as screen positives (HbS/S, HbS/C, HbS/β-thal).

Sample Characteristics
Complete newborn screening records including all study analytes, sex and birth weight were available for 159,215 infants born between January 2012 and December 2014 (Fig. 1). A summary of the cohort characteristics is provided in Table 2. As expected, Hb ratio decreased with advancing gestational age at birth. Relative levels of HbF and HbA in infants born at varying gestational ages is represented in Fig. 2. Fig. 1. Cohort creation. Infants registered in the Born Information System (BIS) from January 2012 -December 2014 who were negative for the conditions screened by Newborn Screening Ontario (NSO) were used for analysis. Infants with incomplete essential demographic data or newborn screening profiles were excluded from the cohort, as were those without first trimester ultrasound data and those whose samples were collected N48 h after birth.

Overall Model Performance
Linear regression performance characteristics demonstrated that the model restricted to newborn birthweight, sex, and multiple birth status had an RMSE of 1·34 weeks in the overall cohort, and correctly classified the gestational age to ±1 week in 55·2% of infants and to ±2 weeks in 88·4% of infants. Addition of Hb ratio improved model performance with an RMSE of 1·23 weeks (LRT p b 0.0001), and accurately predicted gestational age to ±1 or 2 weeks in 60·4% and 90·9% of the cohort, respectively. Model performance was further improved by addition of TSH and 17-OHP levels (RMSE 1·16 weeks, LRT p b 0.0001). In this model, gestational age was correctly classified to ±1 week of 62·8% of infants, and to ±2 weeks of 92·5% of infants. Optimal model performance was achieved by the full analyte model incorporating birthweight, sex, multiple birth status and all newborn screening analytes including Hb ratio. The full prediction model had an overall RMSE of 1·04 weeks, and was capable of providing accurate estimations of gestational age to ±1 or ±2 weeks of true gestational age in 68·7%, and 95·3% of the cohort, respectively. Consistent with our previous findings (Wilson et al., 2016), performance of all linear regression models was diminished in SGA10 infants. Comparison of linear regression model performance characteristics by gestational age is provided in Table 3. The proportions of infants correctly classified by gestational age are summarized in Table 4.

Model Performance in Dichotomous Pre-term Birth Categories
Dichotomization of preterm birth using thresholds of 34 or 37 weeks gestational age demonstrated assessed performance of all models to distinguish between preterm birth categories. By logistic regression, AUC and PPV at 80% sensitivity demonstrated robust model performance overall. For all models, performance was similar or more robust in SGA10 infants compared to the overall cohort (Table 5). Gestational age prediction models were more accurate at discriminating ≤34 versus N34 weeks gestational age compared to b 37 versus ≥37 weeks gestational age. As with linear regression results, logistic regression prediction models derived from a combination of birthweight and Hb ratio had higher predictive capacity than models derived from birthweight alone. Inclusion of TSH and 17OHP levels produced better performance characteristics relative to the full analyte model for discriminating infants ≤ 34 versus N34 weeks gestational age (AUC 0·981 vs 0·975; PPV at 80% sensitivity, 0·675 vs 0·53, LRT p b 0.0001). Limited Hb models (Hb ratio and Hb ratio + TSH + 17-OHP) better discriminated ≤34 versus N34 weeks gestational age among SGA10 infants (AUC, all N0·998 vs 0·997; PPV with 80% sensitivity 0·860 and 0·831 vs 0·710, respectively).

Sensitivity Analysis in Carriers of Disease-causing Hb Variants
Although infants who screened positive for hemoglobinopathies were excluded from our analysis, heterozygotic carriers of disease causing alleles without the disease phenotype were not excluded. In the test data where model performance was evaluated, there were 39,251 noncarriers, 515 sickle carriers, 44 other carriers (

Discussion
Developing reliable methods for postnatal identification of gestational age dating are urgently required. In jurisdictions where access to ultrasound dating is of limited option, postnatal estimations would improve population surveillance in local areas to ultimately address issues of preterm birth prevention, and help target service delivery to high-risk mothers and preterm infants (Dosman et al., 2012;DiPietro and Allen, 1991;Bonhoeffer et al., 2006). Implementation of postnatal estimation tools would also directly benefit affected infants, guiding allocation of services necessary for improving outcome, including kangaroo mother care and appropriate respiratory management (World Health Organization, 2003). In this study, we have demonstrated that prediction models using relative fetal and adult Hb levels at birth, in combination with birthweight, sex and multiple birth data, can provide accurate postnatal gestational age estimation. The addition of other non-mass spectrometry derived newborn screening analytes, TSH and 17OHP, to multivariable regression models further improves their predictive power. Logistic regression analyses demonstrate that our hemoglobin-based prediction algorithms discriminate between ≤34 and N34 weeks gestational age overall and in SGA10 infants with excellent precision.
The human β-globin locus on chromosome 11 houses ε-, γ-, δand β-globin genes that regulate human HbF and HbA expression. While the ε-globin gene is active in early fetal life, γ-globin genes are predominantly expressed for the production of HbF, (α 2 γ 2 ) during the fetal period (Bank, 2006). The predominance of HbF during fetal life has been attributed to its increased oxygen affinity compared to other Hb variants. As pregnancy progresses, δand β-globin genes are activated, with the β-globin gene being the most highly expressed in adult erythrocytes. Indeed, Hb newborn screening levels for our cohort confirmed that Hb ratio, defined by Hb(F + F1)/Hb(A + F + F1), varies by gestational age at birth, with term infants exhibiting the lowest Hb ratios and increasingly preterm infants having a consistently higher Hb ratio.
There is considerable potential value in using metabolic markers as a measure of gestational age after birth. In particular, the use of Hb ratio in such a prediction model would provide substantial advantages over models derived from other traditionally screened analytes. Heel prick blood spot collection for expanded newborn screening is typically taken between 24 and 72 h after birth, the timing of which is critical for accurate interpretation of screening results. In low-resource settings, mothers and infants are often discharged within 48 h of birth, thus limiting the opportunity for connecting with infants after discharge. Hemoglobin analysis however is not limited to heel prick sampling and cord blood samples may be used reliably for analysis, providing results immediately after birth (Lobel et al., 1989). In addition, Hb measurements are traditionally taken by electrophoresis or HPLC in many laboratories (Association of public health laboratories, 2015, Clarke and Higgins, 2000), and thus are more amenable to measurement in settings where the equipment and expertise required for mass spectrometry dependent analyses may be limited. Commercially available field-portable HPLC systems are now available, and in time could be harnessed for newborn screening applications. Lastly, the prevalence of sickle cell traits and rates of hemoglobinopathies are high in nations of the middle east, northern Africa and south east Asia (Modell and Darlison, 2008;Piel et al., 2010;Williams and Weatherall, 2012). Penicillin prophylaxis RMSE, root mean square error (average absolute deviation of observed vs. predicted GA in weeks); TSH, thyroid stimulating hormone; 17-OHP, 17 hydroxyprogesterone; SGA10, small for gestational age. a All models include infant sex, multiple birth (yes, no). for the first year of life is a simple, inexpensive treatment for infants affected by sickle cell disease who are at increased risk of life-threatening pneumococcal infections (Association of public health laboratories, 2015). Thus introduction of Hb testing in areas without established practices would provide dual benefits of gestational age prediction and identification of vulnerable children with hemoglobinopathy conditions. The utility of newborn TSH and 17-OHP levels in addition to Hb ratios in model performance was also explored. Similar to hemoglobin, TSH and 17-OHP offer practical advantages as they may be routinely obtained by fluorometric or colorimetric assay rather than by mass spectrometry analysis. These analytes are also likely to be captured in existing newborn screening programs (Therrell et al., 2015) due to the frequency of related disorders and effectiveness of treatment. In our study, addition of TSH and 17-OHP improved model accuracy over and above that of the simple Hb model and demonstrated excellent predictive ability in SGA10 infants. The latter is particularly important in low resource settings where it may be difficult to distinguish infants who are small as a result of preterm birth or placental insufficiency. Although the effect of SGA10 on Hb ratio in the current study were not explored, preliminary examination of metabolic profiles derived from infants born to a tertiary care hospital with a diagnostic code of 'placental insufficiency' revealed no significant difference in Hb ratio. The disadvantages of relying on TSH and 17-OHP for gestational age prediction must also be considered. TSH and 17-OHP are subject to rapid postnatal change, and are thus typically sampled after analyte levels stabilize (Newborn Screening Ontario -Dépistage Néonatal Ontario, 2013). Thus it is unlikely that prediction models requiring such analyte measurements would be useful in infants who are discharged prior to 24 h, nor would the model be appropriate to cord blood-derived samples for the same reason.
While we found that models utilizing a full panel of newborn screening analytes were the most precise at predicting continuous GA than limited Hb models, the performance characteristics of our Hb models to discriminate ≥ 34 versus b 34 weeks gestational age are promising. Thirty-four weeks gestational age is an important clinical threshold as it represents the lower limit of the late preterm period (Kugelman and Colin, 2013;Bakewell-Sachs, 2007). It is above this threshold that the health risks of preterm infants are reduced, although still remaining elevated compared to their term counterparts (Nold et al., 2011). Thus, the trade-off between minor reduction in model accuracy with reduced cost and expertise required to obtain model variables may make hemoglobin-based metabolic prediction models for gestational age suitable substitutes to postnatal metabolic gestational dating in jurisdictions where full mass spectrometry screening is not available.
The strengths of these analyses include use of a large multi-ethnic cohort and our population-based approach. The large sample size enabled us to partition our data into independent derivation, test and validation sample sets, which permitted unbiased variable selection and avoided over fitting of the data. In addition, the use of gold standard first trimester ultrasound for gestational age strengthens the reliability of model performance. Potential limitations include the specificity of our model to the population from which it was derived. The majority of the infants included in model development were born at term, and average size for gestational age. Weighting of preterm and SGA infants in our model served to adjust for this, and we believe that a trade-off to favor accurate identification of infants at higher clinical risk is beneficial. Importantly, although derived from a multi-ethnic population, the performance of Hb models (Hb and birthweight alone, or the full model) in international populations is as of yet uncertain. Preliminary validation of our original gestational age model, however suggests robust performance across ethnic subgroup in the province of Ontario, Canada.

Conclusions
Methods to predict gestational age based on newborn screening markers are poised to provide accurate postnatal assessments of gestational age in settings where gold standard first trimester ultrasounds are limited. Here, we have built upon our existing postnatal gestational age prediction algorithm to demonstrate both the stand-alone and additive predictive potential of newborn Hb levels to the model. The clinical value of the prediction model limited to birthweight and Hb while excluding other newborn screening analytes would depend on the acceptability of the prediction error for gestational age. To further validate our Hb-based algorithm, an evaluation of the model in cord blood would be of benefit. Validation of our model in low-resource settings is also warranted to determine its utility in international settings, as global or region-specific hemoglobin-based algorithms may be of particular use in low resource settings where mass spectrometry analysis for traditionally screening markers are not available. The incremental benefits of this approach over standard gestational age assessment should also be evaluated in a low resource setting.

Funding Sources
This study was funded by a Phase II Grand Challenges Exploration grant from the Bill & Melinda Gates Foundation (OPP1141535). The funding agency had no role in the study design, data collection, analyses, interpretation or writing of the report. No payments were received to write this article by a pharmaceutical company or other agency. No payments were received to write this article by a pharmaceutical company or other agency.

Conflict of Interest
The authors report no conflict of interest. PPV 80%, positive predictive value when the classification cutpoint is set such that sensitivity is 80%; AUC, area under the receiver operator curve (c-statistic); SGA10, small for gestational age; TSH, thyroid stimulating hormone; 17OHP, 17 hydroxyprogesterone. a All models include infant sex, multiple birth (yes, no).