 Research article
 Open Access
 Open Peer Review
 Published:
Bayesian hierarchical piecewise regression models: a tool to detect trajectory divergence between groups in longterm observational studies
BMC Medical Research Methodology volume 17, Article number: 86 (2017)
Abstract
Background
Bayesian hierarchical piecewise regression (BHPR) modeling has not been previously formulated to detect and characterise the mechanism of trajectory divergence between groups of participants that have longitudinal responses with distinct developmental phases. These models are useful when participants in a prospective cohort study are grouped according to a distal dichotomous health outcome. Indeed, a refined understanding of how deleterious risk factor profiles develop across the lifecourse may help inform earlylife interventions. Previous techniques to determine betweengroup differences in risk factors at each age may result in biased estimate of the age at divergence.
Methods
We demonstrate the use of Bayesian hierarchical piecewise regression (BHPR) to generate a point estimate and credible interval for the age at which trajectories diverge between groups for continuous outcome measures that exhibit nonlinear withinperson response profiles over time. We illustrate our approach by modeling the divergence in childhoodtoadulthood body mass index (BMI) trajectories between two groups of adults with/without type 2 diabetes mellitus (T2DM) in the Cardiovascular Risk in Young Finns Study (YFS).
Results
Using the proposed BHPR approach, we estimated the BMI profiles of participants with T2DM diverged from healthy participants at age 16 years for males (95% credible interval (CI):13.5–18 years) and 21 years for females (95% CI: 19.5–23 years). These data suggest that a critical window for weight management intervention in preventing T2DM might exist before the age when BMI growth rate is naturally expected to decrease. Simulation showed that when using pairwise comparison of leastsquare means from categorical mixed models, smaller sample sizes tended to conclude a later age of divergence. In contrast, the point estimate of the divergence time is not biased by sample size when using the proposed BHPR method.
Conclusions
BHPR is a powerful analytic tool to model longterm nonlinear longitudinal outcomes, enabling the identification of the age at which risk factor trajectories diverge between groups of participants. The method is suitable for the analysis of unbalanced longitudinal data, with only a limited number of repeated measures per participants and where the timerelated outcome is typically marked by transitional changes or by distinct phases of change over time.
Background
Child toadult trajectories of health markers are likely to have implications for the risk of chronic diseases in later life, such as obesity, type 2 diabetes mellitus (T2DM) and cardiovascular diseases; it is therefore important to understand their development throughout the lifecourse [1,2,3,4].
Longrunning observational studies that follow the same subjects participants across the lifecourse are especially suited to studying adult onset disorders, such as cardiometabolic disease, since they allow characterizing the development of normal vs. pathological processes overtime. A goal of such studies is often to determine how a number of patient characteristics, modifiable risk factors profiles [1, 5], their interactions and normal aging may impact the onset and progression of disease over time [6,7,8] in order to identify time periods of divergence in these factors [9–11].
A key statistical issue in these studies is often to determine whether the risk factor levels vary over time between and within groups of participants, and whether different groups are changing in a similar or different fashion over time [12, 13]. Depending on the study, the stratification of participants into groups can relate to participants’ characteristics or exposure (i.e. smoking status), intervention arm (i.e. control vs. medication), or it could be a later health outcome (i.e. disease status in midadulthood). When participants are grouped according to a distal dichotomous health outcome, longitudinal data provide the foundation to understand pathways to deleterious risk factor profiles, which may help inform the timing of interventions [8, 14, 15].
When it is established that groups of interest start with similar initial outcome levels, but do not change similarly overtime, it is often of interest to determine the point in time or age at which they start diverging in their trajectories [16,17,18,19,20]. Being able to determine how and when the change manifests between groups of participants is important, since it can help pinpoint periods in the life course that are critical in the development of abnormal risk factor profiles [21]. However, there is little methodological guidance in the literature on statistical techniques to achieve this, and several studies have noted a lack of relevant methods to investigate trajectory divergence between groups [20,21,22].
A common attempt is to fit a mixed model with time (or age) treated as categorical variable (i.e. non timeordered/ordinated [23]) to retrieve linear predictions at each age for each group of interest from this model (i.e. means of least squares predictions, aka LSmeans [24,25,26]), and to test for a group difference in these predictions using a number of contrasts (i.e. posthoc pairwise comparisons). In this case, the age at which the difference betweengroups emerge is often the age at which a significant betweengroup difference materializes in the LSmeans [23, 27, 28]. Several studies have used this approach to determine at “what times the groups means are different” (e.g. betweensubject effect or posthoc pairwise group comparison, if there are more than two groups) and/or ‘at what times the means differ’ within each group (withinsubject effect testing) [27, 29, 30]. However, even when adjustments are applied for multiple tests [27, 31,32,33], many authors advise against the unrestricted use of multiple comparisons among marginal means due to welldocumented multiple testing issues, especially the increase in false positive rate as the number of hypothesis tests increases [30, 34–37]. Mixed models that assume an unstructured mean response by treating age or time as categorical variables tend to be over parameterized and may be inefficient at detecting main effects [38]. Another crucial disadvantage of this approach, is that it only tests for the difference in means between groups at each time point and does not provide any information on subjectspecific response evolution in time [13, 39], so that the age (or point in time) at which the group difference manifests is ultimately a question of sample size and statistical power.
In contrast, continuous time models such as individualbased trajectory modeling methods, including mixed effect [12], hierarchical [40], multilevel [41] and the closely related structural equation and Latent Growth Curve models [42], have become invaluable tools to understand the natural history of health outcome as well as risk factor/determinant trajectories [14, 43,44,45]. They have advantages over traditional approaches to repeatedmeasure data analysis; their main feature being that they allow summarizing each participant’s outcome trajectory with a few trajectory parameters [39, 46]. In addition, they permit the explicitly modeling of interindividual differences in intraindividual change, permitting inference regarding the average response trajectory over time and how this evolution may vary with participant characteristics (i.e. participantlevel predictors) [47,48,49,50].
Despite their flexibility, these models are not often used to analyse sparse longterm observational data since accelerated longitudinal designs [14, 22, 51] and nonlinear response overtime [44, 52,53,54] both introduce significant complexity into the growth curve modeling approach [55,56,57,58]. Indeed, being able to represent nonlinear patterns with a relatively small number of measurement occasions per participants (often <10 time points) and be specific about where betweenparticipant heterogeneity appears in those patterns is a statistical challenge.
Many applications often relied on higher order time (or age) polynomials or latent basis coefficients [14, 20, 44, 59,60,61], which strengths and limitations have been described elsewhere [9, 46, 62,63,64,65]. In the context of our study the polynomial parameterisation of the growth model does not specifically yield an age or point in time when the growth pattern is changing withinand betweengroups. Alternatively, piecewise models, also known as linear splines or broken stick models, can be used to break up a nonlinear or curvilinear growth trajectory into several separate linear components [66]. They are particularly useful to compare growth rates in different periods over time if the functional form of the response is characterised by different phases of development, or if there is a shift in the outcome trajectory at some point in the event window (i.e. an acceleration or a deceleration in the response change rate from one point in time (or age)) [67,68,69,70,71,72,73]. Piecewise linear trajectory models have been used to model ‘multiphase’ developmental processes primarily with ‘fixed’ transition points in a variety of applications in the frequentist multilevel [40, 45, 69, 74, 75] and structural equation modeling framework [42, 76]. Bayesian applications of these processes are often referred to as ‘random change point model’ where the position of individual breakpoints is also estimated, allowing for betweenperson variability in the transition points [77,78,79,80,81,82,83,84,85,86].
Few studies have, however, investigated the inclusion of categorical covariates or grouping variables as level 2 predictors of the variability in the change point, and the random Bayesian change point model has not, to our knowledge, been formulated to test specifically for the existence of a ‘trajectory divergence’ between two (or more) known groups of participants that have longitudinal responses characterised by distinct developmental phases. In this paper we illustrate the use of Bayesian hierarchical piecewise regression modeling to detect trajectory divergence between groups of participants using longitudinal BMI data from the Cardiovascular Risk in Young Finns (YFS) Study, a well phenotyped prospective cohort with measures from multiple timepoints. Previously published work on this data, based on categorical mixed modeling, suggested that BMI levels became statistically different between those who develop T2DM in adulthood and those who did not from the age of 15 years [87]. We reanalyse this data set to demonstrate how the Bayesian method can be used to (1) model the BMI profiles to better understand the natural history of the BMI trajectories in those who do and do not develop T2DM in adulthood while controlling for potential cohort effects, and (2) obtain a refined estimate and confidence interval of the age at which the two groups start diverging from one another, translating into significantly different BMI from a certain age onwards. In addition, we conduct a series of short simulations to illustrate the difference in the estimates of age at divergence when using the traditional approach (i.e. pairwise comparisons of marginal means from a categorical mixed model) vs. the proposed trajectory modeling approach.
Methods
Statistical model
Nocovariate model
To accommodate the curvilinear developmental pattern in an individual continuous response over time while providing an adequate representation of its developmental theory, we consider a linearlinear piecewise regression model as the functional form of change in the trajectory model. The ‘change point’ (CP) represents the age (or time) at which the transition to a different growth rate occurs. We consider the following unconditional (no covariates) multilevel model:
Level 1 model:
Level 2 model:
Where at age j for participant i, Response _{ ij } is the repeated continuous outcome measures, and age _{ ij } is the corresponding time related variables centered around its grand mean. \( {u}_{C{ P}_i}\left( ag{e}_{i j}\right) \) is a unit heavyside step function where \( {u}_{C{ P}_i}\left( ag{e}_{i j}\right) \) =1 if age _{ ij } ≥ CP _{ i } and \( {u}_{C{ P}_i}\left( ag{e}_{i j}\right)=0 \) if age _{ ij } < CP _{ i }. The random trajectory parameters b _{0i }, b _{1i } and b _{2i } correspond to the individual intercept, slope before and slope after the personspecific change point CP _{ i }, respectively. For each person i, b _{0i } controls the individual baseline level (or initial status) for the response and its interpretation depends on the centering of the age variable (e.g. if age is centered around 25 years, b _{0i } will be the expected participantlevel response at 25 years of age given they are in the first phase of growth b _{1i }). b _{1i }, b _{2i } and CP _{ i }, are the expected linear increase per year of age in the first phase of growth, the expected linear rate of increase after the change point, and age at which the linear perturbation to the initial trend occurs, respectively. ε _{ ij } is the level1 residual (i.e. random withinperson error for person i at age j) and is independent and normally distributed (i.e. ε _{ ij } ~ iid N(0, σ _{ e } ^{2} )). v _{0i }, v _{1i }, v _{2i } and v _{ CPi } are the level2 random effects, multivariate normally distributed with zero mean and variances σ _{ v0} ^{2} , σ _{ v1} ^{2} , σ _{ v2} ^{2} and σ _{ CP } ^{2} respectively and full covariance matrix as shown in 1.3 . β _{00}, β _{10}, β _{20} and CP are the fixed effects (i.e. population average of each trajectory parameter). In this model, the level 1 residual variance σ _{ e } ^{2} can be interpreted as the deviations around an individual’s trajectory and level2 residuals as betweenparticipant variability in the overall intercept (σ _{ v0} ^{2} ), in the rate of change before and after the change point CP _{ i } (σ _{ v1} ^{2} and σ _{ v2} ^{2} respectively), and in the change point itself (σ _{ CP } ^{2} ), respectively.
Model with groupeffect
To explore heterogeneity in individual trajectories between groups of interests, the unconditional segmented growth model can be expanded by including timevarying covariates (TVCs) at level1 and time invariant covariates (TICs) at level 2, while simultaneously adjusting for the effects of variables measured on participants at all time points. Whereas TICs directly predict the growth parameters, TVCs directly predict the repeated measures while controlling for the influence of the growth parameters [43, 88]. If the TIC variable is a binary dummy grouping factor (“GRP_{ i }”), identifying participants coming from two identified groups, the model can be rewritten as follows:
Level 1 model:
Level 2 model:
Where β _{00}, β _{10}, β _{20} and CP are the expected trajectory parameters for the reference group (at zero values for other potential covariates); β _{0grp }, β _{1grp }, β _{2grp } and β _{ CP } are the expected intergroup variations in these parameters for participants in the second group (i.e. respectively, in the mean response, in the linear age effect, in the deviation from linear rate after the CP and in the CP timing); and v _{0i }, v _{1i }, v _{2i } and v _{ CPi } are the level2 residuals person i for intercept, slopes, and age at the change point after controlling for group differences. To test for a betweengroup difference in one trajectory parameter only, ‘GRP’ can be included as a level2 predictor for the parameter of interest, and model all other growth parameters as random effects only (as in 1.2). For each of the p + 1 individual growth parameters, additional participantspecific covariates (TICs) can be included in a similar fashion to have multiple predictors at level 2 as follows: \( {b}_{p i}={\beta}_{p0}+{\displaystyle \sum_{q=1}^{Qp}}{\beta}_{p q}{x}_{q i}+{u}_{p i} \), with x _{ qi }, the q^{th} measured TIC; β _{ pq },the effect of the TIC x _{ qi } on the (p + 1) ^{th} trajectory parameter; and u _{ pi }, the (p + 1) ^{th} random effect. The set of p + 1 random effects for person i assumed to be multivariate normally distributed with covariance matrix of dimension (p + 1) * (p + 1), although simpler variancecovariance structures of the random effects can be considered during model building (i.e. mutual independence of the random effects). It is advisory to standardize TVCs in order to stabilize the variance, improve normality of errors and linearity of the mean [88]. The common assumption for the error structure is ε _{ ij } ~ iid N(0, σ _{ e } ^{2} ) but it can be relaxed to include time specific variances or residual error correlation such as AR1 errors.
The same approach can be used to expand the hierarchical piecewise trajectory model with grouping factors that have more than 2 levels. This is one of the possible approaches to test for a cohorteffect on the development of curvilinear responses over time when data arises from multicohort or accelerated longitudinal designs [89, 90]. If study participants belong to one of k possible birth cohorts, k1 binary dummy variables are created to identify observations coming from people born in the same calendar year, and as in 2.2, these new k1 grouping variables are introduced as level 2 predictors of the different trajectory parameters in the model. The binary dummy variables are introduced to sequentially shift the conditional means of each of the different trajectory parameters. The fixed effects will be the average trajectory parameters for the cohort chosen as the reference cohort in the study sample, and each (β _{ cohort })_{1.. k − 1} coefficient will thus be interpreted as the variation in growth parameters in the corresponding k1th cohort compared to the reference cohort.
Trajectory divergence mechanisms
The equation 2.2 above, allows for betweengroup difference in each of the 4 trajectory parameters of the piecewise model. (i.e. intercept, slope before and after the change point (CP), and the change point itself). If the focus is to determine and model the divergence in the trajectories between group, then model 2.2 can be modified by forcing the intercept and slope before the CP to be invariant across groups by setting β _{0grp } and β _{1grp } to zero at level 2 in equation 2.2. As illustrated in Fig. 1, we identify 3 possible ways in which continuous outcomes trajectories can diverge over time between groups: (1) type 1: the two groups have different slope after the CP, (2) Type 2: the two groups have different change points, and (3) Type 3: the two groups have different CP and postCP slopes. To test for groupdifference at different stages of the outcome development, our approach consists in fitting these 3 possible conditional Bayesian hierarchical models to the data and comparing model fit to determine which mechanisms provides the best representation of the underlying development of the outcome between groups of participants.
Bayesian estimation of the hierarchical piecewise model
We used a Bayesian approach to estimate and summarize the parameters of interest in the conditional multilevel piecewise model (formula 2.2) [86, 91]. In our illustrative example, all models were fit in RJAGS and R2JAGS in R.. In combined Bayesian notation, the trajectory model with a binary grouping status ‘GRP’ as the TIC covariate interacting with all 4 trajectory parameters can be written as follows:
To ensure that the effect of ‘group’ on each trajectory parameter can be either positive or negative and that the prior information does not dominate the likelihood, uninformative priors for the fixed group effects β _{0grp }, β _{1grp }, β _{2grp }, CP _{ grp } can be set as N ~ (0, 10^{4}). In vector notation, the random effects v _{ i } = (v _{ oi }, v _{1i } , v _{2i } , v _{ cpi } )^{T} are assumed to follow a multivariate normal distribution with mean β and unstructured 4 x4 variancecovariance matrix φ as in 1.3, where β = (β _{0}, β _{1}, β _{2}, CP)^{T}, the vector of population means. Traditionally in Bayesian analysis for random effects, InvWishart(Σ,k) is used as a conjugate prior to the unknown variancecovariance matrix of multivariate normal distributions, where Σ is a positive definite inverse scale matrix of degree of freedom k [93]. InverseGamma (λ_{1}, λ_{2}) is often used as the conjugate prior to the variance of univariate normal distribution (i.e. for mutually independent random effects, and model error variance σ ^{2}). Alternative prior distributions may be used for level 2 variances of independent random effects or for the variance components of multivariate normal distributions [41, 92, 93].
Significance of groupdifferences in trajectory parameters
Testing for groupdifferences in trajectory parameters is equivalent to investigating the significance of the grouping covariates parameters at level 2 in the hierarchical change point model. In the Bayesian context, this is done by looking at the posterior probability density for the " β _{ grp }” parameters in 2.2. (i.e. β _{0grp }, β _{1grp }., β _{2grp }.and β _{ CP }) of the estimated covariate parameters. For example, the effect of ‘GRP’ on each trajectory parameter is significant if the 95% Bayesian credible intervals (CI) of the estimated regressors (i.e. each “β _{ grp }”) exclude zero, in which case, the estimated “β _{ grp }” can be interpreted as the shifts in each trajectory parameter in one group compared to the other group [77, 92, 94].
Model convergence, fit and adequacy
The choice of the best model among the suite of candidate (conditional) Bayesian hierarchical models can be based on two criteria: (1) the deviance information criterion DIC [95, 96], an index of quality of fit that is commonly used for Bayesian model comparison [97], and (2), the Bayesian posterior predictive pvalue (PP pvalue), obtained through posterior predictive checking of the likelihood of each potential model (the sum of residuals was used as a as a discrepancy measure) [41].
Illustrative data
We illustrate the application of the proposed Bayesian piecewise modeling approach by using it to investigate the divergence in childto adult trajectories of BMI between participants who do and do not develop adult T2DM in a wellstudied ongoing populationbased prospective cohort, the Cardiovascular Risk in YFS [15]. Details on study design and on the collection of cardiovascular risk factors between 1980 and 2011 are published elsewhere [98] and summarized in Additional file 1.
In a previously published work on the YFS cohort, elevated BMI in children between 9 and 18 years was associated with an increased risk of developing T2DM in adulthood [87]. Additionally, a sex and insulinadjusted mixed model incorporating participants ages as a categorical variable, suggested that differences in average BMI values between those who do and those who do not develop adult T2DM tended to emerge during adolescence, becoming marginally significant from the age of 15 years onwards. In this approach, the betweengroup difference at each age groups was assessed by pairwise comparisons of the predicted marginal means (i.e. LSmeans), and did not incorporate BMI trajectory information at the individual or population level. In contrast, the proposed hierarchical piecewise regression approach considers and makes full use of individual trajectory information to test for groupdifferences at specific stages of BMI development from childhood to adulthood. Unlike categorical approaches, the proposed growth model provides a clearer representation of the underlying pathological BMI development among those who develop T2DM in adulthood.
In our illustration, we include 2540 YFS participants (1401 females and 1139 males) followedup a maximum of six times between 1980 and 2011 (Additional file 2: Table S1). Information on adult T2DM status was collected on participants at their latest individual adult followup (i.e. dichotomous outcome coded 0 for participants without T2DM, and 1 for those with T2DM in 2001, 2007, or 2011). Included participants had at least one BMI measure available in childhood (i.e. in 1980, 1983 or 1986 between age 3 and 18 years). Participants had on average 4.98 repeated measures of BMI over the study period, with 90.7% of participants having 4 or more BMI measures (Additional file 2: Figure S1). 88 included participants (3.5%, 44 females and 44 males) had T2DM in adulthood. We excluded BMI observations made among those aged 3 years in 1980 so that the ages of participants considered in the trajectory analysis ranged from 6 to 49 years. 3 year olds were not included in the analysis since only 3 participants in this birth cohort developed T2DM in adulthood. Furthermore, the lack of BMI measures between 3 and 6 years, prevented modeling the downwards slope from infancy peak, nor the age at adiposity rebound, which usually occurs before age 6 years in normal weight children [99, 100]. Using BMI data collected on participants aged 6 years and over, we expect that most participants had reached this important childhood milestone, and that a linear trend was thus an appropriate functional form to approximate childhood BMI growth from that age (Additional file 3: Figure S1).
Since sex differences in childhood growth and pubertal timing have been demonstrated [101, 102], subsequent BMI trajectory modeling between age 6 and 49 years was conducted among males and females separately [103]. BMI, especially in adulthood, is slightly right skewed, but using log10 transformed BMI in the modeling approach presented below did not alter our conclusions. For ease of interpretation, we thus present results using untransformed BMI only.
Visual inspection of the sexspecific smoothed BMI trajectories confirms the presence of a divergence between the two groups in adolescence (Additional file 3: Figure S2). Compared to participants who remain healthy, those who develop T2DM seem to have greater average BMI levels by the time they are young adults, although it is unclear whether this divergence results from a groupdifference in the timing at which the transition to a slower BMI growth rate happens (Type II divergence) from a groupdifference in rate itself after puberty (Type I divergence), or from both (Type III divergence).
Although the distal outcome of ‘adult T2DM’ is the grouping factor of interest in our illustrative trajectory divergence analyses, we also demonstrate how the same modeling approach can be used to investigate potential intercohort variation in childhood to adulthood BMI trajectories by considering models with ‘year of birth’ as a categorical level 2 predictor of each of the 4 trajectory parameters. Individual age and sexspecific BMI Zscores at the first clinic (in 1980) were also included as level 2 predictors of each BMI trajectory parameters to investigate if systematic deviation from participants of comparable age and sex at baseline had any influence on the development of BMI trajectories later in life. All continuous covariates used in the analyses were standardized in order to stabilize the variance, improve normality of errors and linearity of the mean.
Specific values for the hyperparameters used in our illustrative analyses are given in Additional file 4. While in principle φ can be unstructured, in our application, convergence for some parameters could not be reached when considering an unrestricted covariance structure between all four random effects in the unconditional change point model (equations 1.1 and 1.2), probably due to over parameterisation. Because initial analyses suggested a correlation between the slopes before the change point (b_{ 1i } ) and the difference in slopes after the change point (b_{ 2i }), we constrained the model by including a nonzero correlation between these two random effects but setting independence for all other random effects, leading to a block diagonal structure of φ (Additional file 4). Based on DIC, this covariance structure was preferred over mutually independent random effects for both males and females (Additional file 4), and used when expending the trajectory models with level 2 predictors. In our application, we investigated prior sensitivity by fitting the unconditional BMI trajectory model using three sets of priors for the hyperparameters (Additional file 4). Because we found that the choice of hyperparameters had a minor influence on the marginal posterior distributions, for subsequent (conditional) analyses, we chose to report posterior estimates of parameters estimated from the set of priors that yielded the lowest DIC in the sensitivity analyses (Additional file 4). In this set the priors for the means of the change points were based on the sexspecific estimates that maximized the profile log likelihood for the fixed (populationaverage) breakpoints in the unconditional model (estimated at 16 years for females and 22 for males, see estimation method in Additional file 5). Using these priors for the change point means also kept computation running times reasonable.
For the other participants varying variable included in the analysis, sex and agespecific BMI zscores at the first visit and birth cohort, priors were set to N ~ (0,0.001) for all corresponding parameters (i.e. all β _{ cohort } priors and β _{ initialBMI − z − score }). To remain consistent with previous analyses of this data set [87], timevarying measures of fasting insulin were logtransformed and standardized before being included as a level1 predictor in the Bayesian hierarchical models to improve right skewedness and to linearize its relationship with BMI About 17% of the insulin measures were not available in the data. The missing data mechanism for ‘insulin’ was considered noninformative, as we have no reason to believe that the probability of an individual insulin measure being missing depends on the true value of this missing insulin observation (although it may be related to other observed variables for that individual). We thus consider that insulin is missing at random (MAR), and we specify a prior for this covariate [104]. Since log (insulin) is approximately normally distributed, we specify a N ~ (μ _{log(insulin)}, τ _{log(insulin)} ) likelihood for log(insulin) _{ i } and place a vague prior on its variance (i.e. τ _{log(insulin) ~ Gamma(0.001,0,001)} ). Under this parametrization, the posterior predictive distribution for μ _{log(insulin)} and τ _{log(insulin)} will be informed by the observed part of the data only. Although individual insulin measurements change at each data collection point, by adding log(insulin) as a level 1 covariate in the multilevel model, the estimated relationship between insulin and BMI development remains constant across time [45]. This is a reasonable assumption in our application, since data exploration did not suggest any systematic patterns of change in insulin levels at the intraindividual level as people age. That is, the age smoother estimate obtained by fitting a generalized additive mixed model had an estimated degree of freedom (edf) close to 1 and was not significant (pvalue >0.3), which did not suggest a nonlinear relationship between log(insulin) and age [105].
Approximate posterior distributions of the parameters of the models considered throughout the analyses are obtained via MCMC simulations. Each model ran with 4 independent parallel chains of the Gibbs sampler (see Appendix 3 for an example of code). For each model, the first 50000 iterations were discarded in a burnin run, and the draws from the posterior were thinned by a factor of 10 to reduce serial correlation of the chains. The following 20000 iterations were used to obtain posterior distributions of model parameters by mixing the 4 sequences. Model convergence was assessed through MCMC iterations traceplots and GelmanRubin diagnostic [92], and residual errors were plotted to confirm they approximately followed a normal distribution.
Results
Divergence of BMI profiles in T2DM and nonT2DM YFS participants
Following the modeling approach presented in the Methods and the priors and their corresponding hyperparameters (Additional file 4: Table S1) we fitted the following set of conditional Bayesian hierarchical piecewise models for each sex: unconditional (Model A), adult T2DM status adjusted intercept (Model B), adult T2DM status adjusted childhood slope (Model C), adult T2DM status adjusted adult slope (Model D), adult T2DM status adjusted change point (CP) (Model E), adult T2DM status adjusted CP and adult slope (Model F), adult T2DM status adjusted change point, childhood and adult slopes (Model G), adult T2DM status adjusted intercept, and change point (Model G), and a model with all four parameters adjusted for adult T2DM status (Model H). As mentioned above, previous research on this data set suggested BMI levels were not significantly different between the two groups in childhood [87]. Models C (i.e. group difference in childhood slopes) and B (i.e. BMI response consistently higher in one group across the life course) were thus fitted to demonstrate our modeling approach. An annotated extract showing the RJAGS code syntax used to fit Model E is available in Additional file 6.
For both sex, the lowest DIC was obtained when fitting model E, which was also the best fitting model with PP pvalues close to 0.5 (Table 1). This supported the type II divergence mechanism where a difference in BMI levels emerged between the two groups due to a group difference in the change point timing. BMI growth rate in adulthood for both sexes was decreased by twothirds compared to childhood (i.e. 0.67 vs. 0.18 , and 0.61 vs. 0.15 kg/m^{2} per year in childhood and adulthood for females and males respectively), and participants who developed T2DM had similar BMI yearly rates in adulthood compared to those who remained healthy (β_{2T2DM} effect not significant in model F for both sex Table 2). However, females who developed T2DM reached their developmental transition in BMI rate on average 12.37 years later (Table 2).
Similarly for males, estimated BMI growth rates were not markedly different between the two T2DM groups in childhood or in adulthood, and comparable to those estimated in females (Table 2). But again, compared with healthy adults, those who developed T2DM reached their slower BMI growth rate on average 6.47 years later.
The effect of the timevarying covariate of insulin at level 1 was significant for both males and females, with a 1sd increase in log(Insulin) resulting in a BMI observation increased by 2.6 and 2.8 kg/m^{2} respectively (i.e. exp(β _{log(insulin)}), Table 2). To assess potential differences in the magnitude of the insulin effect as a function of betweenperson characteristics, we expanded model E by including an interaction between ‘adult T2DM status’ and log(insulin). For each sex, the estimated parameters were not significant (95% CI included 0), suggesting that the effect of insulin on BMI was homogenous between the two groups and across genders.
The estimates of the variancecovariance parameters of model E showed that the correlation between an individual’s BMI growth rate in childhood and adulthood is equal to 0.61 for females and 0.47 for males, suggesting that children who have greater yearly BMI increase rates also have greater adult rates of increase (correlation estimated as: \( \raisebox{1ex}{${\sigma}_{\beta 1\beta 2}$}\!\left/ \!\raisebox{1ex}{$\sqrt{\sigma_{\beta 1}^2*{\sigma}_{\beta 2}^2}$}\right. \), Table 2). The betweenparticipant variation around the change point σ _{ CP } was comparable between males and females (Table 2).
Figure 2 shows the estimated populationaverage prototypical trajectories for each sex and T2DM group obtained from the estimated parameters for Model E, along with 100 trajectories predicted for each sex and T2DM group from Model E by Monte Carlo simulation. This illustrates a range of credible individual profiles generated under this model (see Appendix for code). For each sex and adult T2DM status group, Fig. 3. shows a box and whiskers plot of the estimated individual BMI slopes obtained from Model E after the average change point in the healthy group and before the T2DM groups reach their average CP (i.e. slopes between 16.02 and 28.4 years in females, and slopes between 21.62 and 28.09 years in males). Figure 3 illustrates that individual rates of change after puberty provides better discrimination of participants who went on to develop T2DM from those who did not, compared to punctual individual BMI levels at age 15 or 18 for females, and ages 21 and 24 for males (Additional file 2: Figure S2). While the distribution of BMI levels at age groups surrounding the age at divergence overlaps considerably (Additional file 2: Figure S2), individual slopes allow to differentiate participants who have switched to a rate consistent with a normal slowing down of BMI development after puberty, from those who are still on the trajectory of increasing BMI development consistent with the rate from childhood.
Effect of ageand sexspecific childhood Zscore on BMI trajectories
When including individual ageand sexspecific BMI zscores at the first clinic as continuous level 2 predictors of each of the four growth parameters in sexspecific Models E, the only significant effect observed was for the childhood BMI slope, with a 1sd increase in BMI zscore associated with a 0.056 (sd = 0.012) and a 0.038 (sd =0.009) increase in childhood (in kg/m^{2} per year) for male, and females respectively. This suggests that in the YFS sample, higher age and sexadjusted BMI at first visit in childhood were associated with faster BMI increase in childhood, but not with the age at transition in BMI development nor the change rate in adulthood.
Between cohort heterogeneity in BMI trajectories
To test whether ‘year of birth’ was associated with betweenparticipant heterogeneity in the development of BMI from age 6 to 49 years, five binary dummy variables identifying BMI observations of people born in different years (i.e. 62, 65, 71, 74 and 77) were introduced as level 2 predictors of BMI growth parameters in sexspecific models (with year 1971 as the reference level) (Table 3). Increasing the complexity of the model did not improve model fit for males, and the lowest DIC was obtained for the unconditional model (Model A, Table 3) suggesting that their lifecourse BMI trajectory is more stable across cohorts. For females, model E marks a significant improvement in model fit, suggesting that the most significant predictor of betweencohort variations resides in the timing of the CP, although the best model was obtained when adjusting for a cohort effect on both the adult BMI growth rate and change point. For each sex, the posterior mean parameter estimates of the best fitting model are presented in Additional file 7. The results show that most of the between cohort variation for females is due to slight trajectory differences in two specific birth cohorts: those born in 1968, who reached the transition to adult BMI growth rate on average 2.89 years later than the cohort (year of birth 1971), and those born in 1974, who had adult BMI yearly rates increased by 0.06 (e.g. adult slopes of 0.24 compared to 0.18 kg/m^{2} per year on average for the other 5 cohorts) (Additional file 7).
Simulations
A short series of simulations was conducted to compare difference in estimates of the age at which the groups diverge when using the proposed Bayesian piecewise growth modeling approach compared to a more traditional approach based on pairwise comparison of LSmeans estimated from a categorical mixed model. We simulated repeated measure data from a Type II divergence model (i.e. groupdifference in the change point timing only), using the posterior estimates of mean growth parameters for the model fitted for females (average parameters are set to: β _{ O } = 26.5, β _{1} = 0.67, CP = 16.02, = β _{ GroupCP } =12.37, β _{2} = −0.49, matching Model E posterior estimates for females in Table 2), and both a participantlevel random effect (σ _{ error } ^{2} = 2.77) and an observationlevel residual error (σ _{ error } ^{2} = 2.47). Under this model, “CP”, the change point for the first group to depart from the populationaverage childhood slope represents the age at which the two groups of participants diverge in their outcome trajectories (i.e. the second group maintains this rate of change for 12.30 years longer). To resemble the YFS BMI data, we randomly sampled baseline ages from the YFS cohort subtracted by 25 years as ages at the first visit for each participant, with 6 nonmissing repeated measures 3, 6, 9, 21, 27 and 31 years later for each participant. We considered 3 scenarios of sample sizes for the number of participants in each group (group 1/group 2): (1) 100/100, (2) 50/100, and (3) 30/100. For each of the three scenarios, we simulated 100 datasets and fitted both a mixed model with age as a categorical variable and the Type II divergence Bayesian Hierarchical piecewise model using the set of priors defined in Additional file 4. For each piecewise model, we recorded the posterior estimate for the “CP” parameter, and for each fitted categorical mixed model, we applied pairwise comparison of the leastsquare means (LSmeans) with Tukey adjustment for multiplicity to retrieve: (1) the earliest age at which the groupdifference in means was found significant (p < 0.05), and (2) the midway point between two consecutive ages that had a minimum number of nonsignificant differences in means before and significant differences in means after the “midway point” method (2) is a potential alternative definition of age at which the groupdifference appears in the LS means. Compared to the “earliest age with pval <0.05” method (1), the “midway” point definition minimises the impact of simulations where some tests show significance at a young age, even though tests for the surrounding ages are not. For each scenario, estimated ages at divergence using the 3 methods were averaged across the 100 simulations. Figure 4 presents the simulation results in term of the quartiles distribution and means of these estimates of age at divergence across the 100 simulations. When sample size decreases for one group of participants, the pairwise LS mean comparison method will tend to overestimate the age at divergence, with significant variability in the estimates arising due to random variation, especially when age at significance is determined using the first age at which a pvalue <0.05 occurs (Fig. 4.). In contrast, the hierarchical Bayesian piecewise model was less sensitive to sample size, and the true age at divergence was consistently within the estimated interquartile range of the produced estimates, indicating that the Bayesian trajectory divergence model outperforms the LS mean method in both accuracy and precision, regardless of the way “age at divergence” is defined from the model output.
Discussion
Using the repeated BMI data from the YFS study, we demonstrated how Bayesian hierarchical piecewise regression (BHPR) modeling may be used to investigate betweengroup trajectory divergence in nonlinear longitudinal outcomes.
The nonlinearity in BMI development across the lifecourse is well documented in the literature, with changes in BMI corresponding to a number of identified developmental phases [101, 106, 107]. In particular, BMI rate decelerates after puberty once people reach their adult height, translating to a levelingoff of the BMI trajectory in adulthood [108, 109]. Although many recent applications have relied on such approaches [99, 102, 110,111,112,113], traditional polynomial parameterizations of growth curve models are not well suited to analyse BMI development [114, 115], especially if the focus is to identify transitional changes or determine divergence between groups.
In contrast, piecewise regression is particularly suited to model BMI across different lifestages as its parameters map onto what is known about the natural development of BMI over time [116]. Since ‘change points’ (i.e. milestones in the case of BMI) are model parameters in the piecewise model, there is no need to use elaborate techniques to retrieve these points of interest [59, 112, 117]. Piecewise models are also often preferable to more general continuous nonlinear models if the number of repeated measurements per participant is small (i.e. 3 to 6 data points each as in [16, 109]) as is often the case in longrunning observational prospective studies [99, 112]. Moreover piecewise multilevel regression models may be used to characterize the divergence mechanisms in nonlinear responses between groups by modeling change points as random parameters and introducing grouping factors as predictors of the betweenperson heterogeneity in responses over time.
Although our main goal was to characterize how and when the developmental patterns of BMI diverged between those who did and did not develop T2DM in the YFS, we also demonstrated the utility of the method to investigate cohort effects in the outcome response. Previous analyses of the YFS BMI and T2DM data considered categorical mixed models and tested for differences in the estimated BMI levels between the two T2DM groups at different ages by pairwise comparisons of the BMI predicted marginal means (i.e. LeastSquare means) averaged over sex while adjusting for multiple testing (i.e. Tukey adjustment). This approach suggested that from age 15 years, the T2DM group had significantly higher BMI levels than those without T2DM. However, these analyses ignored the potential confounding effect of birth cohorts, and each existing “age” was treated as a non timeordered categorical variable so that no inference could be made on individual or groupspecific agerelated BMI trajectories. Some age groups comprised those from up to five separate birth cohorts, while others only comprised those from a unique birth cohort (i.e. those aged 3 and 27 years). Having substantially fewer participants in one or both T2DM status groups at some age points results in a decreased power to detect a significant difference between groups (i.e. the observed difference at age 27 years was not significant in either sexaveraged or sexadjusted LSmeans, Additional file 8: Tables S1, S2 and Figure S1, S2). Because BMI development is known to progress differently in males and females, and the oldest and youngest cohorts in the YFS sample are almost a generation apart (~15 years), not taking these confounders into account may result in biased inferences. In fact, when estimating the LSmeans separately for each sex, the age at which the difference between T2DM groups becomes significant is not as clear since in males the difference is not significant at age 21 and 24 years, suggesting the true divergence in BMI between T2DM groups for males occurs more around those ages (Additional file 8: Table S2 and Figure S2).
In contrast, the method we illustrate here is not sensitive to sample size and uses developmental theory to inform a model that allows betweengroup differences in withinperson BMI trajectories at four possible levels for males and females to be examined (i.e. the overall BMI level, the childhood BMI growth rate, the adult BMI growth rate, and the age at which the transition between the two phases of change occurs).
Applied to the example data set, our method allowed us to characterise group differences in the nonlinear development of BMI and to identify a critical age window at which weight intervention programs might be best applied to help reduce or delay the incidence of T2DM in adulthood. Our findings support the theory that girls who keep on gaining weight at the same rate they did in childhood past the age of 16 years are more likely to develop T2DM in adulthood. Similarly, for males, the natural deceleration in BMI velocity occurs, on average, at 21 years of age. Those who stay on their childhood BMI trajectory past that age may be at increased risk of developing T2DM.
Longitudinal studies often aim to make inferences on differences among average population health marker trajectories. Typically, this involves comparing change rates (or slope differences) in healthy participants vs. those with pathological development, specific treatment conditions, or groups following certain lifestyle patterns [118–120]. Using our Bayesian hierarchical piecewise regression approach, serial measures of patient’s weight and height, often routinely collected in paediatric, general practice, and healthy or well child clinics, could be used to determine if an individual is on a path to an healthy adult weight status, or if their BMI trajectory places them in a category more susceptible to develop adult metabolic conditions such as T2DM.
Conclusions
Studying withinperson and betweenperson differences in the development of continuous outcomes as a function of age in longrunning multicohort observational studies is crucial to better understand the natural history of healthy vs. pathological risk factor profiles. Due to the typically unbalanced data designs, loss to followup and expected nonlinear responses, it remains methodologically challenging to analyse such data. When the substantial focus is on when and how two or more groups of participants grouped according to a distal dichotomous health outcome have diverged in their response trajectories, traditional parameterisations of curvilinear growth model do not allow to identify an age at which the group that developed the condition moved onto a different path compared to the group that remained healthy. In contrast, the hierarchical piecewise multilevel modeling enables the separation of multiple aspects of change in complex developmental processes such as individual and group differences in the rates of change at different periods, and potential heterogeneity in the timing at which individuals from identified groups enter each developmental phase, providing a powerful tool to help inform intervention The methodology we illustrate here focuses on a response with only one developmental change point, but it could easily be extended to more complex nonlinear responses with multiple transitions.
Abbreviations
 BHPR:

Bayesian hierarchical piecewise regression
 BMI:

Body mass index
 CI:

Credible interval
 CP:

Change point
 CVD:

Cardiovascular disease
 DIC:

Deviance information criteria
 MCMC:

Markov Chain Monte Carlo
 PP pvalue:

Posterior predictive pvalue
 T2DM:

Type 2 Diabetes mellitus
 TIC:

Timeindependent covariate
 TVC:

Timevarying covariate
 YFS:

Cardiovascular risk in Young Finn study
References
 1.
Power C, Kuh D, Morton S. From Developmental Origins of Adult Disease to Life Course Research on Adult Disease and Aging: Insights from Birth Cohort Studies. Annu Rev Public Health. 2013;34:7–28.
 2.
Færch K, Witte DR, Tabák AG, Perreault L, Herder C, Brunner EJ, Kivimäki M, Vistisen D. Trajectories of cardiometabolic risk factors before diagnosis of three subtypes of type 2 diabetes: a posthoc analysis of the longitudinal Whitehall II cohort study. Lancet Diabetes Endocrinol.1:43–51.
 3.
Kuh D, BenShlomo Y, Lynch J, Hallqvist J, Power C. Life course epidemiology. J Epidemiol Community Health. 2003;57:778–83.
 4.
Narayan KMV, Boyle JP, Thompson TJ, Gregg EW, Williamson DF. Effect of BMI on Lifetime Risk for Diabetes in the U.S. Diabetes Care. 2007;30:1562–6.
 5.
BenShlomo Y, Kuh D. A life course approach to chronic disease epidemiology: conceptual models, empirical challenges and interdisciplinary perspectives. Int J Epidemiol. 2002;31:285–93.
 6.
Dudina A, Cooney MT, Bacquer DD, Backer GD, Ducimetiere P, Jousilahti P, Keil U, Menotti A, Njolstad I, Oganov R, et al. Relationships between body mass index, cardiovascular mortality, and risk factors: a report from the SCORE investigators. Eur J Cardiovasc Prev Rehabil. 2011;18:731–42.
 7.
Chen Y, Copeland WK, Vedanthan R. Association between body mass index and cardiovascular disease mortality in east Asians and south Asians: pooled analysis of prospective data from the Asia Cohort Consortium. BMJ. 2013;347:f5446.
 8.
Freedman DS, Khan LK, Dietz WH, Srinivasan SR, Berenson GS. Relationship of childhood obesity to coronary heart disease risk factors in adulthood: the Bogalusa Heart Study. Pediatrics. 2001;108:712–8.
 9.
Twisk JWR. Applied Longitudinal Data Analysis for Epidemiology: A Practical Guide. 2nd ed. Cambridge: Cambridge medicine; 2013.
 10.
Diggle PJ, Heagerty P, Liang KY, Zeger SL. Analysis of Longitudinal Data, Oxford Statistical Science Series. 2002. ISBN 9780198524847.
 11.
Mattsson N, Ronnemaa T, Juonala M, Viikari JS, Raitakari OT. Childhood predictors of the metabolic syndrome in adulthood. The Cardiovascular Risk in Young Finns Study. Ann Med. 2008;40:542–52.
 12.
Verbeke G, Molenberghs G. Linear Mixed Models for Longitudinal Data. New York: Springer; 2000.
 13.
Fitzmaurice G, Davidian M, Verbeke G, Molenberghs G. Longitudinal data analysis. Boca Raton: Chapman and Hall/CRC; 2008.
 14.
Tu YK, Tilling K, Sterne JA, Gilthorpe MS. A critical evaluation of statistical approaches to examining the role of growth trajectories in the developmental origins of health and disease. Int J Epidemiol. 2013;42:1327–39.
 15.
Raitakari OT, Juonala M, Ronnemaa T, KeltikangasJarvinen L, Rasanen L, Pietikainen M, HutriKahonen N, Taittonen L, Jokinen E, Marniemi J, et al. Cohort profile: the cardiovascular risk in Young Finns Study. Int J Epidemiol. 2008;37:1220–6.
 16.
Li L, Hardy R, Kuh D, Lo Conte R, Power C. Childtoadult body mass index and height trajectories: a comparison of 2 British birth cohorts. Am J Epidemiol. 2008;168:1008–15.
 17.
Stuart B, Panico L. Earlychildhood BMI trajectories: evidence from a prospective, nationally representative British cohort study. Nutr Diabetes. 2016;6:e198.
 18.
Finkel D, Reynolds CA, McArdle JJ, Pedersen NL. Cohort differences in trajectories of cognitive aging. J Gerontol B Psychol Sci Soc Sci. 2007;62:P286–294.
 19.
Gimeno D, Ferrie JE, Elovainio M, PulkkiRaback L, KeltikangasJarvinen L, Eklund C, Hurme M, Lehtimaki T, Marniemi J, Viikari JS, et al. When do social inequalities in Creactive protein start? A life course perspective from conception to adulthood in the Cardiovascular Risk in Young Finns Study. Int J Epidemiol. 2008;37:290–8.
 20.
Howe LD, Tilling K, Galobardes B, Smith GD, Gunnell D, Lawlor DA. Socioeconomic differences in childhood growth trajectories: at what age do height inequalities emerge? J Epidemiol Community Health. 2012;66:143–8.
 21.
Carles S, Charles MA, Forhan A, Slama R, Heude B, Botton J, group Emcs. A Novel Method to Describe Early Offspring Body Mass Index (BMI) Trajectories and to Study Its Determinants. PLoS One. 2016;11:e0157766.
 22.
Grajeda LM, Ivanescu A, Saito M, Crainiceanu C, Jaganath D, Gilman RH, Crabtree JE, Kelleher D, Cabrera L, Cama V, Checkley W. Modeling subjectspecific childhood growth using linear mixedeffect models with cubic regression splines. Emerging Themes in Epidemiology. 2016;13:1.
 23.
Locascio JJ, Atri A. An overview of longitudinal data analysis methods for neurological research. Dement Geriatr Cogn Dis Extra. 2011;1:330–57.
 24.
Lenth RV. LeastSquares Means.R package version 2.22. In: Book LeastSquares Means.R package version 2.22. 2016.
 25.
Goodnight JH, Harvey WR. LeastSquares Means in the FixedEffects General Linear Models.Technical Report R103, Book LeastSquares Means in the FixedEffects General Linear Models.Technical Report R103. NC: SAS Institute Inc; 1978.
 26.
Harvey WR. Use of the HARVEY Procedure, Book Use of the HARVEY Procedure. NC: SAS Institute Inc; 1976.
 27.
Liu C, Cripe TP, Kim MO. Statistical issues in longitudinal data analysis for treatment efficacy studies in the biomedical sciences. Mol Ther. 2010;18:1724–30.
 28.
Diggle PJ, Heagerty P, L KY, Zeger SLO. Analysis of Longitudinal Data. 2nd ed. Oxford: Oxford University Press; 2002.
 29.
Kreft IGG. Are multilevel techniques necessary? An overview, including simulation studies, Book Are multilevel techniques necessary? An overview, including simulation studies. Los Angeles: California State University at Los Angeles; 1996.
 30.
Kowalchuk RK, Keselman HJ. Mixedmodel pairwise multiple comparisons of repeated measures means. Psychol Methods. 2001;6:282–96.
 31.
Sullivan LM. Repeated measures. Circulation. 2008;117:1238–43.
 32.
Greenhouse SW, Geisser S. On methods in the analysis of profile data. Psychometrika. 1959;24:95–112.
 33.
Huynh H, Feldt LS. Estimation of the Box Correction for Degrees of Freedom from Sample Data in Randomized Block and SplitPlot Designs. J Educ Behav Stat. 1976;1:69–82.
 34.
Li D, Dye TD. Power and stability properties of resamplingbased multiple testing procedures with applications to gene oncology studies. Comput Math Methods Med. 2013;2013:610297.
 35.
Westfall PJ, Young SS. ResamplingBased Multiple Testing. New York: Willey; 1993.
 36.
Pe'er I, Yelensky R, Altshuler D, Daly MJ. Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genet Epidemiol. 2008;32:381–5.
 37.
Westfall PH, Tobias RD, Rom D, Wolfinger RD, Hochberg Y. Multiple Comparisons and Multiple Tests Using the SAS System, Book Multiple Comparisons and Multiple Tests Using the SAS System. NC: SAS Institute Inc; 1999.
 38.
Donohue MC, Aisen PS. Mixed model of repeated measures versus slope models in Alzheimer's disease clinical trials. J Nutr Health Aging. 2012;16:360–4.
 39.
Goldstein H, Browne W, Rabash J. Multilevel modeling of medical data. Stat Med 2002: 21:3291–315. Stat Med. 2002;21:3291–315.
 40.
Raudenbush SW, Bryk AS. Hierarchical linear models: Applications and data analysis methods. 2. Thousand Oaks: Sage Publications; 2002.
 41.
Gelman A, Hill J. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge: Cambridge University Press; 2007.
 42.
Duncan TE, Duncan SC, Strycker LA. An Introduction to Latent Variable Growth Curve Modeling: Concepts, Issues, and Application, Second Edition. Mahwah: Lawrence Erlbaum Associates; 2006.
 43.
Curran PJ, Obeidat K, Losardo D. Twelve Frequently Asked Questions About Growth Curve Modeling. J Cogn Dev. 2010;11:121–36.
 44.
Hox J, Stoel RD. Multilevel and SEM approaches to Growth Curve modeling. In: Everitt BS, Howell DC, editors. Encyclopedia of Statistics in Behavioral Science, vol. 3. Chichester: John Wiley & Sons; 2005.
 45.
Singer JD, Willett JB. Applied Longitudinal Data Analysis: Modeling change and event occurence. New York: Oxford University Press; 2003.
 46.
Fitzmaurice G, Laird NM, Ware JH. Applied Longitudinal Analysis. 2nd ed. New Jersey: John Wiley & Sons; 2011.
 47.
Preacher KJ. Latent growth curve models. In: Muelle GRHRO, editor. The reviewer's guide to quantitative methods in the social sciences (pp 185–198). London: Routledge; 2010. p. 185–98.
 48.
Mirman D. Growth Curve Analysis and Visualization Using R. Boca Raton: Chapman and Hall/CRC; 2014.
 49.
Mirman D, Dixon JA, Magnuson JS. Statistical and computational models of the visual world paradigm: Growth curves and individual differences. J Mem Lang. 2008;59:475–94.
 50.
Heo M, Faith MS, Mott JW, Gorman BS, Redden DT, Allison DB. Hierarchical linear models for the development of growth curves: an example with body mass index in overweight/obese adults. Stat Med. 2003;22:1911–42.
 51.
Tilling K, MacdonaldWallis C, Lawlor DA, Hughes RA, Howe LD. Modeling Childhood Growth Using Fractional Polynomials and Linear Splines. Ann Nutr Metab. 2014;65:129–38.
 52.
Sterba S. Fitting Non linear Latent growth Curve models with Inidvidually varying Time point. Struct Equ Model Multidiscip J. 2014;21:630–47.
 53.
Cudeck R, du Toit SHC. A nonlinear form of quadratic regression with interpretable parameters. Multivar Behav Res. 2002;37:501–19.
 54.
Harring JR, Cudeck R, du Toit SH. Fitting Partially Nonlinear Random Coefficient Models as SEMs. Multivariate Behav Res. 2006;41:579–96.
 55.
Frenk SM, Yang YC, Land KC. Assessing the Significance of Cohort and Period Effects in Hierarchical AgePeriodCohort Models: Applications to Verbal Test Scores and Voter Turnout in U.S. Presidential Elections. Soc Forces. 2013;92:221–48.
 56.
Raudenbush SW, Chan WS. Application of a hierarchical linear model to the study of adolescent deviance in an overlapping cohort design. J Clin Consulting Psychol. 1993;61:941–51.
 57.
Yang Y, Land KC. A mixed models approach to the ageperiodcohort analysis of repeated crosssection surveys, with an application to data on trends in verbal test scores. Sociol Methodol. 2006;36:75–97.
 58.
Raudenbush SW, Chan WS. Growth Curve Analysis in Accelerated Longitudinal Designs. Criminology Penology. 1992;29:387–411.
 59.
Howe LD, Tilling K, Benfield L, Logue J, Sattar N, Ness AR, Smith GD, Lawlor DA. Changes in ponderal index and body mass index across childhood and their associations with fat mass and cardiovascular risk factors at age 15. PLoS One. 2010;5(12):e15186.
 60.
Long J, Ryoo J. Using fractional polynomials to model nonlinear trends in longitudinal data. Br J Math Stat Psychol. 2010;63:177–203.
 61.
Tilling K, Sterne JAC, Wolfe CDA. Multilevel growth curve models with covariate effects: application to recovery after stroke. Stat Med. 2001;20:685–704.
 62.
Liu X, Engel CE. Methods and Applications of Longitudinal Data Analysis. Elsevier Science Publishing Company Incorporated; 2015
 63.
Royston P, Altman DG. Regression using fractional polynomials of continuous covariates: parsimonious parametric modeling. Appl StatJ R Stat Soc. 1994;43:29–67.
 64.
Royston P, Ambler G, Sauerbrei W. The use of fractional polynomials to model continuous risk variables in epidemiology. Int J Epidemiol. 1999;28(5):964–74.
 65.
Verbeke G. Linear mixed models for longitudinal data, Book Linear mixed models in practice. New York: Springer; 1997.
 66.
Lawrence FR, Blair C. Factorial invariance in preventive intervention: modeling the development of intelligence in low birth weight, preterm infants. Prev Sci. 2003;4:249–61.
 67.
Howe LD, Tilling K, Matijasevich A, Petherick ES, Santos AC, Fairley L, Wright J, Santos IS, Barros AJ, Martin RM, et al. Linear spline multilevel models for summarising childhood growth trajectories: A guide to their application using examples from five birth cohorts. Stat Methods Med Res. 2013;4:249–61.
 68.
Cudeck R, Klebe KJ. Multiphase mixedeffects models for repeated measures data. Psychol Methods. 2002;7:41–63.
 69.
Muggeo MR, Atkins DC, Gallop RJ, Dimidjan S. Segmented mixed models with random changepoints: a maximum likelihood approach with application to treatment for depression study. Stat Model. 2014;14(4):293–313.
 70.
Flora DB. Specifying piecewise latent trajectory models for longitudinal data. Struct Equ Model. 2008;15:513–33.
 71.
Kohli N, Harring JR. Modeling growth in latent variables using a piecewise function. Multivar Behav Res. 2013;48:370–97.
 72.
Willett JB, Singer JD, Martin NC. The design and analysis of longtiduinal studies of development and psychopathology in context: statistical models and methodological recommendations. Dev Psychopathol. 1998;10:395–426.
 73.
Li L, Hardy R, Kuh D, Lo Conte R, Power C. ChildtoAdult body mass index and height ttrajectories : A comparison of 2 british cohorts. 2008.
 74.
Muggeo MR. Segmented mixed models with random changepoints in R. Working Paper. 2016. https://www.researchgate.net/publication/292629179_Segmented_mixed_models_with_random_changepoints_in_R.
 75.
Muggeo MR. Modeling temperature effects on mortality: multiple segmented relationships with common break points. Biostatistics. 2008;9:613–20.
 76.
Muthen B. Secondgeneration structural equation modeling with a combination of categorical and continuous latent variables: New opportunities for latent class–latent growth modeling. Washington: American Psychological Association; 2001.
 77.
Congdon P. Bayesian statistical modeling. New York: Wiley; 2001.
 78.
JacqminGadda H, Commenges D, Dartigues JF. Random changepoint model for joint modeling of cognitive decline and dementia. Biometrics. 2006;62:254–60.
 79.
Lange N, Carlin BP, Gelfand AE. Hierarchical Bayes models for the progression of HIV infection using longitudinal CD4 Tcell numbers. J Am Stat Assoc. 1992;87:615–26.
 80.
Slate EH, Turnbull BW. Statistical models for longitudinal biomarkers of disease onset. Stat Med. 2000;19:617–37.
 81.
Ghosh P, Vaida F. Random changepoint modeling of HIV immunologic responses. Stat Med. 2007;26:2074–87.
 82.
Dominicus A, Ripatti S, Pedersen NL, Palmgren J. A random change point model for assessing variability in repeated measures of cognitive function. Stat Med. 2008;27:5786–98.
 83.
Hall CB, Lipton RB, Sliwinski M, Stewart WF. A change point model for estimating the onset of cognitive decline in preclinical Alzheimer's disease. Stat Med. 2000;19:1555–66.
 84.
Hall CB, Ying J, Kuo L, Lipton RB. Bayesian and profile likelihood change point methods for modeling cognitive function over time. Comput Stat Data Anal. 2003;42:91–109.
 85.
Kiuchi A, Hartigan JA, Holford TR. Change points in the series of T4 counts prior to AIDS. Biometrics. 1995;51:236–48.
 86.
MunizTerrera G, Van den Hout A, Matthews FE. Random change point models: investigating cognitive decline in the presence of missing data. J Appl Stat. 2011;38:705–16.
 87.
Sabin MA, Magnussen CG, Juonala M, Shield JPH, Kähönen M, Lehtimäki T, Rönnemaa T, Koskinen J, Loo BM, Knip M, et al. Insulin and BMI as Predictors of Adult Type 2 Diabetes Mellitus. Pediatrics. 2015;135:e144–51.
 88.
McCoach DB, Kaniskan B. Using TimeVarying Covariates in Multilevel Growth Models. Front Psychol. 2010;1(17):1–12.
 89.
Miyazaki Y, Raudenbush SW. Tests for linkage of multiple cohorts in an accelerated longitudinal design. Psychol Methods. 2000;5:44–63.
 90.
Galbraith S, Bowden J, Mander A. Accelerated longitudinal designs: An overview of modeling, power, costs and handling missing data. Stat Methods Med Res. 2014;92:221–48.
 91.
Mostafa AA, Ghorbal A. Bayesian and nonBayesian analysis for random changepoint problem using standard computer packages. Int Mathematical Archive. 2011;2:1963–79.
 92.
Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. 2. Boca Raton: Chapman & Hall/CRC; 2004.
 93.
Gelman A. Prior distributions for variance parameters in hierarchical models. Bayesian Anal. 2006;1:515–33.
 94.
Gelman A, Hill J. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge: Cambridge University Press; 2006.
 95.
Spiegelhalter DJ, Best NG, Carlin BP. Bayesian measure of model complexity and fit. J R Stat Soc Ser B. 2002;64:583–639.
 96.
Plummer M. Penalized loss functions for Bayesian model comparison. Biostatistics. 2008;9:523–39.
 97.
Burnham K, Anderson D. Model selection and multimodel inference: a practical informationtheoretic approach. New York: Springer; 2002.
 98.
Juonala M, Viikari JS, Raitakari OT. Main findings from the prospective Cardiovascular Risk in Young Finns Study. Curr Opin Lipidol. 2013;24:57–64.
 99.
Chivers P, Hands BP, Parker H, Beilin L, Kendall G, Bulsara M. Longitudinal modeling of body mass index from birth to 14 years. Obes Facts. 2009;2:302–10.
 100.
RollandCachera MF, Deheeger M, Maillot M, Bellisle F. Early adiposity rebound: causes and consequences for obesity in children and adults. Int J Obes (Lond). 2006;30 Suppl 4:S11–17.
 101.
Karlberg J. On the modeling of human growth. Stat Med. 1987;6:185–92.
 102.
Silverwood RJ, De Stavola BL, Cole TJ, Leon DA. BMI peak in infancy as a predictor for later BMI in the Uppsala Family Study. Int J Obes (Lond). 2009;33:929–37.
 103.
Kuczmarski RJ. 2000 CDC Growth Charts for the United States: methods and development. Vital Health Statistics1. 2002;11:1–190.
 104.
Lunn D, Jackson C, Best N, Thomas A, Spiegelhalter D. The BUGS Book: A Practical Introduction to Bayesian Analysis. Boca Raton: CRC press; 2012.
 105.
Wood S. Generalized additive models: an introduction with R. Boca Raton: CRC press; 2006.
 106.
Dietz WH. Critical periods in childhood for the development of obesity. Am J Clin Nutr. 1994;59:955–9.
 107.
Rogol AD, Clark PA, Roemmich JN. Growth and pubertal development in children and adolescents: effects of diet and physical activity. Am J Clin Nutr. 2000;72:521–8.
 108.
Ostbye T, Malhotra R, Landerman LR. Body mass trajectories through adulthood: results from the National Longitudinal Survey of Youth 1979 Cohort (1981–2006). Int J Epidemiol. 2011;40:240–50.
 109.
Li L, Hardy R, Kuh D, Power C. Lifecourse body mass index trajectories and blood pressure in mid life in two British birth cohorts: stronger associations in the laterborn generation. Int J Epidemiol. 2015;44:1018–26.
 110.
Guo SS, Huang C, Maynard LM, Demerath E, Towne B, Chumlea WC, Siervogel RM. Body mass index during childhood, adolescence and young adulthood in relation to adult overweight and adiposity: the Fels Longitudinal Study. Int J Obes Relat Metab Disord. 2000;24:1628–35.
 111.
Williams S, Davie G, Lam F. Predicting BMI in young adults from childhood data using two approaches to modeling adiposity rebound. Int J Obes Relat Metab Disord. 1999;23:348–54.
 112.
Wen X, Kleinman K, Gillman MW, RifasShiman SL, Taveras EM. Childhood body mass index trajectories: modeling, characterizing, pairwise correlations and sociodemographic predictors of trajectory characteristics. BMC Med Res Methodol. 2012;29:12–38.
 113.
Besharat Pour M, Bergstrom A, Bottai M, Magnusson J, Kull I, Moradi T. Age at adiposity rebound and body mass index trajectory from early childhood to adolescence; differences by breastfeeding and maternal immigration background. Pediatr Obes. 2016;30 Suppl 4:S11–17.
 114.
Royston P, Altman DG. Regression using fractional polynomials of continuous covariates: parsimonious parametric modeling. J R Stat Soc, Appl Stat. 1994;43:429–67.
 115.
McArdle JJ, Wang L. Modeling agebased turning points in longitudinal lifespan growth curves of cognition. New York: Routledge/Taylor & Francis Group edn; 2008.
 116.
Ram N, Grimm K. Using simple and complex growth models to articulate developmental change: Matching theory to method. Int J Behav Dev. 2007;31:303–16.
 117.
Tilling K, Davies NM, Nicoli E. Associations of growth trajectories in infancy and early childhood with later childhood outcomes. Am J Clin Nutr 2011;94 Suppl 6:1808s13s. Am J Clin Nutr. 2011;94:1808s–13s.
 118.
Magee CA, Caputi P, Iverson DC. Identification of distinct body mass index trajectories in Australian children. Pediatr Obes. 2013;8:189–98.
 119.
Ekberg J, Angbratt M, Valter L, Nordvall M, Timpka T. History matters: childhood weight trajectories as a basis for planning communitybased obesity prevention to adolescents. Int J Obes. 2012;36:524–8.
 120.
Huang DY, Lanza HI, WrightVolel K, Anglin MD. Developmental trajectories of childhood obesity and risk behaviors in adolescence. J Adolesc. 2013;36:139–48.
Acknowledgements
We thank the clinic and administrative staff for their contribution to data collection. Above all, we thank the participants of the Cardiovascular Risk in Young Finns Study.
Funding
The Young Finn study was financially supported by the Academy of Finland (grants 126925, 121584, 124282, 129378, 117797, 265877 and 41071), the Social Insurance Institution of Finland, the Turku University Foundation, Paavo Nurmi Foundation, Juho Vainio Foundation, Sigrid Juselius Foundation, Maud Kuistila Foundation, Research funds from the Kuopio, Turku and Tampere University Hospitals, the Finnish Foundation of Cardiovascular Research, the Finnish Medical Foundation, the OrionFarmos Research Foundation, and the Finnish Cultural Foundation. This work was partly funded by the National Health and Medical Research Council Project Grant (APP1098369). CGM was supported by a National Heart Foundation of Australia Future Leader Fellowship (100849).
Availability of data and materials
The dataset supporting the conclusions of this article were obtained from the Cardiovascular Risk In Young Finns study (YFS) after submission and approval of our study plan by the Young Finns Study coordinators. The YFS dataset comprises health related participant data and their use is therefore restricted under the regulations on professional secrecy (Act on the Openness of Government Activities, 612/1999) and on sensitive personal data (Personal Data Act, 523/1999, implementing the EU data protection directive 95/46/EC). Due to these ethical restrictions, the data from this study cannot be stored in public repositories or otherwise made publicly available. However, data access may be permitted on a case to case basis upon request only. Data requests should be addressed to the project coordinator, Dr. Olli Raitakari (Email:olli.raitakari@utu.fi), Tel: +358 2 333 7220 Fax: +358 2 333 7270. Written requests should be sent to the following postal address: Research Centre of Applied and Preventive Cardiovascular, Medicine University of Turku, Kiinamyllynkatu 10, FIN20520 Turku, FINLAND.
Authors’ contributions
Analyzed the data: MJB SSW. Wrote the paper: MJB CGM RJT. Designed the study: MJB CGM RJT OTR. Collected the clinical data: JSAV TL MJ NHK OTR. Revised several drafts of the manuscript for critically for important intellectual content: DPB MAS JSAV, NHK, TL MJ CGM RJT OTR. Reviewed and approved the final manuscript: MJB SSW CGM RJT JSAV TL MJ NHK DPB MAS OTR.
Competing interests
The authors declare that they have no competing interests.
Consent for publication
Not applicable.
Ethics approval and consent to participate
YFS Participants or their parents provided written informed consent, and the study was approved by local ethics committees (The Ethics Committee of the Hospital District of Southwest Finland) in agreement with the Declaration of Helsinki.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Author information
Additional files
Additional file 1:
Additional information on BMI, T2DM status and fasting insulin information collection in the YFS subset used in the illustrative analyses. (DOCX 17 kb)
Additional file 2:
Subset of the YFS cohort used for the BMI trajectory analysis. Reported are the total number (No.) of participants seen at each clinic year and their ages (Figure S1.) Density plot of the number of BMI measures per YFS participants in the subset of the cohort used for the BMI trajectory analysis (Figure S2.) and average BMI values in kg/m^{2} at each age stratified by T2DM group (pink, no adult T2DM; blue, Adult T2DM), with error bars representing the mean BMI ± SD (standard deviation) (Figure S3.) (DOCX 180 kb)
Additional file 3:
Spaghetti plot of the individual trajectories of those with T2DM in adulthood (N = 88) and those who did not develop T2DM in adulthood (N = 2452). Red solid line: loess smoother curve indicating the average longitudinal trend in each group (Figure S1.) and scatterplot of the lifecourse BMI data (in kg/m^{2}) stratified by sex. Solid lines and gray bands: loess smoothed average trajectories and confidence intervals for each group (adult T2DM vs. nonT2DM group); dashed lines: agespecific averages of BMI levels (Figure S2.) (DOCX 1918 kb)
Additional file 4:
Prior sensitivity analyses methods and results. (DOCX 25 kb)
Additional file 5:
Loglikelihood profiling method and Rcode for the choice of priors of the changepoint mean. (μ_{cp). (DOCX 14 kb)}
Additional file 6:
Annotated RJAGS sample code to fit a type 1 trajectory divergence model with a fully unstructured 4 by 4 covariance matrix for the random growth parameters. (DOCX 14 kb)
Additional file 7:
Posterior mean parameter estimates for best fitting birth cohortadjusted trajectory model for each sex (Model E for females and Model A for males). (DOCX 16 kb)
Additional file 8:
Results of mixed models with age as a categorical predictor and loginsulin as a continuous predictor: LS means contrasts (No adult T2DM vs. adult T2DM) and significance at each age averaged over (Table S1.) or adjusted for the levels of sex (Table S2.) and pairwise comparisons of Leastsquare means of BMI and 95% CIs at each age in each T2DM status group averaged over levels of sex (Figure S1.) at each age in each T2DM status group and sex group combination (M = males, Ffemales, 1 = No adult T2DM, 2 = adult T2DM) (Figure S2.) and adjusted for log(insulin). (DOCX 549 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Buscot, M., Wotherspoon, S.S., Magnussen, C.G. et al. Bayesian hierarchical piecewise regression models: a tool to detect trajectory divergence between groups in longterm observational studies. BMC Med Res Methodol 17, 86 (2017) doi:10.1186/s1287401703589
Received
Accepted
Published
DOI
Keywords
 Piecewise model
 Hierarchical regression
 Nonlinear trajectory model
 Accelerated longitudinal design
 Cohort effect
 Group divergence