 Research article
 Open Access
 Published:
Clusterbased exposure variation analysis
BMC Medical Research Methodology volume 13, Article number: 54 (2013)
Abstract
Background
Static posture, repetitive movements and lack of physical variation are known risk factors for workrelated musculoskeletal disorders, and thus needs to be properly assessed in occupational studies. The aims of this study were (i) to investigate the effectiveness of a conventional exposure variation analysis (EVA) in discriminating exposure time lines and (ii) to compare it with a new clusterbased method for analysis of exposure variation.
Methods
For this purpose, we simulated a repeated cyclic exposure varying within each cycle between “low” and “high” exposure levels in a “near” or “far” range, and with “low” or “high” velocities (exposure change rates). The duration of each cycle was also manipulated by selecting a “small” or “large” standard deviation of the cycle time. Theses parameters reflected three dimensions of exposure variation, i.e. range, frequency and temporal similarity.
Each simulation trace included two realizations of 100 concatenated cycles with either low (ρ = 0.1), medium (ρ = 0.5) or high (ρ = 0.9) correlation between the realizations. These traces were analyzed by conventional EVA, and a novel clusterbased EVA (CEVA). Principal component analysis (PCA) was applied on the marginal distributions of 1) the EVA of each of the realizations (univariate approach), 2) a combination of the EVA of both realizations (multivariate approach) and 3) CEVA. The least number of principal components describing more than 90% of variability in each case was selected and the projection of marginal distributions along the selected principal component was calculated. A linear classifier was then applied to these projections to discriminate between the simulated exposure patterns, and the accuracy of classified realizations was determined.
Results
CEVA classified exposures more correctly than univariate and multivariate EVA approaches; classification accuracy was 49%, 47% and 52% for EVA (univariate and multivariate), and CEVA, respectively (p < 0.001). All three methods performed poorly in discriminating exposure patterns differing with respect to the variability in cycle time duration.
Conclusion
While CEVA had a higher accuracy than conventional EVA, both failed to detect differences in temporal similarity. The datadriven optimality of data reduction and the capability of handling multiple exposure time lines in a single analysis are the advantages of the CEVA.
Background
Workrelated musculoskeletal disorders (WMSD) are major problems in the industrialized world, at the individual, company and societal levels [1]. Risk factors of WMSD are multidimensional in the sense that individual, physical as well as psychosocial factors play a role in the development of these disorders [2]. Constrained postures, repetitive movements, heavy manual handling, lack of variation and insufficient recovery, are often cited as physical risk factors [3]. The extent and structure of variation of biomechanical exposure across time is generally accepted to be an important determinant of the risk of contracting WMSD [4]. Thus, it has been hypothesized that increased exposure variation may prevent WMSD development in jobs characterized by longterm exposure to constrained postures and/or repetitive movements [5–7]. This calls for the development of methods that can quantify biomechanical exposure variation within and between subjects at work [7, 8].
Biomechanical exposure in occupational settings has been described to comprise three basic, conceptual dimensions, i.e., level (amplitude), duration and repetitiveness (frequency), the latter being closely associated with velocity and acceleration when postural exposure is of interest [9, 10]. Exposure “variation”, in turn, has been defined as the change in the exposure over time [7], including three basic aspects: the range of exposure changes, the repetitiveness (frequency) of those changes, and the extent of temporal similarities, or recurring patterns, in the exposure time line.
A computational framework, the exposure variation analysis (EVA), has been suggested to quantify variation [11]. EVA quantifies the accumulated proportion of recorded time that the exposure level remains uninterruptedly within predetermined limits (“exposure level” categories) for predetermined periods of time (“sequence duration” categories). In ergonomics studies, EVA has mainly been applied with recordings of postures (e.g. [6, 12, 13]) and electromyography (e.g. [14–16]). A variety of metrics have been suggested to assess differences in EVA results between groups or conditions and to extract what is believed to be important properties of exposure. Some approaches are based on an aggregation of EVA cells below or above a certain threshold in exposure level and/or sequence duration [12, 14, 17], while others derive variables describing the centroid or standard deviation of the EVA cells [13, 16, 18, 19], or suggest statistical analysis procedures using principal component analysis of the EVA marginal distribution [15] and hierarchical regression of exposure level, frequency and duration simultaneously [20].
In most studies using EVA, the boundaries of the exposure level categories are based on a logarithmic layout [11], but equidistant [20] and nonuniform layouts [12] have also been suggested. A nonequidistant layout of EVA may cause a correlation between exposure level and “sequence duration” categories in some cases [21], which may violate the intention of EVA to state the “true” interaction between the independent aspects of level and frequency in an exposure time line. A datadriven approach for classifying exposures may organize the data in the EVA array preserving the aforementioned independence optimally. This may, in turn, increase the ability of the approach to discriminate different time lines of exposure compared to a conventional EVA, and thus, eventually, classify exposure patterns according to variation and, possibly, risk.
The conventional EVA is univariate, in the sense that it addresses one exposure time line at a time. If a set of outcomes is correlated, for instance a number of EVA analyses of different exposure variables during the same work task, repeated univariate analyses may increase the risk of type I error in a statistical inference [22].
Using statistical simulations, it is possible to generate synthetic exposure time lines mimicking lack or excess of exposure variation. This allows for a comparison of different methods for exposure variation quantification in terms of their ability to pick up aspects of the variation. This approach was applied in the present study, in which we develop a data driven analysis approach based on conventional EVA to quantify variation in exposure and investigate whether this new approach can lead to a better performance in discriminating between different time lines of exposure compared with a univariate and multivariate conventional EVA.
Methods
We developed a novel datadriven exposure analysis approach based on data clustering techniques (CEVA), and investigated its ability to discriminate between different simulated time lines of exposure compared with a conventional EVA using both univariate and multivariate approaches. Particularly, the effectiveness of the different approaches was compared in terms of their capability to handle correlated exposure realizations.
Exposure simulation
An effective method for quantifying exposure variation should be able to discriminate exposure time lines with different properties along all three fundamental aspects (dimensions) of variation: (i) range, (ii) frequency and (iii) temporal similarity [7]. Thus we simulated the extent of exposure variation in a cyclic movement by two sets of parameters representing “small” and “large” exposure variation along each of these dimensions to investigate the discriminative ability.
Cyclic movements represent a repetitive exposure pattern which offers a conceptual base for investigating metrics for exposure variation. In our simulation, a cycle was composed of two successive levels of exposure, one “high” and one “low” (see Figure 1). The high exposure level was obtained by adding a simulated “range” to the low exposure level. The simulation design rendered 2x2x2 exposure groups representing different sizes of range (“far”, “near”), velocity (“high”, “low”) of exposure shifts around the average level, and temporal similarity (“large”, “small”) of exposure sequences between cycles.
Parameter settings were retrieved from literature on right upper arm postural exposure in cyclic occupational work due to its relevance to WMSD in the neck and shoulder region [23]. For all exposure groups, the ratio of low to high level duration (duty cycle) was randomly selected within a narrow range (0.60.7). The mean cycle time was set to be equal in all groups but its standard deviation was allowed to change, with a large and a small variability representing different extents of “temporal similarity” between the cycles. Table 1 summarizes the chosen parameter settings in the simulation.
Exposure level and cycle time duration were assumed to be normally distributed whereas a lognormal distribution was assumed for the velocity (see below). Both normal and lognormal distributions can be fully described by two parameters, the mean and the standard deviation. Thus, if statistical descriptors such as the median and the 90^{th} percentile are available, the parameters of the distribution can be derived numerically.
We simulated 100 sequential cycles and assumed that exposure was recorded at a virtual sampling frequency of 20 Hz [24]. The whole sequence composed of concatenation of100 simulated cycles was termed an “exposure realization”. An “exposure trace” was defined as an entity consisting of two exposure realizations. For each exposure group, 30 simulated traces were obtained at each of three levels of crosscorrelation coefficient between the two exposure realizations in an exposure trace: low (ρ = 0.1), medium (ρ = 0.5) and high (ρ = 0.9) [25].
Range
As illustrated in Figure 1, the range corresponded to the difference between the “low” and “high” exposure levels within a cycle. Thus, we estimated the “high” level of each simulated cycle as the sum of the “low” level and the exposure range corresponding to the simulated scenario: “far” or “near”. We also assumed the exposure range and the low level of the cycle to be independent, so that ${\sigma}_{\mathit{High}}^{2}={\sigma}_{\mathit{Low}}^{2}+{\sigma}_{\mathit{Range}}^{2}$. The “near” range was assumed to have a mean and standard deviation roughly one third of that of the “far” range. Thus, the distribution of “near” and “far” range were obtained normally distributed as N(7,2^{2}) and N(20,6^{2}) respectively.
Velocity/frequency
Since the reported descriptors in the literature indicate a skewed velocity distribution, we assumed that the velocity distribution can be approximated by a lognormal distribution. In general, the literature reports absolute values of angular velocity [24, 26]. Thus, the extracted parameters correspond to the distribution of absolute value of angular velocity.
To control the velocity direction (up/down), half of the randomly generated samples were randomly selected and their signs were toggled (if y = x then f _{ x }(x) = 0.5(f _{ y }(x) + f _{ y }(−x)) where f _{ x } (x) represents the probability density function of x.). Only half of the samples were toggled to generate a sequence close to a zeromean process and not to introduce any net displacement after the numerical integration. The velocity traces were filtered, i.e. F_{cutoff} = 5 Hz, Butterworth second order in order to keep a sensible spectrum content for simulated realizations [27]. The maximum allowed velocity was set to 300°/s [28]. The velocity traces were numerically integrated and hypothetical limits (−35° to180° mimicking anatomical constraints) were imposed to the output of integration in order to preserve the realism of resulting postures. The mean of the integration output was subtracted from the output to assure that the integration of the velocity traces would result in a zeromean process.
Temporal similarity
A study by Möller et al. [6] reported the mean cycle time in an assembly task to be 216 s. While both higher and lower values for cycle durations have been reported in the literature, we chose the values from this paper [6] as a realistic estimation of cycle time. The standard deviation of the cycle time varied from “small” to “large” values according to Table 1. “Small” standard deviation of cycle time reflects a cyclic task in which the cycles temporal similarity is high whereas “large” values reflect a cyclic task with low temporal similarity. Within an exposure trace, temporal similarity was kept identical between the two exposure realizations.
Exposure variation analysis
Exposure traces were analyzed using the new CEVA (see below), and the results were compared to those from the conventional EVA with predetermined exposure categories using both univariate and multivariate approaches.
Conventional EVA with predetermined exposure categories
EVA categories were constructed using intervals of {<−20 10 10 20 45 75 105 135°<} and {<0.7 1.5 3.1 6.3 s<} along the exposure axis (exposure level categories) and frequency axis (sequence duration categories), respectively. The exposure level categories in the EVA were set arbitrarily to avoid a too coarse or a too fine EVA array, and are similar to the setup in several studies of working postures [12, 20, 29]. The “sequence duration” categories were set using an increasing width of the sequence with increasing average duration as also used in several posture studies [29]. The EVA array of each exposure realization was denoted as {E _{ i } : n × m}, i = 1, 2, with n and m set to 9 and 5, respectively. The marginal distributions of EVA with respect to level and frequency were computed by adding up cell values along both dimensions [14, 15]. Thus, the marginal distribution of EVA had 9 and 5 entries, corresponding to the number of categories. In total, this resulted in a 14dimensional space as the basis for a PCA analysis. Thus, the marginal distribution was denoted as {M _{ i } : 1 × (n + m)}, i = 1, 2.
Clusterbased EVA
A number of methods (e.g. Hartigan and Silhouette statistics) are available for identifying the optimal number of clusters in a general multidimensional space [30]. Conventionally, the optimal number of clusters is defined on the basis of the withincluster dispersion index (sum of squared deviations within clusters) versus the number of clusters [31, 32]. When the number of clusters is increased, the dispersion index first decreases and then flattens markedly. The least number of clusters which leads to a flattened dispersion index defines the optimal number of clusters. “Gap analysis” is a formalized way to implement this procedure. “Gap analysis” can be applied to any clustering method such as Kmeans [33] to obtain the number and centers of optimal clusters [30].
A “Gap analysis” was performed on the exposure level of an arbitrarily selected exposure trace (representative exposure trace) from each of the exposure groups (n = 8) combined with each of the crosscorrelation levels (n = 3), i.e., for 24 representative traces. Subsequently, the following procedure was carried out only once to obtain the optimal cluster centers. Thus, in terms of computational complexity, the following procedure adds just a onetime overhead to the conventional EVA.

i)
Using the gap analysis, the optimal number and centers of clusters was obtained for each of the 24 representative traces.

ii)
Once the centers of clusters were defined for a trace, the samples of the representative exposure traces were appointed to their closest cluster. Each sample (a 2 x1 vector consisting to two individual points across the time axis from the two exposure realizations) of an exposure trace would be appointed to the cluster whose center had the minimum Euclidean distance to that sample.

iii)
Once the steps (i and ii) were done for all eight exposure groups at three crosscorrelation level, the obtained centers of the clusters were compared and nearby clusters were merged into one cluster. Nearby clusters were defined as those having a centertocenter distance less than the 10th percentile of the centertocenter distances of all possible cluster pairs.

iv)
The center of a merged cluster was defined by taking a weighted average of the centers of the constituent clusters. The weighting factor for each cluster was defined by the number of samples appointed to that cluster divided by the total number of samples appointed to the merged cluster. This procedure was repeated for all merged clusters.

v)
The centers of optimal clusters for exposure level were obtained after finishing step (iv). The number of optimal clusters was denoted as n _{ c }. The Euclidean distance of all samples of a representative exposure trace to the optimal cluster centers was computed. The samples of a representative exposure trace were appointed to the cluster with the closest center. Thus, each sample of a representative exposure trace was tagged with its closest cluster. This step was repeated for all representative exposure traces.

vi)
For each of the representative exposure traces in step (v), a sequence of cluster tags was obtained. We registered the duration of uninterrupted period in which the samples of a representative exposure trace tagged with one cluster. This made a sequence including durations of such periods. This was repeated for all representative exposure traces.

vii)
As for exposure level, the gap analysis algorithm was applied to the duration of uninterrupted periods registered in the previous step. The optimal cluster centers along the sequence duration were found (in this case, the clusters are defined in 1D space  centers of an interval). The steps (iiiv) were repeated for sequence duration to find the optimal setup of categories. The number of optimal categories was denoted by m _{ c }.
Thus, the output of CEVA quantifies the proportion of total recording time that an exposure trace remains uninterruptedly close to one of the optimal clusters for a certain period of time. Similar to conventional EVA, this period was partitioned into sequence duration categories in step (vii). The marginal distributions of CEVA were also computed similar to those of conventional EVA and denoted by $\left\{\mathit{Mc}:1\times \left({n}_{c}+{m}_{c}\right)\right\}$.
Statistics
The performances of conventional EVA and CEVA to discriminate between the exposure groups were assessed. The performance of conventional EVA was tested using multivariate and univariate approaches as described below.
Each exposure trace was represented by two sets of 14 variable vectors {M _{ i }} i = 1, 2 expressing the marginal distributions of EVA. In the multivariate approach, these two vectors of the parallel traces were represented by a 28 dimensional vector {M = [M _{1} M _{2}]}. From each exposure group and each level of crosscorrelation, 20 exposure traces were randomly selected and used as a training set. The remaining 10 exposure traces constituted the test set (see below). Thus, the training set consisted of 8 × 3 × 20 exposure traces and the test set consisted of 8 × 3 × 10 exposure traces. The training set mean along each of the dimensions of {M _{ i }} was subtracted from both the training and test sets and the result was normalized to the standard deviation of the training set along the same dimension (taking the zscore without using the information from the test set). A principal component analysis (PCA) was performed on the training set. Each principal component describes a percentage of the total variance in the data set. The least number of principal components leading to a total sum of explained variance above 90% were kept as significant components (denoted as {PC _{ M }}).
In the univariate approach, a similar procedure was performed for the marginal distribution of EVA from each of the exposure realization. Finally, each exposure trace was represented by using the significant components taken from each of the exposure traces {[PC_{M1} PC_{M2}]}. For the CEVA, each exposure trace was represented by a n _{ c } + m _{ c } dimensional vector {Mc}. Thus, PCA was applied on the training set with n _{ c } + m _{ c } dimension.
For all approaches, PCA only applied on the training set and the test set was projected along significant components. This was to avoid the test set from affecting the significant components.
A linear classifier was trained using the significant components from the training set and the projected test set was utilized to validate the performance of the classification. The accuracy of the classification was averaged across 30 repetitions of randomly chosen training and test sets. The accuracy was defined as the percentage of exposure traces in the test data set which were classified to their correct exposure groups. A oneway analysis of variance (ANOVA) was used to assess the accuracy of the classification using the three approaches, i.e., EVA (univariate and multivariate) and CEVA. Tukey's honestly significant difference criterion was applied as a post hoc test in case of the oneway ANOVA showing a significant main effect. The classification accuracy is reported with the mean (standard deviation) of accuracy in percentage. The level of significance was set to P < 0.05.
Results
The accuracy of classifications was 47% (SD = 4.4%), 49% (4.3%) and 52% (4.9%) for the marginal distributions of EVA (multivariate and univariate) and CEVA, respectively. Thus, the CEVA improved the classification accuracy slightly, but significantly (P < 0.001). Figure 2a shows an example of CEVA performed on an exposure trace from an exposure group with high temporal similarity, near range and low velocity. Figure 2b illustrates the location of optimal cluster center in terms of exposure level of realization one and two. Table 2 reports in details the misclassification rates for each exposure group when using the marginal distributions of the three analysis approaches. Note that the gross misclassification rate is equal to the summation of the reported rates in Table 2 divided by number of groups (in this case 8). The most pronounced misclassification for all the three approaches occurred along the aspect of temporal similarity, i.e., the cycle time standard deviation.
The accuracy of the classification was also checked at different levels of crosscorrelation (e.g. low, medium and high) between the exposure realizations. Figure 3 depicts the misclassification rate at different crosscorrelation level. CEVA slightly outperformed both approaches (p < 0.001 where significant). However at low crosscorrelation level, CEVA only outperformed the multivariate approach (49.2 (3.7) % versus 46.4 (3.9) %). The univariate approach outperformed the multivariate approach at the low crosscorrelation level (49.3 (3.8) % versus 46.4 (3.9) %).
Discussion
In this study, we developed a new method, clusterbased exposure variation analysis (CEVA) to analyze exposure variation during repeated cyclic movements. The method is based on the conventional EVA but improved its performance in identifying exposure groups that were known to differ with respect to exposure variation. We found that discriminating between exposures differing only in temporal similarity was the main challenge of all the applied methods.
Simulation
The current simulations imitate the distributional properties of reported exposure traces from different repetitive occupational tasks. Since the simulations are, by necessity, simplified representations of the exposure structure of real work, they will not fully reflect all aspects of human movements during repetitive tasks. For simulating exposure velocities (change rates), lognormal distributions were selected, inspired by kinematic theory of rapid human movement [34, 35], which suggest a logarithmic mathematical model for describing the velocity profile of human ballistic movements. Moreover, the lognormal distribution also matches with a positive skewness (“long tails”) of velocity profiles (10^{th}, 50^{th} and 90^{th} percentiles) often reported in the literature on movements in occupational tasks, see e.g., [36, 37].
Applying Bayesian networks seems, at a first glance, to be a viable approach for estimating the parameter of simulation, but it requires that the observations are delineated by a directed acyclic model [38]. Even with a successful formulating of such a model, this approach will only be applicable to model specific problems and does not provide a general computational framework.
For the ease of simulation, a few assumptions were made that may have introduced some bias. For example, the “low” and “high” exposure levels of each simulated cycle were considered to be independent, and the velocity distributions were assumed to be identical for these two levels. Additionally, the extent of exposure level were in effect smaller in “low” than in “high” velocity groups due to the combined effect of a slower changing exposure level and a similar cycle time duration. As a remedy, a correlation coefficient could be assumed between the “low” level of the cycles and their ranges, and the sensitivity of the performance of the approaches could have been assessed with respect to this parameter as well. Generally, since CEVA is a data driven approach, assessment of its sensitivity to the simulation parameters settings and/or the basic simulation models is of utmost importance. Addressing these issues is an important challenge in future studies of CEVA.
The simulation consisted of two correlated realizations of an exposure trace. This was a measure to decrease the risk of type I error in a statistical inference as a result of multiple univariate analyses [22]. Nevertheless, all three applied methods for analyzing variation were handling the realizations using a multivariate framework (PCA and the linear classifier). Otherwise, a decline in the rate of classification accuracy would have been trivial considering that the classification had been applied for exposure realizations separately (an accurate classification in this case requires two separate classification procedure for each of the exposure realizations and both procedures must identify each of the realizations in the true exposure group).
Our results showed a better performance for the univariate EVA compared with the multivariate EVA at low crosscorrelation between the exposure realizations. This can be explained by the fact that $P{C}_{M}{}_{{}_{1}}$ and $P{C}_{{M}_{2}}$ are not necessarily orthogonal, so that their combination may have described more than 90% of total variance in the data set.
Applying classification analysis on EVA and CEVA outcomes failed to discriminate simulated exposure groups differing in the temporal similarity dimension of exposure variation. Probably, more advanced methods are needed to handle this aspect of exposure variation; for example, recurrent map analysis [39] may provide a basis for performing additional analysis. Future studies are needed, devoted to effective procedures for discrimination of temporal similarity, including validating these procedures on experimental data sets.
CEVA advantages and drawbacks
A general issue in density estimation using constant intervals (categories) is that some intervals may contain very little data while others are well represented in the analyzed data set [40]. To compensate for this imbalance and create a more homogeneous classification, exposure levels with sparse presence of data should be represented by a wide interval of density estimation and vice versa. Ignoring this fact may mask possible contrasts between different exposure groups. In occupational literature using conventional EVA, exposure level categories have not been constructed with this purpose in mind [12, 13, 29, 41], probably due to a concern for exploring exposure at suspected more hazardous levels rather than to optimize statistical performance [17]. However, a datadriven optimal exposure classification may reveal subtle changes in an exposure pattern, for instance due to an intervention, that will be left undetected by conventional methods [42]. The detection of such subtle changes may be relevant in studies of WMSD [8]. In particular for occupational work at low intensities and/or with repeated operations, minute changes in the exposure time pattern may be relevant. This wish to detect small changes in exposure, which would call for optimized data analysis procedures, is at conflict with the wish to keep the EVA layout constant across and within subjects so that inter and intrasubject comparisons are possible. If data analysis is guided by the latter aim, EVA optimality cannot be achieved for all subjects and exposure traces. Securing commonality between subjects, e.g., in intervention studies, while still optimizing the classification performance of an analysis of exposure variation is a challenge for further research. A similar point can be raised if the results of CEVA are compared between different studies because the optimal cluster centers may not match between the studies. However, if a major goal is to compare different groups or conditions, the gap analysis can be done for one of the conditions first, and the resulting cluster centers kept identical across the rest of the conditions.
Extending the number of representative exposure traces may improve the optimality of the method because it will improve the identification of the underlying exposure structure, but it may also result in lower generalizability of CEVA as it may lead to overfitting of the clusters to this particular structure.
In cases where exposure data are obtained from different sources in parallel, for instance in our simulations of one exposure trace comprising two exposure realizations, CEVA provides one single outcome describing overall exposure variation whereas the conventional EVA would provide one result per source of data. This is an attractive property if data from different sources are correlated because a biased statistical analysis may be avoided by using a multivariate approach [43].
PCA was performed to identify a few components describing most of the variability in the data set. Thus, the classification was performed in a space, which had a lower dimension than the original one. An alternative approach could have been to implement a hierarchical regression model to assess exposure level, frequency and duration simultaneously, as suggested in a study by Jansen et al. [20].
In addition, the ratio between the number of observations and the number of variables is most likely larger in CEVA than in a multivariate EVA, i.e., number of exposure traces in the training set divided by dimensionality of CEVA and multivariate EVA marginal distributions ,i.e., (n _{ c } + m _{ c } ) and 2(n + m), respectively. Thus, the PCA analysis of the output of CEVA leads to more reliable outputs [44].
Conclusion
The present study compared the abilities of a newly developed method, CEVA, and the conventional EVA to discriminate patterns of exposure variation in simulated cyclic movements. The CEVA has specific properties such as (i) a datadriven optimality of data reduction and (ii) the capability of handling multiple exposure time lines in one comprehensive analysis. The CEVA slightly outperformed the conventional EVA in discriminating simulated exposure traces known to differ in variation, but both methods failed to detect differences in temporal similarity. The developed approach is promising in the sense that it furnishes a framework for assessing exposure variation, which is considered to be relevant in relation to development of workrelated musculoskeletal disorders. Further research is, however, needed to develop methods that can sufficiently capture the similarity aspect of exposure variation and disentangle the tradeoff between maximizing the information retained in the exposure data resulting from using an EVAinspired approach and classifying exposure in a standardized setup of categories assumed to be indicative of risks for musculoskeletal disorders.
Authors’ information
AS has a background in biomedical engineering and science and now he is working as an assistant professor at the department of health science and technology in Aalborg University in Denmark. PM is the head of research interest group “Physical Activity and Human Performance” and director of the laboratory for Ergonomics and Workrelated Disorders at the same institution. SEM is the research director of the Centre for Musculoskeletal research at the University of Gävle, Sweden.
Abbreviations
 ANOVA:

Analysis of variance
 CEVA:

Clusterbased exposure variation analysis
 EVA:

Exposure variation analysis
 PCA:

Principal component analysis
 WMSD:

Workrelated musculoskeletal disorder.
References
 1.
Bevan S, Quadrello T, McGee R, Mahdon M, Vavrovsky A, Barham L: Fit for work? Musculoskeletal disorders in the European workforce. 2009, London: The Work Foundation, 2009
 2.
Madeleine P: On functional motor adaptations: from the quantification of motor strategies to the prevention of musculoskeletal disorders in the neck–shoulder region. Acta Physiologica. 2010, 199: 146.
 3.
Sommerich CM, McGlothlin JD, Marras WS: Occupational risk factors associated with soft tissue disorders of the shoulder: a review of recent investigations in the literature. Ergonomics. 1993, 36 (6): 697717. 10.1080/00140139308967931.
 4.
Wells R, Mathiassen SE, Medbo L, Winkel J: Time—a key issue for musculoskeletal health and manufacturing. Appl Ergon. 2007, 38 (6): 733744. 10.1016/j.apergo.2006.12.003.
 5.
Madeleine P, Lundager B, Voigt M, ArendtNielsen L: Standardized lowload repetitive work: evidence of different motor control strategies between experienced workers and a reference group. Appl Ergon. 2003, 34 (6): 533542. 10.1016/S00036870(03)000838.
 6.
Möller T, Mathiassen SE, Franzon H, Kihlberg S: Job enlargement and mechanical exposure variability in cyclic assembly work. Ergonomics. 2004, 47 (1): 1940. 10.1080/0014013032000121651.
 7.
Mathiassen SE: Diversity and variation in biomechanical exposure: What is it, and why would we like to know?. Appl Ergon. 2006, 37 (4): 419427. 10.1016/j.apergo.2006.04.006.
 8.
Punnett L, Wegman DH: Workrelated musculoskeletal disorders: the epidemiologic evidence and the debate. J Electromyogr Kinesiol. 2004, 14 (1): 1323. 10.1016/j.jelekin.2003.09.015.
 9.
Winkel J, Mathiassen SE: Assessment of physical work load in epidemiologic studies: concepts, issues and operational considerations. Ergonomics. 1994, 37 (6): 979988. 10.1080/00140139408963711.
 10.
van der Beek AJ, FringsDresen MH: Assessment of mechanical exposure in ergonomic epidemiology. Occup Environ Med. 1998, 55 (5): 29110.1136/oem.55.5.291.
 11.
Mathiassen SE, Winkel J: Quantifying variation in physical load using exposurevstime data. Ergonomics. 1991, 34 (12): 14551468. 10.1080/00140139108964889.
 12.
Bao S, Mathiassen SE, Winkel J: Ergonomic effects of a managementbased rationalization in assembly work—a case study. Appl Ergon. 1996, 27 (2): 8999. 10.1016/00036870(95)000631.
 13.
Straker LM, Coleman J, Skoss R, Maslen BA, BurgessLimerick R, Pollock CM: A comparison of posture and muscle activity during tablet computer, desktop computer and paper use by young children. Ergonomics. 2008, 51 (4): 540555. 10.1080/00140130701711000.
 14.
Mathiassen SE, Winkel J: Physiological comparison of three interventions in light assembly work: reduced work pace, increased break allowance and shortened working days. Int Arch Occup Environ Health. 1996, 68 (2): 94108. 10.1007/BF00381241.
 15.
FjellmanWiklund A, Grip H, Karlsson JS, Sundelin G: EMG trapezius muscle activity pattern in string players: Part I—is there variability in the playing technique?. Int J Ind Ergonomics. 2004, 33 (4): 347356. 10.1016/j.ergon.2003.10.007.
 16.
Samani A, Holtermann A, Søgaard K, Madeleine P: Active pauses induces more variable electromyographic pattern of the trapezius muscle activity during computer work. J Electromyogr Kinesiol. 2009, 19 (6): e430437. 10.1016/j.jelekin.2008.11.011.
 17.
Anton D, Cook TM, Rosecrance JC, Merlino LA: Method for quantitatively assessing physical risk factors during variable noncyclic work. Scand J Work Environ Health. 2003, 29 (5): 354362. 10.5271/sjweh.742.
 18.
Larivière C, Delisle A, Plamondon A: The effect of sampling frequency on EMG measures of occupational mechanical exposure. J Electromyogr Kinesiol. 2005, 15 (2): 200209. 10.1016/j.jelekin.2004.08.009.
 19.
Ciccarelli M, Straker L, Mathiassen SE, Pollock C: Variation in muscle activity among office workers when using different information technologies at work and awayfromwork. Human factors and ergonomics. In press
 20.
Jansen JP, Burdorf A, Steyerberg E: A novel approach for evaluating level, frequency and duration of lumbar posture simultaneously during work. Scand J Work Environ Health. 2001, 27 (6): 373380. 10.5271/sjweh.629.
 21.
Mathiassen SE: PhD. thesis. Variation in Shoulderneck Activity  Physiological, Psychophysical and Methodological Studies of Isometric Exercise and Light Assembly Work. 1993, Stockholm, Sweden: Karolinska Institute
 22.
Leary MR, Altmaier EM: Type I error in counseling research: A plea for multivariate analyses. J Couns Psychol. 1980, 27 (6): 611
 23.
Bernard BP: Musculoskeletal disorders and workplace factors. NIOSH Publication: second ed. US Department of Health and Human Services, Cincinnati, OH. 1997, 1: 97141.
 24.
Hansson GÅ, Arvidsson I, Ohlsson K, Nordander C, Mathiassen SE, Skerfving S, Balogh I: Precision of measurements of physical workload during standardised manual handling. Part II: Inclinometry of head, upper back, neck and upper arms. J Electromyogr Kinesiol. 2006, 16 (2): 125136. 10.1016/j.jelekin.2005.06.009.
 25.
Iman RL, Conover W: A distributionfree approach to inducing rank correlation among input variables. Communications in StatisticsSimulation and Computation. 1982, 11 (3): 311334. 10.1080/03610918208812265.
 26.
Arvidsson I, Hansson GÅ, Mathiassen SE, Skerfving S: Changes in physical workload with implementation of mousebased information technology in air traffic control. Int J Ind Ergonomics. 2006, 36 (7): 613622. 10.1016/j.ergon.2006.03.002.
 27.
Winter DA: Biomechanics and motor control of human movement. 2009, New York: John Wiley & Sons Inc
 28.
Veiersted KB, Gould KS, Østerås N, Hansson GÅ: Effect of an intervention addressing working technique on the biomechanical load of the neck and shoulders among hairdressers. Appl Ergon. 2008, 39 (2): 183190. 10.1016/j.apergo.2007.05.007.
 29.
Plamondon A, Delisle A, Larue C, Brouillette D, McFadden D, Desjardins P, Lariviere C: Evaluation of a hybrid system for threedimensional measurement of trunk posture in motion. Appl Ergon. 2007, 38 (6): 697712. 10.1016/j.apergo.2006.12.006.
 30.
Tibshirani R, Walther G, Hastie T: Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc Series B Stat Methodol. 2001, 63 (2): 411423. 10.1111/14679868.00293.
 31.
Geva AB, Kerem DH: Forecasting generalized epileptic seizures from the EEG signal by wavelet analysis and dynamic unsupervised fuzzy clustering. IEEE Trans Biomed Eng. 1998, 45 (10): 12051216. 10.1109/10.720198.
 32.
Nieminen H, Niemi J, Takala EP, ViikariJuntura E: Loadsharing patterns in the shoulder during isometric flexion tasks. J Biomech. 1995, 28 (5): 555566. 10.1016/00219290(94)00107F.
 33.
Seber GAF: Multivariate observations. 1984, New York: Wiley Online Library
 34.
Plamondon R: A kinematic theory of rapid human movements. Biol Cybern. 1995, 72 (4): 295307. 10.1007/BF00202785.
 35.
Plamondon R: A kinematic theory of rapid human movements: Part III. Kinetic outcomes. Biol Cybern. 1998, 78 (2): 133145. 10.1007/s004220050420.
 36.
Hansson GÅ, Balogh I, Ohlsson K, Granqvist L, Nordander C, Arvidsson I, Åkesson I, Unge J, Rittner R, Strömberg U: Physical workload in various types of work: Part II. Neck, shoulder and upper arm. Int J Ind. Ergonomics. 2010, 40 (3): 267281.
 37.
Wahlström J, Mathiassen SE, Liv P, Hedlund P, Ahlgren C, Forsman M: Upper arm postures and movements in female hairdressers across four full working days. Ann Occup Hyg. 2010, 54 (5): 584594. 10.1093/annhyg/meq028.
 38.
Wu D, Butz C: On the complexity of probabilistic inference in singly connected bayesian networks. Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing Lecture Notes in Computer Science. 2005, 3641: 581590. 10.1007/11548669_60.
 39.
Webber CL, Zbilut JP: Tutorials in contemporary nonlinear methods for the behavioral sciences. Recurrence quantification analysis of nonlinear dynamical systems. 2005, 2694.
 40.
Bowman AW, Azzalini A: Applied smoothing techniques for data analysis: the kernel approach with SPlus illustrations. 1997, USA: Oxford University Press
 41.
Ciccarelli M, Straker L, Mathiassen SE, Pollock C: ITKids Part II: Variation of postures and muscle activity in children using different information and communication technologies. Work: A Journal of Prevention, Assessment and Rehabilitation. 2011, 38 (4): 413427.
 42.
Samani A, Holtermann A, Søgaard K, Madeleine P: Following ergonomics guidelines decreases physical and cardiovascular workload during cleaning tasks. Ergonomics. 2012, 55 (3): 295307. 10.1080/00140139.2011.640945.
 43.
van Dieën JH, Koppes LLJ, Twisk JWR: Postural sway parameters in seated balancing; their reliability and relationship with balancing performance. Gait Posture. 2010, 31 (1): 4246. 10.1016/j.gaitpost.2009.08.242.
 44.
Hair JF, Anderson RE, Tatham RL, Black WC: Multivariate data analysis: 5th ed. 1998, Englewood Cliffs, NJ: Prentice Hall
Prepublication history
The prepublication history for this paper can be accessed here:http://0www.biomedcentral.com.brum.beds.ac.uk/14712288/13/54/prepub
Acknowledgements
This work was financially supported by the Danish Council for Independent Research  Technology and Production Sciences (FTP) Grant number: 10092821.
Author information
Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
All authors made substantial contributions to the conceptualization and writing of this manuscript. AS has implemented the method and is in charge of the integrity of the results. All authors read and approved the final manuscript.
Svend Erik Mathiassen and Pascal Madeleine contributed equally to this work.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Samani, A., Mathiassen, S.E. & Madeleine, P. Clusterbased exposure variation analysis. BMC Med Res Methodol 13, 54 (2013). https://0doiorg.brum.beds.ac.uk/10.1186/147122881354
Received:
Accepted:
Published:
DOI: https://0doiorg.brum.beds.ac.uk/10.1186/147122881354
Keywords
 Ergonomics
 Physical work load
 Linear discriminant analysis
 Workrelated musculoskeletal disorders
 Principle component analysis