Box A Internal consistency (n = 195)b
|
A1
|
Does the scale consist of effect indicators, i.e. is it based on a reflective model?
|
185
|
82
|
193
|
0.06
|
Design requirements
|
A2c
|
Was the percentage of missing items given?
|
183
|
87
|
190
|
0.48
|
A3c
|
Was there a description of how missing items were handled?
|
180
|
90
|
187
|
0.54
|
A4
|
Was the sample size included in the internal consistency analysis adequate?
|
177
|
87
|
185
|
0.06d
|
A5c
|
Was the unidimensionality of the scale checked? i.e. was factor analysis or IRT model applied?
|
180
|
92
|
187
|
0.69
|
A6
|
Was the sample size included in the unidimensionality analysis adequate?
|
166
|
79
|
178
|
0.27
|
A7
|
Was an internal consistency statistic calculated for each (unidimensional) (sub)scale separately?
|
179
|
85
|
187
|
0.31d
|
A8c
|
Were there any important flaws in the design or methods of the study?
|
174
|
86
|
179
|
0.22d
|
Statistical methods
|
A9
|
for Classical Test Theory (CTT): Was Cronbach's alpha calculated?
|
179
|
93
|
187
|
0.27d,e
|
A10
|
for dichotomous scores: Was Cronbach's alpha or KR-20 calculated?
|
151
|
91
|
165
|
0.17d,e
|
A11
|
for IRT: Was a goodness of fit statistic at a global level calculated? e.g. χ2, reliability coefficient of estimated latent trait value (index of (subject or item) separation)
|
154
|
93
|
167
|
0.46d,e
|
Box B. Reliability (n = 141)
b
|
Design requirements
|
B1c
|
Was the percentage of missing items given?
|
129
|
87
|
140
|
0.39
|
B2c
|
Was there a description of how missing items were handled?
|
125
|
91
|
137
|
0.43d
|
B3
|
Was the sample size included in the analysis adequate?
|
127
|
77
|
139
|
0.35
|
B4c
|
Were at least two measurements available?
|
129
|
98
|
140
|
0.72
d
|
B5
|
Were the administrations independent?
|
129
|
73
|
139
|
0.18
|
B6c
|
Was the time interval stated?
|
125
|
94
|
136
|
0.50d
|
B7
|
Were patients stable in the interim period on the construct to be measured?
|
126
|
75
|
138
|
0.24
|
B8
|
Was the time interval appropriate?
|
125
|
84
|
137
|
0.45
|
B9
|
Were the test conditions similar for both measurements? e.g. type of administration, environment, instructions
|
127
|
83
|
138
|
0.30
|
B10c
|
Were there any important flaws in the design or methods of the study?
|
117
|
77
|
129
|
0.08
|
Statistical methods
|
B11
|
for continuous scores: Was an intraclass correlation coefficient (ICC) calculated?
|
119
|
86
|
133
|
0.59e
|
B12
|
for dichotomous/nominal/ordinal scores: Was kappa calculated?
|
111
|
81
|
127
|
0.32e
|
B13
|
for ordinal scores: Was a weighted kappa calculated?
|
111
|
83
|
127
|
0.42e
|
B14
|
for ordinal scores: Was the weighting scheme described? e.g. linear, quadratic
|
108
|
81
|
124
|
0.35e
|
Box D. Content validity (n = 83)
b
|
Design requirements
|
D1
|
Was there an assessment of whether all items refer to relevant aspects of the construct to be measured?
|
62
|
79
|
83
|
0.33
|
D2
|
Was there an assessment of whether all items are relevant for the study population? (e.g. age, gender, disease characteristics, country, setting)
|
62
|
76
|
83
|
0.46
|
D3
|
Was there an assessment of whether all items are relevant for the purpose of the measurement instrument? (discriminative, evaluative, and/or predictive)
|
62
|
66
|
83
|
0.21
|
D4
|
Was there an assessment of whether all items together comprehensively reflect the construct to be measured?
|
62
|
66
|
83
|
0.15
|
D5c
|
Were there any important flaws in the design or methods of the study?
|
58
|
76
|
78
|
0.13
|
Box E. Structural validity (n = 118)
b
|
E1
|
Does the scale consist of effect indicators, i.e. is it based on a reflective model?
|
99
|
78
|
116
|
0f
|
Design requirements
|
E2c
|
Was the percentage of missing items given?
|
95
|
87
|
110
|
0.41
|
E3c
|
Was there a description of how missing items were handled?
|
93
|
91
|
109
|
0.55
|
E4
|
Was the sample size included in the analysis adequate?
|
94
|
87
|
109
|
0.56d
|
E5c
|
Were there any important flaws in the design or methods of the study?
|
89
|
84
|
103
|
0.27
|
Statistical methods
|
E6
|
for CTT: Was exploratory or confirmatory factor analysis performed?
|
92
|
90
|
106
|
0.51d,e
|
E7
|
for IRT: Were IRT tests for determining the (uni-) dimensionality of the items performed?
|
62
|
87
|
80
|
0.39e,f
|
Box F. Hypotheses testing (n = 170)
b
|
Design requirements
|
F1c
|
Was the percentage of missing items given?
|
158
|
87
|
168
|
0.41
|
F2c
|
Was there a description of how missing items were handled?
|
159
|
92
|
169
|
0.60d
|
F3
|
Was the sample size included in the analysis adequate?
|
157
|
84
|
167
|
0.12d
|
F4
|
Were hypotheses regarding correlations or mean differences formulated a priori (i.e. before data collection)?
|
158
|
74
|
168
|
0.42
|
F5
|
Was the expected direction of correlations or mean differences included in the hypotheses?
|
159
|
75
|
169
|
0.26e
|
F6
|
Was the expected absolute or relative magnitude of correlations or mean differences included in the hypotheses?
|
159
|
82
|
168
|
0.29e
|
F7c
|
for convergent validity: Was an adequate description provided of the comparator instrument(s)?
|
125
|
83
|
136
|
0.30
|
F8c
|
for convergent validity: Were the measurement properties of the comparator instrument(s) adequately described?
|
124
|
81
|
135
|
0.35
|
F9c
|
Were there any important flaws in the design or methods of the study?
|
131
|
81
|
145
|
0.17
|
Statistical methods
|
F10
|
Were design and statistical methods adequate for the hypotheses to be tested?
|
150
|
78
|
161
|
0.00d,e,f
|
Box G. Cross-cultural validity (n = 33)
b
|
Design requirements
|
G1c
|
Was the percentage of missing items given?
|
25
|
88
|
32
|
0.52
|
G2c
|
Was there a description of how missing items were handled?
|
22
|
82
|
30
|
0.32
|
G3
|
Was the sample size included in the analysis adequate?
|
26
|
81
|
33
|
0.23
|
G4c
|
Were both the original language in which the HR-PRO instrument was developed, and the language in which the HR-PRO instrument was translated described?
|
28
|
89
|
33
|
0.34d
|
G5c
|
Was the expertise of the people involved in the translation process adequately described? e.g. expertise in the disease(s) involved, expertise in the construct to be measured, expertise in both languages
|
28
|
86
|
33
|
0.46
|
G6
|
Did the translators work independently from each other?
|
28
|
89
|
33
|
0.61
|
G7
|
Were items translated forward and backward?
|
28
|
100
|
33
|
1.00
|
G8c
|
Was there an adequate description of how differences between the original and translated versions were resolved?
|
28
|
86
|
33
|
0.50
|
G9c
|
Was the translation reviewed by a committee (e.g. original developers)?
|
25
|
88
|
31
|
0.56
|
G10c
|
Was the HR-PRO instrument pre-tested (e.g. cognitive interviews) to check interpretation, cultural relevance of the translation, and ease of comprehension?
|
21
|
90
|
29
|
0.61
|
G11c
|
Was the sample used in the pre-test adequately described?
|
28
|
79
|
32
|
0f
|
G12
|
Were the samples similar for all characteristics except language and/or cultural background?
|
26
|
81
|
31
|
0.41
|
G13c
|
Were there any important flaws in the design or methods of the study?
|
26
|
85
|
31
|
0.42
|
Statistical methods
|
G14
|
for CTT: Was confirmatory factor analysis performed?
|
27
|
74
|
32
|
0.03e,f
|
G15
|
for IRT: Was differential item function (DIF) between language groups assessed?
|
13
|
77
|
23
|
0.28e,f
|
Box H. Criterion validity (n = 57)
b
|
Design requirements
|
H1c
|
Was the percentage of missing items given?
|
35
|
91
|
56
|
0.59d
|
H2c
|
Was there a description of how missing items were handled?
|
35
|
97
|
56
|
0.79
d
|
H3
|
Was the sample size included in the analysis adequate?
|
35
|
69
|
54
|
0.06
|
H4
|
Can the criterion used or employed be considered as a reasonable 'gold standard'?
|
37
|
62
|
57
|
0f
|
H5c
|
Were there any important flaws in the design or methods of the study?
|
33
|
79
|
54
|
0.10
|
Statistical methods
|
H6
|
for continuous scores: Were correlations, or the area under the receiver operating curve calculated?
|
37
|
78
|
56
|
0.16e
|
H7
|
for dichotomous scores: Were sensitivity and specificity determined?
|
29
|
83
|
47
|
0.28e,f
|
Box I. Responsiviness (n = 79)
b
|
Design requirements
|
I1c
|
Was the percentage of missing items given?
|
71
|
82
|
76
|
0.14d
|
I2c
|
Was there a description of how missing items were handled?
|
73
|
92
|
77
|
0.36d
|
I3
|
Was the sample size included in the analysis adequate?
|
72
|
72
|
76
|
0.40
|
I4c
|
Was a longitudinal design with at least two measurement used?
|
73
|
100
|
78
|
1.00
d
|
I5c
|
Was the time interval stated?
|
73
|
89
|
78
|
0.25d
|
I6c
|
If anything occurred in the interim period (e.g. intervention, other relevant events), was it adequately described?
|
72
|
78
|
75
|
0.17
|
I7c
|
Was a proportion of the patients changed (i.e. improvement or deterioration)?
|
70
|
97
|
73
|
0.32d
|
Design requirements for hypotheses testing
|
For constructs for which a gold standard was not available
|
I8
|
Were hypotheses about changes in scores formulated a priori (i.e. before data collection)?
|
65
|
69
|
72
|
0.35
|
I9
|
Was the expected direction of correlations or mean differences of the change scores of HR-PRO instruments included in these hypotheses?
|
60
|
78
|
65
|
0.19e
|
I10
|
Were the expected absolute or relative magnitude of correlations or mean differences of the change scores of HR-PRO instruments included in these hypotheses?
|
61
|
90
|
66
|
0.05d,e
|
I11c
|
Was an adequate description provided of the comparator instrument(s)?
|
56
|
70
|
63
|
0f
|
I12c
|
Were the measurement properties of the comparator instrument(s) adequately described?
|
56
|
80
|
63
|
0.06
|
I13c
|
Were there any important flaws in the design or methods of the study?
|
63
|
71
|
68
|
0.03
|
Statistical methods
|
I14
|
Were design and statistical methods adequate for the hypotheses to be tested?
|
63
|
73
|
67
|
0.21e,f
|
Design requirements for comparison to a gold standard
|
For constructs for which a gold standards was available:
|
I15
|
Can the criterion for change be considered as a reasonable 'gold standard'?
|
21
|
67
|
28
|
0f
|
I16c
|
Were there any important flaws in the design or methods of the study?
|
12
|
67
|
21
|
0f
|
Statistical methods
|
I17
|
for continuous scores: Were correlations between change scores, or the area under the Receiver Operator Curve (ROC) curve calculated?
|
28
|
79
|
39
|
0.47e,f
|
I18
|
for dichotomous scales: Were sensitivity and specificity (changed versus not changed) determined?
|
28
|
79
|
37
|
0.15e
|
Box J. Interpretability (n = 42)
b
|
J1c
|
Was the percentage of missing items given?
|
22
|
95
|
41
|
0.80
|
J2c
|
Was there a description of how missing items were handled?
|
21
|
76
|
41
|
0.19
|
J3
|
Was the sample size included in the analysis adequate?
|
23
|
74
|
41
|
0f
|
J4c
|
Was the distribution of the (total) scores in the study sample described?
|
23
|
74
|
41
|
0.08
|
J5c
|
Was the percentage of the respondents who had the lowest possible (total) score described?
|
20
|
95
|
40
|
0.84
|
J6c
|
Was the percentage of the respondents who had the highest possible (total) score described?
|
21
|
90
|
41
|
0.70
|
J7c
|
Were scores and change scores (i.e. means and SD) presented for relevant (sub) groups? e.g. for normative groups, subgroups of patients, or the general population
|
21
|
76
|
41
|
0.05
|
J8c
|
Was the minimal important change (MIC) or the minimal important difference (MID) determined?
|
19
|
89
|
40
|
0.26d
|
J9c
|
Were there any important flaws in the design or methods of the study?
|
21
|
71
|
41
|
0f
|