Inter-rater agreement and reliability of the COSMIN (COnsensus-based Standards for the selection of health status Measurement Instruments) Checklist

Table 3 Inter-rater agreement (percentage agreement) and reliability (kappa coefficients) of the items from the COSMIN checklist (COSMIN step 4)

Item nr	Item	N (minus articles with 1 rating)^a	% agreement	N	Kappa
Generalisability Box (n = 866) ^b		^c
Was the sample in which the HR-PRO instruments was evaluated adequately described? In terms of:
1^d	median or mean age (with standard deviation or range)?	733	86	865	0.36
2^d	distribution of sex?	735	88	863	0.38^e
3	important disease characteristics (e.g. severity, status, duration) and description of treatment?	746	80	862	0.39^f
4^d	setting(s) in which the study was conducted? e.g. general population, primary care or hospital/rehabilitation care	735	89	863	0.30^e
5^d	countries in which the study was conducted?	733	90	861	0.40^e
6^d	language in which the HR-PRO instrument was evaluated?	733	86	861	0.41^e
7^d	Was the method used to select patients adequately described? e.g. convenience, consecutive, or random	729	81	857	0.40
8	Was the percentage of missing responses (response rate) acceptable?	724	82	849	0.48

^a When calculating percentage agreement, articles that were only scored once on the particular item were not taken into account;^b number of times a box was evaluated;^c sample sizes of Generalisability box are much higher that other items, because scores of the items on the Generalisability box for all measurement properties were combined;^d dichotomous item;^e Items with low dispersal i.e. more than 75% of the raters who responded to an item rated the same response category;^f Combined kappa coefficient calculated because of nominal response scale in a one-way design; printed in bold indicates Kappa > 0.70 or % agreement >80%.

ISSN: 1471-2288