Skip to main content

Table 1 Data Quality Dimensions and Domains

From: Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R

Name Dimension Domain

Definition

Primary reference objects to detect data quality issues

Primary reporting metrics of indicators

Integrity

The degree to which the data conforms to structural and technical requirements.

  

 Structural data set error

The observed structure of a data set differs from the expected structure.

Data elements, data records

N

 Relational data set error

The observed correspondence between different data sets differs from the expected correspondence.

Data sets

N

 Value format error

The technical representation of data values within a data set does not conform to the expected representation.

Data fields

N, %

Completeness

The degree to which expected data values are present.

  

 Crude missingness

Metrics of missing data values that ignore the underlying reasons for missing data.

Data fields

N,%

 Qualified missingness

Metrics of missing data values that use reasons underlying missing data.

Data fields, data elements, data record

N,%

Consistency

Consistency

  

 Range and value violations

Observed data values do not comply with admissible data values or value ranges.

Data fields

N,%

 Contradictions

Observed data values appear in impossible or improbable combinations.

Data fields

N,%

Accuracy

The degree of agreement between observed and expected distributions and associations.

  

 Unexpected distributions

Observed distributional characteristics differ from expected distributional characteristics.

Data elements, data records

Diverse statistical measuresa

 Unexpected associations

Observed associations differ from expected associations.

Data elements, data records

Diverse statistical measuresa

 Disagreement of repeated measurements

Disagreement between repeated measurements of the same or similar objects under specified conditions.

Data elements, data records

Diverse statistical measuresa

  1. N: number of issues; %: the percentage of issues relative to the number of assessed elements in a data structure
  2. a A wide range of statistical metrics may apply such as location, scale or shape parameters, correlation coefficients, measures of agreement