 Four strategies can be employed for identifying PE patients in administrative data. Strategy (A) uses ICD10 codes to identify PE patients and employs no verification methods. The assumed validity represents how statistical values would appear to an investigator who used our database and assumed correctness of ICD10 codes. The actual validity demonstrates the true statistical analysis of our database, reflecting the coding errors that we identified. The investigator using a strategy lacking ICD10 code verification would unknowingly miss false positives and negatives in our database. Strategy (B) uses ICD10 codes, with the additional step of identifying the false negative population and moving them to the PEpositive population. The assumed validity represents the statistical values when the investigator assumes that the strategy has captured all PEs in the data, and that all patients were correctly assigned a PE ICD10 code. The actual validity demonstrates the true statistical values of the same strategy; an investigator would unknowingly miss false positives in the data set. Strategy (C) uses ICD10 codes, with the additional step of identifying the false positive PE patients and moving them to the PEnegative group. The assumed validity represents the statistical values when the investigator assumes that the strategy has removed all patients that were incorrectly assigned a PE diagnostic code; in this case they assume that there are no PE patients who were missed because they are not assigned a PE diagnostic code. The actual validity demonstrates the true statistical values of the same strategy; an investigator assuming all patients diagnosed with PE were assigned the appropriate ICD10 code for PE would unknowingly miss false negatives in the data set. Strategy (D) uses ICD10 codes to identify PE patients, and takes the further steps to identify the false positive and negative populations, moving them to the PEnegative and PEpositive populations, respectively; this strategy ensures that all patients’ true diagnoses are known

SN sensitivity, SP specificity, PPV positive predictive value, NPV negative predictive value