Text mining of electronic health records can validate a register-based diagnosis of epilepsy and subgroup into focal and generalized epilepsy

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Text mining of electronic health records can validate a register-based diagnosis of epilepsy and subgroup into focal and generalized epilepsy. / Vulpius, Siri A.; Werge, Sebastian; Jørgensen, Isabella Friis; Siggaard, Troels; Hernansanz Biel, Jorge; Knudsen, Gitte M.; Brunak, Søren; Pinborg, Lars H.

In: Epilepsia, Vol. 64, No. 10, 2024, p. 2750-2760.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Vulpius, SA, Werge, S, Jørgensen, IF, Siggaard, T, Hernansanz Biel, J, Knudsen, GM, Brunak, S & Pinborg, LH 2024, 'Text mining of electronic health records can validate a register-based diagnosis of epilepsy and subgroup into focal and generalized epilepsy', Epilepsia, vol. 64, no. 10, pp. 2750-2760. https://doi.org/10.1111/epi.17734

APA

Vulpius, S. A., Werge, S., Jørgensen, I. F., Siggaard, T., Hernansanz Biel, J., Knudsen, G. M., Brunak, S., & Pinborg, L. H. (2024). Text mining of electronic health records can validate a register-based diagnosis of epilepsy and subgroup into focal and generalized epilepsy. Epilepsia, 64(10), 2750-2760. https://doi.org/10.1111/epi.17734

Vancouver

Vulpius SA, Werge S, Jørgensen IF, Siggaard T, Hernansanz Biel J, Knudsen GM et al. Text mining of electronic health records can validate a register-based diagnosis of epilepsy and subgroup into focal and generalized epilepsy. Epilepsia. 2024;64(10):2750-2760. https://doi.org/10.1111/epi.17734

Author

Vulpius, Siri A. ; Werge, Sebastian ; Jørgensen, Isabella Friis ; Siggaard, Troels ; Hernansanz Biel, Jorge ; Knudsen, Gitte M. ; Brunak, Søren ; Pinborg, Lars H. / Text mining of electronic health records can validate a register-based diagnosis of epilepsy and subgroup into focal and generalized epilepsy. In: Epilepsia. 2024 ; Vol. 64, No. 10. pp. 2750-2760.

Bibtex

@article{6654f45775904b92898a7e21283e5211,
title = "Text mining of electronic health records can validate a register-based diagnosis of epilepsy and subgroup into focal and generalized epilepsy",
abstract = "Objective: Combining population-based health registries and electronic health records offers the opportunity to create large, phenotypically detailed patient cohorts of high quality. In this study, we used text mining of clinical notes to confirm International Classification of Diseases, 10th Revision (ICD-10)-registered epilepsy diagnoses and classify patients according to focal and generalized epilepsy types. Methods: Using the Danish National Patient Registry, we identified patients who between 2006 and 2016 received an ICD-10 diagnosis of epilepsy. To validate the epilepsy diagnosis and stratify patients into focal and generalized epilepsy types, we constructed dictionaries for text mining-based extraction of clinical notes. Two physicians manually reviewed the clinical notes for a total of 527 patients and assigned epilepsy diagnoses, which were compared with the text-mined diagnoses. Results: We identified 23 632 patients with an ICD-10 diagnosis of epilepsy, of whom 50% were registered with an unspecified epilepsy diagnosis. In total, 11 211 patients were considered likely to have epilepsy by text mining, with an F1 measure ranging from 82% to 90%. Manual review of the electronic health records for 310 patients revealed a false discovery rate of 29%. This rate was decreased to 4% by the text mining algorithm. The weighted average F1 measure for text mining-assigned epilepsy types was 79% (82% for focal and 76% for generalized epilepsy). Text mining successfully assigned a focal or generalized epilepsy type to 92% of the text mining-eligible patients registered with unspecified epilepsy. Significance: Text mining of electronic health records can be used to establish a patient cohort with much higher likelihood of having a diagnosis of epilepsy and a focal or generalized epilepsy type compared to the cohort created from ICD-10 epilepsy codes alone. We believe the concept will be essential for future genome-wide and phenome-wide association studies and subsequently the development of precision medicine for epilepsy patients.",
keywords = "electronic health records, entity recognition, epidemiology, epilepsy phenotypes, text mining",
author = "Vulpius, {Siri A.} and Sebastian Werge and J{\o}rgensen, {Isabella Friis} and Troels Siggaard and Jorge Hernansanz Biel and Knudsen, {Gitte M.} and S{\o}ren Brunak and Pinborg, {Lars H.}",
note = "Publisher Copyright: {\textcopyright} 2023 The Authors. Epilepsia published by Wiley Periodicals LLC on behalf of International League Against Epilepsy.",
year = "2024",
doi = "10.1111/epi.17734",
language = "English",
volume = "64",
pages = "2750--2760",
journal = "Epilepsia",
issn = "0013-9580",
publisher = "Wiley-Blackwell",
number = "10",

}

RIS

TY - JOUR

T1 - Text mining of electronic health records can validate a register-based diagnosis of epilepsy and subgroup into focal and generalized epilepsy

AU - Vulpius, Siri A.

AU - Werge, Sebastian

AU - Jørgensen, Isabella Friis

AU - Siggaard, Troels

AU - Hernansanz Biel, Jorge

AU - Knudsen, Gitte M.

AU - Brunak, Søren

AU - Pinborg, Lars H.

N1 - Publisher Copyright: © 2023 The Authors. Epilepsia published by Wiley Periodicals LLC on behalf of International League Against Epilepsy.

PY - 2024

Y1 - 2024

N2 - Objective: Combining population-based health registries and electronic health records offers the opportunity to create large, phenotypically detailed patient cohorts of high quality. In this study, we used text mining of clinical notes to confirm International Classification of Diseases, 10th Revision (ICD-10)-registered epilepsy diagnoses and classify patients according to focal and generalized epilepsy types. Methods: Using the Danish National Patient Registry, we identified patients who between 2006 and 2016 received an ICD-10 diagnosis of epilepsy. To validate the epilepsy diagnosis and stratify patients into focal and generalized epilepsy types, we constructed dictionaries for text mining-based extraction of clinical notes. Two physicians manually reviewed the clinical notes for a total of 527 patients and assigned epilepsy diagnoses, which were compared with the text-mined diagnoses. Results: We identified 23 632 patients with an ICD-10 diagnosis of epilepsy, of whom 50% were registered with an unspecified epilepsy diagnosis. In total, 11 211 patients were considered likely to have epilepsy by text mining, with an F1 measure ranging from 82% to 90%. Manual review of the electronic health records for 310 patients revealed a false discovery rate of 29%. This rate was decreased to 4% by the text mining algorithm. The weighted average F1 measure for text mining-assigned epilepsy types was 79% (82% for focal and 76% for generalized epilepsy). Text mining successfully assigned a focal or generalized epilepsy type to 92% of the text mining-eligible patients registered with unspecified epilepsy. Significance: Text mining of electronic health records can be used to establish a patient cohort with much higher likelihood of having a diagnosis of epilepsy and a focal or generalized epilepsy type compared to the cohort created from ICD-10 epilepsy codes alone. We believe the concept will be essential for future genome-wide and phenome-wide association studies and subsequently the development of precision medicine for epilepsy patients.

AB - Objective: Combining population-based health registries and electronic health records offers the opportunity to create large, phenotypically detailed patient cohorts of high quality. In this study, we used text mining of clinical notes to confirm International Classification of Diseases, 10th Revision (ICD-10)-registered epilepsy diagnoses and classify patients according to focal and generalized epilepsy types. Methods: Using the Danish National Patient Registry, we identified patients who between 2006 and 2016 received an ICD-10 diagnosis of epilepsy. To validate the epilepsy diagnosis and stratify patients into focal and generalized epilepsy types, we constructed dictionaries for text mining-based extraction of clinical notes. Two physicians manually reviewed the clinical notes for a total of 527 patients and assigned epilepsy diagnoses, which were compared with the text-mined diagnoses. Results: We identified 23 632 patients with an ICD-10 diagnosis of epilepsy, of whom 50% were registered with an unspecified epilepsy diagnosis. In total, 11 211 patients were considered likely to have epilepsy by text mining, with an F1 measure ranging from 82% to 90%. Manual review of the electronic health records for 310 patients revealed a false discovery rate of 29%. This rate was decreased to 4% by the text mining algorithm. The weighted average F1 measure for text mining-assigned epilepsy types was 79% (82% for focal and 76% for generalized epilepsy). Text mining successfully assigned a focal or generalized epilepsy type to 92% of the text mining-eligible patients registered with unspecified epilepsy. Significance: Text mining of electronic health records can be used to establish a patient cohort with much higher likelihood of having a diagnosis of epilepsy and a focal or generalized epilepsy type compared to the cohort created from ICD-10 epilepsy codes alone. We believe the concept will be essential for future genome-wide and phenome-wide association studies and subsequently the development of precision medicine for epilepsy patients.

KW - electronic health records

KW - entity recognition

KW - epidemiology

KW - epilepsy phenotypes

KW - text mining

U2 - 10.1111/epi.17734

DO - 10.1111/epi.17734

M3 - Journal article

C2 - 37548470

AN - SCOPUS:85168579788

VL - 64

SP - 2750

EP - 2760

JO - Epilepsia

JF - Epilepsia

SN - 0013-9580

IS - 10

ER -

ID: 364546372