Experimental Evaluation and Development of a Silver-Standard for the MIMIC-III Clinical Coding Dataset

Thomas Searle; Zina Ibrahim; Richard Dobson

Experimental Evaluation and Development of a Silver-Standard for the MIMIC-III Clinical Coding Dataset

Thomas Searle, Zina Ibrahim, Richard Dobson

Abstract

Clinical coding is currently a labour-intensive, error-prone, but a critical administrative process whereby hospital patient episodes are manually assigned codes by qualified staff from large, standardised taxonomic hierarchies of codes. Automating clinical coding has a long history in NLP research and has recently seen novel developments setting new benchmark results. A popular dataset used in this task is MIMIC-III, a large database of clinical free text notes and their associated codes amongst other data. We argue for the reconsideration of the validity MIMIC-III’s assigned codes, as MIMIC-III has not undergone secondary validation. This work presents an open-source, reproducible experimental methodology for assessing the validity of EHR discharge summaries. We exemplify the methodology with MIMIC-III discharge summaries and show the most frequently assigned codes in MIMIC-III are undercoded up to 35%.

Anthology ID:: 2020.bionlp-1.8
Volume:: Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing
Month:: July
Year:: 2020
Address:: Online
Venues:: ACL | BioNLP | WS
SIG:: SIGBIOMED
Publisher:: Association for Computational Linguistics
Note:
Pages:: 76–85
URL:: https://www.aclweb.org/anthology/2020.bionlp-1.8
DOI:
Bib Export formats:: BibTeX MODS XML EndNote
PDF:: https://www.aclweb.org/anthology/2020.bionlp-1.8.pdf

You can write comments here (and agree to place them under CC-by). They are not guaranteed to stay and there is no e-mail functionality.

PDF BibTeX Search