Transformer-based Automated ICD Coding

Final project for the course CSC2541 - Topics in Machine Learning: Machine Learning for Health at the University of Toronto.

Abstract

Automated ICD coding, which aims to predict the International Classification of Disease (ICD) codes based on clinical discharge summaries, has received widespread concern from machine learning researchers due to its ability to save massive time and labour required by human coders. Unlike most NLP tasks, state-of-the-art ICD coding models were based on convolutional neural networks (CNN) or recurrent neural networks (RNN), while the popular transformer architecture did not perform well in ICD coding. In this paper, we investigate the challenges and pitfalls of transformer-based ICD coding by experimenting with different transformer architectures using an encoder-decoder architecture. We then present our solution to transformer-based ICD coding: our Longformer-based model significantly outperforms the CNN-based baseline models in five out of six metrics on the MIMIC-III full code dataset, and all evaluation metrics on the MIMIC-III top-50 code dataset. The source code for this project can be found at https://github.com/wren93/CSC2541-repo.

Full report / Code