Automated medical coding - Part 1
The importance of medical coding.
In the high-stakes world of healthcare, precision in medical coding is not just a matter of administrative necessity — it is the backbone of a thriving insurance ecosystem. Medical codes, such as ICD and CPT, are standardized codes that healthcare professionals use to categorize diagnoses and procedures. These codes are the cornerstone of billing, claims, and analytics, yet the process of assigning these codes is fraught with complexity and cost. For health insurance companies, the message is clear: coding efficiency is a determinant for consistent operational performance.
Key challenges in the current medical coding landscape include:
- Time Constraints: Professional coders in Scotland average 7–8 minutes per case, leading to extensive backlogs that can last months or even years. [1]
- Error Susceptibility: UK accuracy on manual coding averages around 83% due to factors like incomplete data, subjective code selection, lack of expertise, and data entry mistakes. [2]
- Financial Impact: In the U.S., coding inaccuracies cost the healthcare industry a staggering $25 billion annually. [3]
These statistics highlight the critical need for improved accuracy and efficiency. In response to that, automated clinical coding powered by AI, offers a transformative solution, turning a laborious and error-prone process into a streamlined, cost-effective operation.
As the industry pivots towards AI, deep learning stands as a pillar of progress, providing a robust solution to the coding dilemma. For forward-thinking health insurance companies, the adoption of automated coding is not merely an operational decision but a strategic investment in accuracy and efficiency, positioning these companies at the forefront of healthcare innovation.
At Qantev, we are acutely aware of the urgent need for innovation in medical coding and the complexities that come with it. Our team is dedicated to refining state-of-the-art approaches to address these challenges effectively. This series begins with an in-depth look at the PLM-ICD framework [4], a pioneering solution that leverages the encoding capabilities of pre-trained language models to interpret the complexity of clinical text.
Before delving into the solution itself, let's try to decompose and understand the challenges of this task.
Why is it so hard?
Recent studies have introduced a variety of solutions to automate medical coding, employing techniques like Recurrent Neural Networks (RNNs) [5][6], Long Short-Term Memory (LSTM) [7], and Label Attention mechanisms [8], among others. However, these models primarily emphasize developing an effective interaction between note representations and code representations. It is an approach predominantly centers directly on the translating aspect of the problem. In contrast, PLM-ICD prioritizes a deeper understanding of the clinical text, ensuring a more complete comprehension before proceeding to the translation itself.
Using pre-trained language models offers a significant head start, yet some critical issues remain to be tackled. To fully grasp the PLM-ICD architecture, it is useful to review the three main challenges that persist in this context:
- Domain Mismatch: LLMs are typically pre-trained with billions of tokens from a wide variety of general-domain texts, such as Wikipedia, novels, web pages, and forums. This broad spectrum often leads to a domain mismatch when these models are applied to specialized fields, which has been shown to undermine performance on downstream tasks.
- Long Input Text: Pre-trained language models typically have a maximum sequence length due to the size constraints of their positional encodings. They are often capped at 512 tokens. Clinical notes, however, can be much lengthier, with documents such as those in the MIMIC-3 dataset averaging around 1,500 words or 2,000 tokens post-subword tokenization.
- Large Label Set: The process of automatic ICD coding is characterized by a large-scale multi-label text classification problem. It involves going through a vast array of possible labels to accurately categorize a document. With roughly 17,000 codes in ICD-9-CM and an overwhelming 140,000 in ICD-10-CM/PCS, the challenge is not just the sheer volume of labels but ensuring precise matches amidst an extensive and granular set.
The PLM-ICD Framework
Cras sit amet velit id nulla tempus dictum sit amet eu nisi. Fusce aliquet turpis at orci bibendum, non convallis justo tempor.
Predicting ICD codes is a task we define as multi-label classification. Imagine a clinical note as a long string of words, or "tokens", with a dimension that represents the total count of these tokens in the document. Our objective is to assign an appropriate set of ICD codes to this note, which are part of a much larger set of all possible codes, symbolized as Y.
For each ICD code in Y, a 0 or a 1 is assigned to indicate its absence or presence in relation to the clinical note, creating what is called a binary vector. This vector contains as many elements as there are codes in Y, and each element flags whether the corresponding code is relevant to the note. This process translates the detailed textual data of a clinical note into a streamlined, binary format that reflects which ICD codes apply to it.
Having established the nature of the challenge, let’s delve into the PLM-ICD architecture and examine its approach to overcoming the aforementioned challenges:
- Domain-Specific Pre-training: To address the domain mismatch in automatic ICD coding, the strategy is to employ pre-trained language models on biomedical and clinical texts, like BioBERT, PubMedBERT, and RoBERTa-PM. These specialized models have a better grasp of the nuanced biomedical terminology prevalent in clinical notes, crucial for accurate ICD code assignment. By fine-tuning these domain-specific PLMs on ICD coding tasks, we leverage even more their inherent design and objectives, similar to general-domain models.
- Segment Pooling : To overcome the challenge of long input texts in ICD coding, the segment pooling method is proposed. This involves dividing the lengthy clinical documents into smaller segments, each fitting within the maximum token limit of the PLMs. Each segment is independently encoded to generate representations, which are then concatenated to form a comprehensive representation of the entire document (H). By breaking down and then aggregating these segment representations, the model can effectively process and predict based on the full extent of the clinical notes, bypassing the token length constraints of standard PLMs.
- Label-Aware Attention: In order to tackle the large label space, Vu et al. [8] designed a mechanism to enhance the translation task by focusing on parts of the text that are particularly relevant to specific labels. After processing the text to capture the context within hidden representations (H), label-aware attention selectively emphasizes the information from these representations that is most pertinent to each label. Essentially, it acts like a filter that highlights the phrases or terms in the clinical note that are crucial for determining the correct ICD codes. This way, even when faced with a large number of possible labels, the model can efficiently identify and focus on the text segments that contribute most meaningfully to the coding task, enabling more precise predictions.
When the paper was published (12 July 2022), PLM-ICD [4] achieved state-of-the-art performance among all models in terms of micro F1 and all precision@k measures. Here are some of the results on the MIMIC-3 dataset:
The authors have also performed an ablation study to demonstrate the influence of each proposed technique that composes the PLM-ICD framework.
We can see that the the label attention mechanism is indeed the most pivotal component in bridging clinical text and medical codes, as it is the core translator in the PLM-ICD framework. In contrast, domain-specific pre-training and segment pooling serve as strategic adjustments, acknowledging the use of pre-trained language models to encode clinical narratives more effectively. These components are essential adaptations that enable the application of large language models in medical coding.
Conclusion
In conclusion, this model marks a significant advancement in the field of automated medical coding, providing a sophisticated solution by leveraging pre-trained language models for this task. Combined with the ingenious translation component — the label-aware attention mechanism — it sets a new benchmark for precision and efficiency, as demonstrated by its state-of-the-art performance.
PLM-ICD showcases both the potential of AI in healthcare and the transformative impact that such technologies can have on operational excellence and patient care.
In the forthcoming Part II of our series, we will continue to explore other cutting-edge solutions for automated medical coding. Our focus will turn to the use of readily available large language models such as GPT-4, GPT-3.5, and Llama-2, as presented in the study “Automated Clinical Coding with Off-the-Shelf Large Language Models” by Boyle et al. [9]. Their solution surpasses PLM-ICD in macro-f1 scores, setting a new precedent in accurately classifying rarer categories. Despite a distinct methodology, it shares a fundamental reliance: advanced textual comprehension afforded by large language models.
Bibliography:
[1] H. Dong, M. Falis, W. Whiteley, et al., “Automated clinical coding: What, why, and where we are?” NPJ digital medicine, vol. 5, no. 1, p. 159, 2022.
[2] E. M. Burns, E. Rigby, R. Mamidanna, et al., “Systematic review of discharge coding accuracy,” Journal of public health, vol. 34, no. 1, pp. 138–148, 2012.
[3] D. Lang, “Consultant report-natural language processing in the health care industry,” Cincinnati Children’s Hospital Medical Center, Winter, vol. 6, 2007.
[4] C.W. Huang, S.C. Tsai, and Y.N. Chen, “Plm-icd: Automatic icd coding with pretrained language models,” arXiv preprint arXiv:2207.05289, 2022
[5] Choi, E., Bahadori, M. T., Schuetz, A., Stewart, W. F., & Sun, J. (2016, December). Doctor ai: Predicting clinical events via recurrent neural networks. In Machine learning for healthcare conference (pp. 301–318). PMLR.
[6] Baumel, T., Nassour-Kassis, J., Cohen, R., Elhadad, M., & Elhadad, N. (2018, June). Multi-label classification of patient notes: case study on ICD code assignment. In Workshops at the thirty-second AAAI conference on artificial intelligence.
[7] Shi, H., Xie, P., Hu, Z., Zhang, M., & Xing, E. P. (2017). Towards automated ICD coding using deep learning. arXiv preprint arXiv:1711.04075.
[8] Vu, T., Nguyen, D. Q., & Nguyen, A. (2020). A label attention model for icd coding from clinical text. arXiv preprint arXiv:2007.06351.
[9] Boyle, J. S., Kascenas, A., Lok, P., Liakata, M., & O’Neil, A. Q. (2023). Automated clinical coding using off-the-shelf large language models. arXiv preprint arXiv:2310.06552.