Facilitated Access to Drug Safety Information via the Use of Generative Artificial Intelligence

 

 

Written by Paniz Ghavimi, Conway lab 


 

Figure 5 Performance by SPL section 

Have you considered how healthcare providers prescribe the most suitable medication for patients while considering their current medication list?

Adverse drug reactions (ADRs) are the main cause of poor health outcomes and death. Gisladottir et al. (2025) led a research project examining the use of large language models (LLMs) in drug safety research. LLMs are artificial intelligence systems trained on vast amounts of text to generate human-like language. ChatGPT (Generative Pre-trained Transformer), Llama (Large Language Model Meta AL), and Mixtral are examples of LLMs that were tested in this study. Gisladottir et al. compared the capacity of the mentioned LLMs to pre-existing technologies in retrieving drug safety information from Structured Product Labelling (SPL). 

SPL refers to standardized documents approved by Health Level Seven (HL7) – a set of international healthcare standards adopted by the U.S. Food and Drug Administration (FDA) – that facilitate the exchange of information among healthcare providers and systems. SPL conveys essential information such as the “Drug Interactions”, “Adverse Reactions” (ARs), and “Warnings and Precautions” in a computer-readable format (XML files). SPL documents are updated regularly and are available as XML files on the global medical database and the National Library of Medicine’s (NLM) DailyMed website. 

Currently, bidirectional transformer (BERT) based models, a type of NLP method, are used to obtain drug safety information. While BERT models are suited for NLP, they are designed for specific tasks and often require extensive time-consuming training to categorize information in SPL documents, such as ARs. Additionally, existing NLP databases do not include all the relevant ADRs, thus limiting drug safety information. To address these limitations, Gisladottir et al evaluated LLMs to address the gap with the current methods. Generative LLMs have been increasingly used in the medical field due to their ability to efficiently generate comprehensive text from many sources.

Gisladottir et al. (2025) compared the performance of listed LLMs to the two commonly applied methods: Medical Dictionary for Regulatory Activities Preferred Term (MedDRA PT) and the state-of-the-art model DeepCADRME, a (BERT)-based model specifically designed to capture ARs. For their assessment, researchers used a dataset of over 1000 SPLs. They noted that factors such as SPL text length, word choice, and context could influence and limit the generated safety information. 

An important measurement of LLM is the maximum number of characters (including spaces) that can be included in an input and response. ARs vary in length, up to a maximum of 10,000 characters,. One model called the GPT-4-1106-preview has a character limit that is approximately 30 times greater than the currently used approaches. This allows drug safety information databases to be updated with a more comprehensive list of adverse reactions. 

Generative LLMs, especially GPT-4, are equipped to handle larger volumes of text. However, this capability also implies that models may struggle with varied language and occasionally generate irrelevant responses. Despite the absence of prior training, the study showed that LLMs outperformed BERT-based models in processing complex text. Gisladottir et al. also found that GPT-4 carried out impressive outputs with precision on multiple tasks such as drug interactions.

This study demonstrated that generative models, especially GPT-4, can effectively extract computer-compatible lists of adverse reactions from SPLs. The information obtained could inform the development of drug safety databases to reduce adverse drug reactions and improve health outcomes. With further validation, training, improved context-specific prompting, and manual monitoring, LLMs hold significant promise for future implementation in patient safety.

 

Source Publication: 

  • Gisladottir, U., Zietz, M., Kivelson, S., Tanaka, Y., Sirdeshmukh, G., Brown, K. L., & Tatonetti, N. P. (2025). Leveraging large language models in extracting drug safety information from prescription drug labels. Drug Safety, 49(2), 177–193. https://doi.org/10.1007/s40264-025-01594-x