Advanced Search
- Home
- Content
- Current
- Ahead of print
- Past Issues
- JNM Supplement
- SNMMI Annual Meeting Abstracts
- Continuing Education
- JNM Podcasts
- Subscriptions
- Subscribers
- Institutional and Non-member
- Rates
- Journal Claims
- Authors
- Submit to JNM
- Information for Authors
- Assignment of Copyright
- AQARA requirements
- Info
- Reviewers
- Permissions
- Advertisers
- About
- About Us
- Editorial Board
- Contact Information
- More
- Alerts
- Feedback
- Help
- SNMMI Journals
- View or Listen to JNM Podcast
- Visit JNM on Facebook
- Join JNM on LinkedIn
- Follow JNM on Twitter
- Subscribe to our RSS feeds
Meeting ReportPIDS: Data Sciences & Imaging Informatics
Kenji Hirata, Akie Katsuki, Masatoyo Nakajo, Shiro Watanabe, Junki Takenaka, Naoto Wakabayashi, Takaaki Yoshimura, Minghui Tang, Kazutaka Minami, Nozomu Uetake and Kohsuke Kudo
Journal of Nuclear Medicine June 2025, 66 (supplement 1) 251030;
- Article
Abstract
251030
Introduction: SUVmax is the de facto standard for representing uptake intensity in FDG-PET/CT reports. We previously demonstrated that SUVmax values documented in FDG-PET/CT reports are useful for identifying lesion locations within the images (Hirata et al. Front Med 2021). Building upon this finding, we aim to develop systems for treatment response evaluation and automated report generation. Although SUVmax values are typically written as numerical strings with decimal points, such as "3.14," rule-based approaches like regular expressions often fail due to exceptions. Since the emergence of ChatGPT in 2022, large language models (LLMs) have become valuable tools for analysis of medical texts such as radiology reports. However, cloud-based systems, such as ChatGPT and Gemini, are not allowed to directly process reports containing sensitive information. To address this, we implemented an open-source LLM locally and utilized it to extract and structure information on "location and SUVmax" from FDG-PET/CT reports.
Methods: The Institutional Review Board approved the retrospective study (#23-0128). We reviewed 949 patients who underwent FDG-PET/CT examinations at our institute from the beginning of 2017. All reports, written in Japanese, were authored by certified nuclear medicine specialists. The LLM utilized was "Llama-3-ELYZA-JP-8B." Reports were input into the LLM with a prompt instructing it to generate a JSON-format text such as {site: "pancreas", SUVmax: "3.141"}. To mitigate hallucinations, additional instructions were provided to ensure the LLM refrained from outputting answers if the SUVmax was unclear or unavailable. For the organs such as lung and liver, the LLM was instructed to include laterality and specific lobes in the output. The ground truth was determined by an experienced nuclear medicine physician for all the cases. The accuracy of the LLM was evaluated using the Dice similarity coefficient (DSC).
Results: Among the 949 cases reviewed, 591 reports (62%) contained at least one SUVmax description. Collectively, a total of 1,135 SUVmax values were documented, comprising 614 single-digit, 25 double-digit, and 496 triple-digit values. Applying the criterion of SUVmax > 5 or triple-digit values, 842 lesions (74%) met the specified conditions. With respect to laterality, 354 lesions involved the left side, while 411 the right side. Examining major anatomical regions, the thorax accounted for the largest proportion with 276 lesions, followed by the abdomen with 199 lesions, and the head with 97 lesions. At the organ-specific level, the most frequently identified sites were the lung (208 lesions), bone (198 lesions), and pharynx (54 lesions), among others. The ground truth dataset included 1,135 SUVmax values, whereas the LLM output produced 1,353 values - a 19% increase over the ground truth, suggesting the presence of hallucinated, non-existent SUVmax values. Consequently, the patient-based DSC was calculated at 0.792. In 479 cases (81%), the sensitivity was 100%, indicating that the LLM successfully identified all SUVmax values present in the ground truth. Perfect matches were achieved in 407 cases (69%) of the 591 cases. The overall sensitivity was measured at 83.6%.
Conclusions: While the LLM demonstrated a tendency to output non-existent SUVmax values due to hallucinations, it achieved high sensitivity. Unlike cloud-based systems such as ChatGPT, the local LLM can be operated securely, making it a viable tool for efficiently extracting SUVmax values. Further improvements in accuracy are likely to require refinements in the prompts.
- Download figure
- Open in new tab
- Download powerpoint
Previous
Back to top
In this issue
Journal of Nuclear Medicine
Vol. 66, Issue supplement 1
June 1, 2025
- Table of Contents
- Index by author
Article Alerts
Email Article
Citation Tools
- Facebook Like
- Google Plus One
Bookmark this article
Jump to section
- Article
Related Articles
No related articles found.
- Google Scholar
Cited By...
No citing articles found.
- Google Scholar
More in this TOC Section
AI-Assisted Annotation of V/Q Scintigraphy VQ4PEDB: Development of a Large-Scale Annotated Database for Pulmonary Embolism
Lymphoma Subtype Classification Using 18F-FDG PET Tumor-to-Liver Ratio Radiomics Combined with Demographic Data: A Multicenter Study
AI-assisted TMTV calculation for lymphomatous disease – validation study on the international TMTV benchmark dataset