Applying Artificial Intelligence and Chatbots to Enhance Collection Development in Health Sciences Libraries
Ivan Portillo, MLIS, AHIP Health Sciences Librarian & Director of Rinker Campus Library Services Leatherby Libraries Chapman University
David Carson, MM, MLIS Health Sciences Education & Research Librarian Oregon Health & Science University
Introduction
The continuous advancement of artificial intelligence (AI) and large language models (LLMs) has presented several opportunities for librarians to reduce their workload and become more efficient. AI chatbots previously generated responses by relying pre-trained, large-scale data. Newer versions of chatbots have made substantial strides in their capabilities, now offering the ability to search online and supplement outputs using Retrieval-Augment Generation (RAG) to provide more accurate and relevant responses. These advancements present a promising opportunity for librarians to reduce workload and increase efficiency [1, 2].
To explore the potential of generative AI chatbots in assisting health sciences librarians with collection development, we evaluated five generative AI chatbots using two prompts designed to aid librarians. The five generative AI chatbots assessed included ChatGPT o3, NotebookLM, Google Gemini 2.5, Perplexity, and Microsoft Copilot.
Prompt 1: ebook Recommendations
For the first task, we developed the following prompt for each generative AI chatbot to generate a list of recent ebook titles published in the last two years focused on physical therapy, physician assistant, communication sciences and disorders, and pharmacy:
“I am responsible for updating the ebook collection of a health sciences library that supports academic programs in physical therapy, physician assistant, communication sciences and disorders, and pharmacy. I am particularly interested in titles published in 2023-2025 that are critical for these fields. Could you help me identify the top recent ebooks published between 2023 and 2025 for each of these academic programs? Please list up to five essential titles for each program. Provide the information in APA citation format, including the author(s), publication year, title, publisher, and any DOI or URL if available. This will assist in quick integration into our library management system and ensure our collection is relevant and up-to-date.”
The results from the first prompt were assessed based on quality, accuracy, the presence of fabricated titles (often referred to as “hallucinations”), whether references were provided, correct citation details, and accurate Library of Congress (LC) call numbers. ChatGPT, Copilot, and Gemini provided five titles per subject as requested, while Perplexity and NotebookLM provided fewer than five or none for specific subjects. Each AI chatbot successfully produced relevant book titles, but all chatbots also hallucinated inaccurate information about each book, including incorrect editions, publication years, links, and APA citations. All five chatbots also provided titles outside of the date range requested. For example, the following book was suggested by Google Gemini:
Blosser, J. L. (2023). School Programs in Speech-Language Pathology: Organization and Service Delivery (5th ed.). Plural Publishing.
The suggested book title does exist, is written by the author provided, and is relevant to communication sciences and disorders. However, the fifth edition of the book was published in 2012. The most recent edition is the seventh edition, published in 2025, which would have been a better response to our prompt.
Copilot and ChatGPT were the most accurate, as they offered accurate authors and titles while completing the task as requested. We would not recommend any generative AI chatbots for recommendations on recently published titles due to inaccuracies and inconsistencies, but we did find them helpful for discoverability.
Prompt 2: Collection Gap Analysis
We asked each AI chatbot to complete three steps for the second task. The first step was to analyze the curriculum from Chapman University’s Physician Assistant program directly from the program’s webpage. The second step involved creating a list of the library’s collection that was uploaded into each generative AI chatbot. A list of the library’s collection of physical titles was exported as an Excel file from the Leatherby Libraries Integrated Library System, Sierra from Innovative. The Excel file with the list of the library’s collection contained fields for title, Library of Congress call number, location, and item status. NotebookLM and Perplexity were unable to accept Excel files. As an alternative, we copied and pasted the titles and call numbers into Perplexity’s prompt field and a .txt file for NotebookLM. The following prompt was then developed to ask each AI chatbot to compare and analyze the library’s collection to see if the entire curriculum of the Physician Assistant program was represented:
“As a health sciences librarian, your task is to analyze the provided library collection to identify any subject gaps that could hinder the support of Chapman University's Physician Assistant curriculum. Please follow these steps: Review the file attached containing the library’s current collection. Assess the subject areas covered by the collection, focusing on their relevance to the Chapman University's Physician assistant curriculum. Identify any significant gaps or missing subjects within the collection that are critical for the physician assistant curriculum. Provide a list of the subject gaps found with their associated Library of Congress call number ranges and explain why each is important for the program.”
All five AI chatbots completed the tasks for the second prompt but found inconsistencies in the provided analyses. Each AI chatbot was able to compare the provided curriculum with the list of physical items. Each AI chatbot provided minor differences in the subject gaps they identified, but all provided the reasoning behind the importance of each subject area recommended such as the following example from ChatGPT:
Discipline / topic (linked PA course[s])
LC call-number span
Why the gap matters
Current holdings*
Ophthalmology / Eye disorders (PAS 507 EENT)
RE 1–994
EENT begins with a detailed ocular exam and the management of red-flag eye emergencies (e.g., acute angle-closure glaucoma, central retinal artery occlusion). Students need atlases, slit-lamp & fundoscopy guides, and ocular-pharmacology references.
0 titles located (no “RE” shelf marks)
Dermatology (PAS 517)
RL 1–803
The PA dermatology block is image-heavy and requires current visual diagnostics, dermoscopy, and procedural dermatology manuals.
7 titles
Emergency medicine (PAS 520 didactic; PAS 605 rotation)
RC 86–88
Core to quick-decision algorithms, toxicology, and advanced life-support procedures; point-of-care e-reference texts are essential for students on call.
3 titles
Figure 1: Table of collections gap analysis responses from ChatGPT
Only Google Gemini provided inaccurate LC call numbers in its recommendations, while ChatGPT provided additional details and accurate LC call numbers that we found most useful and beneficial to our current collection development cycle.
Conclusion
We found that AI chatbots can assist librarians in collection development, although they are still prone to inaccuracies. Hallucinations found for the first prompt indicate that information retrieval still needs improvement in generative AI chatbots. When using the AI chatbots as a RAG and providing specific sources and data to analyze, the results were more promising and practical, as suggested by the second task. Overall, our findings suggest that generative AI chatbots can be a supplementary tool, and future improvements may prove helpful to librarians.
References
1. Brzustowicz R. From ChatGPT to CatGPT: The Implications of Artificial Intelligence on Library Cataloging. Information Technology and Libraries. 2023;42(3). DOI: https://doi.org/10.5860/ital.v42i3.16295
2. Yamson GC. Immediacy as a better service: Analysis of limitations of the use of ChatGPT in library services. Information Development. 2023;0(0):02666669231206762. DOI: https://doi.org/10.1177/02666669231206762