Author ORCID Identifier
https://orcid.org/0000-0003-3781-5812
https://orcid.org/0000-0001-8791-5964
Document Type
Book Chapter
Publication Date
5-2022
Abstract
Academic and cultural institutions are grappling with problems of how to organize, label, and search disparate bodies of texts. As aggregators, preservers, and disseminators of substantial repositories of digital texts, research libraries are naturally situated at the heart of these problems. This chapter explores how unsupervised machine learning may be used to capture and simplify the complexity and nuances of text. Traditional approaches to improving discoverability and accessibility of text through metadata and controlled vocabularies have time-tested strengths. As the volume of digital data explodes, the obstacles and limitations of traditional approaches become more pronounced, and machine learning “show(s) the potential to create efficiencies that smooth the path to access, enhancing description and expanding forms of discovery along the way.”1 In light of the need for new approaches to metadata generation to facilitate discovery, the authors look at Doc2Vec and topic modelling with Latent Dirichlet Allocation (LDA) to explore their utility as assistive tools for authors, librarians, and readers. The authors apply the two approaches to a corpus of electronic theses and dissertations (ETDs) completed at Ohio universities and colleges.
Publication Title
The Rise of AI: Implications of Artificial Intelligence in Academic Libraries
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.
Recommended Citation
Harper, Charlie; Kumer, Anne; Stuart, Shelby; and Meszaros, Evan, "AI-Informed Approaches to Keyword Generation, Text Summarization, and Document Clustering for Improved Resource Discovery" (2022). Researchers, Instructors, & Staff Scholarship. 1.
https://commons.case.edu/staffworks/1
Comments
This study’s data sets, python notebooks, and trained models are provided on OSF (https:// osf.io/r6yhp/) and are licensed under Creative Commons Attribution-ShareAlike 4.0.