Researchers, Instructors, & Staff Scholarship

AI-Informed Approaches to Keyword Generation, Text Summarization, and Document Clustering for Improved Resource Discovery

Charlie Harper, Case Western Reserve University
Anne Kumer, Case Western Reserve UniversityFollow
Shelby Stuart, Case Western Reserve University
Evan Meszaros, Case Western Reserve UniversityFollow

Author ORCID Identifier

https://orcid.org/0000-0003-3781-5812

https://orcid.org/0000-0001-8791-5964

https://orcid.org/0000-0003-1946-6233

https://orcid.org/0000-0002-9500-0294

Document Type

Book Chapter

Publication Date

5-2022

Abstract

Academic and cultural institutions are grappling with problems of how to organize, label, and search disparate bodies of texts. As aggregators, preservers, and disseminators of substantial repositories of digital texts, research libraries are naturally situated at the heart of these problems. This chapter explores how unsupervised machine learning may be used to capture and simplify the complexity and nuances of text. Traditional approaches to improving discoverability and accessibility of text through metadata and controlled vocabularies have time-tested strengths. As the volume of digital data explodes, the obstacles and limitations of traditional approaches become more pronounced, and machine learning “show(s) the potential to create efficiencies that smooth the path to access, enhancing description and expanding forms of discovery along the way.”1 In light of the need for new approaches to metadata generation to facilitate discovery, the authors look at Doc2Vec and topic modelling with Latent Dirichlet Allocation (LDA) to explore their utility as assistive tools for authors, librarians, and readers. The authors apply the two approaches to a corpus of electronic theses and dissertations (ETDs) completed at Ohio universities and colleges.

Publication Title

The Rise of AI: Implications of Artificial Intelligence in Academic Libraries

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.

Comments

This study’s data sets, python notebooks, and trained models are provided on OSF (https:// osf.io/r6yhp/) and are licensed under Creative Commons Attribution-ShareAlike 4.0.

Recommended Citation

Harper, Charlie; Kumer, Anne; Stuart, Shelby; and Meszaros, Evan, "AI-Informed Approaches to Keyword Generation, Text Summarization, and Document Clustering for Improved Resource Discovery" (2022). Researchers, Instructors, & Staff Scholarship. 1.
https://commons.case.edu/staffworks/1

Download

Included in

Cataloging and Metadata Commons

COinS

Researchers, Instructors, & Staff Scholarship

AI-Informed Approaches to Keyword Generation, Text Summarization, and Document Clustering for Improved Resource Discovery

Author ORCID Identifier

Document Type

Publication Date

Abstract

Publication Title

Creative Commons License

Comments

Recommended Citation

Included in

Browse

Search

Author Corner

Researchers, Instructors, & Staff Scholarship

AI-Informed Approaches to Keyword Generation, Text Summarization, and Document Clustering for Improved Resource Discovery

Authors

Author ORCID Identifier

Document Type

Publication Date

Abstract

Publication Title

Creative Commons License

Comments

Recommended Citation

Included in

Share

Browse

Search

Author Corner