Banner Banner

Lessons from African Low-Resource NLP for Humanities Research

Keynote by Seid Muhie Yimam, University of Hamburg


Abstract: In the digital humanities landscape, natural language processing (NLP) holds transformative potential for low-resource languages, such as those in Africa and other under-resourced regions. This keynote explores how NLP techniques have been adapted to settings with limited resources, focusing on sentiment analysis, hate speech detection, emotion analysis, speech-to-text, and machine translation, derived from our work in African languages. Using large language models (LLMs), our projects have advanced research in low-resource languages, offering unique insights into both challenges and opportunities. 

Drawing from these experiences, my talk will demonstrate how the collaboration between NLP and humanities research can be applied to broader humanities data in low-resource domains, particularly emphasizing the analysis of historical materials such as manuscripts and religious texts. I will explore how AI can enhance our understanding of cultural narratives and historical contexts. 

At the Hub of Computing and Data Science (HCDS) and the Language Technology Research Group at the University of Hamburg (UHH), we developed the Sense Clustering over Time (SCoT) tool, which illustrates how similar methodologies can extend to various humanities data. The SCoT project provides dynamic visualizations of word meanings as they evolve, enhancing diachronic semantic understanding. 

This keynote emphasizes the critical role of interdisciplinary collaboration and AI methodologies in advancing our understanding of humanities data in low-resource settings. By integrating technological innovation (mostly LLM) with cultural heritage studies, we can open new research avenues and gain insights through temporal and linguistic contexts.

Section 3: Modeling Language and Low-Resource Humanities Data