In a May 9, 2024 paper, Juri Opitz from the University of Zurich, along with Shira Wein and Nathan Schneider form Georgetown University, discussed the importance of linguistic expertise in natural language processing (NLP) in an era dominated by large language models (LLMs).
The authors explained that while machine translation (MT) previously relied heavily on linguists, the landscape has shifted. “Linguistics is no longer front and center in the way we build NLP systems,” they said. With the emergence of LLMs, which can generate fluent text without the need for specialized modules to handle grammar or semantic coherence, the need for linguistic expertise in NLP is being questioned.
The authors do not think the rise of LLMs spell the end of linguistics’s relevance to NLP. They emphasized that designing how a system functions is just one part of the overall process involved in the research, development, and deployment of NLP technologies. There are “several aspects in which NLP (still) relies on linguistics, or where linguistic thinking can illuminate new directions,” they said.
The authors identified six major facets where linguistics contributes to NLP, encapsulated in the acronym RELIES.
Slator 2024 Language Industry Market Report — Language AI Edition
The 140-page flagship report features in-depth market analysis, language AI opportunities, survey results, and much more.
Linguistic expertise helps develop resources for NLP tasks through data selection and curation, data annotation (gold standard annotations), and corpus creation, ensuring the quality and diversity of datasets and, therefore, the sound behavior of systems.
“Linguistic expertise is relevant for structurally collecting and documenting language data,” they said.
This linguistic knowledge is also important when building parallel corpora for MT. The authors also emphasized the importance of advanced linguistic knowledge of annotators, noting that “trained linguists are needed for maximizing the quality of MT references.”
Linguistic knowledge is essential for designing effective human evaluations, judging the quality of automatic metrics by correlating agreement between the human judgments and the automatic metric scores (“meta-evaluation”), and identifying linguistic phenomena challenging systems (such as anaphora or dialect variation).
Crucial Role for Human Evaluation
The authors highlighted the “crucial role” human evaluations play in reliably assessing the state of the field — especially as systems continue to improve — with metalinguistic knowledge being a must in human evaluation studies to ensure effective error analysis and quality assessment.
Additionally, expertise in linguistic theories is necessary because the specific linguistic phenomena which may be more challenging for the models need to be understood in order to be identified.
”Linguistics help take a system’s fingerprint, evaluate a system in particular categories, and foster understanding of complex models by binding observed behavior to interpretable linguistic categories,” said the authors.
Low-Resource Settings
Linguistic expertise is essential not only for collecting data to preserve low-resource languages but also for effectively developing technologies for these languages. The authors highlighted the importance of linguistically sensitive supervision in developing language technologies for under-resourced languages.
Linguistically sensitive supervision involves overseeing and guiding the development of language technologies with a deep understanding of linguistic principles and cultural contexts. This ensures that technologies are developed in a way that respects and aligns with the linguistic and cultural norms of the target community.
Interpretability and Explainability
Linguistics provides NLP with an appropriate metalanguage, serving as a common language for expressing observations and formulating explanations. This common language facilitates streamlined discussions on complex NLP processes.
“Linguistics offers NLP an important metalanguage for expressing observations, such as about model predictions, and hypothesizing explanations,” they said.
Study of Language
Linguistics and related areas serve as application domains for NLP. Language researchers, even those who aren’t computational linguists, form a “user base” that drives the development of NLP tasks and tools.
Unlike the previous categories, which showed how linguistics contributes to NLP, this category goes both ways: studying language motivates the development of NLP tools, and those tools then help further language study.
The authors noted that the list is not exhaustive. Their aim was to provide a general overview rather than a detailed analysis. They highlighted that linguistic expertise is valuable but not the sole or most critical aspect in working with language data and systems. They demonstrated how linguistics can contribute to specific projects and the broader field, working in conjunction with other forms of expertise.
“We hope that this study will promote future work that leverages collaboration and connections between linguistics and computer scientists with the aim of NLP progress in diverse domains,” they concluded.