Linguistics

How Big Data Is Transforming Linguistics

The University of Zurich is investing in research into human languages. In the next few years a great deal of equipment will be bought and labs built as part of the LiRI project. With the help of IT specialists, it will be possible to process and analyze large volumes of data.

Andres Eberhard

Blick in ein Forschungslabor
Blick in ein Forschungslabor
This could be the look of a future laboratory in which EEG images will be taken in order to understand speech processing in the brain more precisely. (Image: Marc Latzel)


Back in the day, researchers used to devise their theories at their desk. In the field of language research this type of science even got its own name – armchair linguistics. This has changed: Now work in the field and in the lab, in other words controlled data collection and analysis, is an increasingly important component of research. Things that were only conjectured can now be proven or refuted with data – and most recently with huge volumes of data. For example, people often complain about the barbarization of language in the digital age. “We can only find out whether this is true, and what aspects of it are true, if we analyze millions of words in WhatsApp and text messages,” says Elisabeth Stark, professor of Romance linguistics at UZH. While it is possible to analyze big data from your armchair, it requires a lot of storage and software to process such huge volumes of information.

Research infrastructure of national significance

The University of Zurich will invest around eight million francs by 2025 to set up the infrastructure necessary to carry out this new linguistic research. The plan is to build laboratories, procure modern technology for experimentation and research, and hire a number of IT specialists and data scientists. The importance and urgency of the project, which goes under the name of Linguistic Research Infrastructure (LiRI), are underscored among other things by its inclusion in the federal government’s Swiss Roadmap of Research Infrastructures 2021-2024 at the end of 2018.

It’s an investment in big data processing. But not just that. The labs will also have equipment enabling massive improvements in the quality of sound and video recordings. The shopping list includes devices for recording voices and lip movements, infrared cameras, an ultrasound machine and eye-tracking systems. Heavy investment is also planned in neurolinguistic equipment such as EEG systems that will facilitate a better understanding of language processing in the brain.

Kitted out for complex experiments

“We need this basic high-end equipment as a field for experimentation,” says Stark, who heads the LiRI project. She explains that larger, more complex experiments simply weren’t possible with the present infrastructure. Many of the devices to be acquired are also mobile, meaning they can gather complex data from natural language production in the field. There are many socially relevant fields of application, from research into the way young children acquire language to analyzing hearing impairments and loss in old age. 

Ultimately it’s not just research that will benefit from all these new labs and equipment, but teaching as well. New methods of data collection and processing and quantitative analysis are key components of a new single-major Master’s study program in linguistics and the joint doctoral training program offered by the Zurich Linguistics Center.                                     

Elisabeth Stark und Volker Dellwo zeigen die Skizze ihres zukünftigen Labors
Elisabeth Stark und Volker Dellwo zeigen die Skizze ihres zukünftigen Labors
Elisabeth Stark and Volker Dellwo present the sketch of their future research laboratory. (Image: Andres Eberhard)

Still on the drawing board

LiRI still only exists on paper. Forensic voice expert Volker Dellwo at the Institute of Computational Linguistics, also a member of the project team, has produced a sketch showing a potential lab set-up consisting of multiple interconnected soundproofed booths and video and EEG cabins.

It’s still not clear where the new labs will be located or precisely what form they’ll take. There are much firmer ideas of the personnel required: A system administrator, a lab technician, and around five IT specialists or scientists specialized in data processing and analysis. The first member of the new IT staff will be starting work soon. 

Bringing linguistic research together

The plan is for the infrastructure to be used on an interdisciplinary, nationwide basis. The Zurich Linguistics Center spans more than 20 professorships located in various departments of the UZH Faculty of Arts and Social Sciences. Outside UZH, there are many research projects running all over the country that have already expressed an interest in collaborating with LiRI. Along with people doing research in the humanities and social sciences, researchers in psychology, medicine, geography and biology are also interested in human language. The plan is to also let companies – potential customers could include hearing aid manufacturers – use the labs and equipment in return for a fee.

                                                                                                                                                                            

Andres Eberhard is a freelance journalist; English translation by Michael Craig

Write Comment

The editorial team reserves the right to not publish comments. We will not publish anonymous, defamatory, racist, sexist, otherwise prejudiced, or irrelevant comments. UZH News will also not publish comments with advertising content.

Number of remaining characters: 1000