Field matters

The first workshop on NLP applications to field linguistics

Call for papers
Shared tasks
Workshop program

Workshop description

Field linguistics plays a crucial role in the development of linguistic theory and universal language modelling, as it provides uncontested, the only way to obtain structural data about the rapidly diminishing diversity of natural languages.

The Field matters workshop aims to bring together the urgent needs of field linguists and the vast community of NLP practitioners, developing up-to-date NLP tools for easier, faster, more reliable data collection and annotation.

The workshop will take place **16th of October at COLING 2022**

Invited speakers

Antonios Anastasopoulos (George Mason University)

Antonis Anastasopoulos is an assistant professor at George Mason Computer Science Natural Language Processing Group. His interests include various aspects of multilingual Natural Language Processing and Machine Learning, with the main focus being Machine Translation and Speech Recognition for endangered languages and low-resource settings in general. He completed his Computer Science PhD at the University of Notre Dame, with a dissertation on “Computational Tools for Endangered Language Documentation”. He has been involved with documentation efforts on Griko, an endangered Greek dialect spoken in south Italy. He co-organized the workshop on Language Technology for Language Documentation and Revitalization, hosted in Pittsburgh in August 2019.

Steven Bird (Charles Darwin University)

Steven Bird is conducting social and technological experiments in the future evolution of the world’s languages. Together with his students and colleagues, he is developing scalable methods for preserving disappearing words and worldviews for future generations of speakers and scholars. He is collaborating with speech communities in diasporas and ancestral homelands to design new approaches to language maintenance and revitalisation.

Steven studied computer science at the University of Melbourne before completing a PhD in computational linguistics at the University of Edinburgh. He has conducted fieldwork on endangered languages in West Africa, South America, Central Asia, Melanesia, and Australia. He has held academic positions at the Universities of Edinburgh, Pennsylvania, Melbourne, and UC Berkeley. He holds a secondary appointment as Senior Research Scientist at the International Computer Science Institute, UC Berkeley. He serves as Linguist at the Nawarddeken Academy in West Arnhem.

Steven is leading the Top End Language Lab

Program Committee

Oleg Serikov (HSE University, AIRI, MIPT, RAS Linguistics, oserikov@hse.ru)

is an NLP Researcher at AIR Institute. Oleg now writes his PhD thesis at HSE University, his main points of interest are under-resourced languages ASR, under-resourced languages modelling and linguistic interpretation of language models. He co-organized SIGTYP 2021 and LowResourceEval 2021 shared tasks on under-resourced languages ASR.

Elena Klyachko (Institute of Linguistics RAS, HSE University, elenaklyachko@gmail.com)

is a PhD student at HSE. Her main points of interest are Tungusic languages, which she has been studying during her fieldwork, as well as low-resource NLP. She co-organized the SIGMORPHON 2020-2021 shared tasks on morphological reinflection, LowResourceEval 2019, 2021 shared tasks on NLP for field linguistic data and SigTyp 2021 shared task on under-resourced languages ASR.

Francis Tyers (Indiana University, HSE University, ftyers@iu.edu)

has a huge experience with under-resourced languages processing. His reseach interests include modelling the grammar of polysyntetic languages, and application of finite-state methods to NLP. Francis is one of the core contributors of Apertium machine translation project. He has an experience of co-organizing workshops and shared tasks, including SIGMORPHON 2020-2021, CoNLL 2018.

Valentin Malykh (Huawei, valentin.malykh@huawei.com)

works as a senior research scientist at Huawei Noah’s Ark laboratory. Dr Malykh has more than 20 papers in NLP field, including publications on such conferences as NeurIPS, ACL, WSDM. Valentin previously co-organized the LoResMT 2018-2019 workshops; also he twice co-organized NeurIPS Challenge on Conversational Artificial Intelligence in 2017 and 2018.

Timofey Arkhangelskiy (University of Hamburg, timarkh@gmail.com)

is a researcher at the Institute for Finno-Ugric/Uralic Studies, where he is working on a language documentation project that involves linguistic fieldwork. His interests in NLP are morphology and application of existing tecniques to under-resourced languages to facilitate processing of fieldwork data. He has developed a number of general-purpose corpora of minority languages, as well as software for processing and publishing linguistic corpora.

Tatiana Shavrina (AIRI, SberDevices, shavrina@airi.net)

is a Research Project Manager in NLP at AIRI and the Chief Technology Expert in the Department of Experimental ML at SberDevices. Her research focus is on the evaluation of the language models.

Ekaterina Vylomova (University of Melbourne, evylomova@gmail.com)

is a Lecturer and a Postdoctoral Fellow at the University of Melbourne. Her research is focused on modelling of morphology and computational approaches to linguistics typology. She is the president of SIGTYP, co-organized the SIGTYP 2019-2021 workshops and the SIGMORPHON 2017-2021 shared tasks on morphological reinflection.

Ekaterina Voloshina (HSE University, AIRI, evoloshina@hse.ru)

is a researcher at HSE University and AIR Institute. Her research interests are mainly computational and quantitive approaches to language description and modelling.

Anna Postnikova (HSE University, apostnikova@hse.ru)

is a researcher at HSE University. Her research interests focus on field linguistics and under-resourced languages documentation, particularly Tungusic languages.

Ekaterina Neminova (HSE University, esneminova@edu.hse.ru)

is a student-researcher at HSE University. Her research interests are in approaches to automatic speech, especially field speech, processing, as well as evaluating the quality of this processing.

Alena Fenogenova (SberDevices) is NLP Prototyping & Research Team Lead, Department of Experimental ML at SberDevices.

Vladislav Mikhailov (SberDevices, HSE) is an R&D NLP Engineer in the Department of Experimental ML at SberDevices and works as an invited lecturer in Big Data and IR School (HSE)

Paper submission

We invite both archival and non-archival submissions. Non-archival submissions are 2-page abstracts that could present already published work or work in progress. Archival submissions should be either 4- or 8-pages long.

Dual submissions and preprints

Dual submissions with the main conference are allowed, but authors must declare dual submission by entering the paper’s main conference submission id. The reviews for the submission for the main conference will be automatically forwarded to the workshop and taken into consideration when your paper is evaluated. Authors of dual-submission papers accepted to the main conference should retract them from the workshop by September 20.

Papers posted to preprint servers such as arxiv can be submitted without any restrictions on when they were posted.

Camera-ready information

Authors of accepted archival papers should upload the final version of their paper to the submission system by the camera-ready deadline. Authors may use one extra page to address reviewer comments, for a total of nine pages.

Anti-Harassment Policy

Field matters 2022 adheres to the ACL Anti-Harassment Policy.

Demographic Diversity

We encourage diversity in all forms. Workshop organizers make it their top priority the freedom of thought and expression, as well as respectful scientific debate. On behalf of the organizing team, we are committed to the principles of gender and sociodemographic diversity and are guided by these principles in the consideration of the workshop team, including the selection of invited speakers and PC. We will also make sure that the ACL Anti-Harassment Policy is respected during the organization and execution of the event.