“Participatory Research For Low-resourced Machine Translation: A Case Study In African Languages” has emerged winner of the Wikimedia Foundation Research Award of the Year 2021.
The award was presented by Jimmy Wales, the Founder of Wikipedia and Wikimedia Foundation.
The 18-page research work published in November 2020 (publisher: Association for Computational Linguistics) was undertaken by a total of 47 authors and the Masakhane community with about 18 Nigerians directly involved.
The research focused on the task of Machine Translation (MT) in playing a crucial role for information accessibility and communication worldwide and contends that despite immense improvements in MT over the past decade, MT is centered around a few high-resourced languages and thus, participatory research as a means to involve all necessary agents are required in the MT development process.
The research, demonstrated the feasibility and scalability of participatory research with a case study on MT for African languages including Afrikaans, Amharic, Arabic, Efik, Edo, Esan, Hausa, Igbo, Isoko, Fon, Kikuyu, Luo, Nigerian Pidgin, Tiv, KiSawhili, Yoruba, Urhobo, Shona, Kamba, isiZulu, Lingala, etc.
The research authors stated that its implementation has led to a collection of novel translation datasets, MT benchmarks for over 30 languages, with human evaluations for a third of them, and enables participants without formal training to make a unique scientific contribution. Benchmarks, models, data, code, and evaluation results are released at https://github.com/masakhane-io/masakhane-mt.
For context, Ricky Macharm, one of the authors of the research paper stated:
“In Europe we have languages with very few speakers yet they have so many Wikipedia articles but in Africa where languages like Yoruba and Hausa with far more speakers have fewer or no articles for the native speakers to read.”
“Impressive work, also this table from the paper is heartbreaking. For reference, Irish (spoken daily by < 100K?) has 53,000 Wikipedia articles, topping Yoruba, Shona, Zulu and Igbo’s combined (with a total 100 million speakers).”
“Describe a novel strategy for participatory research around “machine translation.
Shows how this approach can overcome challenges for a range of low-resourced languages.
Presents a inspiring case study of a vibrant community working toward building machine translation for African languages spoken by millions.
An inspiring example of work around knowledge equity.
A range of potential applications for Wikimedia projects.”
The authors of the “Participatory Research For Low-resourced Machine Translation: A Case Study In African Languages” include: Wilhelmina Nekoto, Vukosi Marivate, Tshinondiwa Matsila, Timi Fasubaa, Taiwo Fagbohungbe, Solomon Oluwole Akinola, Shamsuddeen Muhammad, Salomon Kabongo Kabenamualu, Salomey Osei, Freshia Sackey, Rubungo Andre Niyongabo, Ricky Macharm, Perez Ogayo, Orevaoghene Ahia, Musie Meressa Berhe, Mofetoluwa Adeyemi, Masabata Mokgesi-Selinga, Lawrence Okegbemi, Laura Martinus, Kolawole Tajudeen, Kevin Degila, Kelechi Ogueji, Kathleen Siminyu, Julia Kreutzer, Jason Webster, Jamiil Toure Ali, Jade Abbott, Iroro Orife, Ignatius Ezeani, Idris Abdulkadir Dangana, Herman Kamper, Hady Elsahar, Goodness Duru, Ghollah Kioko, Murhabazi Espoir, Elan van Biljon, Daniel Whitenack, Christopher Onyefuluchi, Chris Chinenye Emezue, Bonaventure F. P. Dossou, Blessing Sibanda, Blessing Bassey, Ayodele Olabiyi, Arshath Ramkilowan, Alp Öktem, Adewale Akinfaderin, Abdallah Bashir.