The partner organizations behind Lacuna Fund -- Canada’s International Development Research Centre (IDRC), Google.org, the Rockefeller Foundation, and the initiative “FAIR Forward: Artificial Intelligence for All”, implemented by Deutsche Gesellschaft für Internationale Zusammenarbeit (GIZ) (www.GIZ.de) GmbH (German Development Cooperation) -- announce the second cohort of supported projects.
Funding recipients will create openly accessible text and speech datasets that will fuel natural language processing (NLP) technologies in 29 languages across Africa.
The teams will produce training datasets in Eastern, Western, and Southern Africa that will support a range of needs for low resource languages, including machine translation, speech recognition, named entity recognition and part of speech tagging, sentiment analysis, and multi-modal datasets.
All datasets produced will be locally developed and owned, and will be openly accessible to the international data community.
The availability of high-quality data is crucial and one of the main barriers for the development of local AI-based solutions, especially in the global South where resources to acquire data are scarce.
Both the availability of training data and AI-based solutions can play a major role in addressing current inequalities regarding access to knowledge, services and the diversity of cultural expressions.
An example of impact-driven AI-based solutions is voice interaction: it has the potential to enable millions of people access to information and services, preserve cultural heritage, make technology more inclusive and ultimately foster social and economic development and local value creation.
Lacuna Fund received over 50 outstanding applications from, or in partnership with, organizations across Africa. While each one of them, and many others, are poised for impact, the selected projects include among others:
Building an Annotated Spoken Corpus for Igbo NLP Tasks — University of Ibadan/Nweya
Entity Recognition and Parts of Speech Datasets for African Languages — K4A/Nabende
Open Source Datasets for Local Ghanaian Languages: A Case for Twi and Ga — Ashesi University/Boateng
Masakhane MT: Decolonizing Scientific Writing for Africa — K4A/Abbott
Building NLP Text and Speech Datasets for Low Resourced Languages in East Africa — Makerere/Katumba
Multimodal Datasets for Bemba — University of Zambia/Sikasote
“If we want to seriously level the playing field, we not only need to invest in open training data, computing power and machine learning expertise but raise attention and bring visibility to the growing African technology ecosystems”, said FAIR Forward’s Balthas Seibold.
“The Lacuna Fund builds on a recent groundswell of momentum to create better and more open NLP tools in African languages from machine learning community members, including academic workshops and programs, volunteer collaborations, startup projects, and other efforts. The exceptional quality and variety of all submissions is testament to this and the new funding round supports and amplifies the broad, creative, and impactful work that is already happening across the continent.”
Participating in the Lacuna Fund complements FAIR Forward’s activities to build skills and capacities which can directly use these datasets to create AI-based solutions.
In addition, FAIR Forward also creates open AI training datasets, especially for voice recognition in low-resources languages, which also contribute to the work of the Lacuna Fund projects.