IISc researchers receive USD 2 million to democratise voice technology development in nine Indian languages

Researchers led by Prasanta Kumar Ghosh, Associate Professor at the Department of Electrical Engineering, Indian Institute of Science (IISc) have received grants worth USD 2 million from the Bill and Melinda Gates Foundation and the German Development Cooperation initiative (FAIR Forward – Artificial Intelligence for All) to support open source voice technologies in Indian languages.

In the era of mobile phones and connected devices, human-machine interaction has become critical for gaining access to information, services, and facilities. However, large numbers of low-income people, who face barriers of literacy, skills, poverty, gender, and other socio-economic biases, are unable to efficiently utilise digital technologies. Even though voice technologies have seen remarkable progress with the advent of digital assistants like Alexa, Cortana, Siri, Google Assistant and others, people in low-income areas continue to be excluded from the benefits of this technological revolution. This is especially true for low-income women as existing gender gaps in access, education, rights and empowerment worsen the digital divide for them even further. The languages, dialects and accents of these excluded groups are largely ignored in the creation of new artificial intelligence and machine learning models. Voice technology in local languages, targeted at these marginalised populations, can help bridge this digital divide for the next wave of internet users.

With this goal in mind, the researchers at IISc have embarked on this project with the aim of creating and making available datasets for the development of voice technologies in nine Indian languages: Bhojpuri, Maithili, Maghadi, Hindi, Chhattisgarhi, Bengali, Kannada, Telugu and Marathi. A majority of the existing training data sets required for building such voice technologies in Indian languages are not in the public domain and lack local innovation. They are also focused on the languages and accents used in the highly profitable economically developed markets, biased towards urban and educated users. The collection of open voice data, particularly for less literate and marginalised populations, will strengthen the local AI ecosystems and enable millions of people to access services they are not able to use yet – be it in agriculture, education, health, or other sectors.

The initiative is targeted at open voice data sets that can be used to train machine learning algorithms in a freely accessible way and enable the creation of open-source AI-based solutions. The work will also support the Indian Natural Language Translation Mission (NLTM) under the Ministry of Electronics and Information Technology (MeitY) and help free the potential of voice technology that until now remains widely untapped.

The project aims to collect more than 11,000 hours of high-quality, gender-balanced voice datasets for automatic speech recognition in nine Indian languages in the domains of agriculture and finance which are highly relevant for poor farmers and women. The investment from FAIR Forward will support the collection of nearly 1,000 hours of gender-balanced high-quality speech recordings from voice artists for the development of text-to-speech applications in the same nine Indian languages. The datasets will be made available openly and freely for Indian academics, start-ups, researchers, and developers to spur innovation and academic activity in the development of regional voice technologies in India. This, in turn, will play an important role in enabling a technology ecosystem in India, by democratising the creation and use of voice technologies to make life-changing digital services accessible to millions of users.

