Koki: Revolutionizing Speech Tech for Long Tail Languages (Full Transcript)

Koki innovates with open core speech tech for languages with under 50M speakers, enabling economic growth in emerging markets through specialized solutions.

Download Transcript (DOCX)

Speakers

Add new speaker

Speaker 1: I'm Kelly, co-founder of Koki, a startup bringing speech technology to languages' long tail. Pictured here is a pie chart of the world's languages. Each slice of the chart corresponds to a language, and the size of a slice corresponds to the number of native speakers. Here, languages with under 50 million speakers are all grouped together under the category of others. As you can see, there are over 3.1 billion native speakers of these other languages. 3.1 billion people. In a way, you slice it. That's a lot. And it's likely an undercount of the number of speakers, as this includes only native speakers. The problem we're addressing at Koki is exactly this. The long tail of languages have no speech technology, but the majority of the world speaks long tail languages. Our solution is this. Specializable speech technology that can target the long tail of languages, as well as specialized domains. Now, this problem has been around for decades, and one may wonder, why now? Why try to solve this problem now? Well, technology is changing. Traditional approaches to speech technology are becoming less effective as compared to newer deep learning based approaches. We're sitting at the inflection point of this technological S curve, and change is happening. And even within this deep learning paradigm, research advances over the last year or so have further paved the way for tackling the long tail of languages. So simply put, the time is ripe. Not only is technology changing, but the world is changing. Newly industrialized countries, such as Brazil, India, and China, are economic powerhouses. These newly industrialized countries are speaking long tail languages. Over 400 languages are spoken in newly industrialized countries. 400 languages. To reach this long tail, Koki takes an open core approach, open sourcing our core tech while monetizing a closed crust. The core of our tech is speech recognition, speech synthesis, and fully trained up models. All are open source. Our closed crust centers on deployment, allowing the enterprise to scale out to the long tail. As to our core, our open core speech recognition engine is conveniently packaged and available on all major package repositories. It's extensively documented, allowing you to get started in your programming language of choice. It's able to run on even the smallest of platforms. It's language agnostic, supporting any writing system, and it comes with a monosuit with support for numerous languages. And also, almost forgot to mention, has superhuman accuracy. Our open core speech synthesis engine is also conveniently packaged, well documented, language agnostic, and comes with a monosuit with many languages. And it too has human level fluency. Many of the models in the monosuit were trained using resources provided by NVIDIA's Inception program. Using Inception cloud credits allowed us to scale out our training pipeline, increasing tenfold the amount of data used to train our models. Also, Inception hardware discounts allowed us to purchase a DGX-A100, a monster machine. So we'll save on future training costs too. No, there's one more thing. Today, OVH and Koki are launching a long tail language challenge. OVH is generously donating compute time to anyone who wants to train a speech-to-text model using Koki's code and training data. Furthermore, OVH will donate more compute if you're training a long tail language. This challenge will have numerous leaderboards, judged on anything from model size to accuracy. So there's something for everyone. Also, you can open source your models, uploading them to the model zoo, bringing speech technology to languages long tail. You can find all the details by following the link below, koki.ai forward slash OVH. I want to thank you for listening and have fun with the long tail language challenge.

Summary

Generate a brief summary highlighting the main points of the transcript.

Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Key Takeaways

Extract key takeaways from the content of the transcript.

Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Enter your query

{{ secondsToHumanTime(time) }}

Back

Forward

{{ Math.round(speed * 100) / 100 }}x

{{ secondsToHumanTime(duration) }}

Select Audio file