Speaker 1: Hello, I think we are now live. LinkedIn Live number three, we are. Good, I can see myself. So yes, for everybody, welcome to our third LinkedIn Live. We're very excited about this. Giving educational content providers the best speech-to-text tools possible. I think we need to wait for a few people to come online before we kick off exactly into all the juicy stuff, but I'll probably get told when we've got critical mass. We're still in the tens of thousands at the moment. But I guess, well, Ellie, we could do intros yet, but I just just need the sign first. Also, you've not been unmuted yet either, so I can unmute. Hi, everyone. Great, and I've been told we're good to go as well. So once again, giving educational content providers the best speech-to-text tools from Speechmatics and Udemy. It's going to be a blast. It's our third LinkedIn Live. We should be getting good at it by now. Our last one was in person. That was wonderful and echoey. Now we're back in the Speechmatics office and wherever great place Ellie has found to situate herself. Very quickly, from all the Speechmatics viewers, you'll know exactly who we are. We are a speech-to-text company. We convert audio into text. We do it quickly. We do it accurately. We do it inclusively. We want to understand every single voice possible. But you know about us, so let's find out about somebody new. Over to you, Ellie. Who are you? What are Udemy? What do they do? Thanks, Ricardo. So I'm
Speaker 2: Ellie. I'm a principal product manager at Udemy. I've been with the company about two, just over two years now. I'm based in the Dublin office, so that's where I am right now, in the Udemy Dublin office. And my role is primarily, well I work on a few of the teams, but I'm the main accessibility point of contact within Udemy at the moment. So we're a global company, but our team is split across all the different offices. So I'm based out of Dublin, but that's my main role and why I'm here speaking with you today.
Speaker 1: Awesome stuff. And so what does Udemy do, for those few people who don't know yet? Obviously your marketing's great. But yeah, what do you do? What's special about you? What's special about
Speaker 2: your model? Et cetera, et cetera. Yeah, so we always think about Udemy in kind of three aspects. So we have like the three main areas we're trying to target. So we have our instructors. So the thing that makes Udemy special than any other learning provider on the market is the freshness of our content. And obviously the fact that anyone can go and teach. So we don't expect anyone to have PhDs. We think anyone is able to teach. So it's allowing people and
Speaker 1: enabling instructors. So who wants to learn on the platform? Do you vet them at all? Is there anything they need to do? Do they just get on and have fun? Anyone can go and teach anything.
Speaker 2: Whether you'll be watched at all is questionable. It's a marketplace model. So basically based on course reviews, based on the feedback that you receive and the ratings on your courses, that's when it gets pushed up. So it works similar to like a YouTube or recommendation engine or Amazon or anything like that in which your course is popular. And then it rises to the top of our recommendation engine, search engines and various other things. And obviously the instructors do do their own publicizing, but a lot of that's on us in terms of promotions and things like that. So it's very much a promotional platform for the best instructors. And the more successful you are at teaching a particular topic, the higher you'll
Speaker 1: rise within the charts. And have you got any sort of new Twitter type way to sneak yourself to the top? Can you pay $8? No, not right now. No, no, no. No, no, not at the moment. There's a
Speaker 2: few things you can do to improve your ratings, such as getting course badges and things like that. But no, no. We have bestseller badges and things like that that mark instructors out. But no, at the moment it's fine. Whether it will be long term.
Speaker 1: The top instructors on Udemy, sort of how often would their videos be? How wide viewed are these videos? So these are global audiences. So the other piece that
Speaker 2: obviously is our learners. So that's the massive appeal of Udemy is that you can create a course and then that's available globally. So obviously when you have a when you when you're if you have a PhD and you're doing like LinkedIn Learning, Coursera or whoever you're doing it with, you're going to have a very select audience. With Udemy it's a marketplace, so you have access instantly to like everyone in the globe. And that's also why captioning is obviously very important so we can translate all of those languages. But yeah, as soon as soon as you upload your course and you publish it, it's available in the marketplace and it's available globally. So our instructors have millions of viewers and a few of them are millionaires from Udemy. So they're making a lot of money. They put a lot of investment, a lot of time into these courses and they're watched globally by people learning. OK, so we've got a way to go before
Speaker 1: our LinkedIn lives get to quite that level, but hopefully this will kick things up. So if we're looking at sort of how the landscape's evolved, obviously I assume that the pandemic was probably quite positive in one aspect for Udemy, but has that sort of has that changed since now that people are not at home as much? Are they not watching as many instructional videos? How's
Speaker 2: that evolved for you? So because we have the two sides of the house, we have a DTC, so we have a direct consumer market and we have a B2B side. We've seen the direct consumer market slightly dip in terms of revenue, but actually we have still more traffic. So I think the obviously we hit a huge peak just after everyone went home. There was a lot more learning going on at home. People were a lot more self-driven in terms of their learning, but it hasn't actually dipped as much as we were expecting. We were anticipating a huge dip and that didn't happen. And then because we got the B2B side of the market with the Udemy business, that has just taken off completely. So that's growing at a 69% rate year-on-year and continues to grow. So yeah, like we've kind of got the two wheelhouses now at the moment and they are kind of trading off against each other.
Speaker 1: And do you have different requirements if it's sort of B2B or DTC sort of in terms of the quality of the captions or sort of essentially any part of it, right?
Speaker 2: Do you differentiate? Yeah. So I work primarily on the Udemy business side and it's always harder doing B2B in terms of product marketing. I've worked on DTC in the past with various other providers, but B2B is always more complicated. It's the compliance legislation that you have to, I know we're going to be talking about accessibility, so I won't go into that too much, but there's a lot of compliance legislation you have to meet. But in terms of the actual captions quality, the expectation is just higher. You know, if your organization's paying for something and there is a certain cost associated with it, obviously the model you're paying is very much subscription-based. There's certain standards and quality expected on the B2B side, it tends to be just higher. Whereas on the DTC side, we model ourselves as very much an affordable learning platform, you know, so there's more leeway there, there's more leeway for sure.
Speaker 1: Awesome. And so you mentioned the accessibility, the captioning side, that's what we're here, right? Sort of, well, that's why we're here anyway, we like to sort of help out with the captions. So
Speaker 2: big question, why does captioning matter? Yeah. So captioning matters, mainly it's all about trust really for us. So it's trust for our organization. So for the captioning, we need to have really good, we need good captions to meet accessibility requirements, essentially for our organizational customers. So that's complying with certain legislation in America, it's section 508. In the EU, it's the Web Content Accessibility Guidelines, which is also called WCAG. So they matter for our organizations in terms of accessibility, but our instructors and our
Speaker 1: learners... What does good actually mean here?
Speaker 2: Okay, so well, there's a 99% accuracy rating that needs to be associated with those compliance legislations. So that's what good means for accessibility. But there's also has to be closed captions, not just just subtitling, which is slightly different. And the terminology is often confused when you're talking about accessibility there. Closed captions are including environmental cues. So actually 80% of people, I think we said this in when we were doing the marketing for this, but they watch all of their TV shows and everything with captions on. And that's traditionally, it's actually closed captions they're watching. So everyone will be familiar with on Netflix or on Disney+, when there's an explosion in the background or a plane flying overhead or some sort of action sequence, they'll see the brackets of the environmental cues. So that's the piece as well, that is absolutely essential, because it means that people who are actually trying to learn, we want to make sure that they're having the same learning experience as everyone else. And the instructor's delivery and how, if they're laughing, if there's some music playing in the background, that's all part of it. So
Speaker 1: it's important that that's included. Which is interesting as well, because obviously music and laughter, sort of the old foes, the old enemies of speech to text, trying to transcribe words in some manner, and it doesn't always work. Okay, so obviously, captions are hugely important. And which markets does Udemy tend to cover with these captions?
Speaker 2: Yeah, yeah. So that's what I was going to go on to next. So we have obviously a huge global market. In terms of the ones that we're really big in, it's India. It's huge for learning in terms of actually how many learning hours are on the platform. India is our biggest. Japan is massively growing. So I was there a couple of weeks ago, run 65% of the Nikkei Index. We were running a conference, the Prime Minister came. Reskilling is a big agenda in Japan. So yeah, globally, there's a shift towards learning, and there's a shift towards skills. And just building that trust as well, in terms of actually providing accurate captions, in terms of actually providing accessible compliance across your platforms as well. That is just so essential to us, and for our global markets. So obviously, we have a global presence. We're in North America, we're in APAC, we're in EMEA, we're everywhere. And we're getting much larger as well in Latin America. So we have a lot of Udemy business presence there now as well. So Mexico, and Brazil, and Argentina. And we have offices now globally. So yeah, we're pretty big everywhere. Obviously, because of the marketplace model and our instructor model, it slightly differs for us than it does for other people. So our competitors could go and just launch there. But we like having both our courses that are available in our English collection. So that's about, I think it's 8,600. Yeah, it's many courses on our English collection, on our Udemy business catalogue, but also on our marketplace model. We want to make them available both in all the languages that are in English, but we also want to have local language instructors who are teaching courses. And then eventually, once we get big locally in those markets, translate their courses back to English, if they happen to be the best learners at a particular topic, and it's impossible to say where these trends are going to pop up globally, then we want to make those instructors' voices heard globally, no matter what language they're speaking in. Okay, so essentially,
Speaker 1: to summarize, it's instructors across different languages, but maybe get, again, English-speaking instructors into foreign languages, but also vice versa as the market gets large enough. And you're saying in India as well, you see at the moment, is that a real spread across India, hundreds of different languages? Yeah. Is that a spread across the types of languages, or is that primarily English, or a real mixed bag, or how do you see that? It's mixed, it's mixed.
Speaker 2: We do see primarily in English courses, because our course, our content collection is primarily in English, so it's half, it's about half at the moment, I think. So, but any, we have over 70, I think it's 75 languages. We have 75,000 instructors in over 75 languages on our platform. So yeah, there's a big mix of languages. I don't have the list of Indian languages that are supported, unfortunately. I'm glad you've got a cheat sheet there, because I've got plenty
Speaker 1: of difficult questions that we didn't prepare, so I can throw a few more numbers at you. But just coming back to the compliance piece, there are certain things that need to be available with captioning, which is obviously getting the words right, the punctuation's important, as you've mentioned, the description of the various sounds that are happening too. But what else are you looking for? How do you deal with things like profanity, for example?
Speaker 2: So, we use our transcripts. So, once, obviously, you push, I'm going to just go start at the beginning, because I don't know how familiar people are with it. But once you actually transcribe with Speechmatics, we run our courses through. Our courses are really large, by the way, they can be up to 60 or 70 hours long. So, we run them through Speechmatics, and we get transcripts that's produced out of that, which is automatic speech recognition transcript, and that gives you a really high accuracy. We do testing, and one of our engineers actually wrote a really good article about this with Speechmatics, but we do testing on that. I'm going to plug it, yeah, I'm going to plug it, it's on medium. So, but it basically checks the accuracy rate of the wording. So, we can ensure, and we do regular vendor evaluations to ensure that we are using the best vendors for our captions in order to ensure the most accurate keywords, especially with, we have a lot more content and it's getting a lot more specific as well. So, there's STEM content, there's technical content. So, getting those keywords right and it being up to date quickly, considering our content is so fresh and the market is changing so quickly, that's really, really important for us. So, making sure that terminology, that dictionary is there and is supported is key for us. So, that's an
Speaker 1: important piece. And so, and do you, so one of the things that we provide in that area is obviously the ability to add a custom dictionary. So, you can take a glossary of terms that might not fit in a traditional sort of, a traditional vocabulary of a speech recognition engine. You can add those using what we term as our fun phonetics, which is sort of a IOT, would be EYE for I, O as in oh, oh my word, and then T as in I'll have a cup of tea, or a golfing tea, and then that would give you IOT. So, you have that side. That's something that you use at the moment, is that right?
Speaker 2: Yeah, it's, we can use it more. We could use it more. So, there is definitely more we can do there to make it more accurate, but it's something we need to get better at. Obviously, the speed, it's the speed that these things change at that is the most challenging, like RISP is also the things that make this so, so hard. So, yeah, it's something we need to definitely improve on.
Speaker 1: Is that something you'd ever consider sort of exposing to your, to your users themselves,
Speaker 2: the ability to? Not our users, but maybe, but definitely our instructors. So, that's who it matters to the most. So, like when we're talking about captioning and accuracy, it's actually the course reviews, and the instructors are impacted directly by this. This is why I keep coming back to the trust thing, because it's eroding that relationship. So, where these captions aren't accurate, it's eroding away the relationship with our instructors. Like they, they need their captions to be accurate, otherwise they get complaints directly from the learners, they get impacted on their course reviews, and this is a benefit that we're offering them. So, we're directly responsible for that and ensuring that we're providing the best service to them in whatever way possible. That's, that's where it would be most helpful. Essentially, they're driving you
Speaker 1: to be better in every way possible. Yes. So, it sounds like accuracy is pretty important for you, because we tend to see that across our range of partners and customers, accuracy tends to fall into two sort of buckets. One bucket is, I need speech to text just for this piece of compliance, this piece of regulation. I just need, I need it to be good enough, I need it to be fast enough, and cheap enough to get something done. And then we've got the other end of the spectrum, where I think Udemy comes in, where actually every single iota of accuracy matters more. I'm assuming you fall on that end of the spectrum. If you could talk through what accuracy really
Speaker 2: means for you, that'd be, that'd be really helpful. Yes. So, every percentage counts with us, essentially. So, it's the impact to our, like, we publicise that it's Udemy provided captions, you know, and that they're automatically provided on our courses. So, we publicise that, we want to publicise that in order to get that supply in and show the investment into our platform and that we're supporting our instructors globally. So, that's really important. So, if we're then not providing accurate captions, and we need it to be 100%, then it's on us, you know, it's on Udemy, it impacts the brand. If it's not accurate from an accessibility point of view, it's, like, it's totally opposite to what we're advertising ourselves as. You know, we're advertising ourselves as someone who makes learning accessible globally for everyone. You know, that's why we have the price point we have, that's why we have the brand we have, that's why we have the people we have. So, if we don't have accurate captions, then we're failing to provide exactly what we said we're going to be offering. So, yeah, every percentage counts
Speaker 1: at Udemy for captions. Now, that's useful feedback for us, because we're sort of always there thinking, well, do we focus on the stuff that is low enough quality audio that actually a jump there will be from 80% to 85% accuracy, or does anyone care about going from 95% to 96% on higher quality audio, and it sounds like actually that's right
Speaker 2: within your sweet spot. Yes, yeah, we absolutely do. And, like I said, it's, well, there's SLAs related to compliance on the P2P side, where it has to be high, it has to be 99. So, yeah, I mean, if we could get 100%, we'd want 100%. We've got a question from the audience,
Speaker 1: which is actually very exciting, because this is our first genuine question from the audience. In our previous two LinkedIn Lives, we had to can them, which was very sad, but this is genuine. So, drumroll. What is the format of the subtitle that the platform provides as an output? Is it SRT? Also, does it provide diarisation with speaker names?
Speaker 2: So, we don't use SRT. I know that for a fact, we don't have SRT. I can't remember the, I can go and find out the format right now, but I don't have the exact format. Do we use diarisation with speaker names? Currently, no, we definitely don't for speaker names at the moment. We do provide a captions reviewing tool within our platform for our instructors, though. So, this is where I was saying about, like, having the dictionary and the keywords and things like that, as well. And we also have a lot of issues and error rates with dialects and accents, too. But we do provide a reviewing tool and an editing tool within our instructor platform, where they can go and review them and make tweaks. And it's synchronous to their media content, as well. So, they can go and view that and edit it. So, any little names like that, we do guide them towards doing that within the instructor tooling, is to say, this is automated. There's probably a few things, including names, that were a little bit off. You should go and just check where a few names or brand names or, like, any technical terminology might be mentioned. So, we do direct our instructors in order to do that. I can't remember the name of the file, and I'm kicking
Speaker 1: myself. It's not SRT, though. You could be asking from a speech matters point of view, and the answer to that is, so, we provide in a JSON format, but that can also be provided as an SRT, if you're really excited about that. And we do provide diarization, as well. So, the separation of the speakers, both in the file-based system and the live streaming system, too. So, just a little sales pitch there, if you're interested. And two last things I want to touch on. One of them is, sort of, just, we've talked about different languages, but I think also a range of, sort of, accents and, sort of, the fact that nowadays, the conversations we have, there are so many different accents, voice backgrounds we have. And I just wondered if that's something that comes up a lot, sort of, when you're analyzing, sort of, providers for speech-to-text. Is that something that's key for you? Because I assume, as you've mentioned, you have a really broad base of users across, even across English,
Speaker 2: as well. Yeah. So, we don't have that many. So, our content is primarily the instructor talking, like, actually, similar to this, but usually one person. And there doesn't tend to be an awful lot of background noise or anything like that. So, we do provide tooling and testing and stuff for our instructors, as well. So, they can actually, like, make sure their recording equipment's right. We give them a lot of educational tooling around that. But it tends to be, like, someone talking, scripted or non-scripted, tends to depend into a camera. So, there's not an awful lot. But we've done a little bit of testing, and the main one is actually jazzy music in the background. But, like, that's usually at the beginning of videos and things like that. So, not too much. It's not an action movie sequence or anything like that, but it is there sometimes.
Speaker 1: We've had another question, as well, which, a second question. This is really exciting. We get a three. Someone put in a third question, and, sort of, this will break all records. How does the platform, I think this is probably one for us, actually, how does the platform handle multi-person dialogue? Because I assume for Udemy, it's almost just one instructor. Would you get multiple instructors on the same video? Does that happen?
Speaker 2: Rarely. Rarely. Multi-person dialogue, though, I guess we could. I mean, there's presentation things, and there's interviewing things, and there's, like, role-playing kind of exercises within them. But it's very rare.
Speaker 1: So, at Speechmatics, we do two things with multi-person dialogues. If they're coming from different channels, we can separate those channels out, transcribe as if it were one, and then, sort of, have them, essentially, with perfectly described speaker change. But also, we make our best attempt. We use, sort of, like, voice activity detection to work out what is and, sort of, what isn't speech. And we, essentially, give as good as a go that a human would be able to give in terms of separating those out. And, of course, yeah, overlapped speech is always difficult, no doubt. But as I say, we've done a lot of work on our diarization, the speaker separation, over the last year. So, if you've tried it in the past, I would very much recommend trying it again. And then, finally, is that an issue for academic content? Again, I'm thinking of the scenarios, sort of, we do have a lot of, sort of, partners in the academic space. Generally, there's one person doing most of the talking there. And in that situation, it depends, again, where the microphone feeds are coming from, if it's a single microphone or multiple ones. And then, finally, we have had this third question. I'm glad I didn't promise anything if we got a third question. Otherwise, that would have gone wrong. But can the platform handle more than one language in one speech file? And how does it process it? So, we have recently released something called Language ID, which gives you the ability to essentially detect what language is being spoken in an audio file. That is evolving to be able to provide multilingual ASR, as we would term it. So, being able to say this file has a number of languages in it, can you transcribe them specifically? English is interesting in and of itself, because actually, if you look at the training data for a lot of non-English languages, there will be an amount of English in there. So, actually, if you have a small section of English content within another language being used, then that would usually transcribe correctly as well. But we are working on the multilingual models specifically for that in the future. I don't know if that's something that comes up
Speaker 2: with the videos at Udemy. It does. We do foreign languages being taught on Udemy. A lot of English courses. That's actually one of the most popular things in some of our foreign language countries. It's obviously English courses. But the way we break down our courses is on a per-asset video basis. So, we will always primarily have one language or another. So, we can have that on our side before we spend over to speechmatics.
Speaker 1: So, how do you do this? If I was teaching French today, and so, my name is Ricardo, and then move on to the next bit, would you be chunking that up and sending that to speechmatics,
Speaker 2: or is it just a bit of a headache? No, we do. Yeah, we chunk them up. We send them over to speechmatics. We do have detection. We ask our instructors, obviously, which language they're primarily speaking in and various things like that. So, there are lots of ways for them to tag particular assets and things like that for us to identify them.
Speaker 1: Great. Awesome stuff. And given that this is, again, for anyone looking at the captions right now, I think we're still doing the LinkedIn captions, which is not speechmatics captions, which is very obsessing. So, that's why the quality isn't high. So, we'll just see if we can break it by giving a final sentence. Merci à vous, Ellie Good, de Udemy. C'était un plaisir de te voir. Thank you for everything. And yeah, any other questions? We will see you next time for our
Speaker 2: next session. Ellie, any final words? No, just thanks so much for having me today. And let's continue doing good work together. Thank you. Brilliant. Thanks a lot. Cheers. Bye-bye.
Generate a brief summary highlighting the main points of the transcript.
GenerateGenerate a concise and relevant title for the transcript based on the main themes and content discussed.
GenerateIdentify and highlight the key words or phrases most relevant to the content of the transcript.
GenerateAnalyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.
GenerateCreate interactive quizzes based on the content of the transcript to test comprehension or engage users.
GenerateWe’re Ready to Help
Call or Book a Meeting Now