Azure Speech Service: Transforming Transcriptions
Explore Azure's enhanced speech recognition, offering custom language models and accurate transcription, even with basic devices, for various industries.
File
Azure Speech Service Vision Keynote Demo Microsoft Build 2019
Added on 01/29/2025
Speakers
add Add new speaker

Speaker 1: Now, one service I wanted to showcase today is the Azure Speech Service. Not only is the Speech Service getting better and better when it comes to speech recognition, in fact, what you'll see in this demo is even for commodity hardware to replace any complex microarray setup so that your speech recognition is world class. But the most interesting thing is when you combine speech recognition with language models that are specific to your organizational data, you can start picking up all the jargon. So imagine a transcript that gets created that has the ability to understand the local jargon that's specific to your organization, your industry, that way making the transcript that much more useful. So let's throw it to our team out on the gallery to show you speech translation and transcription.

Speaker 2: Music

Speaker 3: The prototype device connected to a cloud service that provided live transcription and translation. We're proud to announce today that we're making the conversation transcription capability within Azure Speech Services available as a preview release. Come on, let me show you. You might also remember this hardware from last year, which we're also making available as a developer device kit. But today, my colleagues and I are going to give you a demo of new research that we believe will make meeting transcriptions more easily available to everyone in the future.

Speaker 2: We are going to show you this demo using just the microphones built into this laptop and these two smartphones we have in front of us. With these, we create a microphone array in the cloud that enables Azure Speech Services to provide accurate in-person meeting transcription even without a special meeting device.

Speaker 4: Also, you will notice that I didn't bring my phone. But the service can still recognize my voice and correctly identify me because I've given it permission to use my voice print to transcribe what I say. Now the second thing we're going to show you is that the language model of Azure Speech Service can be trained on the data in your company's Microsoft 365 tenant. So it can learn the unique vocabulary of your industry or company. This is available in private preview.

Speaker 3: Okay, so basically for the next two minutes, we're going to have a rap battle of sorts, but for all of us geeks here in the room. So Heiko is a principal PM on the speech team, and he's going to give us an example of some dev speak. And Yousef is in healthcare marketing, and he's going to dazzle us with a little bit of healthcare tech jargon. So while they speak, I encourage you to follow along with the transcript on the screen so you can see just how powerful this service is. Heiko, take us away.

Speaker 2: Azure Speech Services are built with VMs running on Azure hypervisors using Ubuntu-based Docker containers that are orchestrated with the Azure Kubernetes service. Azure Speech Services enable a variety of technical capabilities, including ASR, Neural TTS, Microsoft Translator, and related custom services. You can access these using your favorite programming language, such as Java, JavaScript, Node.js, C++, or C Sharp, and others.

Speaker 3: The bar has been set. Okay, now it's your turn to give us a bit of this healthcare jargon.

Speaker 4: Microsoft Teams can provide EHR integration through ISV vendors, including Infor Cloverleaf, Redux, and others via the HL7 FHIR standard. HL7 FHIR is HIPAA, Mars E, and GDPR compliant, and is based on modern technology, including HTTPS and RESTful protocols, as well as extensible APIs. The FHIR open source community makes their source available on GitHub, and the Microsoft Teams FHIR implementation is also aligned with Project Argonaut and follows the US Core profiles for all the FHIR resources it consumes.

Speaker 3: Well, that was fun. I'm going to call that a draw. So while we're going a bit overboard there, we understand that this is incredibly important so that every company in every industry, with their own specific jargon, can have accurate transcriptions. We're really excited about where this work will take us, and our future ambition is to enable conversation transcriptions for anyone, anywhere, at any time. Thank you.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript