Speaker 1: Ever wish you could instantly make sense of the audio data you collect, phone calls, interviews, and customer feedback? Audio transcription and subsequent analysis have traditionally required external tooling and difficulty integrating outputs with your existing data. This kept valuable audio insights separate from the rest of your structured data. BigQuery integrates with the Cloud Speech-to-Text API, allowing you to unlock the knowledge trapped in your audio files directly from your data warehouse using simple SQL commands. You can join the transcribed audio data with your existing structured data and generate new insights. In this video, we'll explore how Cloud Speech-to-Text integrates with BigQuery. You'll learn to transcribe audio at scale and enrich your existing datasets with audio insights, driving better decision-making and a deeper understanding of your customers. Let's first understand how BigQuery sets the stage for analyzing all sorts of data, including the unstructured kind. BigQuery, Google Cloud's data warehouse, is powerful for structured data use cases, but it also handles unstructured data such as audio, images, and video with ease. BigQuery accesses these assets using object tables, which provide a structured interface to your unstructured data in Google Cloud Storage. To extract value from audio objects, you employ Cloud Speech-to-Text. This specialized AI service transcribes your audio with impressive accuracy, untangling even noisy environments in conversations with multiple speakers. You can choose from a variety of audio models, including chirp, telephony, and others depending on your input audio source and languages. Now, the magic happens with BigQuery's direct integration with Cloud Speech-to-Text. You can use the SQL you already know to tap into its transcription power. You simply invoke the ml.transcribe function over an object table, which points to your audio in Google Cloud Storage. The results are precise transcripts of your audio landed right inside of your BigQuery environment. Now you can search call recordings for keywords, analyze customer sentiment over time, or gain insights from interviews, all within your data warehouse and connected to your other essential business data. This capability removes the need to set up and maintain complex pipelines in order to analyze audio and join the results back to your business data. Instead, it allows your audio insights to flow smoothly into BigQuery, available for analysis at scale. Alright, let's hop into the Google Cloud Console and walk through an example to transcribe audio. Imagine that I work at an online e-commerce retailer, and customers call our support line if they need help with their orders. Here I've assembled some short clips of these customer support calls. You can see a few audio files in this Cloud Storage bucket, and let's see how we can turn this into actionable data in BigQuery. I'm going to first navigate to Cloud Speech-to-Text within the console. I'll select Recognizers, and recognizers represent stored and reusable recognition configurations for our audio data. I'll click Create to configure a new recognizer. This menu walks us through setting up a recognizer, and I'll begin by giving it a descriptive name, Telephony Recognizer. I'll then set the location to US, and the model to Telephony. Lastly, the language code to English, US. I'll also tick the box, Enable Automatic Punctuation, so that my results have punctuation. There are a number of additional options available too. Feel free to check them out for your own use case, and remember that these are just the default settings for your recognizer. We can always change or override them later on if needed. I'll click Save, and we're brought back to the list of recognizers. With the recognizer ID and location, I'll need them in a minute when I reference the recognizer in BigQuery. Now I'll navigate over to BigQuery. I first need to create a cloud resource connection. This allows BigQuery to access the audio files in cloud storage and invoke speech-to-text jobs. To do so, I'll click Add, Connection to External Source, Connection Type as a Vertex AI Remote Model, I'll name it Audio underscore Con, and click Create Connection. I can then navigate to the cloud resource connection and copy the service account ID. I need to give the service account a few permissions, so I'll open up IAM in a separate tab. I'll assign two roles to this service account, Storage Object Viewer, and Cloud Speech Client. Then I'll move back to BigQuery. With permissions all taken care of, now I'll create a model within BigQuery. I've already filled in each of the required variables, but I'll highlight each one. I first enter a project ID, a dataset, and a model name. Then I input the cloud resource connection information. And lastly, I enter the recognizer. I need to input the project number, recognizer location, and recognizer name. This model setup is one time for this audio recognizer, and I can continue to use it in this project moving forward. My last step is to create an object table. This gives a structured interface to objects in cloud storage. The object table takes a couple of inputs, namely, the cloud resource connection we created, and the Google Cloud Storage location where our audio files reside. Note that object tables store metadata only, and they don't move objects from cloud storage to another location. Now that I've completed all of the one-time setup, I can use the ml.transcribe function. The function takes two inputs. The first is a remote model I defined earlier, which points to the speech-to-text recognizer. The second is the object table that references the audio files in cloud storage. Next I'll run the query. And behind the scenes, BigQuery is asking speech-to-text to transcribe the audio files using the recognizer I created, and return the results in a structured format. Upon the query finishing, I can now query the table that contains our transcriptions. Here I see a number of useful fields. The transcripts field contains the transcription, the ml.transcribe result field contains a JSON blob with information about that job, and the result status column returns empty values which indicates success for each of these records. I've also returned object table columns like the GCS location of the file and more. I can also override the recognizer's default configuration by explicitly specifying other inputs in the recognition config. Here I use a different transcription model and a different language. And now, with the transcriptions in BigQuery, I can use this data for analytics purposes. I can join it to other tables in BigQuery to augment my existing data posture, or I could even just run a simple translation to turn it into another language. Let's give that a shot. Here I have a query that uses the ml.translate function to turn this English text into Japanese. This is useful for our staff in the Japanese office who also work with customer feedback. I'll run this and the results come back in Japanese all within my data warehouse. Audio data holds a wealth of knowledge, but its value often goes untapped. Unlocking these insights should not be a complex process. And with BigQuery's speech-to-text integration, you can seamlessly incorporate these insights into your data-driven decision making. With transcribed audio in BigQuery, you can perform new analytical use cases. Examples include joining transcribed metadata with other structured data in your BigQuery tables, using one of the other BigQuery machine learning functions to perform tasks such as translating transcribed audio to a different language, or summarizing a longer transcript into a single sentence. Or maybe even calling a large language model in Vertex AI through BigQuery to summarize classify, perform sentiment analysis, or ask questions and answers of that data. Give it a try. Check out the description below for a step-by-step guide and resources to get started.
Generate a brief summary highlighting the main points of the transcript.
GenerateGenerate a concise and relevant title for the transcript based on the main themes and content discussed.
GenerateIdentify and highlight the key words or phrases most relevant to the content of the transcript.
GenerateAnalyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.
GenerateCreate interactive quizzes based on the content of the transcript to test comprehension or engage users.
GenerateWe’re Ready to Help
Call or Book a Meeting Now