Transforming Audio to Musical Score: BPM Detection and MIDI Conversion
Learn how to convert audio files to musical scores and MIDI using Librosa and Music21. Discover BPM detection, pitch estimation, and auxiliary functions.
File
Python Tutorial - Basics of Automatic Music Transcription - IV Wrapping Up
Added on 09/06/2024
Speakers
add Add new speaker

Speaker 1: So, let's revise a bit. What's our problem? We want to get Switched Idle Mine, that is an audio file, we load it, we apply the transformation, from time we now have a time frequency information. We calculated the onset, where the notes start, so we have something like this. To go from audio to this, to the musical score, and to a MIDI file, we still need some steps, and things get a bit complicated also. So what do we need when we want to define a musical score? We need to know the key of the song, we need to know the BPM that should be performed, we need to know the notes itself, and we need to know the duration of the notes. Bear in mind that even though Librosa is a very powerful library, it's not perfect. Some tasks are very difficult, and people are still researching how to improve and optimize, so we will not achieve 100% success, but we can get some good results. So with this information, I would like now to calculate the tempo, the BPM of the song. With some internet research, I found here in many places that the Guns N' Roses Switched Idle Mine has a BPM of 126, so it's 126 beats per minute. I've seen some more or less, but in this region. Librosa also has this very nice function which is Beatrack, so you should read here the documentation. I believe that at Music Information Retrieval, there's also a tutorial on Beatracking, so there's also a tutorial that explains how some things work. My goal is to detect the tempo, the BPM, so when I go to the score I can set and execute the notes according to a certain BPM. I'm defining here a function that is EstimateTempo. I'm passing the onset envelope that I calculated before, and this function will give me the tempo. So in this case, it calculated 130. So if you consider, I saw some places 126, 125, 128. Okay, 130, not so bad. And here I'm also introducing the Music21 library, which will be the library that will allow me to construct the score itself. So I already estimated the tempo and I set this metronome mark referring to this tempo. Okay, now it comes some auxiliary functions. For MIDI sometimes you need to give the duration, not in terms of seconds, but in terms of quarter notes. So this is a function that will just convert from time to beat, so it will convert seconds to quarter notes. Another auxiliary function here is in MIDI, levels go from 0 to 127, and I'm getting there an amplitude. It can be in dB, so a minus 120 to 0 dB for different magnitudes, and I want to remap this to MIDI. And just as a demonstration, I am also generating with sine waves. So we have here the formula of a sine wave. We need an amplitude, and we need a duration, and we need a frequency to play. So we can synthesize a note by constructing a sine wave that needs an amplitude, the duration, it needs the frequency. So these are auxiliary functions that will be called inside other functions, so I can translate from time to quarter notes, from magnitude to MIDI velocity. So I'm defining here. So what was my goal? My goal is now I have this time frequency representation. I want to detect for each segment which note it is. So for example, in the beginning I have that many seconds of silence because the song still didn't start. Then during this interval, I want to execute this note, and I want to have a transcription, and I want to have a MIDI file, and I also want to generate a sine wave. So I will create what I'm calling here a music information array. So it will have a sine wave information, and then it will have two different, and then it will have the MIDI notes, so MIDI information. It will have this music 21 information because you need to construct this. You need to go to the music 21 documentation and see how you need to create a note, and to give it a note duration, and then that you are able to have the transcription. By the way, to have this picture here, you need to have the music score installed because music score is what will the music 21 use it to have this picture. Unfortunately, if you're running it remotely, I could not set up in the Linux server to run music score. So this, you will not show this. So what you will have, you can have a text information. So for example, it's a music 21 note C sharp, and these are the times that they need to be played, and this information can be converted also to MIDI or to the transcription itself. So let's go back. So I need the sine wave information, amplitude, duration, frequency. I need the music 21 information, and I need MIDI information. So what I'm doing here, I'm creating a function that is called estimate pitch and notes. I will give an array. It will have the onset boundaries, which is each segment, and we will go through each segment. It will estimate the pitch, and it will return, for example, a sine and MIDI note information. Here is the function that estimates the pitch itself. It goes in each segment, and it will detect which note it is from this frequency. So this function here is the most complex. It will take the estimated pitch, and will construct my information, giving for the sine wave, it calculates the amplitude, the fundamental frequency, then it converts also to the MIDI information. So in MIDI, we need the MIDI velocity, we need MIDI duration. So we need all these functions to calculate the MIDI, the sine wave, and the music 21 notes. And finally, I'm creating the music information array, which puts together in an array sine wave, MIDI, and music 21 notes. So you should go through the code. It's a bit complicated, but just see the nested functions. We're going each segment of time. We try to estimate the fundamental frequency. We're just taking averages to try to get which note we should play. We have the segment, which is the duration. We convert from dB to amplitude to give it the sine wave amplitude, and to give it a MIDI velocity. We have these auxiliary functions to convert time to quarter notes, to estimate the tempo, to remap from values. And we end up with something like this. you

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript