Stanford's Neural Network Transcribes Songs to MIDI and Sheet Music (Full Transcript)

Discover how Stanford researchers' open-source neural network transcribes songs into MIDI files and sheet music, excelling with pop songs.

Download Transcript (DOCX)

Speakers

Add new speaker

Speaker 1: Apparently, it's not only the artists and programmers who are going to get automated in the nearest future, but also musicians, because these three guys, Stanford researchers, to whom I'm incredibly grateful, they dropped this incredible paper, a neural network, completely free and open source, that can transcribe any song into MIDI files or even sheet music. I played for a little bit, and it does incredible stuff, gets most of the stuff right. It's been trained on mostly pop songs, so it does better with them. For those interested, I'm going to leave all the links in the description. Let's give it a try, just to demonstrate how it works. So first, I'm going to need to run the program here. It's called the SheetSage, and if I want to use YouTube as a source, I'm just going to need to copy the link. Let's try Miss You. So copy link address, paste it here as an argument to this program, and it starts fetching the audio, retrieving audio. You can also pass an MP3 file, of course. It starts detecting beats after that, and you can run it on a potato, on a calculator, anything. There's also a much more demanding version of it, that requires an NVIDIA GPU with 10 plus gigs of VRAM, I think, or 12 plus. Yeah, but it's also supposed to work better if you're getting unsatisfactory performance with that one. So it is done transcribing, and let's see what it got us. So this is a new folder, and by default, it outputs the PDF. It's supposed to be the, what, Miss You, I think. Yeah, harmonies, and MIDI file, too. I'm going to play the MIDI file, let's see how close it is to the original. You know, it doesn't sound close. Which one I transcribed, I don't remember. Which one is that? Miss You, yeah. This one, it didn't get, it did get the harmonies, though, the melody, not so much. Let's try something else. Something I'm actually familiar with. This one, this one's nice. I'll have to do that one. In general, the ones I tried, it did a fantastic job on that. Sometimes it refuses to run, especially if a song has multiple key changes, tempo changes, the, what, 4 over 4, I don't remember what it's called, key signature, I think. Key signature, and all of that modulation. It has some trouble getting that correctly, but don't consider it a final product, consider it a stepping stone, and maybe even a foundation to what's yet to come, because this is incredible. It's maybe 20 to 30% better and more accurate than the previous models. And the fact that you can just run it like that, okay, which song is this? Oh, Stay Together for the Kids. Let's listen to the MIDI file it produced. Whoa. Wow. You can sing along to that, because the harmonies are all right. At least for Monét. Whoa, this piece is not perfect. Oh no, actually. I lost it, okay. Well, let's try something else now, because this one got partially right, but I want to see whether it can get something even better. All the small things, it's incredible. Let's try this one. You can try a bunch of others, various genres, not just pop songs, but I personally did not give it a try yet, curious as to how that's going to turn out. Let me know if you have any trouble installing that. For now, you need Ubuntu or Debian or other Linux distro for that. Maybe you can run it with Docker, not that sure. Let me know how it goes. All this stuff, I'm overly excited, more than I should be. Just imagine what's yet to come. Transcribing, almost done, that fast, it would take a musician much longer than that. The one who doesn't have a perfect pitch at least. All the small things, let's give it a listen. That's the intro, I think. Where is it? Da-da-da.

Summary

Generate a brief summary highlighting the main points of the transcript.

Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Key Takeaways

Extract key takeaways from the content of the transcript.

Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Enter your query

{{ secondsToHumanTime(time) }}

Back

Forward

{{ Math.round(speed * 100) / 100 }}x

{{ secondsToHumanTime(duration) }}

Select Audio file