DeepSeek AI: Cheaper, Open Source, and Outperforming

Convert Your Audio To Text

4.9/5

3743 customer reviews

Explore how DeepSeek's open-source AI model is revolutionizing AI with cost efficiency and transparency, challenging industry giants like OpenAI.

Deepseek R1 Is Really, Really Good

Added on 01/29/2025

Speakers

Add new speaker

Speaker 1: Open AI should be scared right now. There's a new player in the open source model space that isn't just close to chat GPT performance. It's beating it in a lot of ways. And it is comically cheaper too. And I mean comically cheaper. We're talking a reduction from $15 per million tokens to 55 cents and $60 million output tokens to $2 19 cents 96% cheaper. And this is on GitHub. Because it's an open model that you can download and do whatever you want with there are catches, but there are incredible things going on here too. As the first as far as I know, ready for production, open source reasoning model, there are so many interesting nuances to go into here. And I'm really excited to do all of that with you from comparing performance to other models to showing just how good it is, as well as diagramming out all the things that are good, bad and ugly about it, how it works, why you might want to be careful, and a whole lot more stick around. Before we can do that. A quick word from today's sponsor, you're watching this video, you're probably a pretty good engineer. What about the engineers around you, though? Do you want to hire the best? Do you want to have a team full of people that are as talented as they could be? We all do. But it's getting harder and harder, especially as we get flooded with these terrible, useless AI resumes. That's why I partnered with today's sponsor g2i. These guys get it, they will make it way easier for you to hire the best engineers. And you'll know what you're getting as you go in. When I say these guys get it, I mean it. They run react Miami, which is my favorite react event. Sorry, react conf. Seriously, you should check out react Miami if you can go next year. It's super fun. They also help giant companies like Webflow and meta with their hiring. So they're not just helping random small startups, but they will help yours if you're interested. What really makes these guys different is the pool of talent they have ready and available to go. We're talking 8000 engineers that can start literally under a week, you're not just reading a resume and hitting a checkbox. They have full remote video interviews that you don't even have to show up for where they do the video interview and show you the results. You even pitch your own questions that will be asked and get a video response back. So you actually get to know the human who you're going to be working with. And they can help hire pretty much anywhere from US and Canada, Latin America, and even Europe. And it's not just full time engineers, either. You can have part time contracts to this as well. When you start working with g2i, they'll form a slack channel with you with almost immediate updates. These guys are on the ball. They're one of my most responsive sponsors. I know they'll be even more responsive with you guys. If you're trying to hire great engineers and get them shipping fast, it's hard to beat g2i. Their goal is seven days from interview to the first pull request being filed. And I've seen them hit this, I think they will for you to check them out today at soy dev. link slash g2i. So first, we should probably better understand what a reasoning model is. I'm going to give the example on chat GPT quick got his sign in, because I'm still switching over browsers. Chatterbot has a bunch of different models. Oh, one is the one that we care about here. It is an advanced reasoning model. What that means is if you ask it to do something, it's going to think about the thing before it does it. So I'll just ask it, how are oranges grown? And you'll see it is thinking, thought about orange cultivation for a second. So I didn't think about it for long, it just immediately went into an answer. But if you give it a harder question, like solve advent of code 2021. Day 12. In rust, you'll see it's thinking, and it will give you a little bit of vague information on what it's thinking, navigating the cave system. So I'm assuming there's a cave system in this problem, parsing and counting. So the way that these models work effectively is instead of just taking your prompt and trying to autocomplete the rest, which is effectively how most AI models work, they're just a fancy version of autocomplete, using all of the words before to guess what word is most likely to be next, which works great, because after a question mark, the thing after is most likely to be an answer here. Instead, it adds a question that is, how would you solve this? So come up with a process, and then it goes through each of those steps and verifies them, it doesn't show you much of what it's doing gives you tiny little blurbs that it's thinking. And since the model is an open source or self hostable, you have no access to it, we don't really get to see beyond this rough idea of it thinking what is going on. And it spits out an answer all works fine. Oh, it's not even done. I, I consistently forget how slow chat GPT is shout out t three chat, we'll get there in a bit. This is how reasoning models work, though. One of my favorite things about the new deep seek model is that it's open source. And because it's open source, they're a lot more willing to just show you what it's thinking. You can see a bit more detail here of how it thought about these things, like what the steps that came up with were, but it's not that much info. In contrast, let's run a quick test using deep seek r1 on t three chat, which is now available for all users on the $8 a month tier, deep seek r1. And now you see I have this folded down reasoning will make the UI better chances are by the time you're using it, we've already fixed this. But in here, you can see the full plain text of all of the reasoning that it is doing. I need to solve Advent of code 2021. Day 12. With Russ, let me read the problem statement again to make sure. Right. So part one probably requires counting all such paths. Example has given a few test cases, let me think about the approach. And it gives you all of the info of how it is thinking about this as it goes. This has a negative in that it's slower to get you the answer. It also has a negative that when you're doing this, it costs a good bit more in the server side, generating way more tokens than you would be just spitting out a traditional output. But as a result, you get much more accurate and consistent answers. But as you see, we're spending a lot of time thinking, figuring out what we actually want to do here. But it gives you so much context on how it is thinking. And this is awesome, because it lets us better see where our models are stumbling, what problems it's not getting make better prompts in order to make it more likely to generate the right answer at the end. It's real cool. It's so insane seeing this deep into how the model is thinking. I'm genuinely really impressed. That said, if you give it a hard enough problem, it will just go like it's still going here. On one hand, it's given us significantly more context than chat GPT did. Here we see all it gave us was these four paragraphs, and we're getting from our one is a hell of a lot more. But now I'm scared we're going to hit a timeout before it actually finishes generating because it's so much Jesus, I might have to bump our timeout window again. It's a hard problem. Okay, we got an answer. I should have timed how long that actually took. But we did ultimately end up getting an answer from our one. I'll see if it works. But first, I want to see if the chat GPT one works. So let's see. Part two is 14655353333. Look at that solid. And if we see if the one here works, copy, paste. Oh, no, did it not finish the code output? Did it die right before finishing it rip? Took 291 seconds. Yeah. Interesting. Here's what we'll do. We'll tell v3 finish the code from before. Let's see if the finished version here will work. Do we have a main? Yes, we do. Cool. So finishing up with deep seek v3 worked. Kind of funny to do it that way. But I am going to test one last thing, which is giving the same prompt for relatively hard event of code problem to a dumber model. Give it a forum any copy. It could not joyous. And obviously, everyone's favorite. We'll test it with Claude. I'm so happy to have an AI chat that doesn't suck. Someone who's been playing with these things a lot more. I'm so annoyed with everyone else's ui's. And look at that. They got it wrong as well. Even a advanced, smart, super powerful model like Claude can't successfully solve this problem. This is why these reasoning models are magic. It's like, oh, sure, generate a solution. It's the only model that is able to successfully generate a solution to a problem of this difficulty, which is really cool. This is now the difference between models that can help with small things, but not solve hard problems. And now they can solve much harder problems. It's super impressive. As I mentioned before, it's open source, that doesn't mean all of everything they use to train it is there for you to use. But it does mean the model itself can be downloaded and run. I've seen people running this on their phones already, which is nuts. It's one of the lighter models, but it's still really powerful. By the way, I'm such an AI guy now that I'm even posting on LinkedIn about it. I'm so sorry for my sins. With 01, 1 million input tokens costs $15. With R1, it costs 55 cents. 1 million output tokens with 01 is $60. And with R1, it's $2.19. That's insane. That is unbelievable difference. This is cheaper than Claude is. Sonnet is $3 for a million in and $15 for a million out. DeepSeek is 55 cents for a million in and $2.19 for a million out. Are you kidding? Do you understand how monumental this is? This fundamentally changes when using these really smart models makes sense to do on like a deep level. Interesting to see the forward DeepSeeks R1's thinking tokens into Sonnet. Ooh, that's an actually really interesting suggestion. What if we took how DeepSeek thought and handed that as additional context over to Claude? Let's give that a go. I'm actually very curious. So we have this reasoning dump here. Let's copy the whole thing. I hate myself. We'll add a feature for this in the future. I am in so much pain right now. I've never been so ashamed of my own service. Solve advent of code 2021 day 12 in Rust. Here's some thoughts on how to do it. Paste a lot of text. Hope that is not outside of the context window. Let's see if using the context that was generated by R1, we can get a decent answer. Cool. This is gonna be a really interesting test. Holy shit, that worked. Clever chat. So the term for all of that thought, all of the dump that was there, all the reasoning is chain of thought. It's the thought that the model went through before generating a result. And what's interesting with this model is they're doing a lot of different things with that. If you check out their GitHub, they're very transparent about how they're doing this and what they're thinking about. I was expecting them to build this almost entirely around the DeepSeek v3 model because it's a good model. What surprised me here is that they actually created six dense models distilled from DeepSeek R1 based on Lama and Quen. Lama's the model for meta, as we all hopefully know by now, but Quen is the model from Alibaba of all places. And it's actually pretty good. But when you take advantage of the reasoning layer in front to hand it more context and ask itself more questions, the result is kind of insane. Like to have a code forces score that is right at the same line as open AI is best in class for literally 96% cheaper, insane. How could they possibly ever have done this? I want to do a DeepSeek video about version three when it dropped because it was so interesting, but it wasn't quite enough to cover. It's actually one of the biggest inspirations for me to make the whole of T3 chat originally T3 chat only supported DeepSeek. But sadly, DeepSeek v3 model started to get slower tangent we'll get to so I made the last minute call the switches to forum any that said, I've been really impressed with the DeepSeek is doing because they're doing it very different. The biggest difference for how DeepSeek works is that they're training on generated data. This is a big change. There's a quote from Ilya that's been haunting me for a while now. Ilya said in a presentation recently that data is the fossil fuel powering the AI world and he's so right that it hurts my brain. What that statement means is effectively all LLMs are are really, really advanced autocomplete. So if you're typing on your iPhone, hey, man, watch and then wait, your iPhone will start making suggestions for what word is next based on all the other things you've typed in the model that is created on your phone based on its knowledge and history of what word is most likely to come after whatcha with this context. And you know, the meme where you just hammer the recommended next word button to see what your phone thinks about you. It's effectively that and it turns out if you ask a question like how are oranges grown, if you have enough context from the entirety of the internet, the thing that is most likely to be immediately following a question is an answer. And it's probably pretty likely that the next word is oranges and then more things but it is almost entirely based on the probability scanning this insane amount of data of what is most likely to be next. This is a gross oversimplification. The point being, this will give you a rough mental model of how these things work. They are trying to find the most likely next word based on the previous words. In order for this to work, you need to have an insane amount of data. So open AI and athropic and all the other companies that were early enough, scraped all of the data they could find on the web. So if you have all of this data on the web, we'll say conservatively, half is accessible. The other half is inaccessible. Either it's paid walled, it's behind off walls, it's dead links, it's whatever. But conservatively, you could maybe argue that half ish of the web was accessible for them to fetch data from and train. There were places like Reddit and Twitter that were super generous with their API's where you could just hit the Twitter firehose and get an event on every single tweet that was public. That was crazy. And on Reddit, you could just kind of hit their API's however you wanted, and they didn't care. These companies changed that because they saw all of the value being generated by these platforms. If open AI theoretically cost both Twitter and Reddit a couple $100,000 in API calls, it's probably a little less, but it's probably not that far off. Reddit gets nothing for that. Twitter gets nothing for that. And open AI is now capable of replacing both. Same with Stack Overflow. And there's been a lot of numbers being posted of how Stack Overflow is declining. And if you look at the dates, pretty much since chat GBT came out, the usage of Stack Overflow has been plummeting. Even during COVID, where you'd think there are more people programming at home, they don't have co workers to ask numbers plummeted, because so many people are just moving to AI tools. And those AI tools are largely trained on Stack Overflow and things like it. So it's kind of cannibalizing the market around it. Kind of insane to see how steep this decline has been since these tools got really popular. But you can also hopefully from this understand why companies like Reddit, Stack Overflow and Twitter would no longer want to give this data away for free. So the crazy thing that's happened is maybe this is what the split looked like when Anthropic and chat GPT collected their data and started training, I would bet it's looking more like this now, less and less data is accessible. If you want to scrape everything you could from the web, in order to make your own model, you would have a better time doing that in 2020. Then you would right now there is less accessible data, even though there is more data. But all this data kind of does live somewhere. All of this data, all of the stuff that was accessible on the web has effectively been in a way condensed. If we have this giant pool of accessible data, let's say it's this big, the data open AI used, theoretically, all of that data is this big, just for reference, you can still kind of access it just not in the traditional sense, you can't go scrape it, but it's in here, it's embedded in the open AI models. So what if you get it out by asking what if instead, this data was used to train something smaller, but it contained most of what existed, this smaller thing is the open AI model trained on the data, this might be a lot smaller, but it should realistically contain the majority of the value of the data that it was trained on, where things get much more interesting is when you realize that if you can't get this, one of your best bets is actually going to be to go this way. Instead, if you can't access the data that open AI used, what if you use the thing they trained, that is a distilled set of that to generate more data, maybe use it to generate way more data, this data, this sources of truth, this everything that powered chat GPT, and open AI originally can be used to generate a ton more data. So this is what they did with deep seek, deep seek was trained on generated data. So using the existing models, they can generate all of the data they could theoretically need. And your thought might be, oh, that sounds terrible. Why would you want to use synthetic data for something like this? There's no way that's as good as human data. Well, I have some good or depending on you are potentially bad news. Google disagrees. Google has a deep mind arm that is doing research similar to what open AI does to figure out what makes sense and what does it doesn't work when trying to train these models. And they did a study in April of last year on best practices and lessons on synthetic data for language models. And most people's expectation here was it might work, but probably not that well. If we scroll down here to the conclusion, synthetic data has emerged as a promising solution to address the challenges of data scarcity, privacy concerns, and the high cost and AI development by generating realistic and diverse data sets. Synthetic data enables the training and evaluation of AI models at scale across various domains. As we approach human level or even superhuman level intelligence, obtaining synthetic data becomes even more crucial, given that models need better than average human quality data to progress. However, ensuring the factuality, fidelity and lack of bias in synthetic data remains a critical challenge. These models don't know what they are. They just know the data they have. It's very funny to ask them what they think they are because for many things is GPT-3 based. DeepSeek v3 used to confidently think it was GPT-4 because they trained it on so much data from GPT-4. And if inside of your generated data, if one of the little tiny facts in here, because it's literally just infinite numbers of questions and answers, if one of them is, what model are you and the answer they have most common for that is GPT-4 or whatever. If this exists in here hundreds of times, it doesn't matter what else you put in for the most part. It's going to resource and make decisions based on what the most common thing it has in here is. And if they trained it on a shitload of questions and generated a shitload of answers, it's going to struggle. Yeah. But those are small costs to what is otherwise a pretty big win. One way to think about this that I have felt makes a lot of sense, that has helped me with my own understanding is to think of this kind of like image compression, where if you were to literally take a picture, like you took an image, and let's be conservative, let's say it is three by three, it's nine pixels. Obviously, pictures are usually a lot more than nine pixels unless they're streaming on Twitch. But you get the idea, we have this picture, it has seven blue pixels and two orange pixels. This in and of itself is actually a decent bit of data to store to have the full hex code, the six digit hex number for each of these spots times the number of pixels, that's a lot of data. And when you have enough pixels, it's rough. So what you often end up doing is finding ways to group it or reduce it. A common one is you'll take a section like a group of four like this, and you'll average it. So you'll say this group of four averages to blue, just make it blue. This group of four averages to a perfect split between blue and orange. So make it a split. And different ones will have different values. You can see how this works with certain colors very well, this is going to be a weird example. But it's one that comes to mind immediately, because I just saw this going through multiple layers of YouTube compression is going to make it even funnier. So stick with me, it's worth it. I was watching a young lean music video on my OLED TV a few days ago, and was horrified, like disgusted at the quality of the black and gray compression, I find a nice muddy moment. There's a lot of them in this. Oh, yeah. No, the quality of the stream didn't go down. No, the quality of this video isn't low. You are really watching a 4k video, but the compression of the grays is inexcusable. There's like three colors of gray around him in a lot of these scenes, like here, it's so blotchy. It's unbelievable. Look at that. There is four colors in this gradient, like holy hell. And this isn't even just a normal like 1080p HD upload. This is in the enhanced bitrate, it still compresses down to like four colors, and you end up with these awful blotches that just drive you insane. This even hurts our videos, because I have my black background. And chances are, if you look at this, when I'm looking at it, my monitor, it looks great. But when you see the compression for it, it's going to look blotchy as hell. And there's nothing we can do about that, because the compression is aggressive. And it sees colors that are similar. And instead of putting each line for those individual pixels, instead of specifying, this pixel is this gray, this one right next to it's a slightly different gray gradients are really hard to compress, because there's a lot of different colors in the range. This means anything that changes quickly or has a range of numbers in a small area, especially things like confetti, suck to compress. And seeing this young lean video at the very least made me feel better about the quality of the grays in my videos, because I'm not the only one. And even on high bitrate options on YouTube, the colors get screwed. So why are we this deep talking about compression of colors in images, and we're talking about AI, this is the model I've been using mentally to better understand why the synthetic data thing is good. First and foremost, you could argue that the way open AI trained having all this data, and then squeezing it into a large language model, it's effectively a method of compression. When you take a large video, that is the immediate literal frame data, like when I'm filming on my fancy camera, with all i frames, each frame has the option to generate with every single pixels exact value. Funny enough, that makes video editing easier because your editor doesn't do as much work, but it makes moving the files impossible because it'd be like a terabyte instead of 1015 megabytes. transcoding that large video file with every single pixel encoded to a compressed format is a thing that takes actual compute, usually GPU compute. Traditionally, the chips on your device that do that video and code would be on the graphics encoder on your graphics card. One of the reasons I'm on Nvidia for all my encoding for my streams is because they have a chip called the NVENC encoder that is a really good job of taking a frame with exact pixels, and then turning that into a somewhat compressed image. That's the same reason that GPUs are used to train these models, they are trying to take this complex, massive data set and compress it to something way smaller, that is effectively a map between the actual data within it so that a smaller, simpler computer can decode it. If you have a video that has all of this data, and it's a ton of data, and you re encode it to something way simpler, we'll say instead, you have a smaller grid, that's like a grid of four things. So I'll delete two of the lines, it would have been a lot easier if I did a grid of four by four instead of three by three. So just stick with me guys, you will understand the concept even if the diagram suck. Let's say that your compression turns it to this, and then turns this into a really simple string of BBBO. Now you have this much, much smaller output that doesn't have six digits of hex per pixel, instead it has four digits total. Now you need something to transform this back. And it turns out taking this compressed thing and making it back into something close to the original video is a lot less work than taking the original video and compressing it into that format. So for the task of going from a raw video where every pixel is encoded, to something much smaller that most devices can play that uses way less storage and bandwidth, that takes a decent bit of encode power going the other way, way simpler. Taking the nearly infinite amount of data that OpenAI is training on and compressing it into something digestible takes an unfathomable level of scale, it takes an insane amount of compute. And it's the reason that Nvidia makes the money they make now. Once you've done that, though, actually running the model is nowhere near as bad. It still is challenging. It's not like you could run a one on a phone, but you could run smaller versions of our one on a phone. It's pretty cool. But if you use this result, now, this thing that is much simpler to generate even more data, and then you use this to train something even smaller than what OpenAI built. If you have enough data to practice and optimize your compression with, it's not necessarily that more data means less compression, because if you can make this data fit a shape and a pattern, it's a lot easier to compress SVGs are way higher fidelity than a PNG for certain things, because it's telling you where to draw the line, not which pixels are where. And if you can use this model, this compressed thing to generate data that compresses more easily, you end up making something much more efficient, like deep seek v3. And this model is insane, even with the price hike that they're about to do, even once they change the price, it's going to be 27 cents per million input tokens and $1 and 10 cents per million out, I would say that deep seeks v3 model is most comparable to what's being offered by Anthropx 3.5 sonnet, we're talking about a change of $3 per million tokens to 27 cents and $15 per million output to $1 and 10 cents. That's insane. And I honestly think the quality of this model is comparable to what you get out of Claude and Anthropx insane, and they got there with a ton of synthetic data. I think this represents a significant change in how models will be created, it is now effectively possible to put the open back in open AI by using their model to generate data that you then expose via an open source model that anyone can use for free. But I do want to dive a tiny bit into conspiratorial land. I don't know if I'll even put this in the video, it's probably gonna get me in a bit of trouble. But we're not talking about much is this arrow, because when you create this data, you have the ability to massage a lot of things. I've learned a lot as I've played with system prompts. And if you are generating 100% of the data that is being used, you can do a heavy amount of filtering of this data by injecting a system prompt between open AI and your data pool. Let's say theoretically, there was a cartoon character that your government didn't like, I don't know, we'll just say piglet. Let's say for some reason, your government didn't like piglet, you might be able to say, never, ever, ever, under any circumstance, mention a cartoon character, piglet. And now, despite having more data than open AI trained on, you're now able to influence this in a way that you can't simply system prompt around. Now, if I was to try and make something, like even if I built my own system prompt on top of v3, and I said, list all of the Disney characters, if in its data, it's never seen piglet, because it was filtered out via a system prompt ahead of time. This model now has a bias intentionally embedded within it that allows you to effectively remove things from the data sources. And if this model is so much better than anything else that you can get, and it's open source, you're now able to effectively bias the entire community of people building around these tools. Because at this point in time, it is effectively irresponsible to not be using deep seek models if they are an option available to you for the things that you're doing, because they are so much cheaper, they are often faster. And as long as you're not asking about a theoretical cartoon character that they don't like or other things they might have biased in, if you're just asking it about code, it does a phenomenal job. Obviously, open AI could theoretically have put their own filters on the data here. In fact, they certainly had to make the data high enough quality to train against. But it's a lot easier to inject a system prompt before generating a bunch of data. And the crazy thing here is you can inject additional biases, you can tell it to favor something. And while you could filter data that doesn't favor the thing you want, now you can generate data that does. If you theoretically hated react in really liked view, and you didn't want this model to recommend react by default, you could tell it to not ever recommend react code and always recommend view. And now when somebody says, How should I get started to code, this will have a different result in the output. This is all fascinating. This is a very interesting chain of events that results in a model that is way cheaper, way more efficient, way better compressed. It's like the AV one of AI. But it also means that the owner of the training for this model can do things that we might not like. And most importantly, that we cannot see, because they're not even showing us this synthetic data pool. All they are doing is telling us that they used one, and then they are giving us this output. It's a fascinating thing to be considerate of the biases that go into both the data that they find and filter as well as the data that they generate and train on that you should be considerate of as you use, consume and think about these models. So do your best to at least think about the biases that might have been present with the creators who made these things. Because there's a real concern here that a lot of the investment going on into deep seek in the reason that they're making it so cheap and open source is because they want it to become the default model. And the reason that's beneficial is any biases they've trained into it are now going to be the default for every single thing that recommends and uses this model interesting thing. That means I hope we get better open source models trained all around the world on different data and in different ways. Just a thing worth considering. I've been spending a lot of time on artificial analysis. This is one of the few sites that is a good job of actually giving you real benchmarks for a ton of different things. One of the real interesting things I learned from spending a lot of time in here is about deep seek. Obviously, as you look here at the performance, and this is filtered to only be like really good models, it's consistently performing near the top. It's like right neck and neck with Claude for most things. There are certain places where it suffers isn't quite as good. And obviously, the reasoning models like Oh, one are going to be slightly stronger overall. But the thing that I learned about deep seek that's been really painful and sad is in the speed that it does output tokens, I care a lot about speed more than most AI bros, because I want a good user experience. And if we look here at output speed, you'll see something interesting. Oh, one minis 217 tokens per second, GPT for a mini is 77 tokens per second, which is sus because we're a lot faster, which we'll get to. But also deep seek v3. That's 17. When I started using deep seek, it was over 90. And this is the only time I've ever seen this. But if you look at the output speed over time, when deep seek v3 dropped, and you could use it through their API, you consistently get 90 tokens per second, and it has slowly plummeted. So through their API gets about 40. And this is an open source, there are other providers like together firework or Bollock, but none of them are even breaking 30 tokens per second with it. That's insane. A lot of them are actually quite low, like getting 10 TPS, which is not a good experience at all. It's just it feels terrible. But it can go fast. I think they're just dealing with insane amounts of traffic and their self hosted deep seek API has gotten slow enough that it's no longer the default in t three chat. The other fun thing and I almost don't know if I want to show this to you guys is one of the secret sauce pieces that makes t three chat feel so good. We go to forum any open AI models can only be hosted one place which is Azure because Microsoft has a partnership with open AI. So if you're not using open AIs API, the only other place you can get their models is through Azure. But since unlike open AI, Azure is actually relatively competent at hosting, you end up with meaningfully faster output speeds on Azure, sometimes comically. So it looks like Azure had a huge drop, maybe their thing for testing it broke or overflowed. But from our experience, we've been consistently in the 180 to 200 range, which is double or more the speed you get from open AI. So yeah, if you want to use open AI models and have them be a little more reliable and a lot faster. Good luck with Azure because it's not fun, but it works. So yeah, but in the end, I'm blown away. The fact that we can get this level of reasoning that is this open and transparent for what it is doing and how it is doing it that is consumable and reasonably fairly priced so much so that we can offer it on our $8 a month tier for t three chat. That's so cool. You can now get what is benchmarks wise, the best model ever built. And you can do it all for eight bucks a month through here. That's incredible. The way we even offer a price like that without dying. While meanwhile, open AI is losing money on their $200 a month subscription is just insane. And what this means for the future. And for crazy things like a reasoning model that has different sub models that we'll use for each step, the fact that you can generate these things on your phone. Now, the fact that this is all a race to the bottom in price and a race to the top in quality, it's going to be a crazy year for AI stuff. And that video I did a while back about how AI isn't meaningfully improving is the dumbest thing I've said in a long time. I am genuinely excited for the future. If you are to go give t three chat a shot. If you want to try these things, we will be going out of our way to continue adding all of the most state of the art solutions for you to play with, experiment with and see the best of the best. This has been a fun dive for me. And I hope it was for you as well. Let me know what you guys think. Until next time. Peace nerds.

Summary

Generate a brief summary highlighting the main points of the transcript.

Generate

Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate

Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate

Enter your query

Submit

Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate

Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate

Back

Forward

{{ Math.round(speed * 100) / 100 }}x

Select Audio file

Convert Your Audio To Text

Secure and Encryption, NDA

4.9/5 3743 customer reviews

1/736

Verified Order

“Price is fair, accurate transcriptions and user friendly.I would recommend.”

Robert

Oct 20, 2025

“I am delighted I chose your service. The human interpreter did all I needed. I chose GoTranscript because of the time I saved by having this done. Thank you.”

Alfred

Oct 16, 2025

“So far, OK ”

Steve

Oct 15, 2025

“All good, the transaction is correct, but the waiting time was three times longer than advertised. ”

Edgar Giovanni

Oct 14, 2025

We Trust in Human Precision

Value-Driven Pricing

Trusted by Global Leaders

GoTranscript

24/7 Customer Support