Speaker 1: Hey, I'm Deirdre Bosa, and here on Tech Check, we've heard you loud and clear over the past few days. You watched our Tech Check analysis and our extended interview with Perplexity CEO, Arvind Srinivas. And so, as DeepSeek now roils the public markets and rises to the top of the app store with consumers, we're releasing more content from our investigation into the AI lab and its implications. Here's our extended interview with benchmark partner, Chetan Puduganta. Keep in mind, it was recorded on January 7th, so that was before DeepSeek released its R1 reasoning model, before it dropped its multimodal model today. But like Arvind, Chetan was extremely intuitive, and his insights are just as valuable today. So in like really simple terms, Chetan, describe the race between China and the US when it comes to AI.
Speaker 2: So this is probably the biggest technology shift that we've seen since the internet. And it's by far the most important technology shift we've seen in Silicon Valley since the early 90s. And to date, Silicon Valley LLM companies have been the leaders here. They've done the most important foundational work. They've done the most important leading frontier work. And it was thought, until recently, that there was a significant advantage to the progress being made here in Silicon Valley versus anywhere else in the world. I think with the release of DeepSeek v3, it's opened a lot of eyes of what is actually happening in AI in China. I think the algorithmic advances the DeepSeek team has made, and what they've shown with their papers and their technology and their open source model, is that they're making big leaps, especially on the algorithmic side, trying techniques that haven't been tried here in the US. It's really remarkable the amount of progress they've made with as little capital as it's taken them to make that progress.
Speaker 1: Right. I want to get into capital distillation, but what's at stake in the broadest terms? What does it mean to win or dominate the AI race? What does it mean for the economies and the countries at large?
Speaker 2: So just the amount of efficiency and the amount of economic gain that's available if you're leading in the AI race is truly remarkable. The amount of efficiency it brings to corporations is significant. The amount of efficiency it brings to daily life is significant. If you own the models, you own the data, you own the compliance around it, you own the protections around it, you own the privacy around it. As US-based investors, as US-based entrepreneurs, we want the technology, and we want the best technology to originate from here and to be stationed here so that it can reflect the values of Silicon Valley, reflect the values of the United States of America, and how our system likes to conduct things. And it's really, really important that AI is based here for those reasons.
Speaker 1: So when you look at the last few years, how big a moment is DeepSeek and what they were able to do? Like, could you even call it China's chat GPT moment?
Speaker 2: It could be called that, and the reason is is because they were able to basically release what's called a GPT-4 class model, which isn't exactly the frontier model that we're on here in Silicon Valley. It's maybe a half generation behind, but they were able to get to that based on the research papers and the models they've put out with basically less than $10 million of spend. That's an extraordinarily efficient run to get to that class of model that we simply haven't seen anybody else do before. It also means that the team that DeepSeek has assembled is extraordinarily talented. The data that they have access to must be really, really good and really clean and really deep. And they've clearly done, based on the research papers they've put out, they've done some really innovative things on the algorithmic side that is frankly like world-leading in some ways.
Speaker 1: Right. And when you say they did it for a lot less money, we're talking like pennies on the dollars,
Speaker 2: right? It's about 10x cheaper than what was thought to be possible.
Speaker 1: Do we believe that? You were telling me a little bit about the history, and let's go there, too. What is DeepSeek? It's a Chinese research lab, right? Yes. But where does it come from?
Speaker 2: How's it funded? So the funding of it, the foundation of it, the founding of it has largely not been reported in the press. And everything that you read about it is through forums and maybe Reddit, and so it's all sort of secondhand knowledge. And the secondhand knowledge is that there was a hedge fund with really talented people, and then the hedge fund transitioned to an AI research lab. And it's primarily composed of very talented AI researchers based in China. And all of the researchers there is local talent. And so that's the origin story of DeepSeek. And what has been a benefit for the larger AI community is that DeepSeek has put out all of their research papers and their models as open source. And so we're all able to look at it, we're all able to play with it, consume it, see what kind of techniques they're using. And then perhaps most importantly, we're able to evaluate their results and benchmark them against all the other models that we have. And so how did they actually assemble this talent? How did they assemble all the hardware? How did they assemble the data to do all this? We don't know. It's never been publicized. And hopefully we can learn that. And we just don't know a lot.
Speaker 1: What we do see is the product, the final model, which, as you said, is really competitive. First, how do we know it's competitive benchmarks? It's open source. So you can trust... Can you trust the benchmarks? I mean, I had someone say to me recently that everyone kind of fudges their benchmarks.
Speaker 2: Yeah, you can run the benchmarks yourself. So any developer can download the model, host the model. There are a number of providers that provide APIs to DeepSeek, so you can just use the APIs and then evaluate the results of those APIs against the APIs of all the other models that are out there. And so if there's any skepticism of how good the model is, you can benchmark them yourself, because it's freely available today, no matter where you are. And so I think that there may be claims that... There may be doubts of its proficiency in a generalized sense that these large language models have gotten really good at. And so perhaps it could be deficient in certain areas. But the reason it's so eye-opening and the reason that a lot of researchers have been compelled by what DeepSeek has achieved is that it is a really interesting step in terms of how fast you can get to the frontier or close to the frontier with as little capital as has been claimed here. That's an eye-opening equation. And that's why it's perhaps most important. And then because it's open source, we can all look at it, and we can all evaluate these claims. And to date, the claims seem to be validated, like everything that we've seen over the last week or so. Again, this model is new. It's very new. But every data point we've seen in the last week seems to validate that this is a really good model that's jumped in a way that's surprising.
Speaker 1: Right. And just the cost of developing is just totally, again, another paradigm shift. That's right. Explain how they were able to develop it at such a lower cost, 10 times less than our frontier models, and why we believe that too. Can we trust that?
Speaker 2: So what we have seen in the U.S. with open source models, whether it's LLAMA models or other open source models based here in the U.S., is that developing vertical specific use cases to frontier capabilities is actually very efficient. So small teams with limited budgets can do it. So they can take a really good big model and use a process called distillation. And what distillation is, is basically you use a very large model to help your small model get smart at the thing that you want it to get smart at. And that's actually very cost efficient, because you can just consume it through APIs. You can just consume it by downloading an open source model that's really good and make your small model very good. And so jumping to the frontier on very specific use cases has been proven to be efficient here in Silicon Valley. And we have a number of teams, and we see a number of teams pursuing those angles already. What DeepSeq did was do it in a very generalized way. That is what was unique and what was a big jump in algorithmic advancement. What they seem to have done, based on their research process, is used a new architecture on what's called a mixture of experts. One way to build a really big large language model is to use a dense architecture, and you just have one big model doing all the tasks. Another approach is to use mixture of experts, where you break down the models into sub-expert models, basically. And then you create intelligent routing systems, and then you create intelligent evaluation methods, and then iterate on this process. And these expert models produce great answers. What DeepSeq has done, or seems to have done, is they've taken this mixture of expert architecture and really blown it out to limits that other people hadn't tried before, at least haven't tried and succeeded at. And they have. And that's the algorithmic breakthrough that's pretty unique. And the model is quite performant and has been able to generalize in a way that was thought that, at that scale, would start to break down if you had that many expert models running it.
Speaker 1: Is what you're saying at its core that DeepSeq basically trained the model on chatGBT outputs and copied it, essentially? It seems like it didn't create any of the technology itself.
Speaker 2: It's unclear that that's what they did. It's possible that they used other models to train this model. But it's unclear that was the primary thing that they did. In fact, it looks like they had access to really good data. And they did a bunch of their own training with their own data.
Speaker 1: Data from where?
Speaker 2: Don't know. We have no idea what their data sources were. We don't know how they got all their data. We don't know where the data came from.
Speaker 1: You're saying it was good enough that they didn't just copy something.
Speaker 2: Yeah. It's totally reasonable for AI researchers to take existing models to train new models. So I would assume that DeepSeq probably did some of that.
Speaker 1: I mean, when you ask it, what model are you? It says, I'm chatGBT4. Yes.
Speaker 2: Yes.
Speaker 1: What does that tell us?
Speaker 2: It could tell us a lot of things. And it could also tell us that they used parts of chatGBT to train certain English outputs of it. What's also really interesting is that in the benchmarks, you can see that the DeepSeq model does behave differently than chatGBT. And so to assume that it's a pure distillation of chatGBT, I think is too far of a stretch. I think they've done something that's pretty unique.
Speaker 1: Because they surpassed some of those benchmarks.
Speaker 2: That's right. OK. And I think that ultimately what we don't know is what were their data sources, how did they use that data, and how did they iterate on that data? If an AI lab that had done this in the US, we would just ask them. And they would likely publish papers to tell us.
Speaker 1: Does it matter that there's so much mystery and so many questions about who trained it? Or does it really just matter for the generative AI race that what's out there, what it was able to produce?
Speaker 2: I think it's really important, because this technology is so foundational, that broadly the technology, more of this technology is open source than not. Because transparency is really important. We need to understand the weights of these models. How do these models actually weigh different things?
Speaker 1: But that's all public, right?
Speaker 2: Yes. And then the other part of it that's been really important to date is to understand who are the people behind these models. And how do these teams get regulated? How do these teams sit into compliance frameworks of how we run the internet in the United States? And what is allowed, what's not allowed? What kind of outputs will they allow? What kind of outputs will they not allow? What kind of inputs do they allow? What kind of inputs do they not allow? And so the whole regulatory framework we have in the United States around technology and the internet, as it applies to AI, has been quite effective. And there's a lot of discussion of what needs to be modified or improved or anything like that. And that's a really great discussion to have. What becomes challenging is when you don't know the people behind the technology.
Speaker 1: And we know nothing, right? We know nothing. About the people behind what the data was trained on.
Speaker 2: That's right. And we don't know anything. That is part of the race here, which is that as these labs in other parts of the world start to make big gains, we also need to learn from those gains and then explore those techniques. I think what has been eye-opening about DeepSeek and their techniques is that those techniques worked. And what happens in AI is that if you actually establish a proof of existence across a technique, it actually motivates the next generation of AI researchers to go pursue those techniques because now you know it'll work. And it actually makes catching up far more efficient. So what I expect now that we now know that the DeepSeek techniques work is that a lot of the open source companies based here are going to be able to implement those same techniques and catch up very quickly.
Speaker 1: So the way that this is all being developed open source doesn't, you no longer have that edge that you can protect or that moat, right? But going back to your point, I think it's such a good one about part of the race is knowing who's developing the way in which it's because there's so many risks that like Ilya and Elon Musk and so on talk about that are kind of less talked about today, it feels like. But why is that important?
Speaker 2: I think ultimately the way technology has worked in the United States is that as an investor in early stage companies, we sell a lot of software to big corporations. We sell a lot of software to the US government. And there's an element of trust that happens between the customer and the vendor and the entrepreneur and the teams that are consuming the software. When things run into problems, when things become unexplained, you want the team that's standing behind the software to be there and to represent a set of core principles and be pretty aligned in the business objectives as in that vendor customer relationship. AI is introduced as a whole nother layer of complexity because these systems can be unpredictable. They can create lots of systemic errors and used and implemented in these corporate settings in ways that aren't supported properly or not installed properly. They can cause a lot of issues.
Speaker 1: And so you want the team... Bigger than at the enterprise level, right? There's like humanity level questions.
Speaker 2: Absolutely. And that's why you want to know who are the commercial vendors behind these open source technologies, who are the companies behind them, what are sort of the commercial aspirations here. And a lot of this is today unknown. Pure open source research projects are fantastic. We have a lot of precedent of great technology that's been Apache-based, that's been community-based. And then commercial vendors have taken those open source projects and then packaged them and installed guarantees around them and stuff like that. And so it's possible that with these deep-seek models as open source, we can have US-based companies wrap sort of the security and compliance around these open source models to help serve companies here. And that's a totally reasonable process, and it could be the path that the deep-seek team decides to pursue, and that would be great. But it's still TBD on how this is all going to play out.
Speaker 1: Okay. I want to get to what this means for closed source models, but first, how do you describe the difference between H800s and H100s?
Speaker 2: So there are lots of NVIDIA GPUs. In the United States, we get access to the best available GPUs from NVIDIA. Based on regulations, there are different GPUs that are available in China. And there are different GPUs available to certain regions of the world. And some regions of the world don't have access to GPUs, of course. And what is clear is that deep-seek has trained on a set of GPUs that are different than the GPUs that are available.
Speaker 1: Do you know that this was trained on a less capable GPU? Because we know also that the AI research lab may have at some point or may have access to the most advanced, even though they shouldn't, but that's changed. Could have stashed them leading up to the export ban. Can we be confident that they weren't trained on blacklisted H100s?
Speaker 2: I don't know enough to know what they were trained on. I do think that what they have done in terms of the mixture of expert model architecture is that they were able to take whatever hardware they were trained on, but use it way more
Speaker 1: efficiently. Let's assume that they were trained, the media reports are correct, they were trained on a less advanced version of GPUs. What does that mean? What are the implications?
Speaker 2: I think it means that as researchers in the US and in Silicon Valley look at what deep-seek was able to do with deep-seek v3, it gives us a lot more clarity that we can make a lot more advances with a lot less capital. And so that's now the thing that all the entrepreneurs and VCs are now talking about. I would say that in the last two weeks, AI research teams have really opened their eyes and have become way more ambitious on what's possible with a lot less capital. So previously, to get to the frontier, you'd have to think about hundreds of millions of dollars of investment and perhaps a billion dollars of investment to get there. With deep-seek v3, and prior to even deep-seek, what we were seeing with open source models using things like distillation, it was clear that it was going to be more efficient to get to the frontier than that huge hundreds of millions of dollars or a billion dollars number. And then everything that's going on with reasoning and inference that we can get into is also another path where it could become capital efficient to get to the frontier. What deep-seek has now done here in Silicon Valley is it's opened our eyes to what you can actually accomplish with $10, $15, $20, $30 million. And that actually fits very well into the classic Silicon Valley venture capital model, which is that you try to get a group of really extraordinary people together, put a small amount of capital together, and then go after some innovation that has IP differentiation from an algorithmic perspective versus a dollars perspective. And so what is now shown is that it's possible. It's possible to take GPUs, use really great algorithms, and then create really capable generalized models. We had an instinct of this already. We had seen small teams do stuff like this already, but now this is just a way more ambitious proof point that it's possible and it's a realistic thing to pursue.
Speaker 1: That is a huge shift in the investment paradigm over the last few years. Who has the most money to develop the next advanced model? Each model is going to be bigger and better. So is it still a good investment to invest in open AI?
Speaker 2: I think open AI and the work that they're doing is remarkable. What they've shown with O3 and the benchmarks they put out just a few weeks ago is amazing, especially what they're doing around the benchmarks around software engineering. I think a lot of people, including myself, are very excited to get access to those APIs and to get access to the O3 models through chat GPT. They are by far, in terms of just developing great products, they've done an extraordinary job. And so obviously their financials aren't public, it's just all rumors, but the numbers seem quite extraordinary. We don't know what the underlying margin structure and all that stuff is, but hopefully open AI at some point will disclose those numbers because I think it's an incredible product.
Speaker 1: But it's sure, and it's been ahead and it's most technologically advanced, but it feels like the race has shifted a little bit. And if you can just go deep at the frontier, and when it does put out O3, what's stopping DeepSeek or another Chinese company to replicate that?
Speaker 2: I think what's clear is that AI is way more competitive today than it was even two years ago. Than it was even a few months ago. Sure, absolutely. If you just look at the labs that are based here in Silicon Valley, so if you look at open AI, Anthropic, Google, Meta, and the technologies they're putting out and the AI models that they're putting out, they're all extraordinary. If you look at what Google released with Gemini models and their APIs, the cost efficiency of those APIs is amazing. And what that means for developer unlock to access these intelligent APIs is for really pennies on the dollar. And you get to access it on top of Google infrastructure, which is robust and is up all the time. That's a huge unlock for developers. So just in terms of-
Speaker 1: Right. And Google has that whole ecosystem. They're developing their own TPUs, they've got the distribution. Open AI, it feels like, just has the models, closed source models. What happens to its mode going forward?
Speaker 2: There was a report in Bloomberg, I think it was just yesterday or two days ago, that said that open AI is thinking about developing their own Silicon. I imagine that open AI will become more vertically integrated as things develop.
Speaker 1: I mean, they're looking to develop their own Silicon. I mean, talk about a late start compared to Google or Amazon.
Speaker 2: But it's a balance, right? If you just lay out all the players side by side, some people are advantaged in some areas and some people are advantaged in other areas.
Speaker 1: And if this whole model is changing and you can do more with less money, I guess it can raise the billions of dollars. It can use the billions of dollars it raised to develop Silicon versus developing a large language model. So do you think that slowdown is going to happen at all? Even if that trade-off? If investors know, and like you said, the venture capital model doesn't really work. That's why you saw so many of the big mega caps investing in the frontier model companies. So can open AI and Anthropic, as like two examples, can they continue? There was just a headline about Anthropic raising, I think another $2 billion. Can they continue to raise money at the speed that they have when it feels like their development of closed source models is just, is that competitive edge eroding?
Speaker 2: I would say that the progress of AI hasn't slowed down. So if you just look at the leaps in intelligence, if you will, as a measurement, if you just look at the functionality that Anthropic, OpenAI, Google, XAI, Meta are releasing, it's clear that the advancement has not slowed down. What is also clear is that-
Speaker 1: Advancement taking on a new kind of definition though, right? Like reasoning and then the inference side versus purely training side?
Speaker 2: So this entire from 01 to 03 and the chain of thought models inside of cloud, that's all on reasoning and inference time and test time versus pre-training, which is also more capital efficient and better for developers also. Each of these labs is pushing the frontier and gaining advantage there, but what's also clear is that you can catch up to the frontier probably in a couple of months. So you define the frontier and you catch up to the frontier in a quarter or two, but what's been happening to date is that while you catch up to the frontier six months later, the frontier then gets pushed out further. And that's been really cool to watch, which is that every time the frontier seems catchable, the frontier gets pushed out a little bit more. And so we don't see any slowdown of that. And so I think this is where the advantage from the big labs continues to persist, which is that they continue to persist an advantage on the frontier. And so it's been pretty amazing to watch and they're doing really great work.
Speaker 1: That's interesting. What does it mean for NVIDIA, though, if you see this model that's come out using less advanced chips?
Speaker 2: Well, I think Jensen did his keynote at CES yesterday and it was a spectacular keynote, of course.
Speaker 1: The alligator leather jacket.
Speaker 2: It's amazing. And they have by far the best GPU technology in the market today. The advancements they've made are really spectacular. I think everybody in the ecosystem is really excited for Blackwell to come off the production lines and to get our hands on to these systems. We have a couple of preview systems out in the wild in the ecosystem that people have tried and are quite impressed with. And so what we still don't know is what does Blackwell enable for AI research and what does it unlock? We just don't know yet.
Speaker 1: So what you're saying is because of Blackwell, this whole era where we thought pre-training advancements were kind of slowing down or plateaued, if Blackwell is really amazing, we could make advancements we can't predict right now.
Speaker 2: We could. Right now, the plateauing on pre-training is because we've sort of run out of data. And so the next hypothesis some people are working on is can you use the models to generate synthetic data and that synthetic data goes back to pre-train? We haven't made the necessary advancements to go bet on that yet. And we're still working on it. We have a lot of smart teams across the valley here working on it. And with the amount of the number of people that are working on it and the amount of capital against that effort, it's possible that we could get a breakthrough there. If we get a breakthrough there, all these Blackwell systems are going to really push pre-training up again. Now where we are today in models is we're in test time and inference time reasoning. And so that's where you take something that's called a verifier and you pass through the solution over and over again iteratively. And this is actually very powerful for coding. So for coding, reasoning models have proven to be extraordinary unlock in terms of intelligence capabilities. So you just look at what AI has been able to progress on coding as a use case. We don't see any slowdown because of reasoning or inference time relative to pre-training in coding, for example. Now there could be other use cases that we unlock where things don't slow down in this intelligence paradigm, but that's still TBD.
Speaker 1: Let's talk back to this, just to the DeepSeek idea. I like what you said about now that they've been able to do it, you'll have U.S. companies try to replicate that. Has the race just started anew because of DeepSeek? Put it in terms of biggest moments of the last two years for AI.
Speaker 2: Well, ChatGPT is by far the biggest moment. I think it opened up for everybody when ChatGPT came out what's essentially possible with these new transformer models. Up till that point, transformer models were largely consumed by developers through APIs. And so we started to see early AI companies in 2022 that were using open AI APIs and they were doing really magical things. But it wasn't until ChatGPT you really started to see amazing consumer applications start to show up. So I would say that is by far the biggest moment in AI development in recent memory. Since that time, there have been key moments where they've been big unlocks. So the launch of GPT-4 was a big step function in intelligence and what these models were capable of. Then again, when we started to go into this reasoning paradigm and when the O-series models came out, and then O3 of course was a really big deal. Everything that Anthropic has been doing with their models has been pretty foundational. What they did with computer work as an example to show that these models can actually interact with tools outside of the model to make changes, to help with coding, etc. All of that stuff has been really remarkable moments in AI. And then DeepSeq v3 I think was also among the list of things that you will think about as important moments in AI. It's certainly one of them because it has shown that algorithms and algorithmic technology in AI can unlock a ton of efficiency. And in AI, when you have a proof point of existence for something, it actually informs all of the researchers that something is possible. And just having the knowledge that something is possible allows you to pursue that path of research because you know you can succeed if you just figure out the right techniques. It's actually part of why O3 benchmarks are so powerful. You actually now know that using reasoning that AI models can actually accomplish a lot when it comes to coding. If you know that's possible, you will pursue a whole lot of techniques to try to match that performance. Chetan, thank you so much. Yeah, absolutely.
Generate a brief summary highlighting the main points of the transcript.
GenerateGenerate a concise and relevant title for the transcript based on the main themes and content discussed.
GenerateIdentify and highlight the key words or phrases most relevant to the content of the transcript.
GenerateAnalyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.
GenerateCreate interactive quizzes based on the content of the transcript to test comprehension or engage users.
GenerateWe’re Ready to Help
Call or Book a Meeting Now