Speaker 1: First time on the show, Scale.ai founder and CEO Alexander Wang. His company provides accurately labeled data to help companies train their AI tools. And back in 2022, he became the youngest self-made billionaire in the world. Pretty amazing. Thanks for having me on. I want to go straight to what we were just talking about off camera, which is the idea of where the US is on AI versus China, because you have some very surprising statistics that I think will probably, frankly, freak out some of the viewers.
Speaker 2: So, yeah, first of all, the AI race and the AI war between US and China, I think, is one of the most important issues of today. We took out a full page ad on The Washington Post on Tuesday saying that, you know, America must win the AI war. And so this sort of relative race in AI between the US and China is critical. Today, we released Humanity's Last Exam, which is a new evaluation or benchmark of AI models that we produced by getting, you know, math, physics, biology, chemistry professors to provide the hardest questions they could possibly imagine that are relevant to their recent research to really put the test to the models, to give you a sense no model is getting above 10% on this test. That being said, you know, what we found is that DeepSeek, which is the leading Chinese AI lab, their model is actually the top performing or roughly on par with the best American models, which are O1 from-
Speaker 1: Okay, so I think we have been all under the impression that the US was way ahead of China as it relates to AI, in large part because we have access to, you know, NVIDIA GPUs and chips and other things that supposedly the Chinese do not have. I keep hearing from people all week, from Chinese AI executives, that they say, well, we're so close. And by the way, we're doing it with one hand tied behind our back. Our algos are better. We're actually going to figure out how to do this, do it better than the US and in even a more energy efficient way, because we don't need these super powerful chips. Are they happen to be right?
Speaker 2: There's two things happening. First, it is true. It has been true for a long time that the United States has been ahead. And that's been true for, you know, maybe the past decade. That being said, you know, the very recent event on Christmas Day, you know, about a month ago, DeepSeek released a model, which by the way, I think is symbolic that the Chinese lab releases, you know, an earth shattering model on Christmas Day when, you know, the rest of us are sort of celebrating the holiday. And they released it to much fanfare. And then they followed up with their reasoning model, DeepSeek R1, which is the one that we evaluated as top of the leaderboard. You know, the reality is yes and no. So, you know, the Chinese labs, they have more H100s than people think, you know.
Speaker 1: And these are the highest powered Nvidia chips that they were not supposed to have.
Speaker 2: Yes. My understanding is that DeepSeek has about 50,000 H100s, which they can't talk about, obviously, because it is against the export controls that the United States has put in place. And I think it is true that, you know, I think they have more chips than other people expect, but also on a go forward basis, they are going to be limited by the chip controls and the export controls that we have in place.
Speaker 1: How do you, I mean, you work with all, you work with everybody. So I don't know if it's fair or unfair, but how do you stack rank these large language models? And who ultimately is going to be a winner? Or are they all so close and it gets commoditized?
Speaker 2: The interesting thing that we see right now, so we actually specialize in this. We've produced our SEAL evaluations, our safety evaluations at Alignment Labs, evaluations which measure across many different dimensions. And we measure across math capabilities, coding capabilities, multilingual capabilities, and reasoning capabilities and many different dimensions, including tool use and agent capabilities. And what we see is different models are better at different things. So it's hard to put a clear stack ranking among all the models. You know, for example, the OpenAI models are extremely good at reasoning, but the Anthropic models might be really good at code and sort of, there's a diversity of capabilities of the models. That being said, I think what we're seeing in general is the space is becoming more competitive, not less competitive.
Speaker 1: I keep hearing from business leaders here that they're all playing around with, you know, OpenAI or they're playing around with Cloud, which is the Anthropic model, or they're playing around with Gemini, et cetera. And then they're going and using LLAMA. They're going to find some open source version to try to get close to what they could approximate these other guys doing because of just the different price points of these things. Do you think that's the future of this? Is there like in a Linux world?
Speaker 2: There's definitely a dimension. You know, it comes down to ultimately the level of capabilities and intelligence that are required for your use case. I think ultimately what we're going to see is, you know, what we do with all the leading labs, including OpenAI and Google DeepMind and Meta and many others, is continuing to push the frontier and push the boundaries. And so how do we leverage data, given that, you know, as an industry, we've sort of run out of publicly available data, how do we generate new data to keep pushing the frontiers? And our belief is that, you know, advanced capabilities are going to enable incredible use cases where you're going to be willing to pay for those increased capabilities, but for the more simplistic use cases, those will probably go more towards open source or more basic models.
Speaker 1: We've been talking all morning about Stargate and the debate happening on Twitter between Sam Altman and Elon Musk about whether they really have $100 billion or $500 billion. Satya Nadella was sitting in your chair just yesterday saying he's got $80 billion. His money's real. He took to Twitter. What do you make of all this? You know, all these players.
Speaker 2: You know, so much is on Twitter anyway. So I'm not sure I have... Or X, we should say. Or yeah, X. But I mean, I think one thing that is very real, regardless of sort of Stargate, specifically as a program, is that the United States is going to need a huge amount of computational capacity and a huge amount of infrastructure. So this was actually in, we wrote a letter to the Trump administration on recommendations on how to ensure that the U.S. stays ahead. And one of them was really around infrastructure. We need to unleash U.S. energy to enable this AI boom. And that's clearly what we're seeing right now, which is, you know, in addition to the Stargate program, many of the major AI companies and major clouds are going to be looking to produce, to build giant data centers. So...
Speaker 1: The reason I asked this, though, about the different companies doing this, do you ultimately think we need five, six, seven companies all trying to build frontier models? Or, I mean, there's been a talk forever that, you know, in a different, if Lena Khan hadn't been running the FTC, what if Amazon wanted to buy Anthropic already, for example? Or what if Microsoft bought OpenAI? Or what if some of these folks, so there wouldn't be as many, everybody competing against each other in the same way. I don't know, maybe you think the competition's great. I just don't know how long, long-term, how many models there ultimately will be like that.
Speaker 2: I mean, our view is actually that this is potentially going to be one of the greatest markets or the greatest industries ever. You know, right now, let's say there's between $10 and $20 billion of LLM-based revenue. And if you believe that we're actually on a track towards superintelligence or AGI, then it stands to reason that that's going to go to a trillion dollars or more of revenue. And so if you're looking at a market that's going to go from, let's say, $10 billion to $1 trillion over who knows how many years, I tend to believe a fewer number of years. I think we're sort of in the two to four range. Two to four to get to AGI.
Speaker 1: And what's your version of AGI?
Speaker 2: I think, obviously, there's many definitions. You know, the definition I believe in is our powerful AI systems that are able to use a computer just like you or I could and could use all the tools that a computer could and could basically be a remote worker in the most capable way.
Generate a brief summary highlighting the main points of the transcript.
GenerateGenerate a concise and relevant title for the transcript based on the main themes and content discussed.
GenerateIdentify and highlight the key words or phrases most relevant to the content of the transcript.
GenerateAnalyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.
GenerateCreate interactive quizzes based on the content of the transcript to test comprehension or engage users.
GenerateWe’re Ready to Help
Call or Book a Meeting Now