Daily AI News: Google DeepMind's Gemini Models, NVIDIA's Llama 3.1, and More
Join Paul Beyer and team as they discuss Google DeepMind's Gemini updates, NVIDIA's Llama 3.1, and other significant AI developments. Essential insights for AI leaders.
File
092524 NVIDIAs new Llama-3.1-based model pushes the limits - AGAIN AI News by GAI Insights
Added on 10/01/2024
Speakers
add Add new speaker

Speaker 1: slide. Yes, this coming up. Hello everyone, I'm Paul Beyer and welcome to our daily AI news. We are in a skeleton crew situation today, but we soldier on because the news never stops in AI land. So today we have Luda and Ankaj holding down the fort. Our colleague John Sviyoka and others are on planes. Adam is flying from Australia to the U.S. in preparation for a conference coming up in a week or two and see some family. And our colleague John is at Detroit with some customers here. So lots of news as usual. Ankaj, you want to kick us off the first one, please?

Speaker 2: Yeah, sure. So the first one that we have for today comes from Google DeepMind and it's around their updated production-ready Gemini models and they have reduced the pricing for 1.5 Pro, increased rate limits, and a lot more news. So Luda, what do you think? No one's increasing

Speaker 3: pricing. I know. This is fascinating, isn't it? That it is the second price decrease that we're seeing in the last couple of weeks, right? So I actually think it is important news. The announcement has a number of features that Google is introducing and Gemini is now almost competitive with everything else. So I feel that this news is important.

Speaker 1: What do you think, Paul? Yeah, I mean, I think it's what I don't have enough data on. I would like to understand and ask more companies how much of this is a factor of Lama 3 being out there and putting pressure on it, or is it really just Bellwether and OpenAI continuing to lower their prices? And as they do it, everyone else lowers as well. As we saw in our briefing with OpenAI and some of their announcements, they're lowering down as well. But yeah, I think it's a new and improved model and it's cheaper. So I think it's great for the AI leader and important just because who they are. I think it's important for people to understand the trends because I think the trends as we move into 2025 budgeting is going to be material. We're dealing with more companies starting to think about total cost of ownership. We're seeing more companies seeing pickup on uses. So inference costs are coming up and all these trends are going

Speaker 3: in the way to benefit AI leaders. Yeah. So the notable feature of this, it's faster output and 2x faster output and 3x lower latency. And we will be talking about the NVIDIA model later on, right? And that's another pressure point as well. So to stay competitive, you need to be looking at inference costs and latency and all of those features need to be considered. Also, so that's important. Also, like NVIDIA is demonstrating, there is this router feature being used at the inference time. So you could use whatever model is generating the better quality and cost combination. And as a result, now, Gemini 1.5 Pro will be in the running right for the inference side. So I think all of those factors are putting the pressure on the providers to get the cost down. Great. So important. Next one on cache, please.

Speaker 2: The next one comes from Ben Thompson, and he talks about enterprise philosophy and the first

Speaker 1: view of EA. Yeah. So this is a long article. If you scroll down, I have this as optional, Luda, but I do think there's some interesting things. Ben, if you scroll all the way down, he kind of walks through manual automation, green screen. I think near the end, we start talking about agents and what will the... Keep going. This one right here. What will this intelligence infrastructure stack look like, I think is a really interesting one. We've been calling ourselves the enterprise intelligence application. And I'm personally starting to believe that just as we see when clients serve in mainstream, we're starting to enter a 20-year period of redoing the entire corporate stack with AI native things. It's going to take a long time here. But I think that's the interesting discussion, but this is a long article. It's speculative. We don't know where it is, and I don't think it's worth time for our AI leader. So

Speaker 3: Luda, I had as optional. Yeah, I think so. It's an interesting read. It puts sort of historical things and current waves of an AI in perspective. So for people who are interested in sort of this mega movements and want to step back, this is an interesting step back and think about it. This is an interesting article, but I agree, optional. It is interesting looking at the old

Speaker 1: pictures because automation did happen and we kind of forget this world and we're kind of so enamored with our current technology and not realizing that we're all seeing on aging banana technology every single day because this stuff just getting faster, cheaper, better every single day. Yeah. So an optional. The next one on coach.

Speaker 2: The next one is a research paper, and it's a survey on retrieval augmented generation and also talks about how LLMs can use external data more wisely.

Speaker 3: Luda, what are your thoughts? I actually thought that we should put it as important. It's the type of survey that gives you advice on what is working, what is not with data. So the authors take the various queries and put them into four blocks. It's factual and implicit facts and the like. And then they talk about how the various models, how the various techniques actually work and in getting it better, getting the answer better and with quality. I think this is very useful. So if you go to the to the PDF on cash and scroll to page, I think, 16 at the top, that gives you a visual picture of what those various methods. Yeah. Right here. Right. So it it gets it into I can't quite see into if you are after explicit facts, if you are after implicit facts, interpretable rationales and hidden rationales, and it proposes or assesses what you can use and what is better. So obviously, when you're looking for rationale, if you can use a prompt tuning, you could build a more more information about knowledge and the like. So it is an interesting article, and I think people will get value out of it. So I

Speaker 1: rated it as important. Yeah, it's interesting. I mean, we've said since the very beginning that companies want my chat GPT with my data. And it's interesting that we're going through these product briefings with these vendors for a buyer's guide that's coming out in November here. And just about every single one of them has or has on the road map easier and better ways for companies to get their own data, PDF, PowerPoint into some type of intelligence application application to to get more value out of it. And I think they're all struggling with the same things. How do you get value out of embedded images that have numbers inside it? How do you get value out of tables? And I think this is actually taken to the next level of really understanding some of the nuances of extracting insights out of that, not just data.

Speaker 3: Right. And it provides the various companies and methods that are out there with references and the like. So if you if you are optimizing something in your company, this is a very

Speaker 1: useful article to read. Great. So important it is we talk about 90, 85, 90 percent of all corporate data being unstructured and how Gen AI lights up value and allows you to have a conversation with unstructured data. And this is an improved technique on that. So important.

Speaker 2: Next one on. The next one comes from MIT Management Sloan School, and it talks about how different companies are using Gen AI to execute with speed.

Speaker 1: Yeah, this was signed to me. I thought it was interesting summary of a lot of use cases we've talked about. I didn't find it particularly novel and I had it as as optional. And I'm sure the AI leader who's paying attention is going to see anything particularly new here.

Speaker 3: It's short and sweet, though, Paul, and it's very large companies. What was done, how much data and documents was loaded into the tool? I don't know. I was thinking maybe important and middle back pocket for using in supporting whatever you're doing, because this is essentially anybody and everybody could use it. That's kind of the message. I think I think. Well, and I think the

Speaker 1: back pocket of Tim's view, the AI leaders need to be aware of what stories with other big companies doing this. It fits in with that. And it is just case study centric and not speeds and feed centric here. So great. Let's go with the important there. Next one on. Great. The next one comes from

Speaker 2: Microsoft. And in one of the new releases, Microsoft is claiming that it has a new tool

Speaker 1: which can correct AI hallucinations. And before you jump in there, Luda, Vivek is agreeing with you in the last article. So thanks, Vivek, for. Oh, thank you, Vivek. Yeah. So how about this one, Luda? Is this one yours or mine? Yeah, it's mine.

Speaker 3: Um, so this is Microsoft and they released a tool that's called Correction. This is nothing new. We've talked about the other model evaluating what the LLM gave you as an answer and then either correcting it or marking the answer as an as not quite right. So I I rate it as optional and I agree with the with the tech crunch here is putting all kinds of opinions to question whether this will work or not, because they essentially they are looking for to do the correction. They are looking for the authors to put in the body of knowledge to check against or Google or whatever. And then some people point out that the correction model also can hallucinate. It is unfortunately not the tool that is going to be used very much. So I put this as optional.

Speaker 1: So do three hallucinators working together get the truth? Maybe. All right. Optional. Hello, John, from Michigan. Welcome. Hey, option on this one. Last one on couch, please.

Speaker 2: The last one comes from NVIDIA and it talks about how NVIDIA has fine tuned Lama 3.1 to make it Lama 3.1 Nemotron 51 billion parameter model.

Speaker 1: This one I had this one was mine that analyze. I mean, I find it really interesting. John alluded. I mean, this verticalization war that's going on is massive. And we have watched Metta invest billion plus in the coming out with, you know, quote, open source or free download models that other parts of the ecosystem are taking and optimizing. And this thing that we're watching, the number one chip maker by far taking it, sticking on their own chips, optimizing it and giving AI leaders something that could be way more efficient on throughput efficiency than what they might get from some of the other competitors out there. So I had it. I had it as optional, but I think it fits in with the speeds and feeds trend we're seeing out there. Maybe it's floating

Speaker 3: to important, but I'm I almost think it might be essential, Paul, because this is this can fit in on one chip. Right. So they're using this notion of the router they introduce sometimes. Right. And they're changing the architecture, optimizing for memory imprint, for speed of inference, for accuracy. This is what is going to change the dynamics of all these hallucinations and the like over time. Right. Because they can now affect all of those parameters in real time. So I think I think important to essential in my book. John will probably.

Speaker 4: Yeah, no. Yeah, I think I lean more toward essential, you know, because it's the if you go back to the article for a second, look at the performance curve and how this this is basically off the performance curve in a good way. The just early in the article, it shows the tradeoff between size and performance. I think it's earlier before this. Anyway, the so I think their optimization. The other thing is that, you know, they've got a business model here where they're making a ton selling chips so they can give way. Yeah, that one. Sorry. The efficiency frontier. So you look at accuracy and throughput. I mean, that's you know, they're they're creating a new performance curve on a smaller model and a new training method. So I I think it's essential, especially given how powerful it is.

Speaker 1: Yeah, I think it's really interesting, John. I mean, they are we haven't done a briefing with them. I know we've got one of the representatives speaking at our conference here. I mean, these guys are absolutely going for a big part of the stack. You know, they're not giving up that we're just going to be chip players and stay in our layer here and stay in our box and not provide widely innovative, high value solutions for AI leaders. And I think they're going to see it. And I think we're going to see a market share that might be very different three years from now. What type of LLMs are being driven, driving a lot of the share out there.

Speaker 3: Right. And they are working on they have the money. They're working on vertical integration right with all and with all of those open source models that are out there. And if you scroll down on the accuracy, that is quite stunning. Actually, the results that they achieved to go, go, go, go, go here. So on on all these benchmarks, they're beating on the smaller model. They're beating the results of other models. So this is this is quite significant. And they're not done yet. It's it's work in process. It's optimization. Here's here's Microsoft coming with correction right versus this. This is way, way more advanced than they're doing it by changing the real time architecture. So the real time inferencing model architecture. And that's where they can because they're in control and they're optimizing it to hardware. So this is actually going to get even better. That's great. So a little you're arguing essential with John. I put it important. I can I can certainly support essential.

Speaker 1: John, are you an essential or important? Essential. OK, essential is the video continues to be such an important one. And this is such a I think the other thing is becoming really clear to me as we spend more time thinking about the school total cost of ownership stuff here. It's not clear to me at all that the basic use cases for employees three years from now is going to be much different than it is now. So employees are going to need to summarize PDFs and, you know, create job descriptions and everything else three years from now. And if there's a way for me to do in a model that's good enough, that's way more efficient and cost effective here. This fits in a performance price thing here that fits with the good enough, if not superior. I mean, it's really so differently. The commoditization of the tools that hit basic use cases is happening at a breathtaking pace. Absolutely. Yes, I agree

Speaker 4: with that. Yes. Yeah. The deflation of cost is going to continue. Great. We had it. Sorry. And

Speaker 3: it's coming. I keep harping at this inference side right at the runtime. I think we have not seen anything yet from Nvidia and and other players, because that's where the game will be played, I think, with the router and the like. So other models will play at the real time.

Speaker 1: They will compete for cost and accuracy. One of the things that are... Go ahead, John.

Speaker 4: I was going to say it reminded me of, you know, if you think of this whole activity as a semantic operating system, back when the operating systems were going back and forth, you know, you had a lot of investment in, you know, speed of the chips. And then all of a sudden it went to the bus architecture. And then it went to DASD, you know, how much, you know, solid state memory they have versus disk drives. And then it made the disk drives faster. So if you think of the, you know, training and inference as two parts of a semantic operating system, basically we're shifting over to

Speaker 1: now we have to optimize inference. Yeah. Yeah. Yeah. One of the things that we're doing on the buyer's guide is the team's looking at overall VC investment in 2023 and how that compares in AI versus investment now. And it looks like institutional investment was about 20 billion last year. It might be about 30 billion this year. So it's still increasing. We're also trying to get estimates of what the big tech firms are doing. And that's probably another 10 to 20 billion. I imagine that's also increasing here. So we've got just a massive amount of capital continue to go in this kind of experimentation, innovation, all layers of this stack here. So one essential, a couple important here, and that's the news for the day. John and Luda, any other highlights? No, just apologies for getting in late. No, John, safe travel. So welcome. We'll see everyone, a number of you next two weeks at our conference. Tickets are selling fast. We may actually sell out this week. If not, we'll quite for sure sell out early next week. A lot of excitement around that. And we continue to learn a lot on all the product briefings we're doing for our buyer's guide. And that's slated right now to ship 9 a.m. November 12th, a couple months from now. So thanks, everyone. Have a wonderful day. I want to thank the people who

Speaker 3: are listening. And I would like to encourage them to use comments and tell us what they're listening to, whether it's right, wrong, or whether they agree, disagree. That would be very helpful.

Speaker 1: Thank you. Yeah, Vivek, Irem, and others, thank you for joining. And we continue to find other listeners that don't comment, but do tell us when we meet in person that listen to this either synchronous or asynchronous. So it's always great to get feedback. But thanks, everyone. Have a fabulous, fabulous day. Thanks, everybody. Thank you.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript