DeepSeek vs Gemini and GPT-4: Search Capabilities Compared
Explore how DeepSeek's search features stack up against Gemini and GPT-4 in accessing and processing real-time information efficiently.
File
DeepSeek vs. Gemini Flash vs. GPT-4 AI Search Showdown
Added on 01/29/2025
Speakers
add Add new speaker

Speaker 1: DeepSeek added search capabilities to R1, so now it can access real-time information. But how good is this search compared to the other offerings? Gemini can access internet through the grounding function, and ChatGPT can also use search with GPT 4.0. Now let's see how the search of DeepSeek compared to the other offerings. For Gemini, I'm going to be using Gemini Flash Experimental because it has the answer grounded in Google search. Unfortunately, the thinking model does not have search access yet. Similarly, for ChatGPT, the reasoning O1 model doesn't have the ability to do search yet. If you select O1, search is unavailable. That's why we're going to be comparing with GPT 4.0. Okay, so I'm going to first ask something that is pretty new. So let's see if it can find information about Trey AI. We're going to run the same query through all three models. Now, if you're not familiar with Trey AI yourself, this is a new AI IDE from ByteDance, who are the parent company of TikTok. I recently created a video on this. Link is going to be in the video description. You can use it for free. And it's a really good alternative to something like Cursor or WinServe. All right. So first, let's look at GPT 4.0. It says Trey AI is not a widely recognized term or entity. If you are referring to a specific concept, organization, or technology named a Trey AI, it might be new or niche product, service, or framework. So it seems like since ChatGPT uses Bing search for indexing, seems like the Bing hasn't indexed Trey AI yet. Now, Google Gemini says Trey AI is a new adoptive AI integrated development environment created by ByteDance, the parent company of TikTok. It is designed to help developers write code faster and more efficiently through AI-powered assistant. And here are the other features. It is based on VS Code. So this is pretty great. Now, it says it's multi-platform. I have only tested it on macOS. It says window version is under development. So that seems to be true. OK. Let's look at DeepSeek. So it actually took about seven seconds. Now, for some reason, the internal thinking is in Chinese. But the answer is Trey AI is an AI-powered integrated development environment launched by ByteDance designed to enhance programming efficiency through advanced AI models like Cloud 3.5 and GPT 4.0. These are the only two models it's currently supporting. So this is pretty good. And now it has this builder mode, which I tested in my video. And all the other features that it is talking about seems to be accurate. And even it listed the potential competitors, which is Cursor and WinSurf and traditional IDEs like VS Code. It is based on VS Code. OK. So it seems like both Gemini and DeepSeek are pretty good when it comes to web search. OK. Next, I'm going to run this query, how much VRAM the new RTX GPUs have. Create a table and compare it with the previous generation. Now, I am not telling it what is the current generation. So it has to actually look on the web to figure out what the current generation is and then figure out what the previous generation is. So let's see if it can figure this out. So I'm going to run the same query through all the three different LLMs. OK. So we're going to start with Gemini this time. It says here's the table comparing VRAM of the current generation RTX40 series and the previous generation RTX30 series. So this is pretty surprising because I was expecting Gemini to have the up-to-date information. But it's saying the current generation is 4090 or the 40 series instead of the 50 series. This is extremely surprising and I would say a bit concerning. So not a good start. OK. On the other hand, ChadGPT says here's a comparison of the video VRAM specification between Nvidia's latest RTX50 series GPUs. Now, let's see. So 5090, 5080, 5070 and it also listed the 5060, which I don't think is released yet. But the VRAMs that it's listing seems to be correct. On the other hand, there is a 40 series and these values also seems correct. Now, in terms of the actual sources, it's just using the verge as a source. OK. And here are the results for DeepSeek. So it says here's a detailed comparison of VRAM specification of Nvidia's RTX50 series. Let's see what the values look like. Now, this provides a lot more detail in terms of the memory bus, the memory type, the bandwidth details, right? And the value seems to be correct. It also listed up to 5060. And now there is the same information listed for RTX40 series as well. So this is pretty neat. Now, here's the actual internal thought process. So it says, all right, the user is asking about the VRAM requirements of the new RTX GPUs and wants a table comparing them to the previous generation. So first, I'll look for the mentions of RTX50 series and their VRAM specifications. Then it actually looks at individual websites. And from individual websites, it's extracting information. So let's see if there are references here. So this is gadgets.com, ndkings.com. And here's an official website from Nvidia. Now, I think one thing to compare these different systems would be the quality of the websites that it uses. I think it's using pretty good references in here. But let's say if you were to ask something more technical, it's very important to look at the references that it's using. Now, for that, I'm going to ask it to write a report on test time scaling and the rise of reasoning models include references. Let's see if it's going to pick academic references or something like medium.com. Okay, so we're going to start with Gemini. And the first thing I want to look at is the references it's using. Okay, so it's mostly using academic papers, which is pretty good. For example, it's using attention is all you need. That's where everything started. Then this is a paper, I think from OpenAI. Language models are a few short runners. Now, one thing is that it's using pretty old papers. So nothing recently at all. And actually, I think it's important to look at the report itself. So first, the introduction, then test time scaling challenges, computation costs, latency, energy consumption, and scalability. Now, most of them are really old approaches. So it's not really looking at the latest approaches. For example, something that DeepSeek R1 is using or supposedly what O1 is using. Okay, so for Chat2beauty, again, it's mainly relying on academic papers or blog posts. For example, here's a blog post from OpenAI and another one from Anthropic. And these two are pretty recent. So this is really good. All right, now let's look at the references that are being used by DeepSeek. So again, these are academic papers mostly. But if you look at the dates, they are most recent. Even I think this is ACL, Benchmark and Temporary Reasoning for Large Language Models, right? So a lot more recent data. Now, it's also looking at a couple of blog posts. This is hosted on GitHub, but still, I think that really high quality. But let's actually look at the report itself. I'm going to just compare Chat2beauty versus the DeepSeek model. Because Gemini, for some reason, I think is referring to very old information. So understanding test time scaling. Test time scaling refers to the ability of the model to dynamically adjust their computation resources during inference. This scalability allows for enhanced performance, flexibility and efficiency. Now, it doesn't really specifically talk about like how it does it. Here's information related to rag and chain of thought prompting. Yeah, I think it's really missing the main techniques that are used. Okay, so now when it comes to DeepSeek, it says test time scaling also turn inference time scaling represents a paradigm shift in the AI where computational resources are located during problem solving, rather than solely during training. This approach enables large language model to think longer by iteratively refining their reasoning process, leading to improved accuracy and complex tasks like mathematics, coding and logical problems. And the rise of reasoner models exemplify this trend combining system two style deliberate reasoning with advanced strategies, right? So this definitely seems to be a lot more accurate compared to what the other models were suggesting. I can actually see myself using DeepSeek for tasks like these. Now there's one surprising thing. For example, the references in here that are listed are not on in order. And it seems like it's using some references during the thinking process and discard them when it's generating the final response. So here's the thought process. It says, okay, I'll write a report on test time scaling and the rise of reasoning models, including references. Let me start by going through the provided search results to gather relevant information, right? And you can see that it actually refers to individual web pages and extract information from those individual web pages and combine them to generate the final result. Now, GPT-4.0 cannot do that because it's not using the test time scaling like 0.1. But unfortunately, 0.1 doesn't have internet access yet. Similarly, Gemini also has the same limitation. So it cannot really reason about the information that it's receiving. So I think that's why sometimes you would see that the responses are not that great. Okay, next I asked it, what is the difference between system one and system two thinking? And what level is OpenAI operator? So it found about 46 different results during the web search. And now it's going through the thinking process to generate the response. So for this chart, GPT gave me a definition of system one and system two thinking. But I don't think it's aware of OpenAI operator feature. So it simply says OpenAI models like GPT-4.0 operates at a closer system one thinking. Then OpenAI models can mimic aspects of system two thinking. So it doesn't really know about the OpenAI operator yet. Similarly, Gemini, I think has a similar issue. So it says system one and system two thinking gives me the definition of both of them. And here's a pretty nice summarization in a form of table. So this is pretty neat. But then it says, where does OpenAI's operator fit? So let's consider where OpenAI operator like a large language model fits into this framework. And I think it's thinking that we are referring to the O1 models. So it doesn't really know about the operator feature yet either. And it's not using web search. Now here's the response from DeepSeek. Again, they are using the same reference. Because Daniel came up with the concept of system one and system two thinking. And we also have a pretty nice concise table which summarize the difference between the two. Now I think for some reason, it thinks that the O operator refers to the O1 series. Now it does use internet, but not specifically for OpenAI operator, but rather for system one and system two thinking. So might have might actually tell it specifically to look for operator. Now in a couple of cases, I saw that Gemini and even GPT-4 just decided to use its internal knowledge rather than looking up information on the web. DeepSeek on the other hand, just look at the internet and then formulate a response based on that. So from my experiments, it seems like DeepSeek seems to have most up-to-date knowledge. And the results it generates are much better compared to Gemini and GPT-4. Now unfortunately, the search feature for DeepSeek is not available through the API. You can only access it here on their web interface. In contrast, you can enable grounding through Google search on Gemini API, which is pretty neat. But GPT-4 also doesn't give you a web search behind API. So as a developer, if you need web search, Gemini API is still a good option. Or if you want to use GPT-4 or DeepSeek, then you will need to use some external API providers to do web search for you. You can also upload multiple files and it seems to be able to do visual understanding, which is pretty neat. I'll test this out in another video. Anyways, I hope you found this video useful. Thanks for watching and as always, see you in the next one.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript