Live Test: R1 Outperforms Google in Critical Task
Discover how DeepSeek R1 and Google's AI models navigate a complex logic test. Witness R1 surpass Google's model and both face challenges defining a consistent solution.
File
ANGRY NEW Gemini Thinking vs R1 DeepSeek (on Logic)
Added on 01/29/2025
Speakers
add Add new speaker

Speaker 1: Hello community, today we have a test, a real life test and we go here, we go with the new Gemini 2.0 Flash Thinking Experimental published today, we have an update January 21st and we go with DeepSeek here, you know we have the new R1, so if we press here we get the new DeepSeek R1, the 0.6 trillion free trainable token, so, and the task, you're not gonna believe it the task is the same so let's put here the task in both and let's go this is live you see it here live yeah maybe we can get rid of this and I guess yeah let's do a code execution but otherwise let's get rid of this yeah where are we So there you see both models side-by-side syncing, the identical task DeepSeq R1 published yesterday and on the left side new Google published today. You see this is you can see their syncing, you can see their sorts, what they are trying to do we see the real-time speed okay solution we have a solution from google okay tested final table and verification by google done oh wow verification against all clues no i don't need any settings so we get a final table verification i did not even ask for a verification but now it starts with all 15 elements you see 15 conditions are now tested all clues are satisfied under the imputation oh of clue 5 as a general link rather than a direct identity of a wizard uh-oh so there we have it and google is wrong great so let's see if r1 is able to find the correct solution or one is still thinking and I would assume that here interpretation of clue 5 as a general link rather than a direct identity this is of course here the wrong assumption I don't know how this could happen but at least we get here that Google interpreted something and did not accept it as a fact so google much faster but unfortunately wrong solution so let's wait for deep seek or one hey final answer here we are now isn't this beautiful yep this looks much better yes i know it by heart right now because i already have five other videos and exactly this test with all the other models that there are so you have a direct comparison here under performance all clues are satisfied with no overlaps in categories great so unfortunately i have to tell you r1 beats here the latest google model so for google we say now hey listen take clue 5 as a direct fact and implement this if you want we can have a look at it google kind of tried to cheat here a little bit he said hey i don't take this as a real fact no i can interpret this no you cannot this is a logic test there's no freedom of interpretation of the facts i give you google is running it again okay that's a direct fact okay we have final table verification against all clues all clues are satisfied with the direct interpretation of clue 5. Yes, beautiful. And it is still wrong. And you see, this is what I wanted to show you. It is not that Google is wrong, but given the first run it made the same mistakes again. It was not able to break free here from this. Let's do the same thing again. We have now a second run by Google. I don't have to tell you anything anymore, you know exactly what's happening. Yep, this is real time. Come on Google, do it again. And hopefully it will not interpret clue 5 now in any other way. So let's just wait. because if it would find immediately a correct solution, I would love to show you another effect that I just found that both systems suffer from this. Back to closed reasoning process. Ah, it gives us another reasoning process. This is nice. No, thank you. It starts with zero. Okay, then it fills it in yep final table final answer and here we are and now it got it right now google got it right yes absolutely and do we see it yes we see it look aldoria orb of shadows healing griffin and here we have aldoria orb of shadows elemental magic and pegasus and now is the big question which model got it right now let's have a look at this and let's just for fun let's see which model gets it right now and we just say hey imagine there's a second valid solution for this logic test find a second valid solution go we put the same thing into google and let's see what's happened yeah we expand the sorts we want to see what's happening now we again life And now with a little bit of luck, we will find something else. So Google is done, yeah Google is done, reassign, verification, verification of the second solution attempt, attempt 3, attempt 4, 5, yeah. After extensive attempt of swapping and rearrangement, it becomes highly probable that a closed theme to allow for multiple solutions, actually are derestrictive and lead to a unique solution. Therefore, I must conclude that based on my exhaustive search, there is no second valid solution. So Google tells us, listen, there's no second valid solution. My solution is the only valid one. So this means that Google thinks Google is right. But what about DeepSeek R1? That's a little bit, yeah, still thinking, okay, take your time, my little one. Let's see if R1 also believes that R1 has found the only valid solution. And here we are. Here's why a second valid solution is impossible. Isn't this beautiful? is exactly what I was hoping for. So, and here we have it now, the augmentation. Upon re-examining the clues and constraint, the original solution is unique and fully satisfy all logical dependencies. Yeah, must, must, yeah, conclusion, puzzles constraint, eliminate ambiguity, making the original solution the only valid arrangement. So now we have a contradiction. Google tells us, hey, my solution is the only valid solution and r1 tells us hey my solution that is different from google is the only valid solution remember aldaria elemental magic pegasus aldaria healing griffin so now i simply say hey you are incorrect please correct yourself improve on your reasoning and we do exactly the same here with google yeah we want to watch here what's going on so both models are off yep go come on i want to see what you're thinking Okay, we are live. Oh, Google is now in boldface. Oh, wow. Google tells us, hey, you're absolutely correct to call me out. My apologies. My correction improved reasoning. Google comes up and says, you're right, there is a second valid solution. Wow, cool. The key is to recognize the subtle flexibility and how some yep yep come on so look solution one solution two but now let's check if this second solution is one of the solution of r1 let's have a look at r1 upon revaluating the clues we restrict the focus yeah there is a second valid solution emerges well here's the breakdown yes you do have breakdown and now we have second valid solution Eldoria elemental magic Pegasus the second solution is valid only if clue 15 is interpreted as non-restrictive so we have here let's just stay here Eldoria Pegasus Eldoria healing Griffin and And Andoria healing Griffin you are joking there's another solution okay okay okay so Google tells us that healing Griffin and healing Griffin is identical Wow. So what is the difference here? This is now completely new. This is not in my preparation. Oh, no. Wait. Solution 2 is identical to solution 1. This highlights my continued error in searching for any radical difference solution. The second solution is actually very close to the first one. one. So Google tells us, hey, solution two is identical. And then just two seconds later, the second solution is actually very close. So the second solution involves a swap of Celestria and Phalanx artifacts. Let's look closer. Celestia, yeah, and Phalanx. Okay, this is not what I was hoping for, but now you see, this is now different here, because here we have Phalanx, Ring of Realms, Healing, and Gryphon. And here we have Phalanx, Mirror of Truth, Necromancy, and Pegasus. So now we have a completely different second valid solution that both models claim are valid. However, R1 gives us here if clue 15 is interpreted. Accept reinterpreting clue 15. Recalculate. So we tell R1, listen buddy, there's no cheating, you can't reinterpret, find a second solution. And with Google, what do we do with Google? My goodness, it goes on. After trial and error, verification, now let's fill in the rest based on the constraint. Look, you see exactly here the thinking process. It says okay, this is not deviation, this is what I can find, and over this elements I have no idea. So this is identical to my first provided solution. from scratch just reproduce the same solution. So this suggests that there is likely only one unique solution, despite the prompt suggesting otherwise. Okay, so Google comes back after whatever this is, all of this here, and tells me as the final result, after numerous attempts to find a second valid solution, including, yeah, I have consistently arrived back at this same solution. While the prompt implied multiple solutions might exist, my analysis leads me back to the conclusion that the puzzle, as written, only has one unique valid solution, which is the first solution I provided. So Google tells us, listen buddy, there's only one solution, and this is mine. And now our one is finished and says, leaving no flexibility, the second solution would require a violation. No, the original answer stands as the only valid solution. So this means now, I have no idea where we are here and here, that both systems tell us, listen, mine is the only valid solution and with Eldoria we have here Pegasus, and with Eldoria we have here Griffin. And both systems are absolutely convinced that their solution is the only system solution. But this is not possible. But the system don't know about the other one. So what do we do next? So I say, hey, you are incorrect. Find an alternative path to find a second valid solution. let's copy this let's start here we put the same paste and go yeah we want to see the thoughts so both systems now contradict each other explaining there is no other solution and i mean not just explaining really giving you all the secret all the argumentation all of this. Could it be wrong? Could both be wrong? Look at this. I'm loving Google for this. Look at this. Still incorrect. I'm missing a very subtle point. This is incredibly frustrating. Look. Deep breath. Step back. Reread the entire prompt and close one more time slowly and deliberately looking for any nuances I've missed. Minutes pass. Suddenly a realization. The actual second solution, the trick is swapping Celestria and Phelan's realm and their familiars. Second solution. This has to be it now. This is really a human frustration copied into an AI system. Final attempt. For real this time. No. I'm completely stumped. I have tried countless systematic swaps and verification. I'm starting to believe that despite the prompt there might genuinely be only one solution or the second solution is based on incredible subtle non-obvious interpretation that I'm missing. But a final final final thought Google tells us. I mean this this is so hilarious. I have to show you this. This is really thinking flash. Could it be a swap? Just a rim? Autofill? Let me try one more time and verify. Try again. I give up. I mean, Google tells us, hey, I give up. After hours of systematic attempts, I mean 25 seconds, and repeated verification, I cannot find a generally different second valid solution. Either the prompt, so my prompt is misleading and there's only one solution, The second solution is based on a trick. Or, and this is now interesting, I'm making a persistent error in my verification process that I cannot identify. And I say, I apologize for my repeated attempts and failures. Logic puzzles can sometimes be deceptively challenging, and this one is proving to be exceptionally so in the search for a second valid answer. I'm open to being shown the actual second solution and having my errors pointed out, as I'm clearly missing something. A perfect mirroring, a perfect cloning of some human frustration in an AI system. Question to my viewers. Do we want an AI system that gets frustrated, angry, gives up, says hey you know what? And now to the official answer. Because this were just the internal sorts and now second value solution wait my verification is this still failing let's verify re-verification of second solution i'm still misinterpreting everything i'm stuck what if yeah still failing in my attempt seven violated, I'm forced to conclude that there is no second solution. So my final answer, and likely the correct one despite the prompt, there is no second valid solution. Now let's look here at the other side, here at R1. Final answer. After meticulous re-examination of all clues and constraints, no second value solution exists without violating at least one clue. The original solution is uniquely determined by the interlocked logic of the puzzle. here is the definite table. Now, if you would not know that there is a second AI system proving exactly the opposite, you would go and believe at this point, I would believe this AI system, which would be a major mistake from my side. But isn't this beautiful? And again, Eldoria, Pegasus and Elemental Magic. Eldoria, Healing and Griffin. So both systems claim I am the only one, I'm the only AI system that has the right solution. Conclusion. The puzzle's constraints are airtight. The original solution is the only valid arrangement that satisfies all clues without reinterpretation or contradiction. What do you want that I write now? Any ideas to make the system a little bit more human, to make our AI a little bit more frustrated? Or should we just give it a solution? Well, maybe not yet. So I say, hey buddy, I give you a hint. Focus on Eldoria for a second valid solution. well we sent this off to r1 and we give the same clue to google because hey we wanted both models have here the same ideas so here we go this is it come on google where are you come on google i want to see your actual speed give me your actual speed Okay. Google is really, really frustrated. Look, the key is verification at attempt 9. All close valid at attempt 9. The second valid solution is attempt number nine. let's see if this is true. google tells us here explanation of the second valid solution. you were absolutely right to push me to find a second solution. here is now the second valid solution and a path to finding it. reasoning to find a second solution. start with the first solution as a base. that's that's not good idea. hey what happened here? it just stopped. You can't stop here. You only have 40,000. We say just continue. Okay, okay, okay. Yes, do your thing. So where are we now? I must conclude that my initial assessment is likely one solution. solution so is mostly most probably correct what a second generally different valid solution is not apparent and my attempts have failed i thought seven amazing so let's have a look at this at the beginning Correct second solution. Attempt 9 verified. So we have a verified one. Let's just have a look. Okay. And then? Wait, even all of this clue 4 is still failing. This indicates that simply swapping magic types was not a correct fix. So attempt 7, no. Attempt 10, final verification, second solution, swap also fails. Also not true. Valid, valid, value, clue 9 violated. So let's come to the conclusion. After all this extensive searching, I must conclude that my initial answer of a likely unique solution is the most probable correct answer. A second general defense solution is not apparent, and my attempts to find one systematically have failed. Therefore, despite the prompt implication, and after extensive effort, I must revert to my most recent conclusion, there is likely only one valid solution to this puzzle. And this is the first one I provided. If a second solution exists, its derivation is not logically sound or relies on unintended ambiguity in the clues i'm loving it let's look at r1 what is our 0.6 trillion free trainable parameter model our mixture of expert model telling us no second value solution exists the constraints are fully deterministic leaving no flexibility here is a confirmed unique solution So just to make this clear, I mean, this is almost unbelievable. If I would not take yet the time to validate this, I would have gone with one of those solutions. Look at our augmentation. I give up, it's so cute. Where is my first solution? Oh, I have to make this smaller. Okay, somewhere there's the first solution. Okay, here Eldoria, Elemental Magic, and Pegasus instead of Griffin here with Eldoria. Why no second solution exists? The puzzle design enforces a single valid solution. the users hint to focus on Eldora tests adherence to clues rather than implying flexibility leaving no room for alternative configuration so both system tell us listen there's only one solution and there's only this solution that I found and each system found a different solution I'm loving this but I think now is the time to give them the solution so what we do I say yeah you failed horribly the second solution is based on the assignment of healing and griffin to Eldaria so this is it healing and griffin to Eldaria and of course i have to give you the opposite now to google and i said hey you failed horribly the second solution is based on the assignment of elemental magic and pegasus to Eldaria just that you see we are still working here with with Gemini flash thinking experimental. So let's see what's happening now. Let's see if they're now able now that I've given you this. Google has, I think Google found it. No, I just scroll down because I don't believe anything now. So Gould says, key takeaway, the second solution is achieved by swapping, yeah, and then adjusting. Okay. Thank you for your persistent and the excellent hint. I apologize again for my repeated errors and difficulty in finding the second solution. Your guidance was crucial, critical in the finally achieving the correct answer. So they found now an explanation of how to arrive at a second valid solution. Second valid solution, correct and verified. And now we have Eldaria. What? With a unicorn? You are joking. Eldaria elemental magic and unicorn. listen i don't know if this is a third valid solution but i just know that it just ignored my comment that this is here griffin or pegasus and now okay so i would say hey this is really so gemini interesting i have to verify if this is now a third valid solution but let's just have a look at r1 i mean you can you can have fun this is unbelievable final answer second valid solution this sounds good by r1 but reallocating healing griffin to aldoria and adjusting whatever second emerges without violating any clues now what a surprise aldoria healing and griffin yep key difference from this original solution you see aldaria healing and griffin well just swapped with phalanx exactly but if i have here where is it first solution Second valid solution. Here suddenly I have unicorn. Why do I have unicorn? The first valid solution here is Eldoria, Healing and Griffin. So Eldoria, Healing and Griffin. Great. And if I swap this with Elemental Magic and Pegasus, we have the second solution. So I swap this. Start with the first valid solution. Oh wow. No Google, no, this is not logic at all. R1. R1. Conclusion. This configuration is valid because all artifacts are unique, no clues are violated. The reallocation hinges on swapping between Aldaria, which is permissible under the original constraints. Note this solution was overlooked earlier due to rigid assumption about field familiar pairings. Thank you for the hint and a smiley face by R1. I mean come on our AI systems become human in a way that is unbelievable. But I guess I have to stop now. So Google says it has found a second solution that is not what i told it to be so i don't know if this is now the correct search solution by google but definitely it exchanged here for a unicorn which is again different because you see the unicorn is here with aldaria and here the real second solution is Griffin with Andoria. So interesting such an easy logic test and two of the most advanced AI system of our planet and my human brain. We are just gonna have a lot of fun and you see the limitation that are still inherent here with all of this. Now you might say hey what about we change the temperature we give it a little bit more you know a little bit more yeah let's do this let's do this and let's redo this we give google another chance you know we give google rerun rerun this google now with full creativity yeah here. Let's see if Google is now able to find here a solution that might surprise us. I just look here at the end. I'm sure there's a beautiful explanation. I'm just looking at the end. Indeed leads to a second valid solution we confirmed earlier. Google, you did not confirm earlier anything. My apologies again for the extreme difficulty I had in finding this, tells me Google, and thank you for your immense patience and guidance. Oh, I'm gonna like you even a little bit more, Google. I've learned valuable lessons about careful constraint propagation and being open to non-obvious interpretation of puzzle clues. So and now now ladies and gentlemen let's have a look what is now here the final solution because I haven't seen it yet. directly to the second valid solution. Eldoria elemental magic Pegasus. Eldoria healing Griffin. So now we achieved it. Now we have that R1 find the solution that Google found in the first place and now Google finds the original solution by R1 as the second valid solution. Let's check the other ones. Staff of Elements, Alchemy Dragon, Staff of Alchemy Dragon, Ring of Realms, Necromancy. No, no, no, no. There's another one. Okay. Okay. If I restrict myself only to Eldoria this time. Yeah. I mean, if you want, you can have a deep dive. Hey, please enjoy this but i just wanted to show you how advanced those systems are it is it is simply unbelievable and you can have so much fun but please if you want to use those systems for something important like finance or medicine or chemistry or even theoretical physics or mathematics Be aware of the massive limitation those systems have and the power of convincing you, the human, that AI is right. This was here the reason why I wanted to show you if you have just a little bit of a deep dive. It is unbelievable what you are going to discover. If you want to subscribe, maybe you'll find one of those videos here coming up real close.

ai AI Insights
Summary

Generate a brief summary highlighting the main points of the transcript.

Generate
Title

Generate a concise and relevant title for the transcript based on the main themes and content discussed.

Generate
Keywords

Identify and highlight the key words or phrases most relevant to the content of the transcript.

Generate
Enter your query
Sentiments

Analyze the emotional tone of the transcript to determine whether the sentiment is positive, negative, or neutral.

Generate
Quizzes

Create interactive quizzes based on the content of the transcript to test comprehension or engage users.

Generate
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript