Mar 115 min read

Claude 3 Opus Destroys ChatGPT on Paper but Won’t Be Enough to Overturn OpenAI’s Market Dominance

In the ever-turbulent world of generative AI, Anthropic has been stirring the pot quite vigorously. Picture this: It's the autumn of 2023, leaves are turning, and Anthropic, fresh off the boat from securing a hefty bag of cash in its latest funding escapade in October, hit the ground running. Without missing a beat, they unveil Claude 2.1 to the world by the end of November. But why stop when you're on a roll? Merely three months down the line, they're at it again, this time with Claude 3 – a trio of cunningly crafted artificial intellects that go by the names Haiku, Sonnet, and Opus. It's like choosing your poison, but instead of toxins, you're picking between varying degrees of brainpower, each with a price tag that scales with its capabilities. The catch, though, is a classic case of patience versus pace: the smarter they get, the longer they ponder, leaving Haiku to win the race in speed, if not in wisdom.

Claude 3 Model Family Performance to Cost Chart

Now, let's talk about the heavyweight of the family, Claude 3 Opus. This model isn't just smart; it's like the valedictorian of AI, claiming to leave giants like OpenAI's GPT-4 and Google's Gemini 1.0 Ultra in the dust across the board. We're talking about a machine that can breeze through undergraduate tests, solve math problems with ease, and have more common knowledge than your average quiz night champion. On paper, it's the first of its kind to outdo GPT-4 across all fronts, setting a new benchmark in the AI rat race.

Benchmarks of Claude 3 Model Family VS. Competitors

Vision Benchmarks of Claude 3 Model Family VS. Competitors

Now, amid all this fanfare and the spectacle of benchmarks that Anthropic has been flaunting, one might wonder if this is the beginning of the end of OpenAI's reign. However, we're not the sort to take things at face value, entranced by mere numbers and claims. So, in the spirit of good old-fashioned skepticism, we're rolling up our sleeves and putting Claude 3 Opus and GPT-4 through the wringer with two tests.

"This is the Rolls-Royce of models, at least at this point in time"

Dario Amodei, CEO, Anthropic

Does Claude 3 Live Up to its Hype?

The first test we threw at the LLMs was the infamous logic test that has been rumored to be given during Amazon interviews, commonly known as the Two Poles and a Cable Puzzle. The prompt goes like this:

A cable of 80 meters (m) is hanging from the top of two poles that are both 50 m from the ground. What is the distance between the two poles, to one decimal place, if the center of the cable is 10m above the ground?

Visual Illustration

Before you dive into solving this riddle, here's a hint: it's less about the numbers and more about seeing the big picture. If you split the cable in half, you're left with 40 meters dangling from a 50-meter pole, just reaching down to that 10-meter mark above the ground. The twist? The cable isn't stretched between the poles at all; it's folded upon itself, meaning the poles are essentially stacked, zero meters apart.

Armed with this solution, we pitched the puzzle to both ChatGPT and Claude 3 Opus, giving them a single attempt each to crack it. Here is ChatGPT’s result:

ChatGPT, bless its digital heart, went around in circles but couldn't pin down the solution, proving that even AI can get tied up in knots. ChatGPT’s final words in its answer were as follows:

Here’s the link to the full thread with ChatGPT if you’re curious. On the flip side, Claude 3 Opus cut through the complexity like a hot knife through butter, delivering a precise and swift solution, showcasing its superior math skills and perhaps a sharper logical edge.

Claude 3 managed to solve it in a fraction of the time it took for ChatGPT’s attempt as well. To note Claude 3 Sonnet (the free version), could not solve this puzzle. So far, it looks like there is indeed some substance behind the hype. But we are not done with the tests yet.

The next mirrors a real-world use case many of us encounter: extracting data from a dense PDF. We chose McKinsey's latest 126-page tome on AI for this purpose.

This was a test of the LLMs' ability to navigate through a sea of information and fish out the precise data points we needed without missing data from diagrams or illustrations. The task was straightforward: find the estimated demand change for computer systems analysts as depicted in a diagram below on page 72 of the PDF.

Behold the results by ChatGPT and Claude 3 Opus below:

ChatGPT

Claude 3 Opus

The outcome, however, was a reality check. Both ChatGPT and Claude 3 Opus came up short, latching onto data about electrical engineers from the text but missing the target data from the diagram. This round served as a reminder that, despite their advancements, LLMs might still require a human touch to ensure no detail is overlooked whenever they are working on documents.

Who is the Winner Claude or ChatGPT?

The question of supremacy between Claude 3 and ChatGPT is a tale of two cities. On one hand, Claude 3 emerges as a paragon of logic, a mathematical maestro, and a beacon of technical prowess. Yet, on the other, ChatGPT shines brighter with its treasure trove of features like the GPT store, the magic of image generation, collaborative team workspaces, and its ability to tap into the flowing rivers of live internet data. While both are juggernauts in their own right, capable of fulfilling our linguistic and literary quests, they show their limits when pushed beyond, such as when tasked with extracting information from documents or maybe tasks that require non-linear thinking.

The saga unfolding before us is reminiscent of the eternal dance of rivalry, much like the performance tango between NVIDIA and AMD in the realm of graphics processing units. Each iteration leaps forward, pushing the boundaries with new features, capabilities, and benchmarks that leave enthusiasts and professionals alike eagerly awaiting the next move. NVIDIA, in this dance, has carved out a lead, a trailblazer setting the pace. Similarly, OpenAI has etched its name in the annals of AI history, commanding the stage with its innovations.

Yet, as the clock ticks and pages of the calendar turn, Anthropic's Claude 3, a year younger and brimming with ambition, has started to outpace ChatGPT in several respects. This burgeoning rivalry leaves us pondering the future. Should GPT-5 enter the arena without a decisive edge over its competitors, or should OpenAI linger too long in the wings without unveiling its next act, the throne upon which it sits might begin to wobble. In the end, the race for AI supremacy is not a sprint but a marathon, with twists and turns that could see leaders and challengers exchange places in the blink of an eye.

Comments