xAI’s Grok-3 Jumps to the Top of AI Leaderboards

Alex Heath, writing for The Verge:

Just a few weeks after everyone freaked out about DeepSeek, Elon Musk’s Grok-3 has again shaken up the fast-moving AI race. The new model is ending the week at the top of the Chatbot Arena leaderboard, while the Grok iOS app is at the top of the App Store, just above ChatGPT. Even as Musk appears to be crashing out from his newfound political power, his xAI team has managed to deploy a leading foundational model in record time. [...]

While its Deep Research reports are nowhere near as in depth as OpenAI’s, Grok-3’s “thinking” capabilities appear to be roughly on par with o1, according to Andrej Karpathy, who noted in his deep dive comparison that “this timescale to state of the art territory is unprecedented.”

Benedict Evans, back in 2021, observed:

Elon Musk is a bullshitter who delivers. This breaks a lot of people’s pattern-matching, in both directions.

This summation of Musk is more apt, and more useful, today than it was four years ago. The Boring Company is seemingly a complete fraud, and he’s been making unfulfilled promises about Tesla “full self-driving” for over a decade. But Tesla Motors has done more to make electric cars mainstream than all other automakers combined. Starlink delivers extraordinary satellite Internet service, with no real competitors. SpaceX has rejuvenated the rocket industry. xAI seems to fall on the “actually delivers” side.

Twitter/X seems to fall squarely in the middle. It’s a mess in many ways, and seems not one iota closer to Musk’s promised vision of an “everything app”, but under Musk’s ownership it has been transformed, and while it isn’t more popular than it used to be, it also isn’t less (or much less) popular. It’s just a different somewhat scummier audience and vibe.

My betting money says the whole DOGE thing is very much on the bullshit side, but Musk’s overall track record spans the gamut from outright scams to extraordinary historic accomplishments. He’s such a prolific and shameless bullshitter that I wouldn’t take Musk at his word about anything, even what he had for lunch. But I’d be loath to bet against him on an engineering endeavor.

Wednesday, 26 February 2025