OpenAI Debuts GPT-4o ‘Omni’ Model

Kyle Wiggers, reporting for TechCrunch:

OpenAI announced a new flagship generative AI model on Monday that they call GPT-4o — the “o” stands for “omni,” referring to the model’s ability to handle text, speech, and video. GPT-4o is set to roll out “iteratively” across the company’s developer and consumer-facing products over the next few weeks.

OpenAI CTO Mira Murati said that GPT-4o provides “GPT-4-level” intelligence but improves on GPT-4’s capabilities across multiple modalities and media.

You should watch at least some of the 25-minute live-streamed announcement today, to see — and especially hear — some of the demos. GPT-4o is extraordinarily conversational, and the female voice they used is remarkably emotive. Response times seem very impressive, and in conversation GPT-4o allows you to interrupt it when it’s going in the wrong direction or just blathering.

But my first impression is that it’s too emotive — too cloying, too saccharine. It comes across as condescending, like the voice of a kind kindergarten teacher addressing her students. I suspect, though, that they turned that dial up for the demo, and that it could easily be dialed back. And it really is impressive that I can complain that it might be too emotive. Also impressive: GPT-4o will be made available to all users, including those on the free tier.

OpenAI also announced a ChatGPT Mac app, a sort of Spotlight / LaunchBar / Alfred / Raycast type thing that they’re even calling a “launcher”. It’s supposedly available now to a limited number of users, and rolling out to everyone in the coming months.

Monday, 13 May 2024