By John Gruber
WorkOS: APIs to ship SSO, SCIM, FGA, and User Management in minutes. Check out their launch week.
I got to spend about 30 minutes Monday afternoon using a Vision Pro and VisionOS at Apple Park, in a temporary “field house” building Apple constructed specifically for this experience. This was a one-on-one guided tour with someone from the Vision product marketing team, with the ground rules that I’m allowed to talk about everything I experienced, but photos and videos were not permitted.
It was a very fast 30 minutes, and the experience was, in a word, immersive. I’d pay good money just to run through the exact same 30 minutes again.
It was nowhere near enough time, nor was I able to wander far off the rails of the prepared demos. It’s very clear that the OS and apps are far from finished. But even given the brevity of the demo and constraints of the current state of the software, there are a few things I feel confident about describing.
First: the overall technology is extraordinary, and far better than I expected. And like my friend and Dithering co-host Ben Thompson, my expectations were high. Apple exceeded them. Vision Pro and VisionOS feel like they’ve been pulled forward in time from the future. I haven’t had that feeling about a new product since the original iPhone in 2007. There are several aspects of the experience that felt impossible.
Is it a compelling product, though? It’s a famous Steve Jobs axiom that technology is not enough, that you don’t make compelling products — let alone entire platforms — starting from advanced technology and working backward. You start with a vision for the product and platform experience and then create new technology to make it real. I simply can’t say whether Vision Pro is going to be a compelling product. I spent too little time with it, the software as of today is too far from complete, and, most importantly, the whole experience is too entirely new and mind-bending to render any such conclusion.
But the potential for Vision Pro to be a compelling product, across several use cases, is obvious. This might be great. And without question it is interesting, and I think the fundamental conceptual bones Apple has designed for VisionOS lay the groundwork for a long future. The first generation Vision Pro may or may not be a successful product — I simply don’t want to speculate on that front yet. But even just a small taste of VisionOS made me feel confident that it is going to be the next major platform for Apple and Apple developers, alongside MacOS and iOS/iPadOS.1
A few years ago I stopped wearing contact lenses and have since worn corrective eyeglasses full-time. Before my demo, a rep from Apple took my glasses and used a device to measure my lenses to provide corrective lens inserts for my demo unit. It only took about 3 or 4 minutes. When Vision Pro goes on sale next year, buying corrective lens inserts will probably work like buying eyeglasses online from a retailer like Warby Parker — you’ll provide a copy of your prescription from your eye doctor. It’ll be a slight hassle for us glasses wearers, but not much, and it’s unavoidable. (I see pretty well close up without glasses, so I’m curious how well I might see using Vision Pro without corrective lenses.) After the lens measurement I used an iPhone to do a brief facial scan akin to registering for Face ID (look at the phone, turn your head in a circle), and an ear scan. This test was for identifying a light seal that would best block light, and calibrating the speakers.
My demo was conducted in a relatively spacious living room. I sat on a couch. My tour guide, as it were, from product marketing sat on an arm chair next to the couch. We took a few minutes to go through the strap adjustments to make it comfortable on my face. My light seal was good, but not perfect. The single biggest downside from my demo experience is that Vision Pro feels heavier on your face than I had hoped it would. It’s not uncomfortable — at least not for 30 minutes — but I never forgot it was there, and it just makes your head feel a bit front-heavy.
The first run experience wearing Vision Pro requires a few minutes of very simple eye-tracking calibration exercises. You hold your head still, and VisionOS shows a series of dots in front of you, on a pure black background. Moving only your eyes, not your head, you simply look at each dot in front of you as they appear. After a minute or two of that, boom, you’re in.
What you see at first is just ... your world. You just see the room you’re in. There’s no status information, no metadata, no indicators. Just input from the cameras on the front of the headset, presented on the displays in front of your eyes. It does not magically look indistinguishable from real life, but it does not feel like looking at a screen at all. I perceived absolutely no latency. I definitely could not see pixels — the experience is “retina” quality. Apple is not yet stating what frame rate Vision Pro runs at, but I’m guessing it runs at 90 frames per second, if not higher.
Again, it doesn’t look at all like looking at screens inside a headset. It looks like reality, albeit through something like a pair of safety glasses or a large face-covering clear shield. There is no border in the field of vision — your field of view through Vision Pro exactly matches what you see through your eyes without it. Most impressively, and uncannily, the field of view seemingly exactly matches what you see naturally. It’s not even slightly wider angle, or even slightly more telephoto. There is no fisheye effect and no aberrations or distortion in your peripheral vision. What you see in front of your face exactly matches what your own eyes see when you lift the Vision Pro up over your eyes. Imagine a set of safety glasses that used a glass treatment that gives the world a slight bit of a “film look”. A slight tint (that tint might get dialed in closer to reality by next year — to me it felt ever so slightly warm, color-wise), and a slight bit of visually flattering smoothness to everything.
To do anything in VisionOS, you start by single-pressing the digital crown button above your right eye. (The other top button, on the left, will be used to capture spatial photos and videos, but that button was disabled on our demo units yesterday). Press the digital crown button and boom, your home screen of app icons appears in a plane in front of you. Long-press the digital crown button and it re-centers your virtual world in front of whatever you’re now looking at.
The navigation model of VisionOS is a breakthrough of conceptual simplicity. You simply look at something and tap your index finger and thumb, on either of your hands. I’ll call this “tapping” for lack of a better verb. When you look at any tappable UI element in VisionOS, it is visually highlighted. This highlighting is extremely similar to the hover effect in iPadOS when using a trackpad or mouse. In iPadOS, if you move the mouse pointer circle to a button or icon, that button/icon pops a bit, visually. If you look at a button or icon in VisionOS, that button/icon pops a bit. Some apps have toolbars with a dark background and gray button icons. Look at a button and that button will turn white. Tap your finger and thumb while looking at it and it activates. That’s it. Incredibly simple, surprisingly effective.
At first I found myself reaching out to pinch the icons and buttons I saw in front of me. Like if I wanted to activate Safari I’d reach my hand toward the Safari icon and pinch it. That works fine. But it’s completely unnecessary. You really can just leave your hands on your lap. It works when you try to pinch the actual virtual icon in your field of vision because you are looking at the icon, and your finger and thumb do tap, but your hands can be anywhere. All that matters is what you’re looking at. It started feeling natural within just a minute or two, and I could feel myself navigating the interface faster as the demo went on.
The next most striking aspect of the experience is that everything in VisionOS is completely stable in space. Let’s say you open a Safari window, then open a Messages window next to it. (You can move windows around the room by pinching and dragging on a “Window Bar” that hovers underneath every window. You can move windows left/right, up/down, and closer/further away from you.) Then you turn your head 90 degrees to your side, then turn your head to the other side, and then return your gaze to the center. Those windows for Safari and Messages do not move at all, relative to the real world. Not even a little. You know how when you use AR on an iPhone or iPad — the built-in Measure app is a perfect example — and virtual UI elements or items move around a little bit relative to reality as you pan and rotate the iPhone or iPad? There’s nothing like that in VisionOS. Virtual elements are utterly stable. Obviously that’s what you want in an XR experience, but to my knowledge no other headset can achieve this stability. My understanding is that this profound stability — this palpable realness of virtual elements — is thanks to the extraordinary precise eye-tracking that Vision Pro achieves. Uncanny is often a pejorative — e.g. with the uncanny valley — but in this case I use it as strong praise. It is simply uncanny how Vision Pro makes virtual elements as spatially stable as the walls and furniture and people in the room around you.
There’s also a top-level “Environments” tab on the left side of the home screen. Everything I’ve described thus far took place in the de facto default “environment” which is your actual surroundings. But the Environments tab allows you to switch from pass-through camera reality to a virtual reality. One of the locations I experienced was Mount Hood. Choose it and it’s like you’re there. These environments are incredibly immersive, 360 degrees. And so instead of catching up on Mail and Messages in floating windows in your living room, you can do it while it appears like you’re atop a mountain, or on a beach, or, well, anywhere. It’s beautiful, and every bit as stable spatially as pass-through reality. When you’re in an immersive environment, you can turn the digital crown to adjust how far it wraps around your field of vision. Fully immersive means you don’t see anything from your actual physical environment; partial immersion dials it back from your periphery to the center, with blurred dream-like edges.
Lastly, “breakthrough” is Apple’s term for a feature that identifies real human beings who approach you while immersed. They sort of fade in, as transparent ghosts. But not spooky ghosts. It’s a seamless feature, and given my experience with other VR headsets, a necessary one to avoid being isolated from the people around you once you’re immersed. You see someone fade into view, and you can turn the digital crown to dial back the degree of immersion and truly look at them.
Apple is promoting the Vision Pro announcement as the launch of “the era of spatial computing”. That term feels perfect. It’s not AR, VR, or XR. It’s spatial computing, and some aspects of spatial computing are AR or VR.
To me the Macintosh has always felt more like a place than a thing. Not a place I go physically, but a place my mind goes intellectually. When I’m working or playing and in the flow, it has always felt like MacOS is where I am. I’m in the Mac. Interruptions — say, the doorbell or my phone ringing — are momentarily disorienting when I’m in the flow on the Mac, because I’m pulled out of that world and into the physical one. There’s a similar effect with iOS too, but I’ve always found it less profound. Partly that’s the nature of iOS, which doesn’t speak to me, idiomatically, like MacOS does. I think in many ways that explains why I never feel in the flow on an iPad like I can on a Mac, even with the same size display. But with the iPhone in particular screen size is an important factor. I don’t think any hypothetical phone OS could be as immersive as I find MacOS, simply because even the largest phone display is so small. Watching a movie on a phone is a lesser experience than watching on a big TV set, and watching a movie on even a huge TV is a lesser experience than watching a movie in a nice theater. We humans are visual creatures and our field of view affects our sense of importance. Size matters.
The worlds, as it were, of MacOS and iOS (or Windows, or Android, or whatever) are defined and limited by the displays on which they run. If MacOS is a place I go mentally when working, that place is manifested physically by the Mac’s display. It’s like the playing field, or the court, in sports — it has very clear, hard and fast, rectangular bounds. It is of fixed size and shape, and everything I do in that world takes place in the confines of those display boundaries.
VisionOS is very much going to be a conceptual place like that for work. But there is no display. There are no boundaries. The intellectual “place” where the apps of VisionOS are presented is the real-world place in which you use the device, or the expansive virtual environment you choose. The room in which you’re sitting is the canvas. The whole room. The display on a Mac or iOS device is to me like a portal, a rectangular window into a well-defined virtual world. With VisionOS the virtual world is the actual world around you.
In the same way that the introduction of multitouch with the iPhone removed a layer of conceptual abstraction — instead of touching a mouse or trackpad to move an on-screen pointer to an object on screen, you simply touch the object on screen — VisionOS removes a layer of abstraction spatially. Using a Mac, you are in a physical place, there is a display in front of you in that place, and on that display are application windows. Using VisionOS, there are just application windows in the physical place in which you are. On Monday I had Safari and Messages and Photos open, side by side, each in a window that seemed the size of a movie poster — that is to say, each app in a window that appeared larger than any actual computer display I’ve ever used. All side by side. Some of the videos in Apple’s Newsroom post introducing Vision Pro illustrate this. But seeing a picture of an actor in this environment doesn’t do justice to experiencing it firsthand, because a photo showing this environment itself has defined rectangular borders.
This is not confusing or complex, but it feels profound. Last night I chatted with a friend who, I found out only then, has been using Vision Pro for months inside Apple. While talking about this “your real world room is your canvas for arranging your application windows” aspect of the experience, he said that he spent weeks feeling a bit constrained, keeping his open VisionOS windows all in front of him as though on a virtual display, before a colleague opened his mind to spreading out and making applications windows much larger and arranging them in a wider carousel not merely in front of him but around him. The constraints of even the largest physical display simply do not exist with VisionOS.
I’d group everything I experienced Monday into three loose categories.
The first, for lack of a better term, is simply “computing”. Work, as it were. Reading web pages, talking via FaceTime, reading email or messages. Using apps. You start with an iOS-like grid of app icons, tap on apps to open them in windows, and arrange and resize those windows however you want. Text is very readable in windows. My brief 30 minute demo covered a lot, so I have zero perspective on how pleasant or tolerable it is to read for long stretches of time, but my impression is that it’s more than fine. Windows look solid, text looks crisp, and you can make windows really big in virtual space. If anything is going to seem weird or wrong about long-form reading in VisionOS, it’s not the visual fidelity, but rather the fact that I’ve never once in my life read an article or email message in a window that appeared 4 or 5 feet tall.
The second type of experience is the consumption of 2D content, like photos and videos. Watching a regular movie on a virtual huge screen is incredible. It’s way more like watching a movie in a real cinema than like watching on a TV. One of the movies Apple had us watch was James Cameron’s Avatar: The Way of Water, both in a window floating in front of us, and then in “theater mode”, which immersively removes your actual physical surroundings. Cameron shot Avatar 2 with state-of-the-art 3D cameras, and the 3D effect was, as promised, better than anything I’ve ever seen in a theater or theme park. I don’t generally like 3D feature-length movies at all — I find myself not remembering them afterwards — but I might watch movies like Avatar this way with Vision Pro. But even though Avatar is 3D, it’s still a rectangular movie. It’s just presented as a very large rectangle with very compelling 3D depth inside that rectangle.
The third type of experience is fully immersive. Content Apple commissioned and created that is inherently only consumable in three dimensions. Full immersion, like transporting you to a cliff’s edge atop a mountain, or lakeside on a beautiful spring day. Some of these immersive environments surround you completely — you can turn around 360 degrees, and look up at the sky or down at the ground (or, dizzyingly, down over the cliff’s edge). Another custom experience involved a portal opening on the wall across the room, out of which first flew a small butterfly that landed on my extended finger. This is so compelling that some of the media people who experienced it had their minds tricked into feeling the butterfly land. I did not, but I can see why they did. Then, a dinosaur — a velociraptor-looking thing, seemingly about 9 or 10 feet tall — approached the “portal” in the wall and came halfway through into the room. I was invited to stand up from the couch and approach it. There was a coffee table in front of the couch, a possible shin-banging accident waiting to happen, but the pass-through video experience is so seamless, so natural, so much like just looking through glasses, not looking at a screen inside a headset, that it took no concentration or carefulness at all on my part to stand up and walk around the coffee table and approach the dinosaur. The dinosaur was not pre-recorded. It reacted, live, to me, keeping eye contact with me at all times. It was spooky, and a significant part of my own lizard brain was instinctively very alarmed. I got extremely close to the dinosaur’s head, and the illusion that it was real never broke down. Even up close, there was no sign that it was composed of polygons stitched together or something. It looked like a 10-foot tall dinosaur that could kill me with a snap of its jaws, right there in the room with me, as close to my face as the MacBook Pro display on which I’m writing these words is right now.
Spatial photos and videos — photos and videos shot with the Vision Pro itself — are viewed as a sort of hybrid between 2D content and fully immersive 3D content. They don’t appear in a crisply defined rectangle. Rather, they appear with a hazy dream-like border around them. Like some sort of teleportation magic spell in a Harry Potter movie or something. The effect reminded me very much of Steven Spielberg’s Minority Report, in the way that Tom Cruise’s character could obsessively watch “memories” of his son, and the way the psychic “precogs” perceive their visions of murders about to occur. It’s like watching a dream, but through a portal opened into another world.
Lastly, we saw two sports demos: an at-bat from a baseball game at Fenway Park (Phil Schiller’s hands are all over that one), and a scoring play from a Nuggets-Suns NBA basketball game. For the baseball game, the perspective wasn’t even from the stands, but rather from the home team’s dugout, ground level, right behind first base. It’s not quite just like being there, but it’s a lot like being there. It’s more realistic than seems possible. You choose where to direct your gaze: at the batter at home plate, at the pitcher, or out in the outfield. Or above the outfielders, at the scoreboard. For the NBA game, the perspective was courtside, right behind the basket. But better than the actual courtside perspective, because the perspective was slightly elevated above seating level. Fully immersive, fully three-dimensional, and seemingly perfectly to scale. Kevin Durant looked about 6'10", right in front of me. Getting the scale just right is obviously the correct way to present this, but it seems devilishly tricky to actually pull off. Apple has pulled it off. These baseball and basketball scenes were shot by Apple using entirely custom camera rigs, and stored in altogether new file formats. This is nothing at all like 2D footage extrapolated into 3D, or just painted on a virtual circular wall around you. It looks real. It seems as profoundly different from watching regular TV telecasts of sports as TV telecasts are from audio-only radio broadcasts.2 It was incredible. I would genuinely consider buying a Vision Pro if the one and only thing it did was show entire sporting events like this.
The first two types of experiences — doing computer “work”, and watching 2D and 2D-ish 3D content, have analogs to existing experiences. Using Safari in VisionOS is like using Safari on a Mac or iPad, but with a different presentation. Watching a movie in VisionOS is just watching a movie, albeit with the completely convincing illusion that you’re looking at an enormous room-filling cinema screen. But the third, these original immersive experiences, have no analogs except to the real world. It’s extraordinary, and only possible because Apple has gotten so many little things so exactly right.
I walked away from my demo more than a bit discombobulated. Not because it was disorienting or even the least bit nauseating, but because it was so unlike anything I’ve ever experienced. It strikes me that in some ways Bob Iger’s cameo during the keynote had the relationship between Apple and Disney backwards. Iger spent his keynote cameo talking about Disney creating original new content for VisionOS’s new medium. But after experiencing it, it felt more like what Disney should want is Apple providing this technology for Disney to use in their theme parks. The sports and dinosaur demos I experienced using Vision Pro were in many ways more immersive and thrilling than tentpole major attractions I’ve experienced at Disney World.
Are you going to want to buy a Vision Pro for $3,500? That price is high enough that the answer is probably not, for most of you, no matter how compelling it is. But are you going to want to try one out for an hour or two, and find yourself craving another hour or two? I guarantee it. You need to see it.3
I mean no disrespect to WatchOS or even tvOS by leaving them off that list. But what I mean here by “major platform” is a platform that could serve as a serious user’s primary or sole computing device. Something you use for any combination of work, entertainment, and content creation. Apple Watch isn’t that, but of course isn’t meant to be that. Apple Watch is sort of a hybrid between a platform (like iPhone) and a peripheral (like AirPods). ↩︎
These sports examples were so exciting, so viscerally compelling, that it makes me wonder whether Apple demoed this, or even described it, in confidence, with NFL executives while negotiating for the rights to Sunday Ticket that the NFL wound up selling to Google for YouTube TV. It suddenly strikes me as a colossal mistake on the NFL’s part that they might be the last major sport to provide their games through this technology that is, for now, exclusive to Apple. ↩︎︎
And hear it. I neglected to mention it until this footnote but the audio experience is quite remarkable as well. Immersive and spatial, yes, but also: Apple isn’t kidding about the way that the audio dynamically adjusts to the acoustics of your real-world surroundings. ↩︎︎