The two credible options in 2022 were ship a Chrome extension and have the user install it, or run the browser yourself somewhere else and hand them a window into it. Most of the competitors in the space were doing the first. A couple of the tools we'd demoed against shipped extensions, and most of the recorder startups we read about did the same. The extension route is cheap, and the user gets something that feels like their own Chrome, because it is.
We went the other way, and the rest of this post is the argument for why that was worth it, plus the part where I admit what it cost.
Our runtime is a container that launches a real headed browser with Playwright. Chromium by default, Firefox and WebKit as options, Edge wired in through Playwright's msedge channel. The browser runs non-root in the container, and the user never touches the process directly. They see it through noVNC, streamed into the dashboard. The vanilla noVNC client handles reconnects and clipboard events in ways that didn't survive a real product, so we ended up rewriting a small part of the RFB layer ourselves a few months in.
Every page the browser loads has our event interceptor injected into it. An extension could do that too, since installing one is consenting to exactly that kind of access. The harder thing an extension can't escape is everything else a user's Chrome has on any given day. Their cookies, their extensions, their autofill, their profile. An ad-blocker in their browser swallows a request the recorder needed to see. A grammar checker rewrites a text field mid-keystroke, or a password manager fires an input event the user didn't type. You can tell them to record in a fresh Chrome profile, but nobody does, and the test they just recorded was pinned to whatever their browser looked like that morning.
Clicks and inputs are easy for the interceptor. Everything else involves some amount of stitching. Iframes come back from the browser in fragments with no ancestor chain attached, and rebuilding it is yours to do. Focus-and-input events land as three when the user meant one. An extension has to do most of this work too, with a bit less of the browser surface to reach for. Because our interceptor is running inside the same Playwright page we already have a handle on, we can pull a screenshot and a DOM bounding box out of the same frame and drop them into the same event. Visual validations ride on that, pixels and DOM on the same frame the click happened.
Running a browser per session costs us actual money, in a way an extension running on the user's laptop does not. Cold start is a budget we fight with, because a container with a Playwright browser is not instant and users notice. The stream itself adds latency of its own, since the browser is in some region that isn't their couch. It doesn't feel exactly like your own Chrome, and anyone who's used a cloud IDE has had that feeling before. I'd be lying if I said those weren't fair things to hold against us.
The reason we took the trade anyway is that almost everything we want to build next is easier on a browser we own. Step replay and assertions, say, or the runs we'd want to do without a human watching. Those are harder when you're a guest in the user's Chrome. An extension is the cheap answer to where the browser should live, and that's enough for tools where the browser is the whole product. We weren't building one of those.
It's more to build and run than an extension would be, and we decided that was the trade we wanted.