← Back to indexNiranjan
Essay · IVJan 20224 min read

A bet against test code

Why e2e test codebases rot, and the founding bet behind Mockingjay.

Before Mockingjay, I worked at a fintech. .NET on the product side, some Ember.js too. The tests were Java, with Selenium. I didn't work in that repo much, but everyone knew it as the thing nobody wanted to own. Someone told me once that the test codebase was bigger than the app, and I assumed they were exaggerating. When I actually checked, it was about twice the size.

The number I remember most clearly is coverage. It started somewhere around 70% and was at 40 by the time I left. It slid a little every quarter. Not because people got lazy, though you could tell the story that way if you wanted to. It slid because everything else always mattered more, and the tests got whatever a team could scrape together on a Friday afternoon.

Our Selenium dependency was pinned two majors behind what the rest of the world was on. Upgrading would have broken half the suite. Upgrading was also impossible to prioritise, because nobody ships a release because of a Selenium upgrade, so the upgrade never happened and the tests got flakier. When a test flaked, fixing it was maybe half a day if you were lucky. More often it got quietly moved to "we'll just check that manually from now on." Nobody decided that in a meeting. It was just what happened. Once enough flows had gone that way, the automation engineers were doing manual QA for more than half their week. What the team actually had was a coverage number nobody trusted, and a repository nobody wanted to touch.

I don't think this was unusual. Almost every e2e codebase I've heard about is somewhere in this state, past a certain age. It's what test code does when nobody's looking after it, which is basically always. Nobody picks up a backlog ticket to refactor the tests, because nobody's job gets easier when the tests get cleaner. They only ever cost time.

So that's the problem. Mockingjay's pitch is that you don't have that code in the first place, so there is nothing to generate and nothing to maintain.

Before getting to the bet, I should be honest about what that trade costs. A well-factored e2e codebase, with a team that actually cares about it, is a better product than whatever we ship. If you know your selectors and you know your page objects, a tweak is one line. Doing that same tweak through a GUI is clicks. For complex flows you are going to feel the lack of a programming language, and probably curse us for it. SaaS lock-in is real. None of that is in dispute.

What I'd push back on is the idea that the well-factored version is the thing most teams actually live with. It is a ceiling, and the economics of test code don't let most teams get anywhere near it. Most teams live on the floor. The floor is the 40% coverage and the two-majors-behind dependency. The pitch is about the floor.

No-code e2e has been tried before. Selenium IDE was the obvious one, and there was TestComplete, plus various startups that didn't make it. They all fell into the same trap. The recorder captured user actions; the tool exported Java, or Python. The output of the tool was a codebase. Which means the output rotted the same way the hand-written version rotted. A generator sitting on top of the same problem.

Mockingjay does not export code. The recorder captures what a user does as a sequence of intents (for example, "Enter xyz in the Password textbox" and "Click on the Login Button"), and at replay time our runtime is responsible for figuring out how to actually carry them out on whatever version of the browser a customer is testing against. If we are right, a test recorded today will still work when the site's markup drifts, because it was never bound to a locator string or a particular browser API in the first place.

That is the bet. Browsers and the tooling around them have moved enough that this is tractable in a way it wasn't when those earlier attempts were built.

It's day one. The product does not exist yet. I will probably re-read this in a year and wince. Anyway, we're building it.

Thanks for reading. Questions, disagreements, or corrections,
.