Validation by Description: How We Made Regex Disappear

Walk into any major engineering firm and ask the question: who in this office can write a regular expression from memory?

You'll get one or two hands. Maybe a senior systems engineer. Maybe the developer who maintains the document control sheet. Almost certainly not the project manager who actually knows what a valid asset ID looks like, or the commissioning lead who can recite the equipment numbering convention from a dozen recent jobs.

This is the quiet failure at the heart of most data-validation systems. The people who understand the rules are not the people who can write the rules. So validation either gets delegated to a small group of technical specialists — slow, expensive, a bottleneck — or it doesn't get done at all, and bad data flows downstream until somebody has to clean it up six months later.

We've spent the last several weeks building a validation register inside the Oestler platform. Today we shipped the part that we think actually changes the economics of the problem: you describe the rule in plain English, and Janus AI writes the regex for you.

The Cost of Regex as an Authoring Surface

Regular expressions are extraordinary technology. A single short string can encode a precise pattern that would take pages of conditional logic to express otherwise. Every engineer who has ever needed to validate, parse, or extract structured text has eventually been forced to learn at least the basics.

But the syntax was designed for compilers, not humans. Anchors, character classes, quantifiers, lookaheads, escaping rules, capture groups — each of them is a small puzzle that has to be solved correctly, in sequence, with no feedback until the final string runs against a test case. Get one character wrong and the whole expression silently rejects the data you wanted to accept.

Regex is one of the most powerful tools in software engineering and one of the worst user experiences ever shipped to a non-developer audience.

For an enterprise platform, this matters more than it might in a consumer product. The cost of a single mis-authored validation rule isn't a minor inconvenience — it's an entire site team locked out of saving their shift reports for an afternoon while someone hunts for the engineer who originally wrote the rule and can debug it.

Describe What Valid Looks Like

Open the validation register, click Create rule, and the first thing you'll see in the dialog is a small composer with a wand icon and a single instruction: "Describe the rule and let Janus AI write the regex."

Type a sentence. Anything that captures what a valid value should look like:

"Asset ID must start with AST- followed by between 3 and 6 digits."
"Australian mobile phone number, optional +61 prefix."
"ISO date in YYYY-MM-DD format."
"Inspection report number: two letters, dash, four digits, dash, year."

Click Generate regex. A few seconds later, four things appear:

The regex pattern, autofilled into the form field, ready to edit if you want to.
A draft error message for end users, also autofilled — only if you haven't already written one yourself.
A valid sample value, automatically tested against the new pattern, with a green tick if it passes.
An invalid sample value, automatically tested, with a green tick if it is correctly rejected.

The two sample tests are the part that quietly does most of the work. Generative models are very good at producing plausible regex; they are not perfect. By having the model also produce its own examples, and then running those examples against the pattern in the browser before showing them to you, we turn an opaque AI output into a self-checking artefact. If both ticks are green, the rule almost certainly does what you described. If either is red, you know to look more carefully.

How It Works Underneath

The composer sends a single, tightly-constrained prompt to Janus. The prompt instructs the model to respond with a structured JSON object containing exactly four fields: the pattern, a user-facing message, a valid sample, and an invalid sample. It is given two short worked examples — phone numbers and ISO dates — to anchor the format. It is told explicitly that the pattern body must be a valid argument to JavaScript's new RegExp(...), with no surrounding slashes, no flags, and proper escaping.

Behind the scenes the request is raced across two Google Gemini models — Flash Lite and Gemma — and whichever returns first wins. The response is unwrapped from any markdown code fences the model might have added, parsed as JSON, and then defensively re-extracted by finding the first balanced object if the parse fails. The pattern is compiled with new RegExp in the browser. Only if it compiles do we show it to you.

None of this is exotic. What's interesting is the layering: an LLM produces the candidate regex, the model also produces its own test cases, and the browser executes both before any human sees the result. The AI is treated as a fast first-pass author, not as the source of truth.

Why a Single Rule Type Is Now Enough

This change is also why we've been comfortable consolidating string validation around a single primitive. Earlier drafts of the validation register included separate rule types for length checks, range checks, enumerated values, prefix checks, and so on — each with its own form, its own UI, its own edge cases.

Every one of those collapses into a regex. "Between 5 and 12 characters" is a regex. "Must end in .pdf" is a regex. "One of HV, MV, or LV" is a regex. The reason they normally aren't expressed as regex is the authoring cost — and that's the cost we just removed.

The validation register now ships with two rule kinds, and only two: regex, which covers the entire surface of string-shape validation, and unique, which covers cross-asset collision detection. Two primitives. One AI authoring layer. Every other rule a customer might want is just a description away.

The Wider Pattern

We think this is going to keep happening. There are dozens of features inside enterprise software whose entire user experience is dictated by the fact that the underlying primitive — a query, a transform, a permission, a regex — is too technical for the person who knows the business rule. So the feature gets gated behind a specialist, or it gets simplified down to a watered-down form that doesn't actually solve the problem.

Generative AI is, among other things, a translation layer. It can turn the plain-English statement of a rule into the formal expression a system needs in order to enforce it. When you combine that with cheap, deterministic verification — running the AI's output against the AI's own test cases — you get something genuinely useful: a tool a non-specialist can use, with safety properties a specialist would respect.

The validation register is the first place we've shipped this pattern in Oestler. It will not be the last.

The new authoring composer is live for every project today. Open the validation register, hit Create rule, and start describing.

Validation by Description: How We Made Regex Disappear

The Cost of Regex as an Authoring Surface

Describe What Valid Looks Like

How It Works Underneath

Why a Single Rule Type Is Now Enough

The Wider Pattern

A Validation Register for Project-Wide Data Integrity

Meet Janus: The AI That Actually Remembers Your Work Week

The Construction Industry's Data Problem Is Bigger Than You Think

The Cost of Regex as an Authoring Surface

Describe What Valid Looks Like

How It Works Underneath

Why a Single Rule Type Is Now Enough

The Wider Pattern

A Validation Register for Project-Wide Data Integrity

Meet Janus: The AI That Actually Remembers Your Work Week

The Construction Industry's Data Problem Is Bigger Than You Think

Stay in the loop