Engineering

How we use AI to drive Playwright sessions

Ze Atalaya·Feb 20, 2026·12 min read

The core of Zantiq by Nomina is an AI orchestrator that pairs OpenAI with Playwright. The model decides what to do. Playwright does it. Here's how the loop works.

When you kick off a test, you give it a target URL and an instruction — something like 'Sign up for the app using email, verify your phone number, complete onboarding, and report what happens.' That instruction goes to the model along with context about the tester: their email, phone number, browser profile, and any previous test history.

The model generates a step-by-step plan. Not a fixed script — an adaptive plan that adjusts based on what it sees. Step one might be 'navigate to the signup page.' The model gets a screenshot and the page's accessibility tree after each step. It reads what's on screen and decides the next action.

This is where it gets interesting. If the app shows a phone verification modal that wasn't in the plan, the model recognizes it, checks the tester's phone inbox for an SMS code, reads it, and enters it when available. If a login screen appears, it can use saved credentials attached to the tester. If something unexpected happens — an error banner, a redirect, or a manual challenge — it requests help or documents the blocker in the QA report.

Each step produces a record: what the model intended to do, what it actually did, a screenshot, the page state, and how long it took. These records become the test report.

The prompt engineering matters a lot. We've iterated on the system prompt extensively to handle edge cases — apps that use iframes for auth, multi-step wizards, dynamic form validation that blocks submission. The model needs specific instructions about retry logic, timeout handling, and when to mark a step as failed versus when to try an alternative approach.

We use Playwright's browser contexts with GoLogin profiles for persistent browser state. Each tester gets its own browser context with a consistent fingerprint configuration — canvas hash, WebGL renderer, timezone, language, and screen resolution. The goal is not to promise that every anti-bot system will allow every run; the goal is to avoid throwing away useful QA context between sessions.

Simple flows can finish quickly; complex onboarding, external verification, and multi-user flows take longer. We're working on parallelism — running multiple testers through different flows simultaneously. That's coming in the group testing update.

Get updates

New posts and product updates. No fluff, no spam.