Project · AI Tooling

UX Gap Detection

A multi-bot pipeline that finds the UX issues automated testing misses — running continuously on every test pass, deduplicating against known bugs, and filing verified issues automatically.

QA AI Tooling Enterprise

RoleSenior QA Analyst

StartedMar 2026

StatusActive · 2 apps

StackPython, Anthropic API, GitHub Actions

Meta · AI Tooling · Pipeline

Auto
runs

AI
analysis

Dedup
filter

Simulator
verify

Auto
filed

Active · 2 apps

The problem

When the offshore manual testing team was replaced by automated test passes, the coverage gap wasn't obvious at first. Automated tests are written to verify behavior — they pass as long as the app does what it's programmed to do. They don't notice when something looks off: a misaligned UI element, a flow that's technically correct but confusing, a rendering issue that only appears in certain states.

Real users notice these things. There was no systematic way to catch them without reintroducing manual review, which was exactly what automation was supposed to eliminate. The question became: how do you get the coverage of a manual tester without the manual tester?

The challenge

The biggest risk was noise. A pipeline that files inaccurate or duplicate bugs erodes engineering trust fast, and a system engineers learn to ignore is worse than no system at all. Getting it right required iteration: bugs had to be reproduced in a simulator before filing, not just theorized; correct details (description, build info, reproduction steps) had to be consistently attached; and the pipeline had to coexist with existing automated workflows without disruption. That calibration happened through review cycles with engineering and cross-functional partners, adjusting the detection logic until accuracy was high enough to trust.

Key decisions

Decision 01

Bot-per-concern, not one monolith

Each stage of the pipeline runs as a separate bot — ingestion, analysis, dedup, filing. This means any stage can be updated, retrained, or replaced independently without rebuilding the whole pipeline. When the analysis bot needed refinement, the filing and dedup logic stayed untouched.

Decision 02

Dedup before testing, not after

The dedup check runs before simulator testing, not after. Running a simulator test only to find out the issue is already filed wastes compute and time. Checking the bug database first means only net-new issues ever reach the testing stage.

Decision 03

Auto-file with a human review gate

Issues are filed automatically but reviewed before being assigned. This keeps the output trustworthy — a fully autonomous pipeline that silently files noise is worse than no pipeline. The review gate stays until confidence is high enough to remove it for specific issue types.

How it works

Step 01

Ingest automated runs

Bots pull the latest automated test results — job outputs and post-run reports across both apps

Step 02

Identify UX gaps

AI reviews run output and flags potential UX issues — rendering problems, broken flows, functional anomalies that automation passes but users would notice

Step 03

Dedup against existing bugs

Candidate issues are checked against the open bug database. Already-known issues are filtered out — only net-new gaps proceed

Known issue

Already filed — skipped. No duplicate noise.

Net new

Proceeds to simulator testing

Step 04

Simulator testing

Net-new issues are reproduced in device simulators and emulators. Manual verification for flows with account limitations or hardware dependencies

Step 05

Auto-file & triage

Confirmed issues are filed automatically — title, reproduction steps, and severity pre-populated. Filed to the correct product area on my behalf

Step 06

Review & assign

Filed issues are reviewed and assigned for prioritization. Working toward full autonomous triage — manual review currently required for edge cases and account-gated flows

Where it stands

Started with one app and a few features. Now covers two apps with a full feature suite. The pipeline has been running for around two months — long enough to validate the approach and start tuning accuracy.

2Apps covered

6Pipeline steps

~2moIn production

More projects

pointd A travel rewards optimizer that shows where your points can take you, across every program. Travel · Rewards QA-Agent Runs on every PR and flags test-coverage gaps before they merge. QA · AI Tooling BIP Connects to your GitHub and Claude Code sessions and writes the build-in-public post for you. Developer Tools · AI