Project · AI Tooling
A multi-bot pipeline that finds the UX issues automated testing misses — running continuously on every test pass, deduplicating against known bugs, and filing verified issues automatically.
When the offshore manual testing team was replaced by automated test passes, the coverage gap wasn't obvious at first. Automated tests are written to verify behavior — they pass as long as the app does what it's programmed to do. They don't notice when something looks off: a misaligned UI element, a flow that's technically correct but confusing, a rendering issue that only appears in certain states.
Real users notice these things. There was no systematic way to catch them without reintroducing manual review, which was exactly what automation was supposed to eliminate. The question became: how do you get the coverage of a manual tester without the manual tester?
The biggest risk was noise. A pipeline that files inaccurate or duplicate bugs erodes engineering trust fast, and a system engineers learn to ignore is worse than no system at all. Getting it right required iteration: bugs had to be reproduced in a simulator before filing, not just theorized; correct details (description, build info, reproduction steps) had to be consistently attached; and the pipeline had to coexist with existing automated workflows without disruption. That calibration happened through review cycles with engineering and cross-functional partners, adjusting the detection logic until accuracy was high enough to trust.
Decision 01
Bot-per-concern, not one monolithEach stage of the pipeline runs as a separate bot — ingestion, analysis, dedup, filing. This means any stage can be updated, retrained, or replaced independently without rebuilding the whole pipeline. When the analysis bot needed refinement, the filing and dedup logic stayed untouched.
Decision 02
Dedup before testing, not afterThe dedup check runs before simulator testing, not after. Running a simulator test only to find out the issue is already filed wastes compute and time. Checking the bug database first means only net-new issues ever reach the testing stage.
Decision 03
Auto-file with a human review gateIssues are filed automatically but reviewed before being assigned. This keeps the output trustworthy — a fully autonomous pipeline that silently files noise is worse than no pipeline. The review gate stays until confidence is high enough to remove it for specific issue types.
Started with one app and a few features. Now covers two apps with a full feature suite. The pipeline has been running for around two months — long enough to validate the approach and start tuning accuracy.