Available for AI QA & automation roles

Steven Leon

AI-Focused QA Automation Engineer & Builder

I test, evaluate, and ship AI-powered products — from LLM regression suites to full-stack tools built solo.

0+ years
QA at Vox Media (The Verge)
0+
LLM regression tests shipped
0
Production tools built solo
0
Passing tests in one suite alone
01 — Selected work

What I've built

DropLinx

Project

Drop a link. Get a week of content.

A multi-channel AI content engine that converts any URL into a Social Post Pack, Blog Post, LinkedIn Carousel, UGC Script, YouTube Short, and Newsletter — generated in seconds.

TypeScriptClaude APILovableSupabase

Global History Bot — LLM Eval Suite

Project

Benchmarking 4 frontier LLMs on accuracy, tone, latency, and cost.

A promptfoo-based evaluation framework (built and run in VS Code) that tests GPT-4o, GPT-4o-mini, Claude Sonnet 4, and Claude Opus 4 against a history-bot use case. Uses model-graded closed QA and LLM rubric assertions to score factual grounding and tone, plus hard thresholds on latency (<12s) and cost (<$0.08/request).

4 models • 3 eval methods • pass/fail dashboard
promptfooOpenAIAnthropicYAMLVS Code

QA Practice Projects

Project

Three intentionally broken bugs, built to teach by breaking.

Self-contained Playwright projects reproducing real production bug patterns: a UTM parameter lost mid-redirect, a legally required disclosure stripped by a templating bug, and a flaky test pattern shown side by side with its fix.

PlaywrightTypeScriptGitHub Actions

TokenSavr

Project

A token efficiency dashboard for vibe coders.

Tracks burn rate, compares models, and scores prompt efficiency — built for developers who build with AI tools daily and want to know what they're actually spending.

LovableSupabaseTypeScript

Seed & Berri Social Content OS

Client project — code private

A fully autonomous social content engine for a real fashion brand.

Scrapes trending topics daily, classifies relevance against brand voice, generates captions and imagery with Claude, plans a 7-day calendar, and publishes to Instagram and Threads through the Meta Graph API. Built first for Seed & Berri, architected to scale.

LovableSupabaseClaudeFirecrawlMeta API

MockBook QA Suite

Project

A full sportsbook testing sandbox, built to prove automation depth.

A complete Playwright + TypeScript suite simulating a real sportsbook platform, covering API, SQL, and end-to-end layers with a custom test harness. Built specifically to target iGaming QA roles.

43 passing tests across 3 layers
PlaywrightTypeScriptGitHub Actions
02 — Tooling

What I work with

AI & LLM

ClaudeOpenAIPromptfooPrompt Engineering

Automation & Testing

PlaywrightGitHub ActionsBrowserStackJira

Building

LovableSupabaseTypeScriptJavaScriptReact

DevOps

CI/CDGitHubRelease Gating
03 — Track record

Where I've worked

  1. Oct 2025 — Present

    Boyce Technologies

    Quality Technician

    Manufacturing QA for MTA transit infrastructure projects. Inspects laser-cut and fabricated metal components using precision tools.

  2. Mar 2025 — Jun 2025

    Creative Circle

    Quality Assurance Analyst

    Executed test cases across devices, conducted cross-browser testing with BrowserStack, and tracked bugs in GitHub with reproduction steps.

  3. 2025 — 2026

    Engenious University

    AI Reliability & Conversational QA Engineer

    Built Promptfoo pipelines and ran 150+ structured LLM regression tests with rubric-based scoring and release thresholds.

  4. Oct 2016 — Dec 2023

    Vox Media / The Verge

    Senior QA Analyst

    10 years embedded QA across one of tech media's most-read publications.

  5. Mar 2015 — Feb 2016

    Thrillist Media Group

    QA Coordinator

    QA lead on editorial and digital product releases.

Full resume available on request
04 — Live signal

Still building

Latest open-source activity from my GitHub.

05 — Inbox open

Let's talk

Open to AI QA, automation engineering, and AI product roles. Always happy to talk shop.