Available for AI QA & automation roles

Steven Leon

AI-Focused QA Automation Engineer & Builder

I test, evaluate, and ship AI-powered products — from LLM regression suites to full-stack tools built solo.

View Projects Download Resume Get in touch →

0+ years

QA at Vox Media (The Verge)

LLM regression tests shipped

Production tools built solo

Passing tests in one suite alone

01 — Selected work

What I've built

DropLinx

Project

Drop a link. Get a week of content.

A multi-channel AI content engine that converts any URL into a Social Post Pack, Blog Post, LinkedIn Carousel, UGC Script, YouTube Short, and Newsletter — generated in seconds.

TypeScriptClaude APILovableSupabase

Live Demo

Global History Bot — LLM Eval Suite

Project

Benchmarking 4 frontier LLMs on accuracy, tone, latency, and cost.

A promptfoo-based evaluation framework (built and run in VS Code) that tests GPT-4o, GPT-4o-mini, Claude Sonnet 4, and Claude Opus 4 against a history-bot use case. Uses model-graded closed QA and LLM rubric assertions to score factual grounding and tone, plus hard thresholds on latency (<12s) and cost (<$0.08/request).

4 models • 3 eval methods • pass/fail dashboard

promptfooOpenAIAnthropicYAMLVS Code

GitHub

QA Practice Projects

Project

Three intentionally broken bugs, built to teach by breaking.

Self-contained Playwright projects reproducing real production bug patterns: a UTM parameter lost mid-redirect, a legally required disclosure stripped by a templating bug, and a flaky test pattern shown side by side with its fix.

PlaywrightTypeScriptGitHub Actions

GitHub

TokenSavr

Project

A token efficiency dashboard for vibe coders.

Tracks burn rate, compares models, and scores prompt efficiency — built for developers who build with AI tools daily and want to know what they're actually spending.

LovableSupabaseTypeScript

Live Demo

Seed & Berri Social Content OS

Client project — code private

A fully autonomous social content engine for a real fashion brand.

Scrapes trending topics daily, classifies relevance against brand voice, generates captions and imagery with Claude, plans a 7-day calendar, and publishes to Instagram and Threads through the Meta Graph API. Built first for Seed & Berri, architected to scale.

LovableSupabaseClaudeFirecrawlMeta API

MockBook QA Suite

Project

A full sportsbook testing sandbox, built to prove automation depth.

A complete Playwright + TypeScript suite simulating a real sportsbook platform, covering API, SQL, and end-to-end layers with a custom test harness. Built specifically to target iGaming QA roles.

43 passing tests across 3 layers

PlaywrightTypeScriptGitHub Actions

GitHub

02 — Tooling

What I work with

AI & LLM

ClaudeOpenAIPromptfooPrompt Engineering

Automation & Testing

PlaywrightGitHub ActionsBrowserStackJira

Building

LovableSupabaseTypeScriptJavaScriptReact

DevOps

CI/CDGitHubRelease Gating

03 — Track record

Where I've worked

Oct 2025 — Present
Boyce Technologies
Quality Technician
Manufacturing QA for MTA transit infrastructure projects. Inspects laser-cut and fabricated metal components using precision tools.
Mar 2025 — Jun 2025
Creative Circle
Quality Assurance Analyst
Executed test cases across devices, conducted cross-browser testing with BrowserStack, and tracked bugs in GitHub with reproduction steps.
2025 — 2026
Engenious University
AI Reliability & Conversational QA Engineer
Built Promptfoo pipelines and ran 150+ structured LLM regression tests with rubric-based scoring and release thresholds.
Oct 2016 — Dec 2023
Vox Media / The Verge
Senior QA Analyst
10 years embedded QA across one of tech media's most-read publications.
Mar 2015 — Feb 2016
Thrillist Media Group
QA Coordinator
QA lead on editorial and digital product releases.

Full resume available on request

04 — Live signal

Still building

Latest open-source activity from my GitHub.

05 — Inbox open

Let's talk

Open to AI QA, automation engineering, and AI product roles. Always happy to talk shop.

Contact LinkedIn GitHub

Steven Leon

What I've built

DropLinx

Global History Bot — LLM Eval Suite

QA Practice Projects

TokenSavr

Seed & Berri Social Content OS

MockBook QA Suite

What I work with

AI & LLM

Automation & Testing

Building

DevOps

Where I've worked

Boyce Technologies

Creative Circle

Engenious University

Vox Media / The Verge

Thrillist Media Group

Still building

Let's talk