Tyrien Jones

I run AI agents like a crew. I’m the foreman. They do the work — I call the shots and catch the mistakes.

Geraldton, Western Australia · tyrienjones@gmail.com · GitHub: tyrienjones-tech

Who I am, and how I work

I started writing code in February 2026. Before that I’d never written a line.

I didn’t come up through a computer science degree, a bootcamp, or a tech job. I came up on the tools — mining, fishing, hospitality, retail, farming. Years on the frontline. I’ve seen how real businesses actually run, how a crew actually works, how a foreman keeps a job moving, and how things break when nobody’s paying attention.

Three months after my first line of code, here’s where I am: a working hardware-health tool I use on my own machine every single day, a multi-agent AI safety system, a desktop control-plane app, an audit on someone else’s codebase, and a stack of earlier projects behind it — some finished, some half-built, all dated. I’ll show you the dates. Nothing here is dressed up.

Here’s the part that actually matters, and it’s the thing I’m selling.

I run AI agents the way I used to run crews. I treat them like employees. They’re accelerators, not replacements. I’m still the one steering, still the one catching their mistakes, still the one signing off before anything ships. The agents themselves are off the shelf — Anthropic’s. What’s mine is the way I run them: I give each one a name, a narrow job, a written role prompt, and a checkpoint it can’t skip, and I keep myself in the loop as the final call. I’ve done that across six different projects. It’s not a slogan — it’s prompt files and workflow rules sitting in folders on my machine. I’ll show you those too.

A client hiring me isn’t hiring a junior coder catching up to other juniors. They’re hiring an operator who knows how to direct AI work and has three months of receipts.

Yes, I used AI to make this. That’s the point.

Someone’s going to ask whether I really made this myself, or whether the AI did it. It’s the wrong question, so I’ll put the answer up front.

I used AI to build this page. I use AI on everything I can. And I read every word, checked every claim, and signed off on all of it — the same way I do with code, the same way a foreman signs off on a crew’s work before it leaves the site. That’s not hiding behind a tool. That’s the job. The skill was never typing every character myself. The skill is directing the work, catching what’s wrong before it ships, and standing behind what goes out the door.

I call it agentic living: use the best tools wherever they help, and stay the one in charge. If you want someone who refuses to touch AI so he can call himself pure, that’s not me. If you want someone who uses it hard and still owns every line — that’s exactly me. This page is the proof of it. I made it the way I’d make yours.

The rules I run the crew under

These aren’t theory. They came from real jobs where something went wrong and I wrote down the lesson so it wouldn’t happen twice.

Working version first, polish second. Ship the plain, mechanical, operator-checked version first. Make it nice after it works.
One thing at a time. On staged work, I stop between batches and wait for the go-ahead. I don’t let an agent run straight through, even on a clean job.
Test the fix against real data before I write it. For any non-trivial bug, I replay it against what’s actually on disk, check the data flow, and stress the design — before a line of the fix gets written.
Five failed fixes means the problem isn’t the code. If the same fix keeps not working, the thing I think is broken isn’t the thing that’s broken. I step back instead of swinging again.
Hostile-input test before “done.” Anywhere a user or a file can put text into the system, I throw bad input at it on purpose before I call it shipped.
Say up front whether a file is safe to delete. Every scratch file gets that call made out loud, so nothing important goes in the bin by accident.
Write it down so the next person can pick it up cold. Human or agent. Handover docs, current-state files, decision logs, audit trails. The next person on the job shouldn’t have to ring me to start.

If you’ve worked a site, you’ll recognise the shape of all of that. Method statements, sign-offs, RFIs when you’re blocked, a checkpoint you can’t skip. I run software the same way.

Featured work

I picked three to show in full, then two more I’m running right now. I’ve split every one the same way, so the lines are clear and you can scan it:

The job — what it is, in plain terms.
What’s mine — the part I built.
What I wired in — other people’s tools I integrated. Not my code. I pick them, read the docs, pin a version that works, and write the glue.
How I ran it — the agents and the checkpoints.
Why it counts — what it proves I can do for you.

One thing that pattern makes obvious: the projects aren’t all the same kind of work. PC_HEALTH is mostly wiring good tools together. ECO is mostly my own code. L.I.LY is both. That’s the range.

PC_HEALTH leads because it’s finished and I use it every day.

PC_HEALTH — a longevity monitor for Windows PCs

Finished. In daily use on my own machine. MIT licensed.

This one’s smaller and looser than L.I.LY, and that’s on purpose — a focused tool for one clean problem, not a months-long build. Different jobs need different shapes.

The job. Watch the health of a Windows PC over time — drive wear, temperatures, GPU, and system errors — and keep a timestamped record so you can see trouble coming before a drive dies on you. Read-only by design. It never writes to firmware, never phones home, never sends your data anywhere.

What’s mine. The layer that sits on top of the sensors: the part that polls four separate sources on a sensible schedule, decides what counts as healthy versus worrying, writes clean timestamped JSON snapshots, and — the key piece — hands all of it to an AI through an MCP server with five typed, read-only tools (get_current_health, get_recent_live, get_smart_history, get_recent_events, list_drives). I also wrote the packaging and the docs that ship with it — a README that walks setup, AI integration, architecture, and dependency versions, plus a documented thresholds file for CPU and drive cutoffs with examples for per-model overrides. Docs built along the way, not bolted on.

What I wired in. The actual readings come from other people’s open-source tools, not mine: LibreHardwareMonitor reads the CPU, GPU, and board sensors; smartmontools (smartctl) reads drive SMART data; nvidia-smi reads the GPU; the Windows Event Log handles system errors; and the whole thing speaks the Model Context Protocol (MCP). I didn’t build a hardware monitor. I built the thing that turns four separate hardware tools into one clean stream an AI can read.

How I ran it. A solo build, operator-directed, AI-assisted under the rules above — read-only chosen on purpose, hostile-input tested, every dependency and version written down before I called it shipped.

Why it counts. I can ship working software end-to-end on a real stack, with clean separation between the readers, the aggregator, and the server. An honest README that lists every dependency. The right boundaries picked on purpose. Real software a client can run today.

Python 3.11+, on top of LibreHardwareMonitor, smartmontools, nvidia-smi, the Windows Event Log, and MCP. GitHub: github.com/tyrienjones-tech/PC_HEALTH_MCP

L.I.LY — a personal AI stack with a real safety crew

v0.9. About three months of continuous work. 139 unit tests passing.

The job. A personal AI system that runs local. I don’t call it a chatbot, a persona, or a character — I treat it as its own category, what I call an AP (Artificial Person). It has voice, memory across sessions, and a safety crew watching it. The audio stays on my PC and my phone, and nowhere else.

What’s mine. The whole thing, top to bottom — the server, the web interface, the ledgers, and the glue between them. The internal safety crew is three named agents (WARD and KEEP, with a third called GATE planned). WARD itself runs in three tiers: a deterministic floor with rules that can’t be talked around (the rule file is hash-locked, so tampering gets caught), a faster classifier above that, and a heavier interpreter on top for the hard calls. The core systems — capture, long-term memory, voice handling, journaling, observation, and mid-turn guidance — are mine too, built as separate pieces that work as one.

What I wired in. The voice runs on XTTS-v2, faster-whisper, and silero-vad — other people’s machine-learning models — on PyTorch with CUDA. Phone access goes over Tailscale so it never leaves my own network. The encrypted backups use Restic. The AI calls go to Anthropic’s API. None of those are my code. One piece of integration discipline worth pointing at: one voice library breaks on a newer version of another dependency, so I pin it on purpose and I wrote down why — the kind of thing that bites you six months later if nobody recorded it.

How I ran it. L.I.LY isn’t only the product — it’s the workspace I work in. I set up the structure inside it the way you’d lay out a job site: a shared folder split (one side for design docs, plans, audits, and handovers; the other for live code), continuity rules so a fresh session can pick up cold from where the last one stopped, decision logs, and a memory layer for the development work itself. That’s the part that lets a three-month build hold together instead of unravelling every session. The features ship in staged batches with hostile-review gates between them, not in one run. The safety-hardening sweep that left 139 tests passing was the last batch on top of all that.

Why it counts. I can carry a long, architectural build over months without it falling apart, stand up a genuine multi-agent safety system, and integrate a messy multi-tool stack while keeping the versions straight.

Python (FastAPI) and a web interface I built, integrating Anthropic’s API, XTTS-v2, faster-whisper, silero-vad, PyTorch + CUDA, Tailscale, and Restic.

ECO — a control plane for running a crew of AI agents on long, complex builds

My biggest project, and the one I’m proudest of. In active development since March 2026 — 267 commits deep, 661 Rust tests passing. It’s not finished, and I’m not going to pretend it is. That’s the point of putting it here.

The job. When a build runs long and the work gets complex, it goes sideways in the same few ways every time. Different AI models end up touching the same code — ChatGPT, Claude across versions, Sonnet, the agents I’ve named — and it still has to read like one hand wrote it. The truth about where things stand gets scattered across chat windows that vanish the moment you close them. And the second you let the agents run themselves, you lose the thread on what was actually checked. ECO is the control plane that keeps all of that in line: one verified place where the truth lives, the human as the final word on every commit, every decision, every risk, verification proofs treated as real artifacts with their freshness tracked, and a bounded run → write-back → re-run → mark-verified lifecycle that’s approval-gated, not turned loose. No off-the-shelf tool does this. That’s why I had to build it.

What’s mine. Nearly all of it — the Tauri desktop shell, the Rust backend, the SvelteKit front end, the SQLite store with full-text search, the indexer that reads my existing folders without moving a file. It all runs local, on my machine, not someone’s cloud. And the governance itself, which I’ve now built twice. The first version was the rulebook written as Markdown docs and prompts, with the AI told to read them and behave. That works partway — but rules in a doc can be skipped, forgotten, or argued around. So the version I’m building now codifies the same rules into the Rust itself: the sign-offs, the tier classification, the bounded lifecycle, the proposal-to-write-back flow, the freshness tracking. Once it’s in the binary, the agent can’t talk its way past it — it has to go through it. Rewriting governance from “Markdown the AI is supposed to follow” to “Rust the AI has to pass through” is slow work, and it’s why ECO is still going.

What I wired in. Less than the other projects — but not nothing. There’s a provider layer so ECO can run against mock AI or live, and swap between vendors: Anthropic is the daily driver, but there’s a proven live run on OpenAI too (VERIF-182), so this one isn’t tied to a single AI company. Past that, the code is mine.

How I ran it. The agent setup isn’t a quirk — it’s the shape complex work needs. Any well-run shop splits it the same way: someone plans, someone builds, and the boss signs off on both. ECO is the desktop app that lets me work like that with AI. I run two of Anthropic’s agents under role prompts I wrote — one I named Talos for strategy and audit, one I named Effy for the build — and I hold final say over both. The agents are off the shelf; the framework around them — the roles, the separation, the rules, the checkpoints — is mine. Every task goes through that checkpoint discipline, and the pre-commit review is a hard stop. Nothing gets committed without my sign-off.

The receipts. 267 commits since March. One example of the discipline: I cut the app’s main screen from 8,912 lines down to 2,043 — across thirteen passes, each one regression-proven before I moved to the next, no “trust me, it still works.” And the whole governed lifecycle — run, write back, re-run, mark verified — is proven end to end, under both a mock AI and a live one. That’s not a demo. That’s the thing working.

Why it counts. It’s the biggest thing I’ve built, it’s not done, and it’s solving a problem I couldn’t find solved anywhere else. Building it is how I learned to keep a long, multi-agent job coherent, keep a human in charge of every call, and prove what’s been checked instead of hoping. If a client’s got a build that’s going to run for months across more than one AI, this is the kind of control I bring to it.

Tauri, Rust, SvelteKit, and SQLite with full-text search.

Also running right now

ETS2LA audit — the bugs are in the patterns, not the language. ETS2LA is a self-driving plugin for Euro Truck Simulator 2 — a four-year-old Python project the upstream team is now rewriting in C# because of long-running performance and stability issues. My read on it: those are design-pattern bugs, not language bugs. They’ll come back in C# unless the patterns get fixed first. This campaign is the proof, one bug at a time.

This is local work, not an upstream-PR effort — and there’s a reason for that. Upstream has a flat ban on AI-written code (they’ve closed eight PRs over it), and my workflow leans on AI hard. So I work inside the constraint instead of around it: I hand-type every diff into VS Code myself, run the in-game checks myself, and the fixes stay on my machine. The mod is theirs — GPL v3, every line of it. What’s mine is the campaign and the argument it makes. It’s a skill-test and a learning project at the same time, and I’ll say so plainly.

The report opens with the rule I wrote for it: “write like you’re explaining the fix to a foreman, not pitching to a board.”

Two bugs closed so far:

Arbitrary code execution through the settings file (ETS-021/022). The loader was running pickle.loads(eval(value)) on text it read straight out of a SQLite file — so anyone who could write to that file could run Python on the next launch. I replaced it with json.loads and took both eval and pickle out of that path. Verified with hostile-input round-trip tests and a real in-game window-drag. The same hole comes back in C# through Roslyn scripting or BinaryFormatter — Microsoft has explicitly warned against BinaryFormatter for this exact reason. The language doesn’t save you; fixing the pattern does.
Window position that never saved (ETS-023). A function called check_if_window_still_open was polling the OS for the window position and writing it down — but nothing in the codebase ever called it. Dead since at least August 2025. A two-line fix wired it back into the existing 10Hz tick. Verified in-game. No compiler flags this one — catching it in the C# port means a human walking the call graph by hand, which is exactly the kind of thing a rewrite skips.

How I run it: four roles, not two — and except for me, they’re external AI under role prompts I wrote. A Director (Talos) sets strategy and scope, reached only through me. An Auto Drive Agent does the building — it probes the repo read-only and drafts the diff text in the reports, but it never edits the code, never commits, never opens a PR. An Auto Drive Strategist (a separate Claude session) gives a second opinion each cycle. And I’m the operator: I route between them, make the calls, hand-type every diff, and run the in-game verification. The paper trail is the same jobsite kit I use everywhere — a Kanban board (INBOX → ACTIVE → AWAITING_OPERATOR → VERIFY → COMPLETED, plus a HALTED lane), an append-only JSONL ledger that logs every state change, session logs, handover and context-pack files, and a rolling current-state file.

Why it counts: reading someone else’s code at scale and explaining why it’s wrong, not just what’s wrong. Holding a real technical argument — the C# point — against a four-year codebase and backing it bug by bug. And working inside a hard limit instead of around it: the AI-code ban meant hand-typing every diff and taking the slower path on purpose. Two bugs closed so far. It’s focused work, not a cathedral — and that’s exactly what it needs to be.

UnoAi — a browser-based AI companion, built for ownership. The job: buy once, bring your own key, conversations stay on your device, self-hostable. What’s mine: the app and the strict procedures around it. How I ran it: a written rule set, a pre-commit hook that runs secret-scanning and linting, a method statement required before any file changes, sign-offs with proof, and an RFI when something’s blocked — jobsite paperwork applied to software, which is exactly why I use it. SvelteKit and TypeScript, PolyForm Noncommercial license. Phase 0b complete.

The arc — three months, dated

I’m showing the early stuff on purpose. These are first attempts, not shipped products, and I’m not going to pretend otherwise. The story is the climb.

Feb 17, 2026 — Earliest dated file on my machine: an audio archive from before I started coding. (I made music first.)
Feb 19 — First Python I wrote: an early audio engine and an LLM bridge script. These survive only as zip snapshots now, not a live project — so I’m calling them what they are: where it started.
Feb 22–28 — First real week. Audio tools (a music player with a beat-reactive visualiser, a visual playlist tool, an audio-video mesh), a job-hunting web app — the first version of what I’d iterate through into HuntOS v2, then v3 and v4 in the weeks after — and a couple of skeletons I started and didn’t finish. Even these first ones shipped with READMEs, troubleshooting docs, and a plug-in structure. The documentation habit was there from day one.
March — Prototypes get serious. Stability Forge — first started in late February alongside the audio tools, but in early March it got my first written-down agent crew: Alex on the engine, Sarah on the wrapper, and a read-only auditor I called TALLAS. That role-separation method has shown up in every project since. Alongside it: HuntOS v3 → v4 (the job-hunting platform from week one, now with named agents David and Heather), Founder Brain (a small, clean service with a local database), a deterministic combat simulator with replay and seeded randomness, plus a behavioural-intelligence framework, an author-enrichment pipeline, a beat generator, and a couple of music tools.
Mar 27 — First commit on ECO. The folder was opened a week earlier on Mar 19 — I didn’t start tracking it in git until Mar 27.
Apr 30 — Set up the two-role agent rotation (Talos and Effy) on top of ECO.
May 17 — Shipped PC_HEALTH. In daily use ever since.
May 26 — L.I.LY hit v0.9, 139 tests passing after a safety-hardening sweep.
May 27 — ETS2LA audit report delivered.

The March prototypes mostly didn’t ship to production, and I won’t say they did. What they show is the climb — and that the way I run AI agents now didn’t appear from nowhere. The role-separation method has a March prototype behind it. You can watch the same idea get sharper across every project.

How I work with AI — the crew

This is the spine of everything above. I don’t treat an AI like a magic 8-ball you shake and trust. I run them like a crew: I give each one a name, a narrow job, a written role prompt, and a checkpoint it can’t skip — and I stay in the loop as the foreman who signs off. Let me be straight about what that is and isn’t. The agents are Anthropic’s, off the shelf. What’s mine is the method around them — the role separation, the workflow rules, the prompt files, the checkpoints. That’s light scaffolding, not a deep technical build. The one real exception is WARD, inside L.I.LY: that’s a multi-agent safety system I wrote in code, with 139 tests behind it. Here’s the bird’s-eye view.

When	Project	The agents I named, and their jobs	What’s mine in it
March	Stability Forge	Alex (engine), Sarah (wrapper), TALLAS (read-only audit)	The role split and the workflow rules
March	HuntOS 4.0	David, Heather	The role prompts I drafted
March	Combat sim	Kade Orion (systems builder)	The role prompt and the scope
Apr–now	ECO	Talos (strategy and audit), Effy (build)	Role separation, prompt files, five checkpoints
May–now	L.I.LY	WARD, KEEP, GATE (planned) — the internal safety crew	The code, end to end — the safety crew plus the dev infrastructure that holds the build together
May	ETS2LA	Director (Talos) for strategy, Auto Drive Agent for the work, Auto Drive Strategist for second-opinion review	The four-role split, the workflow, the routing, and hand-typing every diff into VS Code myself per the upstream AI-code ban

Same shape every time: agents I name and scope narrowly, the operator holding final authority, and the rules written down. The agents are accelerators. The judgment is mine. When one gets it wrong — and they do — I’m the one who catches it before it ships, because the checkpoints that catch it are mine.

I’m not locked to one vendor, either. I mostly run Claude models day to day, but I’ve worked across Claude Code, Cursor, Cline, and Windsurf. I pick the tool for the job. The work is AI-assisted; the calls are mine.

The stack — what I write, and what I wire together

Languages I write and read in

I’m not claiming to be an expert in nine languages after three months. What I’m claiming is that I can move between stacks fast enough to ship in them and to audit them.

Language	Where I’ve used it
Python	L.I.LY, PC_HEALTH, the ETS2LA audit, most of the March work
Rust	ECO’s backend
TypeScript	ECO and UnoAi
Svelte	ECO and UnoAi
JavaScript / HTML / CSS	Web interfaces across L.I.LY, ECO, UnoAi
PowerShell	Backup scripts, the PC_HEALTH event-log piece
Bash	The UnoAi validator script (`scripts/validate.sh`). One file, but it does real work — secrets-scan + Prettier + ESLint + cspell + structural validation as a pre-commit gate.
SQL	Databases in ECO, Founder Brain, and others
C# (read, not written)	Reading a game’s runtime to plan mod work

What I integrate — other people’s tools, wired together right

A lot of what I ship sits on top of open-source software other people wrote. I’m straight about that, because picking the right tool, reading its docs, pinning a version that works, writing the glue, and documenting how to put it back when it breaks — that’s a real skill, and it’s one a lot of “I built it all myself” portfolios quietly skip. Plenty of clients have been burned by someone who rebuilt a worse version of a solved problem instead of using the tool that already works.

These are tools I integrate. They’re not my code. I wire them in and I keep them running:

AI / LLM: Anthropic’s API and the Model Context Protocol (MCP)
Networking: Tailscale (private network access, so things never leave your own machines)
Backup: Restic (encrypted snapshots, with a retention and restore routine I wrote and tested)
Hardware sensing: LibreHardwareMonitor, smartmontools, nvidia-smi, the Windows Event Log
Voice / audio ML: XTTS-v2, faster-whisper, silero-vad
Compute: PyTorch with CUDA
Web frameworks: FastAPI, SvelteKit, Tauri
Storage: SQLite with full-text search, Supabase
Hosting: Cloudflare Pages
Quality and security: Playwright, Vitest, ESLint, secret-scanning

The discipline that isn’t code

Some of how I work doesn’t show up as code at all, but it’s where the blue-collar side earns its keep:

Two backup systems, both running. Daily automated snapshots of ECO since 2026-04-28 (31 dated snapshot folders on the backup drive, set-and-forget). Separately, an encrypted Restic system for L.I.LY plus my private vault — a wrapper script I wrote in PowerShell, ACL-locked passphrase, retention policy (14 daily / 8 weekly / 12 monthly), and a documented disaster-recovery flow I’ve walked through. Two different jobs, two different shapes, both real.
Verification logs and decision logs as real documents, not afterthoughts. The last verified entry is the ground truth. Every decision has a record.
Method statements, sign-offs, RFIs, inspections — jobsite paperwork applied to software. When the work’s blocked I raise it; when it’s done I prove it.

Let’s talk about the job

I’ll be straight with you: I haven’t had a paid client yet. I’m after the first one. No fake reviews, no borrowed logos, no made-up “trusted by” line on this page — when I’ve done paid work, you’ll see it here, and not before.

What I can do is show you working software instead of talking about it. If you’ve got a job — a desktop tool, an AI integration, automation for a small business, an audit of code someone left you — I’ll show you the code, walk you through how I’d run it, and tell you straight what I can and can’t do.

Geraldton, Western Australia · tyrienjones@gmail.com · GitHub: tyrienjones-tech