Compiler as Harness
Agenda
- 10:00 — Welcome
- 10:01 — Introductions
- 10:10 — AI news of the week & discussions
- 11:50 — Photos & clean up
What we talked about
- SpaceX buying T-Mobile
- T-Mobile’s mobile offerings — pros and cons
- Master’s in AI — what does it actually entail?
- What is a transformer — and what GPT stands for
- Garry Tan’s gstack
- GPT Pilot
- Claude Code’s /goal command
- Prisma drops Rust for TypeScript
- Rust for agentic coding
- NixOS
- Firecracker
- nix-darwin
- Kansai Robot World
- EyePop.ai
- Micro AGI
- DSLs for agentic projects
- Run agentic coders in a VM
- LSP — if you’re building an editor
- BDD with Cucumber and Playwright
- Jest
- WordPress MCP server
SpaceX buying T-Mobile
Another week, another Musk acquisition rumour — straight on from last week’s “SpaceX buys Cursor” thread. A TD Cowen analyst floated that if SpaceX can’t land a wholesale MVNO deal to push Starlink into consumer mobile (the big US carriers have reportedly refused), buying T-Mobile outright becomes the “clear choice” — a deal that could run to ~$320B with debt. T-Mobile already runs the Starlink satellite-to-cell partnership, so it’s the obvious target. It’s analyst speculation, not a reported deal. Open question — is owning the carrier the only way to break the spectrum chokehold, and how many more of the tools and pipes we depend on end up inside one Musk stack? — Could SpaceX Buy T-Mobile? — 24/7 Wall St. — Seeking Alpha
T-Mobile’s mobile offerings — pros and cons
Naturally led into weighing what T-Mobile actually sells, with the T-Satellite (Starlink direct-to-cell) feature front and centre. Pros — your normal phone becomes an always-on satellite messenger with no aiming, the low-orbit constellation beats Globalstar/Iridium on reliability, free emergency-911 texting for everyone, and it’s bundled on the top plan or $10/mo otherwise. Cons — needs a clear view of the sky (no good indoors, in canyons, or under tree cover), data is throttled and many apps don’t work, native voice is still in beta, and your “lifeline” is only as tough as your phone’s battery and glass. Open question — is satellite-to-cell a genuine reason to switch carriers yet, or still an emergency-only novelty until voice and indoor coverage land? — T-Satellite with Starlink — T-Mobile
Master’s in AI — what does it actually entail?
The room asked what a “Master’s of AI degree” actually means and what you’d do in one. The common shape — roughly 30 credit hours, around ten courses: a handful of core classes (machine learning, deep learning, NLP) plus a concentration and a capstone, with a sideline in AI ethics, explainable AI and policy. So it’s a formal grounding in the maths and methods, not a tooling bootcamp. The pointed bit for this table of self-taught, agent-wielding builders — when you can ship real AI work by orchestrating models and reading docs, does the credential still buy you anything, or is it for the theory and the doors it opens rather than the day-to-day skill? — MS in Artificial Intelligence — UT Austin — AI MS curriculum — Columbia Engineering
What is a transformer — and what GPT stands for
A back-to-basics moment for the room — unpacking the architecture under all of this. The transformer is the neural-network design from Google’s 2017 “Attention Is All You Need” paper, whose self-attention mechanism lets a model weigh a whole sequence at once instead of word-by-word, which is what made today’s giant models trainable. And GPT just spells it out — Generative Pre-trained Transformer: generative (it produces text), pre-trained (on a huge corpus first, then fine-tuned), transformer (that same architecture, decoder half). A 3Blue1Brown video came recommended as the way to actually see it — a visual, intuition-first walk through what’s happening inside. Open question — how much of the “attention” intuition do you actually need to build well with these tools, versus treating the model as a black box and learning it by prompting? — 3Blue1Brown — visual intro to transformers — Attention Is All You Need — Wikipedia — What is GPT? — IBM
Garry Tan’s gstack
YC’s CEO open-sourced his exact Claude Code setup — 23 opinionated tools that play CEO, designer, eng manager, release manager, doc engineer and QA. It rocketed to ~20k stars, then drew as much hate as love. The substance gripe: it’s “a bunch of prompts in a text file” that plenty of Claude Code users had already rolled themselves, riding Tan’s YC platform rather than novelty. The tone gripe is what the room latched onto — a viral “god mode” tweet (a CTO friend claiming gstack found a critical XSS his team missed) plus SXSW talk of “cyber psychosis” and four hours’ sleep read as un-humble hype. The room’s read: vanilla Claude Code generally beats a setup buried in skills — Anthropic’s own updates tend to outpace whatever you’ve bolted on, so a big skill pile quietly becomes legacy cruft fighting the latest model. Open question — strip away the swagger and the founder clout, is there anything in gstack worth lifting into your own setup, or is the lesson just that a good CLAUDE.md and a few subagents get you most of the way? — github.com/garrytan/gstack — Why it got so much love and hate — TechCrunch
GPT Pilot
A throwback that came up — Pythagora’s GPT Pilot, one of the early “first real AI developer” wrappers around GPT, billed as coding a production app step by step with a crew of specialised agents (spec writer, architect, tech lead, developer, code monkey) and a context filter that only fed the LLM the relevant files. Ahead of its time on the multi-agent, plan-then-build idea — but now reads as legacy next to Claude Code and the harnesses that ate its lunch, a neat callback to the gstack “vanilla beats bolted-on” thread above. Open question — did the wrapper era teach us anything the model-makers haven’t since absorbed, or was every clever scaffold always destined to be a stopgap until the base tools caught up? — github.com/Pythagora-io/gpt-pilot
Claude Code’s /goal command
Sits right against the “vanilla beats bolted-on” thread — a native feature rather than a wrapper. /goal flips Claude Code out of turn-by-turn mode into a persistent agent that keeps working toward a completion condition you state, with a small fast model checking the transcript after each turn and kicking off another turn until the condition holds. The whole trick is a verifiable end state — “npm test exits 0”, “git status is clean”, a file count — plus constraints on what mustn’t change along the way; manage it with /goal stop|pause|resume|clear (needs v2.1.139+ and a trusted workspace). Open question — when you can hand Claude a finish line and walk away, where’s the line between a goal worth automating and one you still want to babystep turn by turn? — Keep Claude working toward a goal — Claude Code Docs
Prisma drops Rust for TypeScript
The against-the-grain one — everyone else rewrites in Rust for speed, and Prisma 7 went the other way, ripping out its Rust query engine for a pure TypeScript + WASM one. The twist is they got faster doing it: ~3x quicker queries and 90% smaller bundles, because the old per-OS native binaries were the real drag — hard to contribute to, painful to deploy, and incompatible with serverless and edge runtimes. Now it runs anywhere JS/WASM does (Cloudflare Workers, Bun, Deno) — a neat callback to last week’s edge-hosting thread. Open question — is “rewrite it in Rust” sometimes the wrong instinct, where the native-binary tax outweighs the raw-speed win, or is Prisma a special case because the bottleneck was deployment, not the language? — From Rust to TypeScript — Prisma blog
Rust for agentic coding
The counterpoint right after the Prisma chat — why Rust is a great fit when an AI is the one writing the code. The pitch: the compiler is the harness. Strong types, ownership and trait signatures turn a whole class of “fails halfway through the task” runtime bugs into compile-time errors the agent must fix before moving on, so the loop self-corrects instead of shipping plausible-but-broken code. It also constrains the shape of a correct answer, which is partly why one-shot AI Rust often just works where Python or TS needs babysitting. Practical tips from the room — pin the agent to a recent Rust edition rather than letting it default to the older (~2021) version it leans on, since the newer language and crate idioms matter; Rust by hand is a brutal learning curve but agents clear it fast, so it’s far more approachable than it used to be; and start small to get a feel before pointing an agent at anything big. Open question — so Rust is harder for humans but easier for agents — does that flip the old “use the easy language” calculus once the model writes most of the code, and is the compiler the cheapest verifier we’ve got? — The Compiler Is the Harness — Medium
NixOS
The reproducible-by-design Linux distro — you describe the whole system state in a declarative config file and Nix builds exactly that, with package isolation, atomic upgrades and instant rollback to any previous generation. For newcomers the one-liner that landed: Nix is a package manager and build system where you declare the environment you want instead of running install commands, and it hands you that exact environment every time. A neat workflow that came up — instead of full-copying a project to spin up a dev or experiment branch, you mirror it and track the diff, so each variant is a delta off a shared base rather than a heavyweight duplicate. The pull for this room is environment determinism: “works on my machine” stops being a thing, and an agent handed a flake gets the identical toolchain every run instead of drifting. It also does dev containers — Nix can hand you a reproducible per-project dev shell (and even build the container image declaratively), so the environment is pinned by the same config rather than a hand-maintained Dockerfile. And for anyone who wants to go deeper, OKTech’s next meetup is a whole event on Nix — worth turning up for. Open question — is a fully declarative, rollback-able OS the right substrate for agent work, or is the famous Nix learning curve exactly the kind of friction an agent should be clearing for you now? — nixos.org — OKTech — next event on Nix
Firecracker
Paired naturally with the NixOS chat — AWS’s open-source microVM that boots a hardware-isolated VM in ~125ms with under 5 MiB overhead, packing up to 150 microVMs per second per host. It’s what powers Lambda and Fargate: container-like speed and density, real VM isolation underneath. And another callback to the Rust thread — it’s written in Rust. The obvious fit is sandboxing untrusted agent code — give each agent run a disposable, genuinely isolated box rather than trusting a container boundary. Open question — for running agent-generated code we don’t fully trust, is a Firecracker microVM the sweet spot between “too slow” full VMs and “not isolated enough” containers? — github.com/firecracker-microvm/firecracker — firecracker-microvm.github.io
nix-darwin
The macOS on-ramp to the Nix thread above — nix-darwin brings the same declarative, reproducible config model to a Mac, pitched as an alternative to Homebrew. Instead of a pile of imperative brew install commands and whatever state they leave behind, you declare your packages and system settings in one config and rebuild to that exact state, with the rollback and reproducibility Nix is known for. The draw for this room — your whole dev machine becomes version-controlled and repeatable, the same determinism we want for agent environments but for your daily driver. Open question — is declaring your Mac worth giving up Homebrew’s huge formula catalogue and zero learning curve, or is nix-darwin only worth it once you’re already bought into Nix? — github.com/nix-darwin/nix-darwin
Kansai Robot World
Came up because it ran just earlier this week — Kansai Robot World hit Intex Osaka on 25–26 June, four expos under one roof (service robots, industrial robots, next-gen mobility, and a space-development business show). The bigger point for the room: Japan runs a steady stream of robotics events like this, but they’re genuinely hard to find — scattered across organiser sites and Japanese-only listings, a callback to last week’s “where do you even find a meetup here” thread on connpass, Peatix and tunagate. Open question — is there a single decent aggregator for Japanese robotics/tech events, or is the discoverability gap itself the opportunity? — srobo.jp
EyePop.ai
Came up off the robotics chat — EyePop.ai is a self-service computer-vision platform that lets you detect, count and measure things in images, video or livestreams without an ML engineer: pick a ready-made model or train your own in about an hour, then embed it in your app via their SDK. The bit that landed was deployment — it runs in the cloud or fully on-prem/embedded when latency or data sensitivity rules out a round-trip, the same vision model living right where the camera is. Open question — now that bolting custom vision onto a product is an afternoon, not a research project, where’s the first place this table would actually wire one in? — eyepop.ai
Micro AGI
Rounds out the robotics-data thread — Micro AGI (a Munich startup) is crowdsourcing the one thing simulators can’t fake: footage of real human hands doing real chores in real cluttered homes. Two angles — a free NYC home-cleaning service where vetted cleaners wear head cameras, and a paid program (~$20/hr) where students film themselves doing everyday tasks with a phone on a head strap. Faces and personal items are anonymised out, then the datasets get sold on to AI and robotics firms. It’s the embodied-AI counterpart to EyePop above — the bottleneck for household robots isn’t the model, it’s the data. Open question — is “let cameras watch you mop for $20/hr” the start of a real gig economy for training data, and would anyone here actually opt in? — microagi.ai — Free cleaning for AI training — Crypto Briefing
DSLs for agentic projects
Introduced for the room — a domain-specific language is a small, deliberately limited language aimed at one slice of a problem (think SQL, regex, CSS) rather than a general-purpose language for anything. The pros that clicked for agentic work: a tight DSL gives the agent a narrow, well-defined vocabulary, so there’s far less room to hallucinate than in open-ended code, the surface is easy to validate or even formally check, and the same spec reads cleanly for both the human and the model. It rhymes with the Rust “compiler is the harness” thread — constrain the space and the agent self-corrects — and with last week’s Beads graph: give the agent structure and it stays on the rails. Open question — is the real unlock writing a bespoke DSL for your domain and having the agent target that, rather than pointing it at a general-purpose language and hoping? — Domain Specific Language — Martin Fowler
Run agentic coders in a VM
A practical tip for the security-minded — instead of fighting an agent for per-action approval on your real machine, run the agentic coder inside a throwaway VM and let it off the leash in full-access mode. You get the speed of no permission prompts without betting your actual filesystem, secrets and SSH keys on the agent never doing anything daft; if it goes sideways you nuke the VM. It’s the pragmatic version of the Firecracker thread above — isolation first, then full autonomy inside the box. Open question — is a disposable VM the right default for letting agents run unattended, and where’s the line between “isolated enough to YOLO” and just trusting the harness’s own permission model? — firecracker-microvm.github.io
LSP — if you’re building an editor
For anyone tempted to build their own code editor or IDE — the Language Server Protocol is the thing you want. It’s an open JSON-RPC standard between the editor and a language server that supplies the smarts (autocomplete, go-to-definition, find-references, diagnostics), so you implement against one protocol and inherit language intelligence for every language that ships a server, instead of hand-rolling support per language. Rust’s server, rust-analyzer, came up as a standout — exceptionally good, and a nice tie-back to the Rust-for-agents thread: the same rich type signal that helps the compiler self-correct also powers first-class editor tooling. Open question — if you were building an agent-native editor, is leaning entirely on LSP the shortcut, or do agents want a richer interface than a protocol designed for humans clicking around? — Language Server Protocol — Microsoft — rust-analyzer.github.io
BDD with Cucumber and Playwright
Behaviour-driven development came up — write the expected behaviour first in plain-language Given/When/Then scenarios (Cucumber’s Gherkin), then bind each step to code that drives the app. The combo discussed: Cucumber for the human-readable specs, Playwright underneath to actually click through the browser and assert. Neat callback to the DSL thread above — Gherkin is a little DSL, a constrained vocabulary that reads the same for the product person, the developer and an agent, which is exactly the kind of spec an agent can implement against and verify. A handy side-use that came up — because Playwright already drives a real browser, you can point it at your app to auto-capture screenshots and turn them into tutorials, docs and step-by-step guides, regenerated whenever the UI changes so the screengrabs never go stale. Open question — do executable plain-English specs earn their keep as the shared contract on an agentic project, or is the Gherkin layer just ceremony on top of tests the agent could write straight in Playwright? — cucumber.io — playwright.dev — github.com/vitalets/playwright-bdd — playwright-bdd agent skill — getting started
Jest
Mentioned alongside the BDD chat as the go-to JavaScript testing framework — the batteries-included option for unit and integration tests: test runner, assertions, mocking and snapshot testing in one package, zero-config for most JS/TS projects. Where Cucumber/Playwright above cover plain-English specs and browser end-to-end, Jest is the layer underneath for the fast, plentiful unit tests. The agentic angle is the recurring one — a quick, comprehensive test suite is exactly the green/red signal an agent needs to know its change actually worked. Open question — for an agent-driven codebase, is a dense Jest suite the cheapest guardrail going, or do snapshot tests just become noise the agent learns to blindly update? — jestjs.io
WordPress MCP server
For the WordPress devs in the room — there’s now an official MCP story. The WordPress MCP Adapter bridges the new Abilities API to the Model Context Protocol, so an agent (Claude Code, Cursor, VS Code) can discover and invoke a site’s plugin/theme/core abilities as MCP tools and read site data as MCP resources. WordPress.com ships a built-in server on paid plans, and WordPress.org runs one aimed at preparing and submitting plugins to the directory. Given how much of the web still runs on WordPress, this quietly opens a huge surface to agent automation. Open question — does an MCP server turn WordPress into a serious agent-driven CMS for building and maintaining sites, or is letting an agent invoke live site “abilities” a security headache waiting to happen? — WordPress MCP Adapter — Developer Blog — github.com/WordPress/mcp-adapter
Announcements
- Heads-up for next week — check out Algodyne, a software shop building for property management, fintech and language learning. Worth a look before the next assembly. — algodyne.com
- Next OKTech event is a whole session on Nix — come along. — OKTech — Nix event