AA003 30 May 2026 Office Park 703C

Workflows Unleashed

Plan to code, code to swarm — Dynamic workflows shipped Thursday alongside Opus 4.8. The plan moves into JavaScript: Claude writes the orchestrator from your prompt, a runtime executes it in the background, and up to 16 concurrent / 1,000 total subagents fan out. Intermediate state lives in script variables, so the conversation only sees the final answer. Claude writes its own swarm controller, then lets it loose.
Cleaner glass, fewer ghosts — Opus 4.8 shipped 2026-05-28. Simon calls it "a modest but tangible improvement" — the leap isn't on benchmarks but on candour. First Claude model to score 0% on uncritically reporting flawed results, 4× fewer unflagged code flaws, 10× less overconfidence. The shine is in what it stops pretending.
The flood scours the riverbed Armin Ronacher on dogfooding Pi to build Pi (2026-05-24). LLM issue reports come in confidently wrong; AI patches defend locally instead of fixing root causes; the volume fragments upstream coordination and replaces human review with isolated workarounds. The flood is real — the question is whether strong foundational design and active maintainership still hold the banks.

What we talked about

Pandora music

The internet-radio pioneer built on the Music Genome Project — human analysts tag each song across hundreds of musical attributes, and the recommender builds stations from that hand-labelled signal rather than pure collaborative filtering. Came up as the pre-streaming-era counterpoint to today’s algorithmic feeds: a curation model that leaned on expert features instead of “people who liked this also liked.” — pandora.com

Anthropic’s open-source ESP32 desk pet

Anthropic shipped firmware for a tiny ESP32 desk pet that mirrors Claude Code’s live state — and sits up when Claude is waiting on a permission prompt, so you stop missing approvals after switching windows. Originally targets the M5StickC Plus; the XDA writeup ports it to a WT32-SC01 Plus for a bigger screen. — XDA writeup

pdfplumber

Python library for pulling text, tables, and metadata out of PDFs — character-level positioning, table detection without OCR, visual debugging overlays. Came up as a more reliable handle on structured PDFs than throwing the raw file at a model. — github

/goal

Claude Code slash command for pinning a top-level objective at the session level — Claude keeps the goal in view across turns, sub-tasks, and tool calls, so long arcs don’t drift. Pairs with /loop for “keep iterating toward this until done” runs; came up as the existing pattern that dynamic workflows have to beat.

Hacker News on dynamic workflows

The HN thread on dynamic workflows split sharply — some users called it materially better than turn-by-turn for anything beyond a one-shot PR; others burned through Max limits after ~90 agents on a small package, and several argued the headline Bun port was mechanical enough to not really prove the case. Room consensus was closer to “worth trying on the right shape of task” than either pole. — HN thread

Reddit — r/LLMDevs and r/ExperiencedDevs

The other two watering holes for agentic-coding chatter outside HN. r/LLMDevs leans toward people shipping with the SDKs and arguing about model picks; r/ExperiencedDevs is the senior-engineer reality check — where AI hype gets stress-tested against actual production codebases and team workflows. Worth sampling both when the HN thread feels like an echo. — r/LLMDevsr/ExperiencedDevs

Hacker Newsletter

Weekly email digest of the top Hacker News stories and comments, curated by Kale Davis since 2010 — the low-effort way to skim the week’s signal without living in the orange site. Pairs well with the agentic-coding firehose: a lot of the model-release and tooling chatter we cover here lands there a few days later, already filtered. — hackernewsletter.com

LLMs as an academic-paper news source

Idea floated around the table: point a deep-research run (or a /loop’d agent) at arXiv / Semantic Scholar / Papers with Code on a weekly cadence and let it surface what’s new in agent design, eval, scaling, or whatever niche you actually care about — instead of waiting for it to reach Twitter or HN. Cheaper than reading every abstract; better signal than the algorithmic feeds. Open question for the room: who’s already doing this, and what’s the prompt that actually works?

Karpathy’s take on agentic coding

Andrej Karpathy’s running thesis — “software 3.0”, the LLM as a new kind of OS, the autonomy slider from autocomplete to fully delegated agent — keeps showing up as the frame people reach for when arguing about where workflows, subagents, and skills sit. His point about staying on the left of the autonomy slider until verification is cheap maps directly onto the dynamic-workflows debate: hundreds of background agents are only a good idea if you can actually check the diff at the end. — Software is Changing (Again) — YC talk

Paper orchestra — research notes → full scientific paper

A workflow idea that fell out of the dynamic-workflows brainstorm: feed in a pile of rough research notes and let an orchestrated agent swarm draft a full scientific paper. The “orchestra” framing maps onto the mechanism — a conductor script fans out sections (related work, methods, results, discussion) to parallel writer agents, each grounded in the relevant notes, then a synthesis pass enforces a single voice, consistent notation, and a coherent argument; an adversarial reviewer pass checks every claim traces back to a note rather than being hallucinated. Intermediate drafts live in script variables, so you only see the assembled manuscript. Open question for the room: how much of the reasoning can actually be delegated versus just the prose scaffolding — and where does the human have to stay in the loop to keep it honest?

Announcements

  • OKTech — Kitchen Robots & State of AI — Sat 30 May, 17:00 ~ 19:30, Cybozu Osaka (35F Hankyu Bldg). Robotics in the home + AI in web dev. — oktech.jp
  • Co-hosts wanted — Looking for folks to help run upcoming assemblies — picking topics, lining up demos, opening up the room. Come find us after the meet, or ping us via feedback.

Headlines

Dynamic workflows in Claude Code

Claude writes a JavaScript orchestrator from your prompt; the runtime fans out up to 16 concurrent / 1,000 total subagents in the background while your session stays free. Trigger with the word workflow in any prompt or /effort ultracode. Bundled: /deep-research. Vendor showcase: Jarred Sumner’s Bun port — Zig → Rust, ~750k lines, 99.8% test pass, 11 days. HN reception is split — some swear by it for anything bigger than a 1-shot PR, others blew through Max after 90 agents on a small package. We’ll run one live on this repo. Where does this beat /goal + /loop, and what’s the real cost? — Anthropic blogDocsHNRegister on Bun

When to use a workflow

Subagents, skills, and workflows can all run a multi-step task. The difference is who holds the plan:

  • What it is
    • Subagents — a worker Claude spawns
    • Skills — instructions Claude follows
    • Workflows — a script the runtime executes
  • Who decides what runs next
    • Subagents — Claude, turn by turn
    • Skills — Claude, following the prompt
    • Workflows — the script
  • Where intermediate results live
    • Subagents & Skills — Claude’s context window
    • Workflows — script variables
  • What’s repeatable
    • Subagents — the worker definition
    • Skills — the instructions
    • Workflows — the orchestration itself
  • Scale
    • Subagents — a few delegated tasks per turn
    • Skills — same as subagents
    • Workflows — dozens to hundreds of agents per run
  • Interruption
    • Subagents & Skills — restarts the turn
    • Workflows — resumable in the same session

Opus 4.8 shipped Thursday

Released 2026-05-28, same price as 4.7. Modest on bench (SWE-Bench Pro 69.2%, GDPval-AA 1890 Elo, GPQA Diamond −0.6 pts); meaningful on candour — first Claude model at 0% on uncritically reporting flawed results, 4× fewer unflagged code flaws, 10× less overconfidence. New effort control (Low → Max). Simon: “a modest but tangible improvement.” Does the room’s lived experience match the candour claims, or does it confabulate the same as 4.7? — AnthropicSimon@claudeai

Discussion Topics

  1. How are you using coding agents? — opener. Which tools, what shipped, what got dropped, where AI hit a wall this week.
  2. Try many different harnesses — round-the-table on which harnesses people are actively switching between and what each is genuinely best at. The field is crowded (Terminal Trove tracks 47 actively-developed terminal agents alone), but a handful dominate the rotation. By work adoption, GitHub Copilot (Microsoft/OpenAI) still leads, with Cursor (Anysphere) and Claude Code (Anthropic) tied right behind — and Claude Code growing fastest. The CLI tier: Claude Code — OpenAI’s Codex CLI — Cline — Aider — OpenCode (SST) — OpenHands — Gemini CLI — Pi. Agentic IDEs: Cursor — Google’s free Antigravity (parallel agents via its Manager Surface). Background/autonomous: Google Jules. Are we converging on one daily driver, or running a stable of three? — JetBrains AI PulseTerminal Trove
  3. TOON vs. JSON for prompts — tees up Robert’s show-and-tell. Token-Oriented Object Notation claims ~40% fewer tokens than JSON for structured prompt data with comparable model accuracy. Where does a compact, indentation-based format actually earn its keep, and where is JSON still the right call? — toonformat.dev

Workflow demo (live, this repo)

Live-run a dynamic workflow on the aa-website repo itself — codify the CLAUDE.md em-dash rule and let the workflow find every place we accidentally broke it. Demonstrates exactly the “codebase-wide consistency check” pattern Anthropic cites as the canonical use case.

Workflows are pre-enabled for this repo via .claude/settings.json — no /config toggle needed:

{
  "disableWorkflows": false
}

Then paste this into Claude Code:

Run a workflow to audit src/**/*.{astro,mdx,md} for places that should
be em-dashes per the CLAUDE.md house rule but aren't. Look for:

- `·` (middle dot) used as a divider
- `|` used between list items in a three-item strip
- `/` used between options (e.g. `light/dark`, `A/B`)
- ` - ` (space-hyphen-space) used as a divider, not as a hyphen
  in a compound word

For each finding, return: file path, line number, the offending
fragment, and the proposed em-dash replacement.

Then run an adversarial reviewer pass that rejects findings that are
actually legitimate hyphens (e.g. `A4-landscape`, `light-mode`),
ranges (`10:00 ~ 12:00`), or code (paths, flags, identifiers). Only
report violations that survive the review.

Fan-out: one generator agent per glob (or per directory under src/). Adversarial reviewer pass de-noises. Expected scale: ~10–20 agents total — well under the HN cautionary tale. Artifact: PR-ready diff against master.

Show and Tell

Member picks

  • TOON — Token-Oriented Object Notation (Robert) (toonformat.dev) — compact, human-readable data format positioned as a JSON alternative for LLM prompts. Claim: ~40% fewer tokens than JSON for the same structured data, with comparable or better model accuracy across providers. Indentation-based, minimal quoting. Robert to walk through the format and where it earns its keep vs. where JSON is still the right call.
  • /lab/ route convention — drop a /lab/ route into a project as a scratch space for technical testing — quick UI experiments, agent-driven prototypes, integration trials, throwaway demos. Keeps the spike code reachable in the running app without leaking into production routing or polluting the main IA. Worth comparing notes: how do people gate it (env flag, dev-only build, robots-noindex), and what’s the rule for graduating something out of /lab/?

Next time

  • TBD.

Run /news skill

Floor: 2026-05-23 (262305-aa02). Generated 2026-05-29.

New Claude Code commands & features

  • Dynamic workflows + /workflows + /deep-research (v2.1.154, 2026-05-28) — Claude writes a JS script from your prompt; runtime executes background subagents (16 concurrent / 1,000 total cap per run); script holds the loop, branching, and intermediate state so Claude’s context only sees the final answer. Trigger with the word workflow in any prompt or /effort ultracode. /workflows opens the per-phase progress view with pause/resume/restart. /deep-research is the bundled cross-checked-research workflow. Save successful runs as /<name> to .claude/workflows/ (project) or ~/.claude/workflows/ (personal). Pro requires turning on in /config; off-by-default on Enterprise — blogdocsrelease
  • claude agents — background sessions (v2.1.154, 2026-05-28) — ! <command> runs shell commands as attach/detach background sessions; also claude --bg --exec '<command>'release
  • Opus 4.8 default + cheaper Fast Mode (v2.1.154, 2026-05-28) — Opus 4.8 defaults to high effort (/effort xhigh); Fast Mode now 2× standard rate for 2.5× speed (down from prior pricing) — release
  • /chrome — browser selection (v2.1.154, 2026-05-28) — pick which connected Chrome instance Claude attaches to via /chrome → “Select browser…” — release
  • Lean system prompt default (v2.1.154, 2026-05-28) — leaner system prompt now default for every model except Haiku, Sonnet, and Opus 4.7 and earlier — release
  • Plugin defaultEnabled: false (v2.1.154, 2026-05-28) — plugins can declare default-off in plugin.json; enable with /plugin or claude plugin enablerelease
  • /reload-skills (v2.1.152, 2026-05-27) — re-scan skill directories without restarting the session — release
  • /code-review --fix (v2.1.152, 2026-05-27) — after a review, apply the surfaced reuse / simplification / efficiency suggestions to the working tree — release
  • MessageDisplay hook (v2.1.153, 2026-05-28) — new hook event for transforming or hiding assistant message text during display — release
  • SessionStart sessionTitle + reloadSkills (v2.1.152, 2026-05-27) — SessionStart hooks can set session title via hookSpecificOutput.sessionTitle and return reloadSkills: true to re-scan skills in the same session — release
  • Skill disallowed-tools (v2.1.152, 2026-05-27) — skill frontmatter can remove tools from the model while the skill is active — release
  • Vim / reverse history search (v2.1.152, 2026-05-27) — / in NORMAL mode opens reverse search (like Ctrl+R) — release
  • pluginSuggestionMarketplaces managed setting (v2.1.153, 2026-05-28) — admins allowlist org marketplaces for context-aware plugin suggestions — release
  • Model fallback on missing model (v2.1.152, 2026-05-27) — falls back to --fallback-model when the primary model isn’t available — release
  • Auto Mode no longer needs opt-in consent (v2.1.152, 2026-05-27) — release

Claude Code — other notes

  • [2026-05-28] v2.1.154 also: stdio MCP servers now receive CLAUDE_CODE_SESSION_ID and CLAUDECODE=1 env vars; /remote-control shows “Disconnect Remote Control” when active; CLAUDE_CODE_OPUS_4_6_FAST_MODE_OVERRIDE deprecated (removal 06/01) — release
  • [2026-05-29] v2.1.156 — fix for Opus 4.8 thinking-block modification causing API errors — release

Anthropic

  • [2026-05-28] Opus 4.8 GA — Claude API, Bedrock, Vertex, Foundry, same pricing as 4.7 ($5 / $25 per M). Headline numbers: SWE-Bench Pro 69.2% (from 64.3%), GDPval-AA 1890 Elo (+137 over 4.7, +121 over GPT-5.5), Terminal-Bench +8.5pts (GPT-5.5 still leads at 78.2%), GPQA Diamond −0.6pts. Anthropic claims 4× drop in unflagged code flaws, 10× drop in overconfidence, first Claude model at 0% uncritically reporting flawed results — announcement
  • [2026-05-28] Dynamic workflows — research preview — official blog post + docs. Background runtime that orchestrates tens-to-hundreds of subagents per session (cap 16 concurrent / 1,000 total); script holds state outside Claude’s context. Vendor showcase: Jarred Sumner’s Bun Zig→Rust port — ~750k lines of Rust, 99.8% test pass, 11 days first-commit-to-merge, binary 3–8 MB smaller, memory safety (not perf) the stated motivation. HN reception is split — some users found it materially better than turn-by-turn for anything beyond a 1-shot PR; others hit Max limits after ~90 agents on a small package; multiple commenters argued the Bun port was mechanical and doesn’t prove much — blogdocsHNThe Register on Bun

Codex

  • [2026-05-28] Codex rust-v0.135.0codex doctor expanded with environment, Git, terminal, app-server, and thread diagnostics; /status shows remote connection + server version; /permissions understands named profiles and custom configs; vim text-object editing + configurable interrupt-turn binding; Python SDK exposes Sandbox presets — release
  • [2026-05-26] Codex rust-v0.134.0 — search across local conversation history with content-match previews; --profile becomes primary profile selector across CLI / TUI / sandbox; MCP setup gains per-server environment targeting and OAuth options; connector schemas preserve $ref / $defs; read-only MCP tools (readOnlyHint) can run concurrently — release

Adjacent tools

  • [2026-05-24] Armin Ronacher — “Pi, OSS, and the AI flood” — dogfooding Pi to build Pi; the three problems: confidently-wrong LLM issue reports, AI patches that locally defend instead of fixing root causes or maintaining invariants, and the volume fragmenting upstream human-to-human coordination — post
  • TOON — Token-Oriented Object Notation — compact JSON alternative for LLM prompts; vendor-cited ~40% fewer tokens than JSON with comparable or better model accuracy across providers; indentation-based, minimal quoting — toonformat.dev

Simon says

  • [2026-05-28] “Claude Opus 4.8: ‘a modest but tangible improvement’” — Simon’s read on the 4.8 release: incremental on benchmarks, meaningful on hallucination / overconfidence — post

Topics worth a 5-min slot

  1. Opus 4.8 — modest bench, big honesty story — does the room’s lived experience match the vendor’s “0% uncritical reporting” claim, or does it still confabulate the same as 4.7?
  2. Dynamic workflows + claude agents — Claude Code now lets you fan out across hundreds of background subagents from one prompt, and detach/reattach to shell sessions. Anyone tried it yet? Where does it beat /goal + /loop?
  3. The AI flood on OSS maintainers — Armin’s three claims: bad issue reports, local-defense patches, fragmented upstream coordination. Match the room’s experience maintaining repos?
Further reading