AA001 16 May 2026 The DECK
Hello, AA
Bells in a closed room — v2.1.141 added `terminalSequence` to hook output. Hooks can now fire OS desktop notifications, change the window title, and ring the terminal bell with no controlling terminal attached. Headless agents finally have a way to tap you on the shoulder.
Pockets now hold a foreman — Codex landed inside the ChatGPT mobile app: approve tasks, review diffs, redirect a running agent mid-flight from iPhone, iPad, or Android. OpenAI cites 4M weekly Codex users — supervision is leaving the terminal.
Slash and you shall find Biblical inversion for the slash-command zoo we've each grown inside Claude Code. Today's round-the-table picks up which custom commands earned their keep and which got deleted in a week.
Suggested Topics
- How are you using coding agents? — opener. Which tools, which loops, what shipped vs. still vibes? What got dropped this week?
- Agent SDK billing split (June 15) —
claude -p, GitHub Actions, and third-party SDK apps move onto a metered credit pool at API rates. Likely a response to clones like OpenClaw routing subscription credits into non-Anthropic agents — the split fences off SDK traffic from human-drivenclaudeuse. What does the math look like at Pro / Max / Team? - Claude Code commands — let’s scroll through Claude’s command menu together first — what’s actually in there, and what surprised you? Then:
/goal(v2.1.139) lets Claude run to a verifiable finish line on its own. What custom commands have people built? Which earned their keep, which got deleted in a week? - DORA on AI ROI (Robert) — Robert brings both reports. Amplifier thesis, J-curve, the −$344k stability tax. Does the room buy it?
- HTML over Markdown — Simon’s pitch: ask Claude for HTML and you get SVG diagrams, widgets, sliders. Anyone using it for real?
- CAD agent demo — Adam-CAD/CADAM.
Show and Tell
Reports — DORA (via Robert)
- DORA — ROI of AI-assisted Software Development (Google Cloud, updated 2026-04-22) — report — calculator — TLDR: AI is an amplifier; the ROI comes from platform quality, workflow clarity, and team alignment, not the tool. Models a J-curve (productivity dip before upside) and ships an interactive calculator — run conservative / realistic / optimistic and present finance a range. The catch: AI adoption correlates with increased delivery instability — the sample calc shows a −$344k downtime hit from change-failure rate going 5% → 6%. Fix it with automated testing, CI, and small batches.
- DORA — State of AI-assisted Software Development 2025 (Google Cloud, 2025) — report — PDF — TLDR: 90% of devs use AI at work, 80%+ say it raises productivity, but 30% have little or no trust in the generated code. Same amplifier thesis: teams in loosely-coupled architectures with fast feedback loops capture the gains; tightly-coupled / slow-process teams see little or none. Identifies seven org capabilities that magnify AI’s impact — foundational eng practices, healthy data ecosystem, quality internal platform, etc.
Member picks
- Simon Willison — “The Unreasonable Effectiveness of HTML” (2026-05-08) — ask Claude for HTML instead of Markdown and you get SVG diagrams, widgets, and interactive explanations — post
- CAD agent —
Adam-CAD/CADAM— the repo behind today’s demo — github.com - GSD —
gsd-build/get-shit-done— spec-driven dev system for Claude Code — six-command loop (/gsd-new-project→ discuss → plan → execute → verify → ship), parallel-wave subagents on fresh 200k contexts,npx get-shit-done-cc@latestinstaller. 62k stars since December, with its own$GSDSolana token — make of that what you will. — github.com /simplify— custom Claude Code command — diff → three parallel review agents (reuse, quality, efficiency) → fix. Worth it on real code; overkill on prose-only diffs (skip the fanout).- Hazy Research — “Minions” (Stanford, 2025-02-24) — small local model + frontier cloud model collaborating: remote decomposes, local chews through long docs in parallel, remote aggregates. 97.9% of remote-only accuracy at 17.5% of the cost. — post
- Liquid AI — LFM2 (2025-07-10) — on-device foundation models (350M / 700M / 1.2B, 10T tokens) tuned for edge inference; hybrid 16-block stack of gated short convolutions + grouped query attention, claimed 2× faster CPU decode/prefill vs. Qwen3. Pair-with-Minions material. — post
- Chrome Translator API (Chrome 138+, desktop) —
Translator.create()/translator.translate(), fully on-device via downloaded language packs, no cloud round-trip. The on-device-AI thread continues into the browser itself. — docs - Zellij — terminal workspace / multiplexer (Rust, WASM-plugin architecture). Batteries-included tmux: floating panes, named sessions, layouts you can check into the repo — handy for parking a Claude Code pane next to a Codex pane next to logs. — zellij.dev
ENABLE_PROMPT_CACHING_1H=1— Claude Code v2.1.108 env var that flips the prompt cache off the 5-min default and onto a 1-hour TTL (API key, Bedrock, Vertex, Foundry). 1h writes cost more than 5m writes — break-even is ~5+ reads inside the window — but worth it on long sessions over a stable codebase that pauses between turns.FORCE_PROMPT_CACHING_5M=1pins back to 5m. — release/model opusplan— Claude Code model alias that runs Opus while in plan mode and Sonnet everywhere else. Hard reasoning gets the smart model; execution drops to Sonnet for speed and quota. Pairs cleanly with any/plan-driven loop — you only burn Opus tokens where the architecture decisions actually live. — docs- GitHub Copilot — it’s good now — May 2026 dropped the Copilot CLI agent into JetBrains, a unified sessions view, global
.agent.mdconfig, and a standalone desktop app (tech preview, Win/Mac/Linux) that hands each parallel session its own git worktree + branch. Basically Claude Code’sclaude agentsdashboard, shipped by Microsoft. Worth a look if you wrote it off in the GPT-3.5 era. — changelog - NVIDIA NemoClaw (early preview) — NVIDIA’s privacy/security stack layered over OpenClaw. Agent Toolkit for policy-based guardrails, OpenShell runtime, and Nemotron-class open models running locally on RTX / RTX PRO / DGX Station / DGX Spark — with a privacy router that hands the cloud only what the policy lets through. Lands while Anthropic is busy fencing OpenClaw out with the June 15 SDK billing split — different bets on where agent traffic should run. — nvidia.com
- pnpm 11 —
minimumReleaseAgeon by default (2026-05-05) — newly published versions can’t be resolved until they’re at least 24h old (minimumReleaseAge: 1440minutes). A delay window that disrupts “publish-and-pwn” supply-chain attacks like the recent Mini Shai-Hulud install-hook credential stealer. Opt out per-package viaminimumReleaseAgeExclude(for hotfixes), or globally withminimumReleaseAge: 0inpnpm-workspace.yaml. Worth knowing about as agents start runningpnpm addunsupervised. — release
Around town
- OKTech — Kitchen Robots & State of AI — Sat 30 May, 17:00–19:30, Cybozu Osaka (35F Hankyu Bldg). Robotics in the home + AI in web dev. — oktech.jp
- State of Web Dev AI (Devographics) — TLDR: the same crew behind State of JS / CSS now run a survey on how web developers actually use AI — model providers, coding assistants, spend, pain points — and deliberately include the AI-skeptics so the picture isn’t a hype curve. Worth filling in. — stateofai.dev
Next time
- How to create a Claude Code skill — what goes in
SKILL.md, frontmatter contract, how the harness decides when to invoke, walkthrough of an example from this repo.
Run /news skill
Floor: 2026-05-09 (no prior entry — defaulted to 7-day window). Generated 2026-05-16.
New Claude Code commands & features
/goal(v2.1.139, 2026-05-11) — set a completion condition; Claude keeps working across turns until a small evaluator model says it’s met. Works in interactive,-p, and Remote Control; shows live elapsed / turns / tokens overlay — releaseclaude agents(v2.1.139, Research Preview) — single dashboard of every Claude Code session (running / blocked-on-you / done);--cwd <path>to scope by directory (v2.1.141);--add-dir,--settings,--mcp-config,--plugin-dir,--permission-mode,--model,--effort,--dangerously-skip-permissionsto configure dispatched background sessions (v2.1.142) — agent view docs- Fast mode → Opus 4.7 (v2.1.142, 2026-05-14) — Fast mode now defaults to Opus 4.7 instead of 4.6; pin to 4.6 with
CLAUDE_CODE_OPUS_4_6_FAST_MODE_OVERRIDE=1— release - Projected context cost in
/plugin(v2.1.143, 2026-05-15) — per-turn and per-invocation token estimates shown in the marketplace browse pane before you install — release - Hooks
terminalSequence(v2.1.141, 2026-05-13) — emit desktop notifications, window titles, and bells from hooks without a controlling terminal — release - Rewind “Summarize up to here” (v2.1.141) — compress earlier context while keeping recent turns intact — release
claude plugin details <name>(v2.1.139) — component inventory + projected per-session token cost — release/scroll-speed(v2.1.139) — tune mouse-wheel scroll speed with a live preview — release- Transcript view navigation (v2.1.139) —
?for keyboard shortcuts,{/}to jump between user prompts,vto toggle the shortcut panel — release
Claude Code — other notes
- [2026-05-06] Anthropic — Code w/ Claude 2026 keynote: SpaceX/xAI Colossus capacity deal, doubled 5-hour API limits, multi-agent orchestration + routines for Claude managed agents, API usage up 17× YoY — Simon’s live blog
- [2026-05] Anthropic announces June 15 billing split — Agent SDK,
claude -p, GitHub Actions, and third-party Agent-SDK apps move onto a separate monthly credit pool at API rates — help center - [2026-05-13] v2.1.141 also:
CLAUDE_CODE_PLUGIN_PREFER_HTTPSfor plugin clones without GitHub SSH;/feedbackcan attach the last 24h / 7d of sessions — release - [2026-05-14] v2.1.142 also: plugins with a root-level
SKILL.mdsurface as a skill;MCP_TOOL_TIMEOUTfinally raises the 60s remote-MCP fetch ceiling — release - [2026-05-15] v2.1.143 also: plugin dependency enforcement on
disable/enable;worktree.bgIsolation: "none"for repos where worktrees are impractical — release
Codex
- [2026-05-14] Codex on the ChatGPT mobile app — approve tasks, review diffs, redirect running runs, monitor terminal output from iPhone/iPad/Android; preview on all plans incl. Free; OpenAI cites 4M weekly Codex users — TechCrunch
- [2026-05-15] Codex CLI 0.131.0 alpha train — daily alpha cuts landing on
main(rust-v0.131.0-alpha.21 latest as of today) — releases - [2026-05-08] Codex CLI 0.130.0 —
codex remote-controlheadless app-server entrypoint; thread pagination with summary/full views; Bedrock auth viaaws loginconsole credentials;view_imageresolves through selected environments — release - [2026-05-07] Codex CLI 0.129.0 — vim mode in the TUI, redesigned resume/fork picker,
/hooksbrowser, plugin sharing — release
Adjacent tools
- [2026-05-14] xAI ships Grok Build — terminal-native CLI, up to 8 parallel subagents, 16-agent Heavy architecture, 2M token context, MCP support, plan-review approval before applying changes; $99–$299/mo SuperGrok Heavy — xAI — Bloomberg
- [2026-05-13] Cursor 3.4 — Development Environments for Cloud Agents; multi-repo environments re-used across sessions; Dockerfile-based setup with build secrets — changelog
- [2026-05-11] Cursor in Microsoft Teams —
@Cursordelegates to a cloud agent, reads the thread, opens the PR; Bugbot effort levels (default / high / custom) — changelog - [2026-05-12] Signadot skill — Claude Code / Codex / Cursor can validate changes against live Kubernetes environments — SiliconANGLE
- [2026-05-11] AWS Bedrock AgentCore — preview of managed payments lets agents autonomously access/pay APIs, MCP servers, web content, other agents; Agent Toolkit for AWS GA at no extra charge — AWS
- [2026-05-08] Industry — Snyk × Anthropic integration, Opsera DevSecOps Agents embedded in Cursor IDE, Prismatic Skills plugin for Claude Code, Coder Agents native architecture for self-hosted enterprise — SD Times roundup
Simon says
- [2026-05-14] “Not so locked in any more” — LLM-assisted language migration (Bun: Zig → Rust) changes the cost calculus for picking a stack — post
- [2026-05-11] “Learning on the Shop floor” — Shopify’s internal coding agent River and what it implies about org-specific tooling — post
- [2026-05-11] Thoughts on GitLab’s workforce reduction in the agentic era — post
- [2026-05-07] Notes on the xAI/Anthropic Colossus data-center deal — post
Reports & research
- [2026-05] Anthropic — 2026 Agentic Coding Trends Report — eight trends across foundation / capability / impact: multi-agent coordination replacing single-agent loops, task horizons stretching from minutes to days, and the “delegation gap” — engineers use AI for ~60% of their work but fully delegate only 0–20% — report
Topics worth a 5-min slot
- Grok Build vs. the field — 8 parallel subagents, 2M context, MCP-compatible, $99/mo intro. Worth a hands-on, or another CLI that fades after the hype week?
- Codex on your phone — what does “approve and redirect agent runs from anywhere” actually change about your workflow vs. always-attached terminal sessions?
- The delegation gap (Anthropic 2026 report) — 60% AI use but 0–20% full delegation. What’s the bottleneck in this room — trust, review cost, or task-fit?
Further reading
- Anthropic — Agent SDK on your Claude plan (June 15 split) support.claude.com
- Simon Willison — The Unreasonable Effectiveness of HTML simonwillison.net
- Claude Code changelog code.claude.com