AA001 16 May 2026 The DECK

Hello, AA

Bells in a closed room — v2.1.141 added `terminalSequence` to hook output. Hooks can now fire OS desktop notifications, change the window title, and ring the terminal bell with no controlling terminal attached. Headless agents finally have a way to tap you on the shoulder.

Pockets now hold a foreman — Codex landed inside the ChatGPT mobile app: approve tasks, review diffs, redirect a running agent mid-flight from iPhone, iPad, or Android. OpenAI cites 4M weekly Codex users — supervision is leaving the terminal.

Slash and you shall find Biblical inversion for the slash-command zoo we've each grown inside Claude Code. Today's round-the-table picks up which custom commands earned their keep and which got deleted in a week.

Show and Tell

Reports — DORA (via Robert)

DORA — ROI of AI-assisted Software Development (Google Cloud, updated 2026-04-22) — report — calculator — TLDR: AI is an amplifier; the ROI comes from platform quality, workflow clarity, and team alignment, not the tool. Models a J-curve (productivity dip before upside) and ships an interactive calculator — run conservative / realistic / optimistic and present finance a range. The catch: AI adoption correlates with increased delivery instability — the sample calc shows a −$344k downtime hit from change-failure rate going 5% → 6%. Fix it with automated testing, CI, and small batches.
DORA — State of AI-assisted Software Development 2025 (Google Cloud, 2025) — report — PDF — TLDR: 90% of devs use AI at work, 80%+ say it raises productivity, but 30% have little or no trust in the generated code. Same amplifier thesis: teams in loosely-coupled architectures with fast feedback loops capture the gains; tightly-coupled / slow-process teams see little or none. Identifies seven org capabilities that magnify AI’s impact — foundational eng practices, healthy data ecosystem, quality internal platform, etc.

Member picks

Simon Willison — “The Unreasonable Effectiveness of HTML” (2026-05-08) — ask Claude for HTML instead of Markdown and you get SVG diagrams, widgets, and interactive explanations — post
CAD agent — Adam-CAD/CADAM — the repo behind today’s demo — github.com
GSD — gsd-build/get-shit-done — spec-driven dev system for Claude Code — six-command loop (/gsd-new-project → discuss → plan → execute → verify → ship), parallel-wave subagents on fresh 200k contexts, npx get-shit-done-cc@latest installer. 62k stars since December, with its own $GSD Solana token — make of that what you will. — github.com
/simplify — custom Claude Code command — diff → three parallel review agents (reuse, quality, efficiency) → fix. Worth it on real code; overkill on prose-only diffs (skip the fanout).
Hazy Research — “Minions” (Stanford, 2025-02-24) — small local model + frontier cloud model collaborating: remote decomposes, local chews through long docs in parallel, remote aggregates. 97.9% of remote-only accuracy at 17.5% of the cost. — post
Liquid AI — LFM2 (2025-07-10) — on-device foundation models (350M / 700M / 1.2B, 10T tokens) tuned for edge inference; hybrid 16-block stack of gated short convolutions + grouped query attention, claimed 2× faster CPU decode/prefill vs. Qwen3. Pair-with-Minions material. — post
Chrome Translator API (Chrome 138+, desktop) — Translator.create() / translator.translate(), fully on-device via downloaded language packs, no cloud round-trip. The on-device-AI thread continues into the browser itself. — docs
Zellij — terminal workspace / multiplexer (Rust, WASM-plugin architecture). Batteries-included tmux: floating panes, named sessions, layouts you can check into the repo — handy for parking a Claude Code pane next to a Codex pane next to logs. — zellij.dev
ENABLE_PROMPT_CACHING_1H=1 — Claude Code v2.1.108 env var that flips the prompt cache off the 5-min default and onto a 1-hour TTL (API key, Bedrock, Vertex, Foundry). 1h writes cost more than 5m writes — break-even is ~5+ reads inside the window — but worth it on long sessions over a stable codebase that pauses between turns. FORCE_PROMPT_CACHING_5M=1 pins back to 5m. — release
/model opusplan — Claude Code model alias that runs Opus while in plan mode and Sonnet everywhere else. Hard reasoning gets the smart model; execution drops to Sonnet for speed and quota. Pairs cleanly with any /plan-driven loop — you only burn Opus tokens where the architecture decisions actually live. — docs
GitHub Copilot — it’s good now — May 2026 dropped the Copilot CLI agent into JetBrains, a unified sessions view, global .agent.md config, and a standalone desktop app (tech preview, Win/Mac/Linux) that hands each parallel session its own git worktree + branch. Basically Claude Code’s claude agents dashboard, shipped by Microsoft. Worth a look if you wrote it off in the GPT-3.5 era. — changelog
NVIDIA NemoClaw (early preview) — NVIDIA’s privacy/security stack layered over OpenClaw. Agent Toolkit for policy-based guardrails, OpenShell runtime, and Nemotron-class open models running locally on RTX / RTX PRO / DGX Station / DGX Spark — with a privacy router that hands the cloud only what the policy lets through. Lands while Anthropic is busy fencing OpenClaw out with the June 15 SDK billing split — different bets on where agent traffic should run. — nvidia.com
pnpm 11 — minimumReleaseAge on by default (2026-05-05) — newly published versions can’t be resolved until they’re at least 24h old (minimumReleaseAge: 1440 minutes). A delay window that disrupts “publish-and-pwn” supply-chain attacks like the recent Mini Shai-Hulud install-hook credential stealer. Opt out per-package via minimumReleaseAgeExclude (for hotfixes), or globally with minimumReleaseAge: 0 in pnpm-workspace.yaml. Worth knowing about as agents start running pnpm add unsupervised. — release

Around town

OKTech — Kitchen Robots & State of AI — Sat 30 May, 17:00–19:30, Cybozu Osaka (35F Hankyu Bldg). Robotics in the home + AI in web dev. — oktech.jp
State of Web Dev AI (Devographics) — TLDR: the same crew behind State of JS / CSS now run a survey on how web developers actually use AI — model providers, coding assistants, spend, pain points — and deliberately include the AI-skeptics so the picture isn’t a hype curve. Worth filling in. — stateofai.dev

Next time

How to create a Claude Code skill — what goes in SKILL.md, frontmatter contract, how the harness decides when to invoke, walkthrough of an example from this repo.

Run /news skill

Floor: 2026-05-09 (no prior entry — defaulted to 7-day window). Generated 2026-05-16.

New Claude Code commands & features

/goal (v2.1.139, 2026-05-11) — set a completion condition; Claude keeps working across turns until a small evaluator model says it’s met. Works in interactive, -p, and Remote Control; shows live elapsed / turns / tokens overlay — release
claude agents (v2.1.139, Research Preview) — single dashboard of every Claude Code session (running / blocked-on-you / done); --cwd <path> to scope by directory (v2.1.141); --add-dir, --settings, --mcp-config, --plugin-dir, --permission-mode, --model, --effort, --dangerously-skip-permissions to configure dispatched background sessions (v2.1.142) — agent view docs
Fast mode → Opus 4.7 (v2.1.142, 2026-05-14) — Fast mode now defaults to Opus 4.7 instead of 4.6; pin to 4.6 with CLAUDE_CODE_OPUS_4_6_FAST_MODE_OVERRIDE=1 — release
Projected context cost in /plugin (v2.1.143, 2026-05-15) — per-turn and per-invocation token estimates shown in the marketplace browse pane before you install — release
Hooks terminalSequence (v2.1.141, 2026-05-13) — emit desktop notifications, window titles, and bells from hooks without a controlling terminal — release
Rewind “Summarize up to here” (v2.1.141) — compress earlier context while keeping recent turns intact — release
claude plugin details <name> (v2.1.139) — component inventory + projected per-session token cost — release
/scroll-speed (v2.1.139) — tune mouse-wheel scroll speed with a live preview — release
Transcript view navigation (v2.1.139) — ? for keyboard shortcuts, { / } to jump between user prompts, v to toggle the shortcut panel — release

Claude Code — other notes

[2026-05-06] Anthropic — Code w/ Claude 2026 keynote: SpaceX/xAI Colossus capacity deal, doubled 5-hour API limits, multi-agent orchestration + routines for Claude managed agents, API usage up 17× YoY — Simon’s live blog
[2026-05] Anthropic announces June 15 billing split — Agent SDK, claude -p, GitHub Actions, and third-party Agent-SDK apps move onto a separate monthly credit pool at API rates — help center
[2026-05-13] v2.1.141 also: CLAUDE_CODE_PLUGIN_PREFER_HTTPS for plugin clones without GitHub SSH; /feedback can attach the last 24h / 7d of sessions — release
[2026-05-14] v2.1.142 also: plugins with a root-level SKILL.md surface as a skill; MCP_TOOL_TIMEOUT finally raises the 60s remote-MCP fetch ceiling — release
[2026-05-15] v2.1.143 also: plugin dependency enforcement on disable/enable; worktree.bgIsolation: "none" for repos where worktrees are impractical — release

Codex

[2026-05-14] Codex on the ChatGPT mobile app — approve tasks, review diffs, redirect running runs, monitor terminal output from iPhone/iPad/Android; preview on all plans incl. Free; OpenAI cites 4M weekly Codex users — TechCrunch
[2026-05-15] Codex CLI 0.131.0 alpha train — daily alpha cuts landing on main (rust-v0.131.0-alpha.21 latest as of today) — releases
[2026-05-08] Codex CLI 0.130.0 — codex remote-control headless app-server entrypoint; thread pagination with summary/full views; Bedrock auth via aws login console credentials; view_image resolves through selected environments — release
[2026-05-07] Codex CLI 0.129.0 — vim mode in the TUI, redesigned resume/fork picker, /hooks browser, plugin sharing — release

Adjacent tools

[2026-05-14] xAI ships Grok Build — terminal-native CLI, up to 8 parallel subagents, 16-agent Heavy architecture, 2M token context, MCP support, plan-review approval before applying changes; $99–$299/mo SuperGrok Heavy — xAI — Bloomberg
[2026-05-13] Cursor 3.4 — Development Environments for Cloud Agents; multi-repo environments re-used across sessions; Dockerfile-based setup with build secrets — changelog
[2026-05-11] Cursor in Microsoft Teams — @Cursor delegates to a cloud agent, reads the thread, opens the PR; Bugbot effort levels (default / high / custom) — changelog
[2026-05-12] Signadot skill — Claude Code / Codex / Cursor can validate changes against live Kubernetes environments — SiliconANGLE
[2026-05-11] AWS Bedrock AgentCore — preview of managed payments lets agents autonomously access/pay APIs, MCP servers, web content, other agents; Agent Toolkit for AWS GA at no extra charge — AWS
[2026-05-08] Industry — Snyk × Anthropic integration, Opsera DevSecOps Agents embedded in Cursor IDE, Prismatic Skills plugin for Claude Code, Coder Agents native architecture for self-hosted enterprise — SD Times roundup

Simon says

[2026-05-14] “Not so locked in any more” — LLM-assisted language migration (Bun: Zig → Rust) changes the cost calculus for picking a stack — post
[2026-05-11] “Learning on the Shop floor” — Shopify’s internal coding agent River and what it implies about org-specific tooling — post
[2026-05-11] Thoughts on GitLab’s workforce reduction in the agentic era — post
[2026-05-07] Notes on the xAI/Anthropic Colossus data-center deal — post

Reports & research

[2026-05] Anthropic — 2026 Agentic Coding Trends Report — eight trends across foundation / capability / impact: multi-agent coordination replacing single-agent loops, task horizons stretching from minutes to days, and the “delegation gap” — engineers use AI for ~60% of their work but fully delegate only 0–20% — report

Topics worth a 5-min slot

Grok Build vs. the field — 8 parallel subagents, 2M context, MCP-compatible, $99/mo intro. Worth a hands-on, or another CLI that fades after the hype week?
Codex on your phone — what does “approve and redirect agent runs from anywhere” actually change about your workflow vs. always-attached terminal sessions?
The delegation gap (Anthropic 2026 report) — 60% AI use but 0–20% full delegation. What’s the bottleneck in this room — trust, review cost, or task-fit?