From 53e64201c5b7052bf5297e98d0bf4cbe9b038a70 Mon Sep 17 00:00:00 2001
From: sacOO7 <sachinshinde7676@gmail.com>
Date: Fri, 27 Mar 2026 19:44:23 +0530
Subject: [PATCH] Created skill to perform behavior testing of given command
 group

---
 .claude/skills/ably-behavior-testing/SKILL.md | 483 ++++++++++++++++++
 .../references/command-inventory.md           | 381 ++++++++++++++
 .../references/report-template.md             | 314 ++++++++++++
 .../references/testing-dimensions.md          | 361 +++++++++++++
 .gitignore                                    |   3 +
 5 files changed, 1542 insertions(+)
 create mode 100644 .claude/skills/ably-behavior-testing/SKILL.md
 create mode 100644 .claude/skills/ably-behavior-testing/references/command-inventory.md
 create mode 100644 .claude/skills/ably-behavior-testing/references/report-template.md
 create mode 100644 .claude/skills/ably-behavior-testing/references/testing-dimensions.md
diff --git a/.claude/skills/ably-behavior-testing/SKILL.md b/.claude/skills/ably-behavior-testing/SKILL.md
new file mode 100644
index 00000000..472057b7
--- /dev/null
+++ b/.claude/skills/ably-behavior-testing/SKILL.md
@@ -0,0 +1,483 @@
+---
+name: ably-behavior-testing
+description: "Perform behavior testing of Ably CLI commands with full output reports. Use this skill whenever asked to test, validate, or behavior test CLI commands — even casually (e.g., 'test the channels commands', 'run behavior tests on rooms', 'validate the CLI output', 'test all commands', 'test spaces', 'test the control API commands', 'test channels publish and subscribe', 'test ably apps create'). Supports testing at any granularity: a single command (e.g., 'test channels publish'), a subcommand group (e.g., 'test channels presence'), a full command group (e.g., 'test channels'), or the entire CLI. When the user specifies particular commands, test ONLY those — do not expand scope unless asked. Generates reports under CLAUDE-BEHAVIOR-TESTING/<command-group>/: a primary report, human-readable output report, and JSON output report (using --pretty-json). Do NOT use for writing unit tests (use ably-new-command), code review (use ably-review), or codebase audits (use ably-codebase-review)."
+---
+
+# Behavior Testing — Ably CLI
+
+You are acting as the **lead expert tester**. Your task is to perform comprehensive behavior testing of Ably CLI commands by executing them against the live Ably service and documenting results in structured Markdown reports.
+
+## When to Use This Skill
+
+- Testing all commands and subcommands under a command group (e.g., `channels`, `rooms`, `spaces`, `apps`)
+- Validating `--help` output accuracy and completeness (including topic-level help)
+- Verifying subscribe/publish/history workflows end-to-end
+- Comparing human-readable vs JSON output formats (using `--pretty-json`)
+- Testing Control API commands (apps, auth, queues, integrations)
+- Generating test reports for QA review
+
+## Prerequisites
+
+- Authentication is already configured (no need to set access tokens)
+- The CLI is built and ready (`pnpm clean && pnpm build` if needed)
+- Use `pnpm cli` to run commands (equivalent to `ably`)
+
+---
+
+## Step 0: Confirm Scope with the User
+
+**Before doing any testing**, check whether the user already specified what to test in their message.
+
+- **If the user specified scope** (e.g., "test channels", "test spaces members", "test channels publish and subscribe") → skip this step, go straight to Step 1.
+- **If the user did NOT specify scope** (e.g., they just invoked `/ably-behavior-testing` or said "run behavior tests") → ask them to choose.
+
+Present this prompt:
+
+---
+
+**What would you like to test?**
+
+| # | Command Group | API Type | Subcommands |
+|---|--------------|----------|-------------|
+| 1 | `channels` | Product API | subscribe, publish, history, list, presence, occupancy, annotations, ... |
+| 2 | `rooms` | Product API | messages, presence, occupancy, reactions, typing |
+| 3 | `spaces` | Product API | members, locations, locks, cursors, occupancy |
+| 4 | `logs` | Product API | subscribe, history, channel-lifecycle, connection-lifecycle, push |
+| 5 | `apps` | Control API | create, list, update, delete, channel-rules, rules |
+| 6 | `auth` | Control API | issue-ably-token, issue-jwt-token, keys |
+| 7 | `queues` | Control API | create, list, delete |
+| 8 | `integrations` | Control API | create, list, get, update, delete |
+| 9 | `push` | Control API | publish, channels, devices, config |
+| 10 | `stats` | Control API | app, account |
+| 11 | `connections` | Product API | test |
+| 12 | **All commands** | All | **Takes a long time — tests every group above sequentially** |
+
+Pick a number, a group name, or tell me specific commands (e.g., "channels publish and subscribe").
+
+---
+
+Wait for the user's response before proceeding. Do NOT start testing until they confirm.
+
+---
+
+## Step 1: Identify Scope
+
+Determine which commands to test based on what the user asked for (either from their original message or from Step 0). The user may specify:
+
+- **Specific commands**: "test channels publish" → test only `channels publish`
+- **A subcommand group**: "test channels presence" → test `presence enter`, `presence get`, `presence subscribe`
+- **A full command group**: "test channels" → test all channels subcommands
+- **Multiple groups**: "test channels and rooms" → test both groups
+- **Everything**: "test all commands" → test all groups
+
+**Respect the requested scope.** If the user says "test channels publish and subscribe", do NOT expand to test history, list, presence, etc. Only broaden scope if the user explicitly asks or says "test all".
+
+Consult `references/command-inventory.md` for the complete list organized by API type.
+
+Commands fall into two API categories with different testing patterns:
+
+| API Type | Groups | Auth | Transport |
+|----------|--------|------|-----------|
+| **Product API** (Ably SDK) | `channels`, `rooms`, `spaces`, `logs`, `connections`, `bench` | API key (`ABLY_API_KEY`) | SDK (REST/Realtime) |
+| **Control API** (HTTP) | `accounts`, `apps`, `auth`, `queues`, `integrations`, `push`, `stats`, `channel-rule` | Access token (`ABLY_ACCESS_TOKEN`) | HTTP requests |
+| **Local** | `config`, `support`, `version`, `status`, `login` | Varies | Local/mixed |
+
+For each group, enumerate **all** commands and subcommands:
+
+```bash
+pnpm cli <group> --help          # Topic-level: lists subcommands
+pnpm cli <group> <subcommand> --help  # Command-level: shows flags and usage
+```
+
+---
+
+## Step 2: Build a Test Plan
+
+For each command group, organize testing into these categories:
+
+### A. Help Validation
+
+Test **both** topic-level and command-level help:
+
+**Topic-level** (`ably channels --help`):
+- Lists all subcommands with descriptions
+- Subcommand names match actual available commands
+- No missing or extra subcommands
+
+**Command-level** (`ably channels publish --help`):
+- USAGE section is present and accurate
+- All documented flags appear and descriptions match behavior
+- Required arguments are clearly marked
+- Examples are valid and runnable
+- Flag aliases listed (e.g., `-D` for `--duration`)
+
+### B. Argument and Flag Validation
+- Missing required arguments produce clear errors with **non-zero exit code**
+- Unknown flags are rejected with suggestion ("Did you mean...?")
+- Flag type constraints enforced (e.g., `--limit` with `min: 1`)
+- Flag aliases work (short `-D` for `--duration`, `-v` for `--verbose`)
+- Default values applied when flags omitted
+- Mutually exclusive flags handled (e.g., `--json` and `--pretty-json` together)
+
+### C. Subscribe-First Workflow (for groups with subscribe/publish)
+
+Follow this exact sequence:
+
+1. **Start subscriber** — Run subscribe in background with `--duration`. Wait for output to contain the listening/ready signal (e.g., "Listening" or "Subscribed") before proceeding — do NOT use `sleep`.
+2. **Publish data** — In a separate process, publish to the subscribed resource.
+3. **Validate receipt** — Confirm subscriber receives the published data with correct format.
+4. **Multiple messages** — Publish several messages. Verify all received in order.
+5. **Stop subscriber** — Let duration expire or terminate.
+6. **Query history** — Run `history` to verify published messages appear.
+7. **One-shot queries** — Test `get`, `list`, and other read commands.
+
+### D. Output Format Testing
+
+Every command must be tested in **two modes**:
+
+| Mode | Flag | Format | Use Case |
+|------|------|--------|----------|
+| Human-readable | (none) | Styled text with labels, colors | Interactive use |
+| JSON | `--pretty-json` | Indented, colorized JSON | Scripting, debugging |
+
+> **Why `--pretty-json` only?** The `--json` and `--pretty-json` flags produce identical data — only whitespace/indentation differs. Testing with `--pretty-json` validates all JSON behavior (envelope structure, field parity, stdout cleanliness) while also being easier to read in reports. There is no need to test `--json` separately.
+
+Validate per mode:
+- **Human-readable**: Correct formatting, labels, progress messages. No raw JSON leaking. Progress/listening messages present.
+- **JSON (`--pretty-json`)**: Valid JSON (verify with `jq`). No human-readable text mixed in (no progress messages, no "Listening..." text). Correct envelope structure. For streaming commands, each event is a valid JSON object.
+- **Cross-mode parity**: JSON mode exposes all fields from human-readable (JSON may have more, human-readable must not have fields missing from JSON). Null/undefined fields omitted in both modes.
+- **stdout/stderr separation**: In human-readable mode, both data and decoration (progress, success, listening messages) go to stdout. Warnings go to stderr. In `--pretty-json` mode, stdout must contain ONLY valid JSON — decoration is suppressed (not redirected), so piping to `jq` must succeed. Errors go to stderr in all modes.
+
+### E. Error Path Testing
+- Invalid arguments → clear error, non-zero exit code
+- Nonexistent channels/rooms → appropriate error or empty result
+- Missing required flags → error with guidance
+- Error output in JSON mode → JSON error envelope (`type: "error"`, `success: false`)
+- No stack traces leaked in any mode
+- Exit codes: 0 for success, non-zero for errors (document which exit codes appear)
+
+### F. Edge Cases
+- Empty channel/room names (should error)
+- Special characters in names (`#`, `/`, `%`, spaces)
+- Unicode in message data
+- Very long messages (>64KB)
+- Empty message body
+- Pagination boundaries (for history/list with `--limit`)
+- Multi-channel subscribe (channels subscribe accepts multiple channel names)
+
+### G. Flag-Specific Testing
+
+Test these flags where applicable (see `references/command-inventory.md` for which commands support each):
+
+| Flag | Test | Verify |
+|------|------|--------|
+| `--duration N` | Set to 3-5s | Command auto-exits after N seconds, clean exit code 0 |
+| `--rewind N` | Subscribe with rewind after publishing | Receives N historical messages on attach |
+| `--client-id` | Set custom ID | ID appears in presence/message output |
+| `--limit N` | Set on history/list | Returns exactly N items (or fewer if less exist) |
+| `--direction` | `forwards` vs `backwards` on history | Order changes |
+| `--start` / `--end` | Time range on history | Only items in range returned |
+| `--app` | Specify app ID | Overrides default app |
+| `--verbose` | Add to any command | Additional debug output on stderr |
+| `--force` | On commands that have it (push, apps delete) | Skips confirmation prompt |
+
+---
+
+## Step 3: Execute Tests
+
+### Execution Method
+
+Use the Bash tool to run each command. Capture both stdout and stderr separately to verify stream separation.
+
+**All temporary files** (stdout captures, stderr captures, subscriber output, etc.) must be written under `CLAUDE-BEHAVIOR-TESTING/<command-group>/temp/`. Create this directory at the start of testing a command group. Clean up the `temp/` directory after all reports are generated.
+
+```bash
+# Create temp directory at the start of testing a command group
+TEMP_DIR="CLAUDE-BEHAVIOR-TESTING/<command-group>/temp"
+mkdir -p "$TEMP_DIR"
+```
+
+**Pattern for stdout/stderr separation:**
+```bash
+# Capture stdout and stderr separately
+pnpm cli channels list --pretty-json >"$TEMP_DIR/stdout.txt" 2>"$TEMP_DIR/stderr.txt"
+echo "Exit code: $?"
+echo "=== STDOUT ===" && cat "$TEMP_DIR/stdout.txt"
+echo "=== STDERR ===" && cat "$TEMP_DIR/stderr.txt"
+# Verify JSON validity
+jq . "$TEMP_DIR/stdout.txt" >/dev/null 2>&1 && echo "Valid JSON" || echo "INVALID JSON"
+```
+
+**Pattern for subscribe + publish workflow:**
+```bash
+# Start subscriber with file-based output capture
+pnpm cli channels subscribe test-channel --duration 15 >"$TEMP_DIR/sub_stdout.txt" 2>"$TEMP_DIR/sub_stderr.txt" &
+SUBSCRIBER_PID=$!
+
+# Wait for ready signal (NOT sleep) — poll for output
+for i in $(seq 1 30); do
+  if grep -q "Listening\|Subscribed\|Attaching" "$TEMP_DIR/sub_stderr.txt" 2>/dev/null; then
+    break
+  fi
+  sleep 0.5
+done
+
+# Publish messages
+pnpm cli channels publish test-channel "Hello World 1"
+pnpm cli channels publish test-channel "Hello World 2"
+pnpm cli channels publish test-channel "Hello World 3"
+
+# Wait for subscriber to finish
+wait $SUBSCRIBER_PID
+echo "Exit code: $?"
+echo "=== Subscriber stdout ===" && cat "$TEMP_DIR/sub_stdout.txt"
+echo "=== Subscriber stderr ===" && cat "$TEMP_DIR/sub_stderr.txt"
+```
+
+**Pattern for JSON subscribe + publish:**
+```bash
+pnpm cli channels subscribe test-channel --duration 15 --pretty-json >"$TEMP_DIR/sub_json.txt" 2>"$TEMP_DIR/sub_err.txt" &
+SUBSCRIBER_PID=$!
+
+for i in $(seq 1 30); do
+  if [ -s "$TEMP_DIR/sub_json.txt" ] || grep -q "Attaching" "$TEMP_DIR/sub_err.txt" 2>/dev/null; then
+    break
+  fi
+  sleep 0.5
+done
+
+pnpm cli channels publish test-channel "Hello World 1"
+wait $SUBSCRIBER_PID
+
+# Validate JSON validity
+jq . "$TEMP_DIR/sub_json.txt" >/dev/null 2>&1 && echo "Valid JSON" || echo "INVALID JSON"
+```
+
+### Naming Convention for Test Resources
+
+Use unique, descriptive names to avoid collisions:
+- Channels: `behavior-test-<command>-<timestamp>`
+- Rooms: `behavior-test-room-<command>-<timestamp>`
+- Spaces: `behavior-test-space-<command>-<timestamp>`
+
+### Retry on Network Errors
+
+If a command fails due to network issues, retry once before recording the failure.
+
+### Control API Commands
+
+Control API commands (apps, auth, queues, integrations) behave differently:
+- They make HTTP requests, not SDK calls — no subscribe/publish workflows
+- They use access tokens, not API keys
+- Test CRUD patterns: create → list → get → update → delete
+- Verify `--app` flag overrides default app
+- Test `--pretty-json` output has correct envelope structure
+- Some require `--force` for destructive operations (delete)
+
+---
+
+## Step 4: Generate Reports
+
+All reports are written under `CLAUDE-BEHAVIOR-TESTING/<command-group>/`. For example, testing `channels` produces:
+
+```
+CLAUDE-BEHAVIOR-TESTING/
+└── channels/
+    ├── REPORT_PRIMARY.md
+    ├── REPORT_NON_JSON.md
+    ├── REPORT_JSON.md
+    └── temp/          ← intermediate files (cleaned up after reports are generated)
+```
+
+Generate **three** report files:
+
+1. **`REPORT_PRIMARY.md`** — The **primary report**. Contains the consolidated summary of ALL findings (critical, major, minor, and low severity) discovered across both output modes. A reader of this file alone should have the complete picture of every issue found during testing — they should never need to check the per-mode reports to discover additional issues.
+2. **`REPORT_NON_JSON.md`** — All commands tested without JSON flags (default human-readable output). Only includes issues specific to human-readable output.
+3. **`REPORT_JSON.md`** — All commands tested with `--pretty-json`. Only includes issues specific to JSON output.
+
+After all three reports are generated, **clean up the temp directory**:
+```bash
+rm -rf "CLAUDE-BEHAVIOR-TESTING/<command-group>/temp"
+```
+
+### Primary Report Structure (`REPORT_PRIMARY.md`)
+
+```markdown
+# Behavior Test Report — [Command Group]
+
+**Date:** YYYY-MM-DD
+**CLI Version:** X.Y.Z
+**Tester:** Claude (automated)
+
+## Overall Summary
+
+| Output Mode | Total Tests | Passed | Failed | Skipped |
+|-------------|-------------|--------|--------|---------|
+| Human-readable | N | N | N | N |
+| --pretty-json | N | N | N | N |
+| **Total** | **N** | **N** | **N** | **N** |
+
+## All Issues Found
+
+[Every issue found across both output modes, consolidated here. Each entry includes:]
+[- Severity (critical/major/minor/low)]
+[- Affected command(s)]
+[- Description]
+[- Which output mode(s) are affected]
+[- Steps to reproduce]
+[- Expected vs actual behavior]
+
+## Cross-Mode Analysis
+
+[Comparison findings: field parity, JSON envelope correctness, stdout cleanliness]
+
+## Per-Mode Report Links
+
+- [Human-Readable](REPORT_NON_JSON.md)
+- [JSON](REPORT_JSON.md)
+```
+
+### Per-Mode Report Structure (`REPORT_*.md`)
+
+Use the templates in `references/report-template.md` for each command entry. Each per-mode report follows this structure:
+
+```markdown
+# Behavior Test Report — [Command Group] ([Output Mode])
+
+**Date:** YYYY-MM-DD
+**CLI Version:** X.Y.Z
+**Tester:** Claude (automated)
+**Output Mode:** Human-readable / --pretty-json
+
+## Summary
+
+| Category | Total | Passed | Failed | Skipped |
+|----------|-------|--------|--------|---------|
+| Help Validation | N | N | N | N |
+| Argument Validation | N | N | N | N |
+| Functionality | N | N | N | N |
+| Output Format | N | N | N | N |
+| Error Handling | N | N | N | N |
+| JSON Cleanliness | N | N | N | N |
+
+## Issues Found
+
+[Issues specific to THIS output mode only. For the full consolidated list of all issues across all modes, see REPORT_PRIMARY.md.]
+
+## Commands Tested
+
+[Individual command reports follow, separated by ---]
+```
+
+---
+
+## Step 5: Analyze and Cross-Reference
+
+After generating both per-mode reports:
+
+1. **Compare across output modes** — For each command, compare both modes:
+   - Are the same data fields present across modes?
+   - Does `--pretty-json` produce valid JSON?
+   - Is stdout clean in JSON mode (no progress messages leaking)?
+   - Does JSON output use correct envelope structure (`type`, `command`, `success`, `<domainKey>`)?
+
+2. **Cross-command validation** — Verify workflows span commands:
+   - Data published via `publish` appears in `subscribe` output
+   - Data published via `publish` appears in `history` output
+   - Presence entered via `enter` appears in `get` output
+   - Messages sent appear with correct metadata
+   - For Control API: resources created via `create` appear in `list`
+
+3. **Consistency checks** (per `references/testing-dimensions.md`):
+   - Same flag behaves identically across commands (`--limit`, `--duration`, `--pretty-json`)
+   - Same field formatting across commands (timestamps, IDs, labels)
+   - Same error patterns across commands (stderr, exit codes, JSON error envelope)
+
+4. **Document issues** — For any failures or inconsistencies:
+   - Exact steps to reproduce
+   - Expected vs actual behavior
+   - Severity assessment (critical/major/minor/low)
+   - Which output modes affected
+
+5. **Generate `REPORT_PRIMARY.md`** — After completing the two per-mode reports, create the primary report that consolidates ALL findings (critical, major, minor, and low severity) discovered across both output modes. Each issue entry must specify which output mode(s) it affects. A reader of `REPORT_PRIMARY.md` alone should have the complete picture of every issue found during testing — they should never need to check the per-mode reports to discover additional issues. The per-mode reports (`REPORT_*.md`) should only contain issues specific to that mode and reference `REPORT_PRIMARY.md` for the full list.
+
+6. **Clean up temp directory** — Remove `CLAUDE-BEHAVIOR-TESTING/<command-group>/temp/` after all reports are finalized.
+
+---
+
+## Behavior Testing Dimensions
+
+Beyond basic functional testing, cover these behavior testing dimensions per `references/testing-dimensions.md`:
+
+### Code Path Coverage
+- Exercise all conditional branches in flag handling
+- Test both REST and Realtime transport paths where applicable
+- Test single-item vs batch operations
+- Test with and without optional flags (encryption, rewind, client-id)
+
+### State Machine Validation (Long-Running Commands)
+- INIT -> CONNECTING -> CONNECTED -> SUBSCRIBED -> RECEIVING -> CLEANUP -> EXIT
+- Verify progress messages at each transition (on stderr, not stdout)
+- Test clean shutdown via `--duration`
+
+### Output Contract Verification
+- JSON envelope: `type`, `command`, `success` fields present
+- Domain nesting: data under singular key (events) or plural key (collections)
+- Metadata: `total`, `hasMore`, `timestamp` at correct level
+- No raw data fields at envelope level
+- Streaming: each event independently parseable as JSON (test via `--pretty-json`)
+
+### Output Cleanliness in JSON Mode
+- In human-readable mode: both data and decoration (progress, success, listening) go to stdout. Warnings go to stderr.
+- In `--pretty-json` mode: decoration is **suppressed** (via `shouldOutputJson` guard), so stdout contains ONLY JSON. Verify: `stdout | jq .` must succeed.
+- Errors go to stderr in all modes. Warnings go to stderr in all modes.
+
+### Pagination Testing (history, list commands)
+- Default limit behavior
+- Custom `--limit` values
+- `--direction backwards` vs `forwards`
+- `--start` and `--end` time range filters
+- `hasMore` indicator accuracy in JSON output
+- Pagination hint in JSON when `hasMore` is true
+
+### Exit Code Verification
+- 0 for successful operations
+- Non-zero for errors (missing args, auth failures, not found)
+- 0 for clean exit after `--duration` expires
+- Consistent across all commands
+
+---
+
+## Parallel Execution Strategy
+
+For efficiency, spawn parallel agents by command group:
+
+1. **Agent per API type** — One agent for Product API groups (channels, rooms, spaces), one for Control API groups (apps, auth, queues, integrations)
+2. **Within each agent** — Test subcommands sequentially where workflows depend on each other (subscribe needs publish first, CRUD needs create first)
+3. **Report generation** — Each agent produces its section, combine into final reports
+
+---
+
+## Quality Gates
+
+A command **passes** if:
+- `--help` output is accurate and complete (topic-level and command-level)
+- All documented flags work as described
+- Output format is correct in both modes (human-readable, `--pretty-json`)
+- In `--pretty-json` mode, stdout contains only valid JSON (no progress/decoration leaking)
+- Exit codes are correct (0 for success, non-zero for errors)
+- Error messages are clear and actionable (no stack traces)
+- Subscribe/publish workflow delivers messages correctly
+- History/get/list queries return expected data
+- JSON output (via `--pretty-json`) is valid and parseable (`jq`-friendly)
+- Streaming output (via `--pretty-json`) has one valid JSON object per event
+
+A command **fails** if:
+- Output is missing fields compared to documentation
+- JSON output has incorrect envelope structure
+- Human-readable text (progress, listening) leaks into stdout in `--pretty-json` mode
+- Error messages are unclear or missing
+- Commands crash, hang, or exit with wrong code
+- Data published is not received by subscriber
+- Pagination produces incorrect results
+- Exit code doesn't match success/failure state
diff --git a/.claude/skills/ably-behavior-testing/references/command-inventory.md b/.claude/skills/ably-behavior-testing/references/command-inventory.md
new file mode 100644
index 00000000..9d6b12b5
--- /dev/null
+++ b/.claude/skills/ably-behavior-testing/references/command-inventory.md
@@ -0,0 +1,381 @@
+# Command Inventory
+
+Complete list of commands to test, organized by API type. Update this when commands are added or removed.
+
+---
+
+## Product API Commands (Ably SDK)
+
+These commands use the Ably SDK (REST or Realtime) and authenticate via API key.
+
+### Channels (`ably channels`)
+
+#### Top-Level Commands
+| Command | Type | Description |
+|---------|------|-------------|
+| `channels subscribe <channel...>` | Subscribe (long-running) | Subscribe to messages on one or more channels |
+| `channels publish <channel> <message>` | Publish (one-shot/batch) | Publish a message to a channel |
+| `channels history <channel>` | History (paginated) | Retrieve message history for a channel |
+| `channels list` | List (paginated) | List active channels |
+| `channels inspect <channel>` | Get (one-shot) | Open dashboard to inspect a channel |
+| `channels batch-publish` | Publish (batch) | Publish messages to multiple channels |
+| `channels append <channel>` | REST mutation | Append data to a message |
+| `channels delete <channel>` | REST mutation | Delete a message |
+| `channels update <channel>` | REST mutation | Update a message |
+
+#### Presence Subcommands (`ably channels presence`)
+| Command | Type | Description |
+|---------|------|-------------|
+| `channels presence enter <channel>` | Enter/Hold (long-running) | Enter presence on a channel |
+| `channels presence get <channel>` | Get (one-shot) | Get current presence members |
+| `channels presence subscribe <channel>` | Subscribe (long-running) | Subscribe to presence changes |
+
+#### Occupancy Subcommands (`ably channels occupancy`)
+| Command | Type | Description |
+|---------|------|-------------|
+| `channels occupancy get <channel>` | Get (one-shot) | Get occupancy metrics |
+| `channels occupancy subscribe <channel>` | Subscribe (long-running) | Subscribe to occupancy changes |
+
+#### Annotations Subcommands (`ably channels annotations`)
+| Command | Type | Description |
+|---------|------|-------------|
+| `channels annotations publish <channel>` | Publish | Publish annotation on a message |
+| `channels annotations subscribe <channel>` | Subscribe (long-running) | Subscribe to annotation events |
+| `channels annotations get <channel>` | Get (one-shot) | Get annotations for a message |
+| `channels annotations delete <channel>` | REST mutation | Delete an annotation |
+
+---
+
+### Rooms (`ably rooms`)
+
+#### Top-Level Commands
+| Command | Type | Description |
+|---------|------|-------------|
+| `rooms list` | List (paginated) | List active chat rooms |
+
+#### Messages Subcommands (`ably rooms messages`)
+| Command | Type | Description |
+|---------|------|-------------|
+| `rooms messages send <room> <message>` | Send (one-shot) | Send a message to a room |
+| `rooms messages subscribe <room...>` | Subscribe (long-running) | Subscribe to messages in a room |
+| `rooms messages history <room>` | History (paginated) | Get message history for a room |
+| `rooms messages delete <room>` | REST mutation | Delete a message |
+| `rooms messages update <room>` | REST mutation | Update a message |
+
+#### Message Reactions Subcommands (`ably rooms messages reactions`)
+| Command | Type | Description |
+|---------|------|-------------|
+| `rooms messages reactions send <room>` | Send | Send a reaction to a message |
+| `rooms messages reactions remove <room>` | REST mutation | Remove a reaction |
+| `rooms messages reactions subscribe <room>` | Subscribe (long-running) | Subscribe to reaction changes |
+
+#### Presence Subcommands (`ably rooms presence`)
+| Command | Type | Description |
+|---------|------|-------------|
+| `rooms presence enter <room>` | Enter/Hold (long-running) | Enter presence in a room |
+| `rooms presence get <room>` | Get (one-shot) | Get presence members |
+| `rooms presence subscribe <room>` | Subscribe (long-running) | Subscribe to presence changes |
+
+#### Occupancy Subcommands (`ably rooms occupancy`)
+| Command | Type | Description |
+|---------|------|-------------|
+| `rooms occupancy get <room>` | Get (one-shot) | Get room occupancy metrics |
+| `rooms occupancy subscribe <room>` | Subscribe (long-running) | Subscribe to occupancy changes |
+
+#### Reactions Subcommands (`ably rooms reactions`)
+| Command | Type | Description |
+|---------|------|-------------|
+| `rooms reactions send <room>` | Send | Send a room-level reaction |
+| `rooms reactions subscribe <room>` | Subscribe (long-running) | Subscribe to room reactions |
+
+#### Typing Subcommands (`ably rooms typing`)
+| Command | Type | Description |
+|---------|------|-------------|
+| `rooms typing keystroke <room>` | Send | Send a typing indicator |
+| `rooms typing subscribe <room>` | Subscribe (long-running) | Subscribe to typing indicators |
+
+---
+
+### Spaces (`ably spaces`)
+
+#### Top-Level Commands
+| Command | Type | Description |
+|---------|------|-------------|
+| `spaces create <space>` | Create (one-shot) | Create a space |
+| `spaces get <space>` | Get (one-shot) | Get space details |
+| `spaces list` | List (paginated) | List spaces |
+| `spaces subscribe <space>` | Subscribe (long-running) | Subscribe to space events |
+
+#### Members Subcommands (`ably spaces members`)
+| Command | Type | Description |
+|---------|------|-------------|
+| `spaces members enter <space>` | Enter/Hold (long-running) | Enter a space as a member |
+| `spaces members get <space>` | Get (one-shot) | Get current space members |
+| `spaces members subscribe <space>` | Subscribe (long-running) | Subscribe to member changes |
+
+#### Locations Subcommands (`ably spaces locations`)
+| Command | Type | Description |
+|---------|------|-------------|
+| `spaces locations set <space>` | Set/Hold (long-running) | Set location in a space |
+| `spaces locations get <space>` | Get (one-shot) | Get member locations |
+| `spaces locations subscribe <space>` | Subscribe (long-running) | Subscribe to location changes |
+
+#### Locks Subcommands (`ably spaces locks`)
+| Command | Type | Description |
+|---------|------|-------------|
+| `spaces locks acquire <space> <lock>` | Acquire/Hold (long-running) | Acquire a lock in a space |
+| `spaces locks get <space>` | Get (one-shot) | Get locks in a space |
+| `spaces locks subscribe <space>` | Subscribe (long-running) | Subscribe to lock changes |
+
+#### Cursors Subcommands (`ably spaces cursors`)
+| Command | Type | Description |
+|---------|------|-------------|
+| `spaces cursors set <space>` | Set/Hold (long-running) | Set cursor position in a space |
+| `spaces cursors get <space>` | Get (one-shot) | Get cursor positions |
+| `spaces cursors subscribe <space>` | Subscribe (long-running) | Subscribe to cursor changes |
+
+#### Occupancy Subcommands (`ably spaces occupancy`)
+| Command | Type | Description |
+|---------|------|-------------|
+| `spaces occupancy get <space>` | Get (one-shot) | Get space occupancy metrics |
+| `spaces occupancy subscribe <space>` | Subscribe (long-running) | Subscribe to occupancy changes |
+
+---
+
+### Logs (`ably logs`)
+
+#### Top-Level Commands
+| Command | Type | Description |
+|---------|------|-------------|
+| `logs subscribe` | Subscribe (long-running) | Subscribe to all log events |
+| `logs history` | History (paginated) | Get log history |
+
+#### Channel Lifecycle (`ably logs channel-lifecycle`)
+| Command | Type | Description |
+|---------|------|-------------|
+| `logs channel-lifecycle subscribe` | Subscribe (long-running) | Subscribe to channel lifecycle events |
+
+#### Connection Lifecycle (`ably logs connection-lifecycle`)
+| Command | Type | Description |
+|---------|------|-------------|
+| `logs connection-lifecycle subscribe` | Subscribe (long-running) | Subscribe to connection lifecycle events |
+| `logs connection-lifecycle history` | History (paginated) | Get connection lifecycle history |
+
+#### Push Logs (`ably logs push`)
+| Command | Type | Description |
+|---------|------|-------------|
+| `logs push subscribe` | Subscribe (long-running) | Subscribe to push notification logs |
+| `logs push history` | History (paginated) | Get push notification log history |
+
+---
+
+### Connections (`ably connections`)
+| Command | Type | Description |
+|---------|------|-------------|
+| `connections test` | One-shot | Test connection to Ably |
+
+---
+
+### Bench (`ably bench`)
+| Command | Type | Description |
+|---------|------|-------------|
+| `bench publisher` | Long-running | Benchmark publish throughput |
+| `bench subscriber` | Long-running | Benchmark subscribe throughput |
+
+---
+
+## Control API Commands (HTTP)
+
+These commands use the Ably Control API via HTTP and authenticate via access token.
+
+### Accounts (`ably accounts`)
+| Command | Type | Description |
+|---------|------|-------------|
+| `accounts login` | Auth flow | Log in to an Ably account |
+| `accounts logout` | Auth flow | Log out of current account |
+| `accounts current` | Get (one-shot) | Show current account info |
+| `accounts list` | List | List configured accounts |
+| `accounts switch` | Config | Switch between accounts |
+
+### Apps (`ably apps`)
+| Command | Type | Description |
+|---------|------|-------------|
+| `apps create` | Create | Create a new app |
+| `apps current` | Get (one-shot) | Show current app |
+| `apps list` | List | List apps in account |
+| `apps update` | Update | Update app settings |
+| `apps delete` | Delete (destructive) | Delete an app |
+| `apps switch` | Config | Switch between apps |
+
+#### Channel Rules (`ably apps channel-rules`)
+| Command | Type | Description |
+|---------|------|-------------|
+| `apps channel-rules create` | Create | Create a channel rule |
+| `apps channel-rules list` | List | List channel rules |
+| `apps channel-rules update` | Update | Update a channel rule |
+| `apps channel-rules delete` | Delete (destructive) | Delete a channel rule |
+
+#### Integration Rules (`ably apps rules`)
+| Command | Type | Description |
+|---------|------|-------------|
+| `apps rules create` | Create | Create an integration rule |
+| `apps rules list` | List | List integration rules |
+| `apps rules update` | Update | Update an integration rule |
+| `apps rules delete` | Delete (destructive) | Delete an integration rule |
+
+### Auth (`ably auth`)
+| Command | Type | Description |
+|---------|------|-------------|
+| `auth issue-ably-token` | Create | Issue an Ably token |
+| `auth issue-jwt-token` | Create | Issue a JWT token |
+| `auth revoke-token` | Delete | Revoke a token |
+
+#### Keys (`ably auth keys`)
+| Command | Type | Description |
+|---------|------|-------------|
+| `auth keys create` | Create | Create an API key |
+| `auth keys current` | Get (one-shot) | Show current key |
+| `auth keys get <key-id>` | Get (one-shot) | Get key details |
+| `auth keys list` | List | List API keys |
+| `auth keys revoke <key-id>` | Delete (destructive) | Revoke a key |
+| `auth keys switch` | Config | Switch between keys |
+| `auth keys update <key-id>` | Update | Update key capabilities |
+
+### Queues (`ably queues`)
+| Command | Type | Description |
+|---------|------|-------------|
+| `queues create` | Create | Create a queue |
+| `queues list` | List | List queues |
+| `queues delete` | Delete (destructive) | Delete a queue |
+
+### Integrations (`ably integrations`)
+| Command | Type | Description |
+|---------|------|-------------|
+| `integrations create` | Create | Create an integration |
+| `integrations get <id>` | Get (one-shot) | Get integration details |
+| `integrations list` | List | List integrations |
+| `integrations update <id>` | Update | Update an integration |
+| `integrations delete <id>` | Delete (destructive) | Delete an integration |
+
+### Channel Rules (`ably channel-rule`) — alias
+| Command | Type | Description |
+|---------|------|-------------|
+| `channel-rule create` | Create | Create a channel rule (alias) |
+| `channel-rule list` | List | List channel rules (alias) |
+| `channel-rule update` | Update | Update a channel rule (alias) |
+| `channel-rule delete` | Delete (destructive) | Delete a channel rule (alias) |
+
+### Push (`ably push`)
+| Command | Type | Description |
+|---------|------|-------------|
+| `push publish` | Publish | Send a push notification |
+| `push batch-publish` | Publish (batch) | Send push notifications in batch |
+
+#### Push Channels (`ably push channels`)
+| Command | Type | Description |
+|---------|------|-------------|
+| `push channels list` | List | List push channels |
+| `push channels list-channels` | List | List push channels (alternate) |
+| `push channels save` | Create/Update | Save a push channel subscription |
+| `push channels remove` | Delete | Remove a push channel subscription |
+| `push channels remove-where` | Delete (bulk) | Remove push channels matching criteria |
+
+#### Push Devices (`ably push devices`)
+| Command | Type | Description |
+|---------|------|-------------|
+| `push devices get <id>` | Get (one-shot) | Get device details |
+| `push devices list` | List | List registered devices |
+| `push devices save` | Create/Update | Register/update a device |
+| `push devices remove` | Delete | Remove a device |
+| `push devices remove-where` | Delete (bulk) | Remove devices matching criteria |
+
+#### Push Config (`ably push config`)
+| Command | Type | Description |
+|---------|------|-------------|
+| `push config show` | Get (one-shot) | Show push config |
+| `push config set-apns` | Update | Configure APNS |
+| `push config set-fcm` | Update | Configure FCM |
+| `push config clear-apns` | Delete | Clear APNS config |
+| `push config clear-fcm` | Delete | Clear FCM config |
+
+### Stats (`ably stats`)
+| Command | Type | Description |
+|---------|------|-------------|
+| `stats app` | Get (one-shot) | Get app-level stats |
+| `stats account` | Get (one-shot) | Get account-level stats |
+
+---
+
+## Local / Utility Commands
+
+These commands don't call external APIs or have mixed behavior.
+
+| Command | Type | Description |
+|---------|------|-------------|
+| `login` | Auth flow | Alias for `accounts login` |
+| `version` | Local | Display CLI version |
+| `status` | Local | Show current config status |
+| `config path` | Local | Show config file path |
+| `config show` | Local | Show current config |
+| `support ask` | External | Ask Ably support a question |
+| `support contact` | External | Contact Ably support |
+| `interactive` | Interactive | Launch interactive mode |
+| `help` | Local | Show help |
+| `test wait` | Test utility | Wait for a duration (internal) |
+
+---
+
+## Testing Priority Order
+
+### Phase 1: Core Workflows (subscribe + publish)
+1. `channels subscribe` + `channels publish`
+2. `rooms messages subscribe` + `rooms messages send`
+
+### Phase 2: History and Query
+3. `channels history`
+4. `rooms messages history`
+5. `channels list`
+6. `rooms list`
+
+### Phase 3: Presence
+7. `channels presence enter` + `channels presence get` + `channels presence subscribe`
+8. `rooms presence enter` + `rooms presence get` + `rooms presence subscribe`
+
+### Phase 4: Spaces
+9. `spaces members enter` + `spaces members get` + `spaces members subscribe`
+10. `spaces locations set` + `spaces locations get` + `spaces locations subscribe`
+11. `spaces locks acquire` + `spaces locks get` + `spaces locks subscribe`
+12. `spaces cursors set` + `spaces cursors get` + `spaces cursors subscribe`
+
+### Phase 5: Occupancy
+13. `channels occupancy get` + `channels occupancy subscribe`
+14. `rooms occupancy get` + `rooms occupancy subscribe`
+15. `spaces occupancy get` + `spaces occupancy subscribe`
+
+### Phase 6: Mutations and Specialized
+16. `channels append` / `channels update` / `channels delete`
+17. `rooms messages update` / `rooms messages delete`
+18. `channels batch-publish`
+19. `channels annotations` (all subcommands)
+
+### Phase 7: Rooms Extras
+20. `rooms reactions send` + `rooms reactions subscribe`
+21. `rooms messages reactions send` + `rooms messages reactions remove` + `rooms messages reactions subscribe`
+22. `rooms typing keystroke` + `rooms typing subscribe`
+
+### Phase 8: Logs
+23. `logs subscribe` + `logs history`
+24. `logs channel-lifecycle subscribe`
+25. `logs connection-lifecycle subscribe` + `logs connection-lifecycle history`
+26. `logs push subscribe` + `logs push history`
+
+### Phase 9: Control API (CRUD)
+27. `apps create` + `apps list` + `apps update` + `apps delete`
+28. `auth keys create` + `auth keys list` + `auth keys get` + `auth keys update` + `auth keys revoke`
+29. `queues create` + `queues list` + `queues delete`
+30. `integrations create` + `integrations list` + `integrations get` + `integrations update` + `integrations delete`
+
+### Phase 10: Stats and Utilities
+31. `stats app` + `stats account`
+32. `connections test`
+33. `version` + `status` + `config show` + `config path`
diff --git a/.claude/skills/ably-behavior-testing/references/report-template.md b/.claude/skills/ably-behavior-testing/references/report-template.md
new file mode 100644
index 00000000..b47f18cf
--- /dev/null
+++ b/.claude/skills/ably-behavior-testing/references/report-template.md
@@ -0,0 +1,314 @@
+# Report Template
+
+Use these templates for each command entry in the test reports.
+
+---
+
+## Template: Individual Command Report
+
+```markdown
+---
+
+### `ably <group> <subcommand>`
+
+**Full Command:**
+\`\`\`bash
+pnpm cli <group> <subcommand> [args] [flags]
+\`\`\`
+
+**Description:**
+Brief explanation of what the command does, based on `--help` and Ably documentation.
+
+**Flags Tested:**
+| Flag | Value | Purpose |
+|------|-------|---------|
+| `--flag-name` | `value` | What it does |
+
+**Output (stdout):**
+\`\`\`
+[Paste actual stdout here]
+\`\`\`
+
+**Stderr:**
+\`\`\`
+[Paste actual stderr here — progress messages, warnings, etc.]
+\`\`\`
+
+**Exit Code:** N
+
+**Review & Analysis:**
+
+| Check | Result | Notes |
+|-------|--------|-------|
+| Command executes successfully | PASS/FAIL | |
+| Exit code correct | PASS/FAIL | Expected: 0, Got: N |
+| Output on correct stream (stdout) | PASS/FAIL | |
+| Progress/errors on stderr | PASS/FAIL | |
+| Output format correct for mode | PASS/FAIL | |
+| All expected fields present | PASS/FAIL | List missing fields |
+| Matches documented behavior | PASS/FAIL | |
+| Consistent with other output modes | PASS/FAIL | Note differences |
+| Error handling appropriate | PASS/FAIL | |
+
+**Issues Found:**
+- [ ] Issue description (Severity: Critical/Major/Minor)
+  - **Expected:** What should happen
+  - **Actual:** What happened
+  - **Affected Modes:** human-readable / --pretty-json / all
+  - **Steps to Reproduce:**
+    1. Step 1
+    2. Step 2
+
+---
+```
+
+## Template: Help Validation
+
+```markdown
+---
+
+### `ably <group> <subcommand> --help`
+
+**Full Command:**
+\`\`\`bash
+pnpm cli <group> <subcommand> --help
+\`\`\`
+
+**Output:**
+\`\`\`
+[Paste actual --help output here]
+\`\`\`
+
+**Help Validation:**
+
+| Check | Result | Notes |
+|-------|--------|-------|
+| USAGE section present | PASS/FAIL | |
+| All flags documented | PASS/FAIL | List missing flags |
+| Flag descriptions accurate | PASS/FAIL | |
+| Flag aliases listed | PASS/FAIL | e.g., -D for --duration |
+| Required args marked | PASS/FAIL | |
+| Examples valid | PASS/FAIL | |
+| Description matches behavior | PASS/FAIL | |
+
+---
+```
+
+## Template: Topic-Level Help
+
+```markdown
+---
+
+### `ably <group> --help`
+
+**Full Command:**
+\`\`\`bash
+pnpm cli <group> --help
+\`\`\`
+
+**Output:**
+\`\`\`
+[Paste actual topic help output here]
+\`\`\`
+
+**Topic Help Validation:**
+
+| Check | Result | Notes |
+|-------|--------|-------|
+| Lists all subcommands | PASS/FAIL | Missing: list any missing |
+| No extra/removed subcommands | PASS/FAIL | Extra: list any extra |
+| Descriptions match actual behavior | PASS/FAIL | |
+| Topic description is clear | PASS/FAIL | |
+
+---
+```
+
+## Template: Subscribe-Publish Workflow
+
+```markdown
+---
+
+### Workflow: Subscribe + Publish — `<group>`
+
+**Subscribe Command:**
+\`\`\`bash
+pnpm cli <group> subscribe <resource> --duration 15
+\`\`\`
+
+**Publish Commands:**
+\`\`\`bash
+pnpm cli <group> publish <resource> "Message 1"
+pnpm cli <group> publish <resource> "Message 2"
+pnpm cli <group> publish <resource> "Message 3"
+\`\`\`
+
+**Subscriber stdout:**
+\`\`\`
+[Paste subscriber stdout showing received messages]
+\`\`\`
+
+**Subscriber stderr:**
+\`\`\`
+[Paste subscriber stderr showing progress/listening messages]
+\`\`\`
+
+**Subscriber Exit Code:** N
+
+**Validation:**
+
+| Check | Result | Notes |
+|-------|--------|-------|
+| Subscriber connects (progress on stderr) | PASS/FAIL | |
+| Listening message on stderr (not stdout) | PASS/FAIL | |
+| No data received before publish | PASS/FAIL | |
+| All published messages received on stdout | PASS/FAIL | Count: N/N |
+| Messages received in order | PASS/FAIL | |
+| Output format correct for mode | PASS/FAIL | |
+| Timestamps present and valid | PASS/FAIL | |
+| Clean exit after duration (code 0) | PASS/FAIL | |
+
+**History Verification:**
+\`\`\`bash
+pnpm cli <group> history <resource> --limit 10
+\`\`\`
+
+**History stdout:**
+\`\`\`
+[Paste history output]
+\`\`\`
+
+**History Exit Code:** N
+
+| Check | Result | Notes |
+|-------|--------|-------|
+| Published messages in history | PASS/FAIL | Count: N/N |
+| Correct chronological order | PASS/FAIL | |
+| Message data intact | PASS/FAIL | |
+
+---
+```
+
+## Template: Error Path
+
+```markdown
+---
+
+### Error: `ably <group> <subcommand>` — [Error Scenario]
+
+**Full Command:**
+\`\`\`bash
+pnpm cli <group> <subcommand> [invalid args/flags]
+\`\`\`
+
+**Expected Behavior:**
+Clear error message indicating [what went wrong] with actionable guidance.
+
+**stdout:**
+\`\`\`
+[Paste stdout — should be empty or contain JSON error envelope in --pretty-json mode]
+\`\`\`
+
+**stderr:**
+\`\`\`
+[Paste stderr — error messages appear here]
+\`\`\`
+
+**Exit Code:** N (expected: non-zero)
+
+**Validation:**
+
+| Check | Result | Notes |
+|-------|--------|-------|
+| Error message clear | PASS/FAIL | |
+| Exits with non-zero code | PASS/FAIL | Exit code: N |
+| Error on stderr (not stdout) | PASS/FAIL | |
+| Actionable guidance provided | PASS/FAIL | |
+| No stack traces leaked | PASS/FAIL | |
+| JSON error envelope (if --pretty-json) | PASS/FAIL | type: "error", success: false |
+
+---
+```
+
+## Template: JSON Validation (for --pretty-json streaming)
+
+```markdown
+---
+
+### JSON Validation: `ably <group> <subcommand> --pretty-json`
+
+**Full Command:**
+\`\`\`bash
+pnpm cli <group> <subcommand> [args] --pretty-json --duration N
+\`\`\`
+
+**Raw stdout:**
+\`\`\`
+[Paste raw stdout — should be valid JSON]
+\`\`\`
+
+**Validation:**
+
+| Check | Result | Notes |
+|-------|--------|-------|
+| Valid JSON output | PASS/FAIL | |
+| No non-JSON text on stdout | PASS/FAIL | |
+| Correct envelope structure | PASS/FAIL | |
+| Consistent domain key | PASS/FAIL | |
+| Parseable by `jq` | PASS/FAIL | `stdout | jq .` succeeds |
+
+---
+```
+
+## Template: Control API CRUD Workflow
+
+```markdown
+---
+
+### CRUD Workflow: `ably <group>`
+
+**Create:**
+\`\`\`bash
+pnpm cli <group> create [args] [flags]
+\`\`\`
+- Exit Code: N
+- stdout: [summary of output]
+
+**List (verify created):**
+\`\`\`bash
+pnpm cli <group> list
+\`\`\`
+- Exit Code: N
+- Created resource appears: YES/NO
+
+**Get (if available):**
+\`\`\`bash
+pnpm cli <group> get <id>
+\`\`\`
+- Exit Code: N
+- Fields match create response: YES/NO
+
+**Update (if available):**
+\`\`\`bash
+pnpm cli <group> update <id> [flags]
+\`\`\`
+- Exit Code: N
+- Change reflected in subsequent get: YES/NO
+
+**Delete:**
+\`\`\`bash
+pnpm cli <group> delete <id> --force
+\`\`\`
+- Exit Code: N
+- Resource gone from list: YES/NO
+
+**Validation:**
+
+| Check | Result | Notes |
+|-------|--------|-------|
+| Full CRUD lifecycle works | PASS/FAIL | |
+| --force required for delete | PASS/FAIL | |
+| JSON output correct at each step | PASS/FAIL | |
+| Error on non-existent resource | PASS/FAIL | |
+
+---
+```
diff --git a/.claude/skills/ably-behavior-testing/references/testing-dimensions.md b/.claude/skills/ably-behavior-testing/references/testing-dimensions.md
new file mode 100644
index 00000000..ad20ae46
--- /dev/null
+++ b/.claude/skills/ably-behavior-testing/references/testing-dimensions.md
@@ -0,0 +1,361 @@
+# Behavior Testing Dimensions
+
+This reference covers the specific testing dimensions that go beyond basic functional testing. These are derived from established CLI behavior testing best practices.
+
+---
+
+## 1. Code Path Coverage
+
+Behavior testing means exercising all observable code paths through their external behavior. For each command, identify and test:
+
+### Conditional Branches in Flag Handling
+- Commands with `--pretty-json`: verify human-readable output is suppressed on stdout
+- Commands with `--verbose`: verify additional debug information appears on stderr
+- Commands with optional flags: test with and without each optional flag
+- Commands with mutually dependent flags: test combinations
+
+### Transport Path Selection
+Some commands (e.g., `channels publish`) auto-select between REST and Realtime based on flags:
+- Single publish: REST transport
+- Publish with `--count > 1`: Realtime transport
+- Subscribe commands: always Realtime
+
+### Encryption Paths
+`--cipher-key` is available on `channels subscribe` and `channels history`:
+- Without `--cipher-key`: standard message format
+- With `--cipher-key`: messages are decrypted using the provided hex-encoded key
+- Test: publish encrypted messages, then verify `subscribe --cipher-key` and `history --cipher-key` both decrypt correctly
+
+### Batch vs Single Operations
+- `channels publish` with single message vs `channels batch-publish`
+- `channels publish` with `--count 1` vs `--count 5`
+- Verify progress indicators for batch operations
+
+---
+
+## 2. State Machine Validation
+
+Long-running commands follow a state machine. Test each state transition:
+
+```
+INIT
+  |
+  v
+CONNECTING ──(error)──> ERROR ──> EXIT
+  |
+  v
+CONNECTED
+  |
+  v
+SUBSCRIBED ──(disconnect)──> RECONNECTING ──> CONNECTED
+  |
+  v
+RECEIVING
+  |
+  v (duration expires or SIGINT)
+CLEANUP
+  |
+  v
+EXIT
+```
+
+### What to Verify at Each State
+
+| State | Expected Output | Stream | Flags Affected |
+|-------|----------------|--------|----------------|
+| CONNECTING | Progress message: "Attaching to channel..." | stderr | `--verbose` shows details |
+| CONNECTED | (implicit, no separate message) | — | |
+| SUBSCRIBED | "Listening for messages." / "Press Ctrl+C to exit." | stderr | Suppressed in `--pretty-json` |
+| RECEIVING | Formatted message output | stdout | `--pretty-json` changes format |
+| CLEANUP | (silent) | — | `--verbose` may show cleanup |
+| EXIT | Clean exit, code 0 | — | |
+
+### Hold Commands (enter, set, acquire)
+- ENTER: Operation completes, state is held
+- HOLD: Status message emitted (especially JSON: `logJsonStatus("holding", ...)`)
+- EXIT: Leave/release/cleanup triggered, state removed
+- Verify: hold status appears in JSON mode (`type: "status"`)
+
+---
+
+## 3. Output Contract Verification
+
+### JSON Envelope Structure
+Every JSON output must follow this contract:
+
+```json
+{
+  "type": "result" | "event" | "status" | "error",
+  "command": "channels.subscribe",
+  "success": true,
+  "<domainKey>": { ... }
+}
+```
+
+### Verification Checklist
+
+**One-shot commands** (publish, get, history, list):
+- [ ] `type` is `"result"`
+- [ ] `command` matches the actual command
+- [ ] `success` is `true` for success, absent for events
+- [ ] Domain data nested under singular key (single item) or plural key (collection)
+- [ ] No raw data fields spread at envelope level
+- [ ] `total` / `hasMore` metadata alongside domain key (not inside it)
+
+**Streaming commands** (subscribe):
+- [ ] Output is valid JSON — verify with `jq . "$TEMP_DIR/stdout.txt"` (with `--pretty-json`, events are multi-line indented JSON, not single-line NDJSON)
+- [ ] `type` is `"event"` for data events
+- [ ] Domain data nested under singular key (e.g., `"message"`)
+- [ ] Timestamps are present and in correct format
+
+**Hold commands** (enter, set, acquire):
+- [ ] Initial result with `type: "result"`
+- [ ] Followed by `type: "status"` with `"holding"` message
+- [ ] Both are valid JSON lines
+
+**Error responses** (with `--pretty-json`):
+- [ ] `type` is `"error"`
+- [ ] `success` is `false`
+- [ ] Error details include code and message
+- [ ] No human-readable text mixed into stdout
+- [ ] stderr may still contain human-readable error
+
+### Domain Key Naming
+
+| Command Type | Key | Example |
+|-------------|-----|---------|
+| Single message event | `message` | `{ "type": "event", "message": { ... } }` |
+| Message history | `messages` | `{ "type": "result", "messages": [ ... ] }` |
+| Presence event | `presence` | `{ "type": "event", "presence": { ... } }` |
+| Presence get | `members` | `{ "type": "result", "members": [ ... ] }` |
+| Channel list | `channels` | `{ "type": "result", "channels": [ ... ] }` |
+| Room list | `rooms` | `{ "type": "result", "rooms": [ ... ] }` |
+| Occupancy | `occupancy` | `{ "type": "result", "occupancy": { ... } }` |
+| Lock event | `lock` | `{ "type": "event", "lock": { ... } }` |
+| Cursor event | `cursor` | `{ "type": "event", "cursor": { ... } }` |
+| Space members | `members` | `{ "type": "result", "members": [ ... ] }` |
+
+---
+
+## 4. Stream Separation Testing
+
+This is a critical CLI testing dimension. All CLI tools must properly separate data from metadata.
+
+### Rules
+- **stdout**: Data output only — human-readable records OR JSON payloads
+- **stderr**: Progress messages, warnings, verbose output, error messages
+
+### How to Test
+
+All temp files go under `CLAUDE-BEHAVIOR-TESTING/<command-group>/temp/` (see SKILL.md Step 3).
+
+```bash
+# Capture streams separately
+pnpm cli channels list --pretty-json >"$TEMP_DIR/stdout.txt" 2>"$TEMP_DIR/stderr.txt"
+
+# Verify stdout is pure JSON
+jq . "$TEMP_DIR/stdout.txt" >/dev/null 2>&1
+echo "JSON valid: $?"
+
+# Verify stderr has progress (if any)
+cat "$TEMP_DIR/stderr.txt"
+
+# For streaming commands
+pnpm cli channels subscribe test-ch --pretty-json --duration 5 >"$TEMP_DIR/stdout.txt" 2>"$TEMP_DIR/stderr.txt"
+# Verify JSON validity
+jq . "$TEMP_DIR/stdout.txt" >/dev/null 2>&1 && echo "Valid JSON" || echo "INVALID JSON"
+```
+
+### Common Violations to Check
+- Progress messages ("Attaching to channel...") appearing on stdout in JSON mode
+- "Listening for messages." appearing on stdout in JSON mode
+- Warning messages on stdout instead of stderr
+- Error messages on stdout instead of stderr
+- ANSI color codes in piped/non-TTY output
+
+---
+
+## 5. Configuration Resolution Testing
+
+Test the auth/config precedence chain:
+
+```
+CLI flags (highest priority)
+  |
+Environment variables (ABLY_API_KEY, ABLY_TOKEN, ABLY_ACCESS_TOKEN)
+  |
+Stored config (ably login)
+  |
+Defaults (lowest priority)
+```
+
+### Test Scenarios
+- Command with stored config (default): should work
+- Command with `--app` flag: should override default app
+- Command with invalid app: should produce clear error
+
+---
+
+## 6. Pagination Testing
+
+For commands that return paginated results (history, list):
+
+### Scenarios
+
+| Scenario | How to Test | Verify |
+|----------|------------|--------|
+| Default limit | Run without `--limit` | Returns default number of items |
+| Custom limit | `--limit 5` | Returns exactly 5 (or fewer if less exist) |
+| Limit 1 | `--limit 1` | Returns exactly 1 item |
+| Large limit | `--limit 1000` | Returns available items, no crash |
+| Direction | `--direction forwards` vs `backwards` | Order changes |
+| Time range | `--start "1h" --end "now"` | Only items in range |
+| hasMore indicator | Check JSON output | `hasMore: true` when more pages exist |
+| Pagination hint | Check JSON output | `hint` field when `hasMore` is true |
+
+### Pagination Log
+When in non-JSON mode, verify:
+- "Fetched N pages" message appears when multiple pages consumed
+- Billable warning for history commands
+
+---
+
+## 7. Human-Readable vs JSON Field Parity
+
+For every command, compare the fields shown in human-readable output vs JSON output:
+
+### Rules
+- JSON output should contain **all** fields from human-readable output
+- JSON output may contain **additional** fields not shown in human-readable output
+- Human-readable output should **never** contain fields absent from JSON output
+- Null/undefined fields should be **omitted** in both modes (not shown as "null")
+
+### Common Field Mismatches to Check
+- Timestamps: human-readable may format differently (ISO string vs Unix ms)
+- IDs: may be truncated in human-readable but full in JSON
+- Metadata: may be flattened in human-readable but nested in JSON
+- Empty arrays: should be omitted in human-readable, present as `[]` in JSON
+
+---
+
+## 8. Error Path Testing Matrix
+
+| Error Category | Test Method | Expected Behavior |
+|---------------|-------------|-------------------|
+| Missing required arg | Omit channel/room name | Clear error on stderr, non-zero exit code |
+| Invalid flag value | `--limit -1` or `--limit abc` | Validation error with guidance |
+| Unknown flag | `--nonexistent-flag` | "Unknown flag" error, possibly with suggestion |
+| Auth failure | (if testable) Invalid API key | "Authentication failed" with hint |
+| Network error | (if reproducible) Invalid host | Connection error with retry guidance |
+| Not found | Nonexistent channel in history | Empty result or appropriate message |
+| Permission denied | (if testable) Restricted key | Permission error with hint |
+| JSON error envelope | Any error with `--pretty-json` | `type: "error"`, `success: false`, on stdout |
+
+---
+
+## 9. Exit Code Testing
+
+Every command must be tested for correct exit codes:
+
+| Scenario | Expected Exit Code |
+|----------|-------------------|
+| Successful operation | 0 |
+| Missing required argument | Non-zero (typically 2) |
+| Unknown flag | Non-zero (typically 2) |
+| Auth failure | Non-zero |
+| Resource not found | Non-zero |
+| Clean exit after `--duration` | 0 |
+| Network error | Non-zero |
+
+### How to Test
+```bash
+pnpm cli channels list --pretty-json >/dev/null 2>&1; echo "Exit: $?"
+pnpm cli channels publish 2>&1; echo "Exit: $?"  # missing args
+pnpm cli channels list --bogus-flag 2>&1; echo "Exit: $?"
+```
+
+---
+
+## 10. Signal and Lifecycle Testing
+
+### Duration Flag
+- `--duration 5`: command exits after ~5 seconds
+- `--duration 0.5`: command exits quickly (useful for testing)
+- No `--duration`: command runs until interrupted
+
+### Clean Shutdown
+When a long-running command exits (via duration or interrupt):
+- No error messages on clean exit
+- Exit code is 0
+- Resources cleaned up (no leaked connections in verbose output)
+- Presence left (if entered)
+- Any held state released
+
+---
+
+## 11. Cross-Command Consistency
+
+Verify consistent behavior patterns across similar commands:
+
+### Naming Consistency
+- Channels use "publish" / Rooms use "send"
+- Both use "subscribe" for listening
+- Both use "history" for past events
+- Both use "presence" subgroup
+- Spaces use "enter"/"set"/"acquire" for hold commands
+
+### Output Format Consistency
+- Same field formatting (timestamps, IDs, labels) across all commands
+- Same progress message patterns ("Attaching to...", "Listening for...")
+- Same error message patterns
+- Same JSON envelope structure
+
+### Flag Consistency
+- `--limit` behaves the same across history, list commands
+- `--duration` behaves the same across all subscribe commands
+- `--pretty-json` produces correct envelope structure everywhere
+- `--verbose` produces same level of detail everywhere
+- `--client-id` is available on all subscribe/presence/publish commands
+
+---
+
+## 12. Control API Testing Patterns
+
+Control API commands have different testing patterns from Product API commands:
+
+### CRUD Workflow Testing
+For resource-managing commands (apps, keys, queues, integrations, rules):
+
+```
+CREATE → verify in LIST → GET details → UPDATE → verify change in GET → DELETE → verify gone from LIST
+```
+
+### What to Verify
+- Create returns the created resource with all fields
+- List includes the created resource
+- Get returns full details matching create response
+- Update changes only specified fields
+- Delete removes the resource (may require `--force`)
+- Proper error when operating on non-existent resource
+
+### Destructive Operation Safety
+- Delete commands should require `--force` or prompt for confirmation
+- Error message when `--force` not provided should explain how to proceed
+- `--force` skips confirmation and deletes immediately
+
+### Auth Differences
+- Control API uses `ABLY_ACCESS_TOKEN` (not `ABLY_API_KEY`)
+- Missing token produces different error than missing API key
+- Some commands require account-level access, others app-level
+
+---
+
+## 13. Idempotency Testing
+
+For one-shot commands, verify that running the same command twice produces consistent results:
+
+- `list` commands return same data on repeated calls
+- `get` commands return same data on repeated calls
+- `history` commands return same data (assuming no new messages)
+- `publish`/`send` commands each create a new message (not idempotent — this is correct)
+- Error conditions produce consistent error messages
diff --git a/.gitignore b/.gitignore
index 54a0ea42..d0ed819e 100644
--- a/.gitignore
+++ b/.gitignore
@@ -58,3 +58,6 @@ test/e2e/web-cli/*.log
 
 # Environment files with secrets
 .env
+
+# Behavior testing reports
+CLAUDE-BEHAVIOR-TESTING/