writing-reference-skills
Use when creating skills that document APIs, libraries, CLIs, or other external tools with version-sensitive facts like endpoints, pricing, quotas, or auth flows. Also use when an existing reference skill has outdated information.
| Model | Source | Category |
|---|---|---|
| opus | core | Meta |
Context: fork
Overview
Section titled “Overview”Reference skills document external tools (APIs, libraries, CLIs) where facts change over time. The critical difference from technique/pattern skills: training data alone produces plausible-looking but wrong information. Research is mandatory.
Mandatory Announcement — FIRST OUTPUT before anything else:
┏━ 🛡 writing-reference-skills ━━━━━━━━━━━━━━━━━━┓┃ [one-line description of what API/tool/skill] ┃┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛No exceptions. Box frame
- Creating a skill for an API, SDK, CLI tool, or library
- Updating an existing reference skill with new versions/pricing/deprecations
- Any skill where facts (versions, endpoints, quotas, pricing) can become outdated
Don’t use for: Technique skills, pattern skills, or discipline-enforcing skills — those don’t need external research.
Full Reference
Writing Reference Skills
Section titled “Writing Reference Skills”Overview
Section titled “Overview”Reference skills document external tools (APIs, libraries, CLIs) where facts change over time. The critical difference from technique/pattern skills: training data alone produces plausible-looking but wrong information. Research is mandatory.
Mandatory Announcement — FIRST OUTPUT before anything else:
┏━ 🛡 writing-reference-skills ━━━━━━━━━━━━━━━━━━┓┃ [one-line description of what API/tool/skill] ┃┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛No exceptions. Box frame first, then work.
REQUIRED BACKGROUND: You MUST understand writing-skills for the TDD cycle and SKILL.md structure. This skill adds the research and accuracy layer specific to reference documentation.
When to Use
Section titled “When to Use”- Creating a skill for an API, SDK, CLI tool, or library
- Updating an existing reference skill with new versions/pricing/deprecations
- Any skill where facts (versions, endpoints, quotas, pricing) can become outdated
Don’t use for: Technique skills, pattern skills, or discipline-enforcing skills — those don’t need external research.
The Core Problem
Section titled “The Core Problem”Baseline testing revealed: agents asked to write reference skills skip web research entirely and rely on training data. They acknowledge uncertainty (“this version might be outdated”) but don’t verify. Result: plausible-looking skills with wrong API versions, outdated pricing tiers, incorrect rate limits, and missing deprecations.
Evidence from our Google API skills: 4 of 7 skills needed REFACTOR fixes despite the agent being “confident” in the content. Wrong pricing tier names (3 tiers vs actual 5), outdated upload quotas (1,600 vs actual ~100 units), missing service name distinctions, wrong rate limits.
Two-File Structure
Section titled “Two-File Structure”skill-name/ SKILL.md # Concise entry point (<100 lines, <500 words) reference.md # Comprehensive reference (400-800 lines)SKILL.md Template (Reference Type)
Section titled “SKILL.md Template (Reference Type)”---name: api-namedescription: Use when [specific triggering conditions with API/tool name]---
# API Name
## OverviewOne sentence: what it does + current version/date.
## Quick Reference| Item | Value ||---------|-------|| Base URL | ... || Auth | ... || Python | `pip install ...` || Node.js | `npm install ...` |
## AuthenticationMinimal setup code (one language, 3-5 lines).
## Common Operations2-3 most frequent operations with code snippets.
## Rate Limits / QuotasTable of key limits.
## Common Mistakes| Mistake | Fix ||---------|-----|
## Full ReferenceSee `reference.md` in this skill directory for [list topics covered].reference.md Structure
Section titled “reference.md Structure”- Table of contents at top
- Organized by feature area (not CRUD operations)
- Code examples in 1-2 languages max (pick the API’s primary ecosystem)
- Error codes with actionable fixes
- Appendices for lookup tables (currency codes, error codes, etc.)
Research Phase (Mandatory)
Section titled “Research Phase (Mandatory)”digraph research { rankdir=TB; "Start" [shape=oval]; "Search current API state" [shape=box]; "Verify version/date" [shape=box]; "Check pricing/quotas" [shape=box]; "Check deprecations" [shape=box]; "Write skill" [shape=box]; "Any uncertain facts?" [shape=diamond]; "Search to verify" [shape=box]; "Done" [shape=oval];
"Start" -> "Search current API state"; "Search current API state" -> "Verify version/date"; "Verify version/date" -> "Check pricing/quotas"; "Check pricing/quotas" -> "Check deprecations"; "Check deprecations" -> "Write skill"; "Write skill" -> "Any uncertain facts?"; "Any uncertain facts?" -> "Search to verify" [label="yes"]; "Any uncertain facts?" -> "Done" [label="no"]; "Search to verify" -> "Any uncertain facts?";}What to Research (Checklist)
Section titled “What to Research (Checklist)”Before writing ANY reference skill, search for:
- Current version — API version number, SDK version, release date
- Authentication — Has the auth method changed? New scopes? Deprecated flows?
- Pricing/quotas — Current tier names, free thresholds, rate limits, daily caps
- Deprecations — What was removed or sunset recently? Migration deadlines?
- New features — Major additions in the last 6-12 months
- Breaking changes — Renamed endpoints, changed parameters, new requirements
How to Research
Section titled “How to Research”Use WebSearch with targeted queries:
"[API name] API changelog 2025 2026""[API name] pricing changes""[API name] deprecated endpoints""[API name] rate limits quota""[API name] latest version"
Use WebFetch on official docs pages for specific facts. Prioritize official documentation over blog posts.
The “Flag and Verify” Rule
Section titled “The “Flag and Verify” Rule”Never write “this might be outdated” in a skill. If you’re uncertain about a fact:
- Search for the current value
- If you find it, use the verified value
- If you can’t find it, note the date of your best information:
(as of [date]) - Never leave hedging language (“probably”, “I think”, “might be”) in the final skill
Source Quality Gates
Section titled “Source Quality Gates”Before trusting any source found during research, apply these 4 gatekeeper gates. A source must pass at least 3 of 4 gates to be used as reference material.
The Four Gates
Section titled “The Four Gates”| Gate | Question | Pass | Fail |
|---|---|---|---|
| G1: Mechanism Specificity | Does it define a specific pattern, technique, or API? | Concrete implementation details | Vague advice or opinion |
| G2: Implementable Artifacts | Does it contain code, schemas, templates, or diagrams? | Actionable artifacts | Theory without examples |
| G3: Beyond Basics | Does it cover advanced patterns, not just “getting started”? | Edge cases, gotchas, production patterns | Tutorial-level intro |
| G4: Source Verifiability | Is the author/org a demonstrated technical authority? | Official docs, core team, recognized expert | Anonymous blog, no credentials |
Gate Application
Section titled “Gate Application”For each source: gates_passed = count(G1, G2, G3, G4 that pass) if gates_passed >= 3: TRUST — use as reference material if gates_passed == 2: VERIFY — cross-reference with a trusted source if gates_passed <= 1: REJECT — do not useSpecial Cases
Section titled “Special Cases”- Official documentation (docs.*.com, *.dev) always passes G4 automatically
- GitHub README from the project repo passes G4 automatically
- Stack Overflow answers with 10+ upvotes pass G2 but need G4 verification
- AI-generated content (ChatGPT answers, Copilot suggestions) NEVER passes G4 — always verify against official docs
RED-GREEN Testing for Reference Skills
Section titled “RED-GREEN Testing for Reference Skills”Reference skills test differently from discipline skills. You’re testing accuracy, not compliance.
RED Phase (Baseline)
Section titled “RED Phase (Baseline)”Ask a subagent API-specific questions WITHOUT the skill. Document:
- What facts they get wrong (versions, pricing, endpoints)
- What they’re uncertain about
- What they omit entirely
GREEN Phase (With Skill)
Section titled “GREEN Phase (With Skill)”Same questions WITH the skill loaded. The skill should:
- Correct the wrong facts
- Fill knowledge gaps
- Provide working code examples
What to Test (4 Questions Minimum)
Section titled “What to Test (4 Questions Minimum)”- Authentication/setup — How to get started
- Common operation — Most frequent API call with code
- Gotcha/pricing/limits — The thing people get wrong
- Recent change — Something that changed in the last year
Success Criteria
Section titled “Success Criteria”Rate each answer: does the skill provide correct, current, actionable information that the agent couldn’t produce from training data alone? If the skill doesn’t add value over baseline on at least 2 of 4 questions, it needs more research.
Living Reference Architecture
Section titled “Living Reference Architecture”Reference skills with substantial documentation MUST use a reference/ directory instead of a monolithic reference.md. This keeps each category scannable, searchable, and independently updatable.
Directory Structure
Section titled “Directory Structure”skill-name/ SKILL.md # Router — routes to reference/ files (use skill-router-template.md) reference/ authentication.md # Named after user tasks, not internal architecture sending-messages.md webhooks.md error-handling.md pricing-limits.mdCategory file names MUST reflect what the developer is trying to do (“sending-messages”) not internal structure (“message-resource”). See templates/reference-doc-guidelines.md for full naming conventions.
SKILL.md Router Template
Section titled “SKILL.md Router Template”When using reference/ structure, SKILL.md becomes a router. Use the template at templates/skill-router-template.md:
## Where to Go
| I want to... | File ||---------------------------|-----------------------------------|| Authenticate / set up | `reference/authentication.md` || Send a message | `reference/sending-messages.md` || Handle webhooks | `reference/webhooks.md` || Understand errors | `reference/error-handling.md` || Check pricing / limits | `reference/pricing-limits.md` |Inline Gotchas — Not Separate Sections
Section titled “Inline Gotchas — Not Separate Sections”Place gotchas inline next to the code they apply to, tagged with severity. Never put them in a separate “Common Mistakes” section buried at the bottom.
```bash# Send messagecurl -X POST https://api.example.com/messages \ -H "Authorization: Bearer $TOKEN" \ -d '{"to": "+15551234567", "body": "Hello"}'Severity tags: `◆ CRITICAL` (silent data loss, auth failure) · `⚠ WARNING` (common mistake) · `ℹ INFO` (non-obvious behavior)
### When to Use reference/ vs reference.md
| Condition | Structure ||-----------|-----------|| Reference content > 300 lines | `reference/` directory || 3+ distinct topic areas | `reference/` directory || Single focused API (e.g., one endpoint) | `reference.md` still fine || Existing skill with reference.md | Migrate when updating, not proactively |
---
## Common Mistakes When Writing Reference Skills
| Mistake | Fix ||---------|-----|| Skipping research ("I know this API") | You know the API as of your training cutoff. Search for changes. || SKILL.md too long (>100 lines) | Move details to reference/ or reference.md. SKILL.md is the index card. || reference.md too short (<300 lines) | Not enough detail. Cover auth, operations, errors, limits, examples. || Wrong pricing/quota numbers | Always verify with a web search. These change frequently. || Hedging language ("probably", "might be") | Search and verify, or note the date of your information. || Examples in 3+ languages | Pick 1-2 max. Match the API's primary ecosystem. || No error codes section | Developers search for error codes. Include the common ones with fixes. || Organized by CRUD instead of features | Developers think "I want to send an SMS" not "I want to create a resource." || Gotchas in a separate section | Place inline next to the code they apply to with severity tags. || Monolithic reference.md > 300 lines | Split into reference/ category files named after user tasks. |
</details>