content-qa

Self-assessment quality gate for generated content — LLM-as-Judge scoring across brand voice, factual accuracy, SEO compliance, originality, engagement potential, and AI citation readiness. Also handles SEO and AI visibility optimization checks. Use before publishing any generated content.

Model	Source
sonnet	pack: content-pumper

Full Reference

┏━ 📋 content-qa ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ LLM-as-Judge quality gate — score before ship ┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

content-qa

Quality gate for the content-pumper pipeline. Scores generated content across 6 dimensions using LLM-as-Judge, runs SEO and AI optimization checks, and returns a verdict: approve / review / reject. Nothing ships without passing this gate.

Scoring Dimensions

Dimension	0–40 (fail)	40–60 (poor)	60–80 (acceptable)	80–100 (excellent)
Brand voice alignment	Wrong tone, off-brand vocab, personality mismatch with brand.json	Some alignment, occasional drift	Mostly on-brand, minor tone inconsistencies	Perfect match — tone, vocabulary, personality indistinguishable from brand guidelines
Factual accuracy / citation quality	Claims unsupported or fabricated, no citations	Some citations but missing key claims or sources unverifiable	Most claims cited, 1–2 gaps	Every claim cited, sources verified, no fabrication, primary sources preferred
SEO compliance	Missing title tag, meta desc, H1; no keyword use; no schema	Has title/meta/H1 but keyword density off, no schema	Solid structure, keyword present, schema missing or partial	Title ≤ 60 chars, meta ≤ 160 chars with keyword, H1–H3 hierarchy clean, density 1–2%, schema valid, internal links present
Originality / unique angle	Rehash of top-ranking articles, no differentiation	Slightly different framing but no new insight	Unique angle present, minor overlap with competitor content	Clear POV not found in competitor research, new data or framing, differentiator explicit in intro
Engagement potential	Weak hook, poor readability, no CTA, no emotional triggers	Mediocre hook, readability issues, CTA buried	Good hook, readable, CTA present but mild	Strong hook in first 100 words, Flesch ≥ 60, CTA clear and action-oriented, emotional trigger present
AI citation readiness	No structured data, no FAQ, answers buried or absent	Partial FAQ or sparse answers	FAQ present, answers mostly direct	Schema markup complete, FAQ with concise direct answers, clear headings per question, optimized for AI snippet extraction

Each dimension scores 0–100. Composite = average of all 6.

Optimization Pass (runs before scoring)

Before invoking LLM-as-Judge, run these checks and apply fixes automatically:

Check	Fix if missing
Schema markup (`application/ld+json`)	Inject Article or FAQPage schema based on content type
FAQ section	Add FAQ block from research Q&A pairs if content supports it
Meta description includes target keyword	Rewrite meta desc to lead with keyword
H1–H3 hierarchy follows logical structure	Flatten or promote headings to enforce hierarchy
Concise answers under each heading (≤ 60 words for AI snippets)	Prepend a 1–2 sentence direct answer at the top of each H2 section
Internal links placeholder (at least 2)	Flag for human — do not fabricate URLs

Log every change made during optimization pass. Include in score card output.

Assessment Process

Run optimization pass — apply fixes, log changes
Invoke llm-evaluation — pass content + rubric for each dimension; request numeric score + 1–2 sentence justification per dimension
Invoke seo-pulse — run on the generated file path; extract title, meta, H1, keyword density, schema validation results
Invoke brand-compliance — pass content + brand.json; check tone, vocabulary, prohibited phrases
Invoke ai-visibility — assess FAQ structure, schema completeness, snippet-readiness
Compile score card — aggregate all scores, generate per-dimension feedback, apply threshold logic

Threshold Actions

Result	Condition	Action
Approve	All 6 dimensions ≥ 80	Auto-approve — return `verdict: approve`, pass to delivery skill
Review	Any dimension 60–79, none below 60	Queue for human review — include specific suggestions per flagged dimension
Reject	Any dimension < 60	Reject — include detailed feedback per failing dimension, suggest targeted rewrites

Iteration Support

When content-pumper-pimp receives a reject verdict:

Pass full score card + per-dimension feedback to content-writer
Re-invoke content-writer with the feedback as a revision prompt
Re-run content-qa on the revised output
Max 2 iterations — if still failing after 2 revision cycles, escalate to human with both score cards and the revision history

Track iteration count in the score card (iteration: 1 | 2 | escalated).

Output Format

┌─ content-qa score card ──────────────────────────────┐
│ File: <path>                                          │
│ Topic: <title>                                        │
│ Iteration: <1 | 2 | escalated>                        │
└───────────────────────────────────────────────────────┘

Dimension                    Score  Verdict
─────────────────────────────────────────────
Brand voice alignment          85   ✓
Factual accuracy               78   ◐  needs 2 more citations
SEO compliance                 90   ✓
Originality / unique angle     82   ✓
Engagement potential           61   ◐  hook too generic, CTA weak
AI citation readiness          88   ✓

Composite: 80.7  →  verdict: review

Optimization pass:
  ✓ Injected FAQPage schema
  ✓ Rewrote meta desc to lead with keyword
  ○ Internal links — flagged for human (no URLs to inject)

Suggested fixes:
  ▸ Factual accuracy: cite source for claim on line 47 re: market share figure
  ▸ Engagement: replace hook with a stat or provocative question; move CTA above fold

Integration

Skill	Role
`content-pumper-pimp`	Orchestrator — invokes content-qa after content-writer completes
`content-writer`	Receives rejection feedback for revision cycles
`llm-evaluation`	LLM-as-Judge scoring per dimension
`seo-pulse`	SEO signal extraction and validation
`brand-compliance`	Brand voice and vocabulary check against brand.json
`ai-visibility`	FAQ and schema assessment for AI snippet readiness