Skip to content

content-qa

Self-assessment quality gate for generated content — LLM-as-Judge scoring across brand voice, factual accuracy, SEO compliance, originality, engagement potential, and AI citation readiness. Also handles SEO and AI visibility optimization checks. Use before publishing any generated content.

ModelSource
sonnetpack: content-pumper
Full Reference

┏━ 📋 content-qa ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ LLM-as-Judge quality gate — score before ship ┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

Quality gate for the content-pumper pipeline. Scores generated content across 6 dimensions using LLM-as-Judge, runs SEO and AI optimization checks, and returns a verdict: approve / review / reject. Nothing ships without passing this gate.


Dimension0–40 (fail)40–60 (poor)60–80 (acceptable)80–100 (excellent)
Brand voice alignmentWrong tone, off-brand vocab, personality mismatch with brand.jsonSome alignment, occasional driftMostly on-brand, minor tone inconsistenciesPerfect match — tone, vocabulary, personality indistinguishable from brand guidelines
Factual accuracy / citation qualityClaims unsupported or fabricated, no citationsSome citations but missing key claims or sources unverifiableMost claims cited, 1–2 gapsEvery claim cited, sources verified, no fabrication, primary sources preferred
SEO complianceMissing title tag, meta desc, H1; no keyword use; no schemaHas title/meta/H1 but keyword density off, no schemaSolid structure, keyword present, schema missing or partialTitle ≤ 60 chars, meta ≤ 160 chars with keyword, H1–H3 hierarchy clean, density 1–2%, schema valid, internal links present
Originality / unique angleRehash of top-ranking articles, no differentiationSlightly different framing but no new insightUnique angle present, minor overlap with competitor contentClear POV not found in competitor research, new data or framing, differentiator explicit in intro
Engagement potentialWeak hook, poor readability, no CTA, no emotional triggersMediocre hook, readability issues, CTA buriedGood hook, readable, CTA present but mildStrong hook in first 100 words, Flesch ≥ 60, CTA clear and action-oriented, emotional trigger present
AI citation readinessNo structured data, no FAQ, answers buried or absentPartial FAQ or sparse answersFAQ present, answers mostly directSchema markup complete, FAQ with concise direct answers, clear headings per question, optimized for AI snippet extraction

Each dimension scores 0–100. Composite = average of all 6.


Before invoking LLM-as-Judge, run these checks and apply fixes automatically:

CheckFix if missing
Schema markup (application/ld+json)Inject Article or FAQPage schema based on content type
FAQ sectionAdd FAQ block from research Q&A pairs if content supports it
Meta description includes target keywordRewrite meta desc to lead with keyword
H1–H3 hierarchy follows logical structureFlatten or promote headings to enforce hierarchy
Concise answers under each heading (≤ 60 words for AI snippets)Prepend a 1–2 sentence direct answer at the top of each H2 section
Internal links placeholder (at least 2)Flag for human — do not fabricate URLs

Log every change made during optimization pass. Include in score card output.


  1. Run optimization pass — apply fixes, log changes
  2. Invoke llm-evaluation — pass content + rubric for each dimension; request numeric score + 1–2 sentence justification per dimension
  3. Invoke seo-pulse — run on the generated file path; extract title, meta, H1, keyword density, schema validation results
  4. Invoke brand-compliance — pass content + brand.json; check tone, vocabulary, prohibited phrases
  5. Invoke ai-visibility — assess FAQ structure, schema completeness, snippet-readiness
  6. Compile score card — aggregate all scores, generate per-dimension feedback, apply threshold logic

ResultConditionAction
ApproveAll 6 dimensions ≥ 80Auto-approve — return verdict: approve, pass to delivery skill
ReviewAny dimension 60–79, none below 60Queue for human review — include specific suggestions per flagged dimension
RejectAny dimension < 60Reject — include detailed feedback per failing dimension, suggest targeted rewrites

When content-pumper-pimp receives a reject verdict:

  1. Pass full score card + per-dimension feedback to content-writer
  2. Re-invoke content-writer with the feedback as a revision prompt
  3. Re-run content-qa on the revised output
  4. Max 2 iterations — if still failing after 2 revision cycles, escalate to human with both score cards and the revision history

Track iteration count in the score card (iteration: 1 | 2 | escalated).


┌─ content-qa score card ──────────────────────────────┐
│ File: <path> │
│ Topic: <title> │
│ Iteration: <1 | 2 | escalated> │
└───────────────────────────────────────────────────────┘
Dimension Score Verdict
─────────────────────────────────────────────
Brand voice alignment 85 ✓
Factual accuracy 78 ◐ needs 2 more citations
SEO compliance 90 ✓
Originality / unique angle 82 ✓
Engagement potential 61 ◐ hook too generic, CTA weak
AI citation readiness 88 ✓
Composite: 80.7 → verdict: review
Optimization pass:
✓ Injected FAQPage schema
✓ Rewrote meta desc to lead with keyword
○ Internal links — flagged for human (no URLs to inject)
Suggested fixes:
▸ Factual accuracy: cite source for claim on line 47 re: market share figure
▸ Engagement: replace hook with a stat or provocative question; move CTA above fold

SkillRole
content-pumper-pimpOrchestrator — invokes content-qa after content-writer completes
content-writerReceives rejection feedback for revision cycles
llm-evaluationLLM-as-Judge scoring per dimension
seo-pulseSEO signal extraction and validation
brand-complianceBrand voice and vocabulary check against brand.json
ai-visibilityFAQ and schema assessment for AI snippet readiness