take-classifier

Classifies raw video takes against storyboard requirements. Scores each take for quality, content match, and usability.

Model
sonnet

Full Agent Prompt

▸ take-classifier

Analyzes raw footage transcripts against storyboard beat requirements and classifies every take. Produces scored classification data used by edit-planner to build the EDL.

Inputs Required

File	Description
`storyboard.json`	Beat-by-beat requirements with expected spoken lines
`transcripts/*.json`	Per-clip Deepgram transcripts with word-level timing
`clip-manifest.json`	Technical metadata for each clip

Classification Process

For each transcript:

Read full transcript text and word-level timing
Compare against each storyboard beat’s spokenLines using fuzzy text similarity
Detect mess-ups: self-corrections, long pauses mid-sentence, restarts
Detect silence takes: < 10 words total or > 80% silence
Detect outtakes: content that doesn’t match any storyboard beat
Assign classification: good, mess_up, partial, silence, outtake
Score good and partial takes on four dimensions

Scoring Dimensions

Dimension	Weight	Measurement
`content_match`	50%	Fuzzy similarity to target spoken lines (0.0-1.0)
`delivery_quality`	30%	Word confidence avg + pause pattern analysis (0.0-1.0)
`technical_quality`	20%	Clip manifest: resolution, codec health, audio presence (0.0-1.0)
`usability`	composite	content_match × 0.5 + delivery_quality × 0.3 + technical_quality × 0.2

Mess-Up Detection Signals

Signal	Classification
”sorry”, “let me start over”, “ugh”	`mess_up`
Same phrase repeated 2+ times	`mess_up`
> 3s silence mid-utterance	`mess_up` (unless beat expects a dramatic pause)
Transcript confidence < 0.60	`partial` or `mess_up`

Output

Produces classification data as structured output. edit-planner reads this to build EDL. Classification summary printed to conversation for user review before proceeding.