USP

It uniquely positions Claude Code as the "builder" and "director" of video projects, offering a flexible, AI-orchestrated workflow with deep integration of open-source AI models. This toolkit provides a structured yet adaptable framework f…

Use cases

01Creating AI-generated explainer videos
02Producing sprint review videos with demos
03Developing product demo videos
04Automating video content creation
05Composing programmatic video with AI elements

Detected files (8)

.claude/skills/frontend-design/SKILL.mdskill

Show content (4275 bytes)

---
name: frontend-design
description: Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
license: Complete terms in LICENSE.txt
---

This skill guides creation of distinctive, production-grade frontend interfaces that avoid generic "AI slop" aesthetics. Implement real working code with exceptional attention to aesthetic details and creative choices.

The user provides frontend requirements: a component, page, application, or interface to build. They may include context about the purpose, audience, or technical constraints.

## Design Thinking

Before coding, understand the context and commit to a BOLD aesthetic direction:
- **Purpose**: What problem does this interface solve? Who uses it?
- **Tone**: Pick an extreme: brutally minimal, maximalist chaos, retro-futuristic, organic/natural, luxury/refined, playful/toy-like, editorial/magazine, brutalist/raw, art deco/geometric, soft/pastel, industrial/utilitarian, etc. There are so many flavors to choose from. Use these for inspiration but design one that is true to the aesthetic direction.
- **Constraints**: Technical requirements (framework, performance, accessibility).
- **Differentiation**: What makes this UNFORGETTABLE? What's the one thing someone will remember?

**CRITICAL**: Choose a clear conceptual direction and execute it with precision. Bold maximalism and refined minimalism both work - the key is intentionality, not intensity.

Then implement working code (HTML/CSS/JS, React, Vue, etc.) that is:
- Production-grade and functional
- Visually striking and memorable
- Cohesive with a clear aesthetic point-of-view
- Meticulously refined in every detail

## Frontend Aesthetics Guidelines

Focus on:
- **Typography**: Choose fonts that are beautiful, unique, and interesting. Avoid generic fonts like Arial and Inter; opt instead for distinctive choices that elevate the frontend's aesthetics; unexpected, characterful font choices. Pair a distinctive display font with a refined body font.
- **Color & Theme**: Commit to a cohesive aesthetic. Use CSS variables for consistency. Dominant colors with sharp accents outperform timid, evenly-distributed palettes.
- **Motion**: Use animations for effects and micro-interactions. Prioritize CSS-only solutions for HTML. Use Motion library for React when available. Focus on high-impact moments: one well-orchestrated page load with staggered reveals (animation-delay) creates more delight than scattered micro-interactions. Use scroll-triggering and hover states that surprise.
- **Spatial Composition**: Unexpected layouts. Asymmetry. Overlap. Diagonal flow. Grid-breaking elements. Generous negative space OR controlled density.
- **Backgrounds & Visual Details**: Create atmosphere and depth rather than defaulting to solid colors. Add contextual effects and textures that match the overall aesthetic. Apply creative forms like gradient meshes, noise textures, geometric patterns, layered transparencies, dramatic shadows, decorative borders, custom cursors, and grain overlays.

NEVER use generic AI-generated aesthetics like overused font families (Inter, Roboto, Arial, system fonts), cliched color schemes (particularly purple gradients on white backgrounds), predictable layouts and component patterns, and cookie-cutter design that lacks context-specific character.

Interpret creatively and make unexpected choices that feel genuinely designed for the context. No design should be the same. Vary between light and dark themes, different fonts, different aesthetics. NEVER converge on common choices (Space Grotesk, for example) across generations.

**IMPORTANT**: Match implementation complexity to the aesthetic vision. Maximalist designs need elaborate code with extensive animations and effects. Minimalist or refined designs need restraint, precision, and careful attention to spacing, typography, and subtle details. Elegance comes from executing the vision well.

Remember: Claude is capable of extraordinary creative work. Don't hold back, show what can truly be created when thinking outside the box and committing fully to a distinctive vision.

.claude/skills/ltx2/SKILL.mdskill

Show content (9783 bytes)

---
name: ltx2
description: AI video generation with LTX-2.3 22B — text-to-video, image-to-video clips for video production. Use when generating video clips, animating images, creating b-roll, animated backgrounds, or motion content. Triggers include video generation, animate image, b-roll, motion, video clip, text-to-video, image-to-video.
---

# LTX-2.3 Video Generation

Generate ~5 second video clips from text prompts or images using the LTX-2.3 22B DiT model.
Runs on Modal (A100-80GB). Requires `MODAL_LTX2_ENDPOINT_URL` in `.env`.

## Quick Reference

```bash
# Text-to-video
python3 tools/ltx2.py --prompt "A sunset over the ocean, golden light on waves, cinematic" --output sunset.mp4

# Image-to-video (animate a still image)
python3 tools/ltx2.py --prompt "Gentle camera drift, soft ambient motion" --input photo.jpg --output animated.mp4

# Custom resolution and duration
python3 tools/ltx2.py --prompt "..." --width 1024 --height 576 --num-frames 161 --output wide.mp4

# Fast mode (fewer steps, quicker)
python3 tools/ltx2.py --prompt "..." --quality fast --output quick.mp4

# Reproducible output
python3 tools/ltx2.py --prompt "..." --seed 42 --output reproducible.mp4
```

## Parameters

| Parameter | Default | Description |
|-----------|---------|-------------|
| `--prompt` | (required) | Text description of the video |
| `--input` | - | Input image for image-to-video |
| `--width` | 768 | Video width (divisible by 64) |
| `--height` | 512 | Video height (divisible by 64) |
| `--num-frames` | 121 | Frame count, must satisfy `(n-1) % 8 == 0` |
| `--fps` | 24 | Frames per second |
| `--quality` | standard | `standard` (30 steps) or `fast` (15 steps) |
| `--steps` | 30 | Override inference steps directly |
| `--seed` | random | Seed for reproducibility |
| `--output` | auto | Output file path |
| `--negative-prompt` | sensible default | What to avoid |
| `--lora` | none | Style LoRA preset. Currently: `crt-terminal`. |

## Style LoRAs

Style LoRAs bias the output toward a specific visual aesthetic. They're baked into the Modal image and selected per-request; switching LoRAs forces a pipeline rebuild (~60s one-time cost per container lifetime per switch).

### `crt-terminal` — CRT / pixel-art terminals

Base: LTX-2.3 22B, trained by [@lovis93](https://huggingface.co/lovis93/crt-animation-terminal-ltx-2.3-lora) (Apache 2.0).

```bash
# Trigger word is auto-prepended — write the prompt normally
python3 tools/ltx2.py --lora crt-terminal \
  --prompt "a terminal typing out \"\\$ claude --continue\" character by character in glowing green pixel font, scanlines, phosphor glow, low choppy frame rate, hacker mood" \
  --output crt_claude.mp4
```

**What the preset changes:**
- Prepends `crtanim,` to the prompt (the LoRA's trigger word)
- Defaults to 1024×1024, 121 frames (the ratio it was trained on)
- Relaxes the default negative prompt so on-screen text isn't filtered out

**Prompt pattern:** `<CRT aesthetic> → <color palette> → <animation style> → <subject> → <literal text in quotes> → <mood>`. Keep on-screen text to 1–3 words — the model can't render long strings reliably. The LoRA prefers static framing; ask for camera moves explicitly if you want them.

## Valid Frame Counts

`(n - 1) % 8 == 0`: 25 (~1s), 49 (~2s), 73 (~3s), 97 (~4s), **121 (~5s default)**, 161 (~6.7s), 193 (~8s max practical).

## Common Resolutions

| Resolution | Ratio | Notes |
|------------|-------|-------|
| 768x512 | 3:2 | Default, good balance |
| 512x512 | 1:1 | Square, fastest |
| 1024x576 | 16:9 | Widescreen |
| 576x1024 | 9:16 | Portrait/vertical |

## Prompting Guide

LTX-2 responds well to cinematographic descriptions. Layer these dimensions:

- **Camera:** "Slow dolly forward", "Aerial drone shot", "Tracking shot", "Static wide angle"
- **Lighting:** "Golden hour", "Cinematic lighting", "Neon-lit", "Soft diffused light"
- **Motion:** "Timelapse of...", "Slow motion", "Gentle camera drift", "Gradually transitions"
- **Style:** "Shot on 35mm film", "Documentary style", "Clean minimal aesthetic"
- **Negative:** Always implicitly avoids "worst quality, blurry, jittery, watermark, text, logo"

Keep prompts under 200 words. Be specific about the scene.

### Good Prompts

```
# Atmospheric b-roll
"Aerial drone shot slowly flying over turquoise ocean waves breaking on white sand, golden hour sunlight, cinematic"

# Product/tech scene
"Close-up of hands typing on a mechanical keyboard, shallow depth of field, soft desk lamp lighting, cozy atmosphere"

# Abstract background
"Dark moody abstract background with flowing blue light streaks, subtle geometric grid, bokeh particles floating, cinematic tech atmosphere"

# Animate a portrait
"Professional headshot, subtle natural head movement, confident warm expression, studio lighting, shallow depth of field"

# Animate a slide/screenshot
"Gentle subtle particle effects floating across a presentation slide, soft ambient light shifts, very slight camera drift"
```

### Bad Prompts

```
# Too vague
"A cool video"

# Too many competing ideas
"A cat riding a skateboard while juggling fire on the moon during a thunderstorm"

# Describing text/UI (model can't render text reliably)
"A website showing the text 'Welcome to our platform'"
```

## Video Production Use Cases

### B-Roll Clips
Generate atmospheric 5s shots for cutaways between narrated scenes:
```bash
python3 tools/ltx2.py --prompt "Futuristic holographic interface, glowing data visualizations, clean workspace, cinematic" --output broll_tech.mp4
python3 tools/ltx2.py --prompt "Aerial view of European city at golden hour, modern architecture" --output broll_europe.mp4
```

### Animated Slide Backgrounds
Feed a slide screenshot and add subtle motion:
```bash
python3 tools/ltx2.py --prompt "Gentle particle effects, soft ambient light shifts, very slight camera drift" --input slide.png --output animated_slide.mp4
```

### Animated Portraits
Bring still headshots to life:
```bash
python3 tools/ltx2.py --prompt "Subtle natural head movement, warm expression, professional lighting" --input headshot.png --output animated_portrait.mp4
```

### Stylized Character Cameo (SadTalker Alternative)
For non-realistic faces — fantasy characters, masked figures, heavy beards, helmets, illustrations — SadTalker often produces uncanny or broken lip sync because it's trained on photoreal humans. LTX-2 image-to-video is frequently a better choice when **lip-sync precision isn't critical** (the viewer's brain fills in the gap as long as something is moving). Prompt for *motion + atmosphere*, not phonemes:

```bash
python3 tools/ltx2.py \
  --input character_portrait.png \
  --prompt "Ancient warrior speaks slowly with gravitas, beard shifts subtly, glowing aura pulses, embers drift past, slow head movement, cinematic close-up, mystical atmosphere" \
  --width 768 --height 768 \
  --output character_speaking.mp4
```

**When LTX-2 wins over SadTalker:**
- Stylized / illustrated / fantasy characters
- Heavy facial hair or accessories obscuring the mouth
- Masked or helmeted figures
- Short cameo lines where atmosphere matters more than precision
- Dramatic VO rather than dialogue

**When SadTalker still wins:**
- Photoreal human presenters
- Full sentences where mouth shape needs to match phonemes
- Tutorials / talking-head explainers where the viewer is effectively reading lips

### Branded Intro/Outro
Generate abstract motion backgrounds for title cards:
```bash
python3 tools/ltx2.py --prompt "Dark moody background with flowing blue and coral light streaks, bokeh particles, cinematic tech atmosphere, no text" --output intro_bg.mp4
```

### Combining with Other Tools

LTX-2 generates raw clips. Combine with the rest of the toolkit:

| Workflow | Tools |
|----------|-------|
| Generate clip → upscale | `ltx2.py` → `upscale.py` |
| Generate clip → add to Remotion | `ltx2.py` → use as `<OffthreadVideo>` in composition |
| Generate image → animate | `flux2.py` → `ltx2.py --input` |
| Generate clip → extract audio | `ltx2.py` → `ffmpeg -i clip.mp4 -vn audio.wav` |
| Generate clip → add voiceover | `ltx2.py` → mix with `qwen3_tts.py` output |

## Technical Details

- **Model:** LTX-2.3 22B DiT (Lightricks), bf16
- **GPU:** A100-80GB on Modal (~$4.68/hr)
- **Inference:** ~2.5 min per clip (768x512, 121 frames, 30 steps)
- **Cost:** ~$0.20-0.25 per 5s clip
- **Cold start:** ~60-90s (loading ~55GB weights)
- **Output:** H.264 MP4 with synchronized ambient audio (24fps)
- **Max duration:** ~8s (193 frames) per clip

### Known Limitations

- **Training data artifacts:** ~30% of generations may have unwanted logos/text from training data. Re-run with different `--seed`.
- **Text rendering:** Cannot reliably generate readable text in video. Use Remotion overlays instead.
- **Max duration:** ~8s per clip. Longer content needs stitching.
- **Audio:** Generated audio is ambient/environmental only. Use voiceover/music tools for speech and music.
- **License:** Community License — free under $10M revenue, commercial license needed above that.

## Setup

```bash
# 1. Create Modal secret for HuggingFace (one-time)
modal secret create huggingface-token HF_TOKEN=hf_your_token

# 2. Deploy (downloads ~55GB of weights, takes ~10 min)
modal deploy docker/modal-ltx2/app.py

# 3. Save endpoint URL to .env
echo "MODAL_LTX2_ENDPOINT_URL=https://yourname--video-toolkit-ltx2-ltx2-generate.modal.run" >> .env

# 4. Test
python3 tools/ltx2.py --prompt "A candle flickering on a dark table, cinematic" --output test.mp4
```

**Important:** HuggingFace token needs read-access scope. Accept the [Gemma 3 license](https://huggingface.co/google/gemma-3-12b-it-qat-q4_0-unquantized) before deploying. Unauthenticated downloads are severely rate-limited.

.claude/skills/moviepy/SKILL.mdskill

Show content (13198 bytes)

---
name: moviepy
description: Python video composition with moviepy 2.x — overlaying deterministic text on AI-generated video (LTX-2, SadTalker), compositing clips, single-file build.py video projects. Use when adding labels/captions/lower-thirds to LTX-2 or SadTalker outputs, building short ad-style spots in pure Python without Remotion, or doing programmatic video composition. Triggers include text overlay on video, label LTX-2 clip, caption SadTalker output, lower third, build.py video, moviepy, Python video composition, sub-30s ad spot.
---

# moviepy for Video Production

moviepy is the toolkit's go-to library for **putting deterministic text on top of AI-generated video** and for building short, single-file Python video projects without a Remotion toolchain.

The deeper principle is **trustworthy text**: any genre where text *has to* be readable, accurate, and consistent (legally, editorially, or commercially) is a genre where AI-rendered in-frame text is unacceptable and a moviepy overlay step is the natural fix. Names must be spelled right. Prices must be exact. Source attributions must be pixel-perfect. AI generation models cannot guarantee any of that.

## When to use moviepy vs. Remotion

| Use moviepy when… | Use Remotion when… |
|-------------------|---------------------|
| Overlaying text/labels on an LTX-2 or SadTalker output | Building long-form sprint reviews or product demos |
| Building sub-30s ad-style spots in a single `build.py` | Multi-template, multi-brand, design-heavy work |
| Compositing data-driven visuals (matplotlib `FuncAnimation` → mp4) | Anything needing React components or design system reuse |
| One-off transformations on existing video files | Anything where the project lifecycle (planning → render) matters |
| You want zero Node.js / no React mental overhead | You want hot-reload preview in Remotion Studio |

Two runnable references for everything in this skill live in `examples/`:

- **`examples/quick-spot/build.py`** — 15-second ad-style spot. Audio-anchored timeline, text overlay, optional VO + ducked music. Renders silent out of the box with zero external assets.
- **`examples/data-viz-chart/build.py`** — animated time-series chart with deterministic title and source attribution. Demonstrates the matplotlib (data) + moviepy (trustworthy text) split.

Both run with `python3 build.py` and produce a real `out.mp4` immediately. Read them alongside this skill — every pattern below is shown working there.

**Dependencies.** `moviepy`, `Pillow`, and `matplotlib` are declared in `tools/requirements.txt` and installed with the toolkit's one-line Python setup: `python3 -m pip install -r tools/requirements.txt`. If you hit `Missing dependency` when running an example, run that command from the repo root — the examples' `build.py` files will tell you the same thing in their error message and exit cleanly rather than printing a bare traceback.

## The main use case: text on AI-generated video

Both LTX-2 and SadTalker output bare visuals:

- **LTX-2** cannot reliably render readable text (the model hallucinates letterforms — see the ltx2 skill's "Bad Prompts").
- **SadTalker** outputs a talking head with no captions, labels, lower thirds, or context.

The fix is to generate the visual cleanly, then composite text over it deterministically with moviepy. This is the canonical pattern in this toolkit:

```python
from moviepy import VideoFileClip, ImageClip, CompositeVideoClip

# 1. AI-generated visual (LTX-2 or SadTalker output)
bg = VideoFileClip("lugh_ltx.mp4").without_audio()

# 2. Text rendered via PIL → ImageClip (see "Text rendering" below)
title = (
    ImageClip("text_cache/intro_title.png")
    .with_duration(2.0)
    .with_start(0.5)
    .with_position(("center", 880))
)

# 3. Composite
final = CompositeVideoClip([bg, title], size=(1920, 1080))
final.write_videofile("lugh_with_caption.mp4", fps=30, codec="libx264")
```

Common shapes this takes:

| Shape | LTX-2 use | SadTalker use |
|-------|-----------|---------------|
| Title card over hero footage | "INTRODUCING LONGARM" over a cinematic LTX-2 b-roll | n/a |
| Lower third / name plate | n/a | "Lugh — Ancient Warrior God" under a talking head |
| Quote caption | "I am going home." over an LTX-2 character cameo | Same, over a SadTalker talking head |
| Brand attribution | Logo + URL fade-in over the last second | Same |
| Tinted overlay for contrast | Dark navy semi-transparent layer behind text | Same |

## Genres where this shines

The "AI-visual + deterministic text overlay" pattern is the natural production pipeline for several styles of video. If the request matches one of these, reach for moviepy by default:

| Genre | What you overlay | Why moviepy is the right call |
|-------|------------------|-------------------------------|
| **News / talking-head journalism** | Speaker name plates, location bars, breaking-news banners, source attribution, pull quotes | Names must be spelled right (editorial / legal). The biggest category by volume. |
| **Documentary segments** | Interviewee lower thirds, chapter titles, archival source credits, location stamps | Same trust requirement as news. |
| **Trailers / promo spots** | Title cards, credit overlays ("FROM THE DIRECTOR OF…"), date stings, quote cards, CTAs | Tightly timed, text-heavy, every frame matters. The `q2-townhall-longarm-ad` example is exactly this. |
| **Social short-form (Reels, TikTok, Shorts)** | Word-accurate captions for sound-off viewing, hashtag overlays | Most social viewing is muted; captions are non-negotiable. |
| **Product demos with annotations** | Pricing callouts, feature labels, "click here" pointers over screen recordings, before/after labels | Prices and product names must be exact. |
| **Tutorials / explainers** | Step number overlays, terminal-command captions, keyboard-shortcut callouts | Step numbers must be sequential, commands must be copy-pasteable. |

Lesser-but-real fits: music videos (lyric overlays), reaction videos (source attribution), sports recaps (score overlays), real-estate tours (price / sqft), conference talks (speaker + session plate).

**For full SRT-driven subtitling** (long-form, time-coded, multilingual) moviepy is workable but not ideal — reach for `ffmpeg` with `subtitles` filter or a dedicated subtitle tool. moviepy is best for hand-placed overlays, not bulk caption tracks.

## Text rendering — use PIL, not `TextClip`

**Critical gotcha:** moviepy 2.x's `TextClip(method='label')` has a tight-bbox bug that **clips letter ascenders and descenders** (the tops of capitals, the tails of g/p/y). On Apple Silicon you'll see characters with sliced edges and not realise what's wrong for hours.

**The workaround:** render text to a transparent PNG via PIL, then load it as an `ImageClip`. Cache the result by content hash so re-builds are free.

```python
import hashlib
from pathlib import Path
from PIL import Image, ImageDraw, ImageFont

ARIAL_BOLD = "/System/Library/Fonts/Supplemental/Arial Bold.ttf"

def render_text_png(txt, size, hex_color, cache_dir="./text_cache"):
    cache = Path(cache_dir); cache.mkdir(parents=True, exist_ok=True)
    key = hashlib.sha1(f"{txt}|{size}|{hex_color}".encode()).hexdigest()[:16]
    path = cache / f"{key}.png"
    if path.exists():
        return str(path)

    font = ImageFont.truetype(ARIAL_BOLD, size)
    bbox = ImageDraw.Draw(Image.new("RGBA", (1, 1))).textbbox((0, 0), txt, font=font)
    tw, th = bbox[2] - bbox[0], bbox[3] - bbox[1]
    pad = max(20, size // 4)

    img = Image.new("RGBA", (tw + pad * 2, th + pad * 2), (0, 0, 0, 0))
    rgb = tuple(int(hex_color.lstrip("#")[i:i+2], 16) for i in (0, 2, 4))
    ImageDraw.Draw(img).text((pad - bbox[0], pad - bbox[1]), txt, font=font, fill=(*rgb, 255))
    img.save(path)
    return str(path)
```

The full helper (with kwargs for bold, position, fades, and cleaner ergonomics) is in `examples/quick-spot/build.py` — copy it rather than re-implementing.

## Audio-anchored timeline pattern

For ad-style edits where every frame matters, generate per-scene VO first and anchor every visual to known absolute timestamps. This eliminates timing drift entirely. See **CLAUDE.md → Video Timing → Audio-Anchored Timelines** for the full pattern. The short version:

```python
# Audio-anchored timeline (25s):
#   Scene 1 tired      0.3 → 3.74  (audio 3.44s)
#   Scene 2 worries    4.0 → 8.88  (audio 4.88s)

text_clip("TIRED OF",     start=0.5,  duration=1.2)
text_clip("THIRD-PARTY",  start=1.0,  duration=1.8)
vo_clip("01_tired.mp3",   start=0.3)
vo_clip("02_worries.mp3", start=4.0)
```

## Common recipes

### Text on a single AI-generated clip

```python
from moviepy import VideoFileClip, ImageClip, CompositeVideoClip

bg = VideoFileClip("ltx_hero.mp4").without_audio()
caption = (
    ImageClip(render_text_png("THE FUTURE OF AGENTS", 140, "#FFFFFF"))
    .with_duration(bg.duration)
    .with_position(("center", 880))
)
CompositeVideoClip([bg, caption], size=bg.size).write_videofile("captioned.mp4", fps=30)
```

### Lower third over a SadTalker talking head

```python
from moviepy import VideoFileClip, ImageClip, ColorClip, CompositeVideoClip

talking = VideoFileClip("narrator_sadtalker.mp4")
W, H = talking.size

# Semi-transparent bar across the bottom for contrast
bar = (
    ColorClip((W, 140), color=(20, 24, 38))
    .with_duration(talking.duration)
    .with_opacity(0.75)
    .with_position(("center", H - 160))
)
name = (
    ImageClip(render_text_png("LUGH", 72, "#F06859"))
    .with_duration(talking.duration)
    .with_position((80, H - 150))
)
title = (
    ImageClip(render_text_png("Ancient Warrior God", 36, "#FFFFFF"))
    .with_duration(talking.duration)
    .with_position((80, H - 80))
)
CompositeVideoClip([talking, bar, name, title]).write_videofile("with_lower_third.mp4", fps=30)
```

### Tinted overlay for text contrast over busy footage

LTX-2 b-roll is often too visually busy for legible text. Drop a semi-transparent navy layer between the video and the text:

```python
from moviepy import ColorClip

tint = (
    ColorClip((W, H), color=(20, 24, 38))
    .with_duration(duration)
    .with_opacity(0.55)
)
# Composite order: bg → tint → text
CompositeVideoClip([bg, tint, text_clip])
```

### Side-by-side composite

```python
from moviepy import VideoFileClip, CompositeVideoClip, ColorClip

left  = VideoFileClip("demo_a.mp4").resized(width=960).with_position((  0, "center"))
right = VideoFileClip("demo_b.mp4").resized(width=960).with_position((960, "center"))
bg    = ColorClip((1920, 1080), color=(0, 0, 0)).with_duration(max(left.duration, right.duration))
CompositeVideoClip([bg, left, right]).write_videofile("split.mp4", fps=30)
```

### Mix per-scene VO with ducked music

```python
from moviepy import AudioFileClip, CompositeAudioClip
from moviepy.audio.fx.MultiplyVolume import MultiplyVolume
from moviepy.audio.fx.AudioFadeIn import AudioFadeIn
from moviepy.audio.fx.AudioFadeOut import AudioFadeOut

music = AudioFileClip("music.mp3").with_effects([
    MultiplyVolume(0.22),  # duck under VO
    AudioFadeIn(0.5),
    AudioFadeOut(1.5),
])
vo = [
    AudioFileClip(f"scenes/0{i}.mp3").with_effects([MultiplyVolume(1.15)]).with_start(start)
    for i, start in [(1, 0.3), (2, 4.0), (3, 9.1)]
]
final_audio = CompositeAudioClip([music] + vo)
```

## Gotchas

- **moviepy 2.x renamed methods.** Use `subclipped` (not `subclip`), `with_duration` / `with_start` / `with_position` (not `set_duration` etc.), `with_effects([...])` instead of `.fadein()`/`.fadeout()`. Many tutorials online still show 1.x syntax — be skeptical.
- **`TextClip(method='label')` clips ascenders/descenders.** Always use the PIL workaround above.
- **`OffthreadVideo` is Remotion-only.** moviepy uses `VideoFileClip`. Don't mix the two.
- **Resizing requires Pillow ≥ 10.0** for the LANCZOS resample. If you see `ANTIALIAS` errors, upgrade Pillow.
- **`ColorClip` takes RGB tuples, not hex strings.** Use `(20, 24, 38)`, not `"#141826"`.
- **Audio in `VideoFileClip` is loaded by default.** Call `.without_audio()` if you only want the visual — composing with audio you don't want will cause silent VO drops in `CompositeAudioClip`.
- **Always set `size=(W, H)` on `CompositeVideoClip`.** Without it, output dimensions follow the first clip, which can be smaller than your target.

## When to reach for what

| Task | Tool |
|------|------|
| Animate a still image | `tools/ltx2.py --input` |
| Talking head from photoreal portrait | `tools/sadtalker.py` |
| Talking head from stylized character | `tools/ltx2.py --input` (see ltx2 skill) |
| **Add a label/caption/lower third to either of the above** | **moviepy + PIL (this skill)** |
| Convert / compress / resize an existing file | `ffmpeg` (see ffmpeg skill) |
| Long-form, design-system-driven video | Remotion (see remotion skill) |

## References

- Runnable example — short ad-style spot: `examples/quick-spot/build.py`
- Runnable example — data-viz with text overlay: `examples/data-viz-chart/build.py`
- Audio-anchored timelines: `CLAUDE.md → Video Timing → Audio-Anchored Timelines`
- Related skills: `ltx2`, `ffmpeg`, `remotion`

.claude/skills/playwright-recording/SKILL.mdskill

Show content (12670 bytes)

---
name: playwright-recording
description: Record browser interactions as video using Playwright. Use for capturing demo videos, app walkthroughs, and UI flows for Remotion videos. Triggers include recording a demo, capturing browser video, screen recording a website, or creating walkthrough footage.
---

# Playwright Video Recording

Playwright can record browser interactions as video - perfect for demo footage in Remotion compositions.

## Quick Start

### Installation

```bash
# In your video project
npm init -y
npm install -D playwright @playwright/test
npx playwright install chromium
```

### Basic Recording Script

```typescript
// scripts/record-demo.ts
import { chromium } from 'playwright';

async function recordDemo() {
  const browser = await chromium.launch();
  const context = await browser.newContext({
    viewport: { width: 1920, height: 1080 },
    recordVideo: {
      dir: './recordings',
      size: { width: 1920, height: 1080 }
    }
  });

  const page = await context.newPage();

  // Your recording actions
  await page.goto('https://example.com');
  await page.waitForTimeout(2000);
  await page.click('button.demo');
  await page.waitForTimeout(3000);

  // Close to save video
  await context.close();
  await browser.close();

  console.log('Recording saved to ./recordings/');
}

recordDemo();
```

Run with:
```bash
npx ts-node scripts/record-demo.ts
# or
npx tsx scripts/record-demo.ts
```

## Recording Configuration

### Viewport Sizes

```typescript
// Standard 1080p (recommended for Remotion)
viewport: { width: 1920, height: 1080 }

// 720p (smaller files)
viewport: { width: 1280, height: 720 }

// Square (social media)
viewport: { width: 1080, height: 1080 }

// Mobile
viewport: { width: 390, height: 844 } // iPhone 14
```

### Video Quality Settings

```typescript
const context = await browser.newContext({
  viewport: { width: 1920, height: 1080 },
  recordVideo: {
    dir: './recordings',
    size: { width: 1920, height: 1080 } // Match viewport for crisp output
  },
  // Slow down for visibility
  // Note: slowMo is on browser launch, not context
});

// For slow motion, launch browser with slowMo
const browser = await chromium.launch({
  slowMo: 100 // 100ms delay between actions
});
```

## Recording Patterns

### Form Submission Demo

```typescript
import { chromium } from 'playwright';

async function recordFormDemo() {
  const browser = await chromium.launch({ slowMo: 50 });
  const context = await browser.newContext({
    viewport: { width: 1920, height: 1080 },
    recordVideo: { dir: './recordings', size: { width: 1920, height: 1080 } }
  });
  const page = await context.newPage();

  await page.goto('https://myapp.com/form');
  await page.waitForTimeout(1000);

  // Type with realistic speed
  await page.fill('#name', 'John Smith', { timeout: 5000 });
  await page.waitForTimeout(500);

  await page.fill('#email', 'john@example.com');
  await page.waitForTimeout(500);

  // Click submit
  await page.click('button[type="submit"]');

  // Wait for result
  await page.waitForSelector('.success-message');
  await page.waitForTimeout(2000);

  await context.close();
  await browser.close();
}
```

### Multi-Page Navigation

```typescript
async function recordNavDemo() {
  const browser = await chromium.launch({ slowMo: 100 });
  const context = await browser.newContext({
    viewport: { width: 1920, height: 1080 },
    recordVideo: { dir: './recordings', size: { width: 1920, height: 1080 } }
  });
  const page = await context.newPage();

  // Page 1
  await page.goto('https://myapp.com');
  await page.waitForTimeout(2000);

  // Navigate to page 2
  await page.click('nav a[href="/features"]');
  await page.waitForLoadState('networkidle');
  await page.waitForTimeout(2000);

  // Navigate to page 3
  await page.click('nav a[href="/pricing"]');
  await page.waitForLoadState('networkidle');
  await page.waitForTimeout(2000);

  await context.close();
  await browser.close();
}
```

### Scroll Demo

```typescript
async function recordScrollDemo() {
  const browser = await chromium.launch();
  const context = await browser.newContext({
    viewport: { width: 1920, height: 1080 },
    recordVideo: { dir: './recordings', size: { width: 1920, height: 1080 } }
  });
  const page = await context.newPage();

  await page.goto('https://myapp.com/long-page');
  await page.waitForTimeout(1000);

  // Smooth scroll
  await page.evaluate(async () => {
    const delay = (ms: number) => new Promise(r => setTimeout(r, ms));
    for (let i = 0; i < 10; i++) {
      window.scrollBy({ top: 200, behavior: 'smooth' });
      await delay(300);
    }
  });

  await page.waitForTimeout(1000);
  await context.close();
  await browser.close();
}
```

### Login Flow

```typescript
async function recordLoginDemo() {
  const browser = await chromium.launch({ slowMo: 75 });
  const context = await browser.newContext({
    viewport: { width: 1920, height: 1080 },
    recordVideo: { dir: './recordings', size: { width: 1920, height: 1080 } }
  });
  const page = await context.newPage();

  await page.goto('https://myapp.com/login');
  await page.waitForTimeout(1000);

  await page.fill('#email', 'demo@example.com');
  await page.waitForTimeout(300);

  await page.fill('#password', '••••••••');
  await page.waitForTimeout(500);

  await page.click('button[type="submit"]');

  // Wait for dashboard
  await page.waitForURL('**/dashboard');
  await page.waitForTimeout(3000);

  await context.close();
  await browser.close();
}
```

## Cursor Highlighting

Playwright doesn't show cursor by default. Add visual indicators:

### CSS Cursor Highlight

```typescript
// Inject cursor visualization
await page.addStyleTag({
  content: `
    * { cursor: none !important; }
    .playwright-cursor {
      position: fixed;
      width: 24px;
      height: 24px;
      background: rgba(255, 100, 100, 0.5);
      border: 2px solid rgba(255, 50, 50, 0.8);
      border-radius: 50%;
      pointer-events: none;
      z-index: 999999;
      transform: translate(-50%, -50%);
      transition: transform 0.1s ease;
    }
    .playwright-cursor.clicking {
      transform: translate(-50%, -50%) scale(0.8);
      background: rgba(255, 50, 50, 0.8);
    }
  `
});

// Add cursor element
await page.evaluate(() => {
  const cursor = document.createElement('div');
  cursor.className = 'playwright-cursor';
  document.body.appendChild(cursor);

  document.addEventListener('mousemove', (e) => {
    cursor.style.left = e.clientX + 'px';
    cursor.style.top = e.clientY + 'px';
  });

  document.addEventListener('mousedown', () => cursor.classList.add('clicking'));
  document.addEventListener('mouseup', () => cursor.classList.remove('clicking'));
});
```

### Click Ripple Effect

```typescript
// Add click ripple visualization
await page.addStyleTag({
  content: `
    .click-ripple {
      position: fixed;
      width: 40px;
      height: 40px;
      border-radius: 50%;
      background: rgba(234, 88, 12, 0.4);
      pointer-events: none;
      z-index: 999998;
      transform: translate(-50%, -50%) scale(0);
      animation: ripple 0.4s ease-out forwards;
    }
    @keyframes ripple {
      to {
        transform: translate(-50%, -50%) scale(2);
        opacity: 0;
      }
    }
  `
});

// Custom click function with ripple
async function clickWithRipple(page, selector) {
  const element = await page.locator(selector);
  const box = await element.boundingBox();

  await page.evaluate(({ x, y }) => {
    const ripple = document.createElement('div');
    ripple.className = 'click-ripple';
    ripple.style.left = x + 'px';
    ripple.style.top = y + 'px';
    document.body.appendChild(ripple);
    setTimeout(() => ripple.remove(), 400);
  }, { x: box.x + box.width / 2, y: box.y + box.height / 2 });

  await element.click();
}
```

## Output for Remotion

### Move Recording to public/demos/

```typescript
import { chromium } from 'playwright';
import * as fs from 'fs';
import * as path from 'path';

async function recordForRemotion(outputName: string) {
  const browser = await chromium.launch({ slowMo: 50 });
  const context = await browser.newContext({
    viewport: { width: 1920, height: 1080 },
    recordVideo: { dir: './temp-recordings', size: { width: 1920, height: 1080 } }
  });
  const page = await context.newPage();

  // ... recording actions ...

  await context.close();

  // Get the video path
  const video = page.video();
  const videoPath = await video?.path();

  if (videoPath) {
    const destPath = `./public/demos/${outputName}.webm`;
    fs.mkdirSync(path.dirname(destPath), { recursive: true });
    fs.renameSync(videoPath, destPath);
    console.log(`Recording saved to: ${destPath}`);

    // Get duration for config
    // Use ffprobe: ffprobe -v error -show_entries format=duration -of csv=p=0 file.webm
  }

  await browser.close();
}
```

### Convert WebM to MP4

Playwright outputs WebM. Convert for better Remotion compatibility:

```bash
ffmpeg -i recording.webm -c:v libx264 -crf 20 -preset medium -movflags faststart public/demos/demo.mp4
```

## Interactive Recording

For user-driven recordings where you manually perform actions:

```typescript
// Inject ESC key listener to stop recording
async function injectStopListener(page: Page): Promise<void> {
  await page.evaluate(() => {
    if ((window as any).__escListenerAdded) return;
    (window as any).__escListenerAdded = true;
    (window as any).__stopRecording = false;
    document.addEventListener('keydown', (e) => {
      if (e.key === 'Escape') {
        e.preventDefault();
        (window as any).__stopRecording = true;
      }
    });
  });
}

// Poll for stop signal - handle navigation errors gracefully
while (!stopped) {
  try {
    const shouldStop = await page.evaluate(() => (window as any).__stopRecording === true);
    if (shouldStop) break;
  } catch {
    // Page navigating - continue recording
  }
  await new Promise(r => setTimeout(r, 200));
}
```

**Key insight:** `page.evaluate()` throws during navigation. Use try/catch and continue - don't treat errors as stop signals.

## Window Scaling for Laptops

Record at full 1080p while showing a smaller window:

```typescript
const scale = 0.75; // 75% window size
const context = await browser.newContext({
  viewport: { width: 1920 * scale, height: 1080 * scale },
  deviceScaleFactor: 1 / scale,
  recordVideo: { dir: './recordings', size: { width: 1920, height: 1080 } },
});
```

## Cookie Banner Dismissal

Comprehensive selector list for common consent platforms:

```typescript
const COOKIE_SELECTORS = [
  '#onetrust-accept-btn-handler',           // OneTrust
  '#CybotCookiebotDialogBodyButtonAccept',  // Cookiebot
  '.cc-btn.cc-dismiss',                      // Cookie Consent by Insites
  '[class*="cookie"] button[class*="accept"]',
  '[class*="consent"] button[class*="accept"]',
  'button:has-text("Accept all")',
  'button:has-text("Accept cookies")',
  'button:has-text("Got it")',
];

async function dismissCookieBanners(page: Page): Promise<void> {
  await page.waitForTimeout(500);
  for (const selector of COOKIE_SELECTORS) {
    try {
      const btn = page.locator(selector).first();
      if (await btn.isVisible({ timeout: 100 })) {
        await btn.click({ timeout: 500 });
        return;
      }
    } catch { /* try next */ }
  }
}
```

Call after `page.goto()` and on `page.on('load')` for navigation.

## Important: Injected Elements Appear in Video

**Warning:** Any DOM elements you inject (cursors, control panels, overlays) will be recorded. For UI-free recordings, use terminal-based controls only (Ctrl+C, max duration timer).

## Tips for Good Demo Recordings

1. **Use slowMo** - 50-100ms makes actions visible
2. **Add waitForTimeout** - Pause between actions for comprehension
3. **Wait for animations** - Use `waitForLoadState('networkidle')`
4. **Match Remotion dimensions** - 1920x1080 at 30fps typical
5. **Test without recording first** - Debug before final capture
6. **Clear browser state** - Use fresh context for clean demos
7. **Dismiss cookie banners** - Use comprehensive selector list above
8. **Re-inject on navigation** - Cursor/listeners reset on page load

---

## Feedback & Contributions

If this skill is missing information or could be improved:

- **Missing a pattern?** Describe what you needed
- **Found an error?** Let me know what's wrong
- **Want to contribute?** I can help you:
  1. Update this skill with improvements
  2. Create a PR to github.com/digitalsamba/claude-code-video-toolkit

Just say "improve this skill" and I'll guide you through updating `.claude/skills/playwright-recording/SKILL.md`.

.claude/skills/acestep/SKILL.mdskill

Show content (13332 bytes)

---
name: acestep
description: AI music generation with ACE-Step 1.5 — background music, vocal tracks, covers, stem extraction, audio repainting, and continuation for video production. Use when generating music, soundtracks, jingles, or working with audio stems. Triggers include background music, soundtrack, jingle, music generation, stem extraction, cover, style transfer, repaint, continuation, or musical composition tasks.
---

# ACE-Step 1.5 Music Generation

Open-source music generation via `tools/music_gen.py`.

**Cloud providers:**
- **acemusic** (default) — Official ACE-Step cloud API with XL Turbo (4B) model + 5Hz LM thinking mode. Free API key from [acemusic.ai/api-key](https://acemusic.ai/api-key). No GPU required.
- **modal** — Self-hosted ACE-Step 2B Turbo on Modal. Requires `MODAL_MUSIC_GEN_ENDPOINT_URL`.
- **runpod** — Self-hosted ACE-Step 2B Turbo on RunPod. Requires `RUNPOD_ACESTEP_ENDPOINT_ID`.

## Setup

```bash
# acemusic (recommended — free, best quality, no GPU)
echo "ACEMUSIC_API_KEY=your_key" >> .env
# Get key at https://acemusic.ai/api-key

# Self-hosted (optional fallback)
python tools/music_gen.py --setup             # RunPod
modal deploy docker/modal-music-gen/app.py    # Modal
```

## Quick Reference

```bash
# Basic generation (uses acemusic XL Turbo by default)
python tools/music_gen.py --prompt "Upbeat tech corporate" --duration 60 --output bg.mp3

# Generate 4 variations, pick the best
python tools/music_gen.py --prompt "Calm ambient piano" --duration 30 --variations 4 --output ambient.mp3

# Fast mode (disable thinking)
python tools/music_gen.py --no-thinking --prompt "Quick draft" --duration 30 --output draft.mp3

# With musical control
python tools/music_gen.py --prompt "Calm ambient piano" --duration 30 --bpm 72 --key "D Major" --output ambient.mp3

# Scene presets (video production)
python tools/music_gen.py --preset corporate-bg --duration 60 --output bg.mp3
python tools/music_gen.py --preset tension --duration 20 --output problem.mp3
python tools/music_gen.py --preset cta --brand digital-samba --duration 15 --output cta.mp3

# Vocals with lyrics
python tools/music_gen.py --prompt "Indie pop jingle" --lyrics "[verse]\nBuild it better\nShip it faster" --duration 30 --output jingle.mp3

# Cover / style transfer
python tools/music_gen.py --cover --reference theme.mp3 --prompt "Jazz piano version" --duration 60 --output jazz_cover.mp3

# Repaint a weak section
python tools/music_gen.py --repaint --input track.mp3 --repaint-start 15 --repaint-end 25 --prompt "Guitar solo" --output fixed.mp3

# Continue from existing audio
python tools/music_gen.py --continuation --input track.mp3 --prompt "Continue with jazz piano" --output extended.mp3

# Stem extraction
python tools/music_gen.py --extract vocals --input mixed.mp3 --output vocals.mp3

# Fall back to self-hosted
python tools/music_gen.py --cloud modal --prompt "Background music" --duration 60 --output bg.mp3
```

## Fixing "Samey" Output

If generated music sounds repetitive or lacks variety, try these in order:

1. **Use acemusic cloud** (default) — the XL Turbo 4B model is significantly more capable than the 2B model on Modal/RunPod
2. **Keep thinking mode on** (default for acemusic) — the 5Hz LM enriches sparse prompts into detailed musical descriptions
3. **Generate variations** — `--variations 4` generates 4 takes, pick the best
4. **Use stochastic inference** — `--infer-method sde` adds randomness (same seed gives different results)
5. **Vary BPM and key across scenes** — don't use the same preset for every scene
6. **Write sparser prompts** — "Upbeat indie rock" gives the model more creative freedom than a hyper-detailed description
7. **Vary seeds** — omit `--seed` to let each generation be unique

## Creating a Song (Step by Step)

### 1. Instrumental background track (simplest)
```bash
python tools/music_gen.py --prompt "Upbeat indie rock, driving drums, jangly guitar" --duration 60 --bpm 120 --key "G Major" --output track.mp3
```

### 2. Song with vocals and lyrics
Write lyrics in a temp file or pass inline. Use structure tags to control song sections.

```bash
# Write lyrics to a file first (recommended for longer songs)
cat > /tmp/lyrics.txt << 'LYRICS'
[Verse 1]
Walking through the morning light
Coffee in my hand feels right
Another day to build and dream
Nothing's ever what it seems

[Chorus - anthemic]
WE KEEP MOVING FORWARD
Through the noise and doubt
We keep moving forward
That's what it's about

[Verse 2]
Screens are glowing late at night
Shipping code until it's right
The deadline's close but so are we
Almost there, just wait and see

[Chorus - bigger]
WE KEEP MOVING FORWARD
Through the noise and doubt
We keep moving forward
That's what it's about

[Outro - fade]
(Moving forward...)
LYRICS

# Generate the song
python tools/music_gen.py \
  --prompt "Upbeat indie rock anthem, male vocal, driving drums, electric guitar, studio polish" \
  --lyrics "$(cat /tmp/lyrics.txt)" \
  --duration 60 \
  --bpm 128 \
  --key "G Major" \
  --output my_song.mp3
```

### 3. Repaint a weak section
If the chorus sounds weak, regenerate just that section:
```bash
python tools/music_gen.py --repaint --input my_song.mp3 --repaint-start 20 --repaint-end 35 --prompt "Powerful anthemic chorus, big drums" --output fixed.mp3
```

### 4. Continue/extend a track
```bash
python tools/music_gen.py --continuation --input my_song.mp3 --prompt "Continue with gentle acoustic outro" --output extended.mp3
```

### Key tips for good results
- **Caption = overall style** (genre, instruments, mood, production quality)
- **Lyrics = temporal structure** (verse/chorus flow, vocal delivery)
- **UPPERCASE in lyrics** = high vocal intensity
- **Parentheses** = background vocals: "We rise (together)"
- **Keep 6-10 syllables per line** for natural rhythm
- **Don't describe the melody in the caption** — describe the *sound* and *feeling*
- **Use `--seed`** to lock randomness when iterating on prompt/lyrics

### Controlling vocal gender
The model doesn't reliably follow "female vocal" or "male vocal" on its own. Use **both** of these together:
1. **In the prompt**: Be explicit — "solo female singer, alto voice" or "female vocalist only, breathy intimate voice". Adding an artist reference helps (e.g., "Brandi Carlile style").
2. **In the lyrics**: Add `[female vocal]` tags before each section:
```
[female vocal]
[Verse 1]
Walking through the morning light...

[female vocal]
[Chorus - anthemic]
WE KEEP MOVING FORWARD...
```
Just saying "female vocal" in the prompt alone is often ignored. The combination of prompt + lyrics tags is what works.

### Duets and vocal trading
For duets with male/female vocals trading verses, use both the prompt and per-section lyrics tags:
- **Prompt**: "duet, male and female vocals trading verses, warm harmonies on chorus"
- **Lyrics**: Tag each section with who sings it:
```
[Verse 1 - male vocal, storytelling]
First verse lyrics here...

[Chorus - male and female duet, harmonies]
Chorus lyrics here...

[Verse 2 - female vocal, wry]
Second verse lyrics here...

[Bridge - male vocal, spoken]
Spoken bridge...

[Bridge - female vocal, sung]
Sung response...
```
This reliably produces vocal trading between sections and harmonies on shared parts.

## Scene Presets

| Preset | BPM | Key | Use Case |
|--------|-----|-----|----------|
| `corporate-bg` | 110 | C Major | Professional background, presentations |
| `upbeat-tech` | 128 | G Major | Product launches, tech demos |
| `ambient` | 72 | D Major | Overview slides, reflective content |
| `dramatic` | 90 | D Minor | Reveals, announcements |
| `tension` | 85 | A Minor | Problem statements, challenges |
| `hopeful` | 120 | C Major | Solution reveals, resolutions |
| `cta` | 135 | E Major | Call to action, closing energy |
| `lofi` | 85 | F Major | Screen recordings, coding demos |

## Task Types

### text2music (default)
Generate music from text prompt + optional lyrics.

### cover
Style transfer from reference audio. Control blend with `--cover-strength` (0.0-1.0):
- **0.2** — Loose style inspiration (more creative freedom)
- **0.5** — Balanced style transfer
- **0.7** — Close to original structure (default)
- **1.0** — Maximum fidelity to source

### extract
Stem separation — isolate individual tracks from mixed audio.
Tracks: `vocals`, `drums`, `bass`, `guitar`, `piano`, `keyboard`, `strings`, `brass`, `woodwinds`, `other`

### repainting (acemusic only)
Regenerate a specific time segment within existing audio while preserving the rest.
```bash
python tools/music_gen.py --repaint --input track.mp3 --repaint-start 15 --repaint-end 25 --prompt "Guitar solo" --output fixed.mp3
```

### continuation (acemusic only)
Extend existing audio by continuing from where it ends.
```bash
python tools/music_gen.py --continuation --input track.mp3 --prompt "Continue with jazz piano" --output extended.mp3
```

## Prompt Engineering

### Caption Writing — Layer Dimensions

Write captions by layering multiple descriptive dimensions rather than single-word descriptions.

**Dimensions to include:**
- **Genre/Style**: pop, rock, jazz, electronic, lo-fi, synthwave, orchestral
- **Emotion/Mood**: melancholic, euphoric, dreamy, nostalgic, intimate, tense
- **Instruments**: acoustic guitar, synth pads, 808 drums, strings, brass, piano
- **Timbre**: warm, crisp, airy, punchy, lush, polished, raw
- **Era**: "80s synth-pop", "modern indie", "classical romantic"
- **Production**: lo-fi, studio-polished, live recording, cinematic
- **Vocal**: breathy, powerful, falsetto, raspy, spoken word (or "instrumental")

**Good**: "Slow melancholic piano ballad with intimate female vocal, warm strings building to powerful chorus, studio-polished production"
**Bad**: "Sad song"

### Key Principles

1. **Specificity over vagueness** — describe instruments, mood, production style
2. **Avoid contradictions** — don't request "classical strings" and "hardcore metal" simultaneously
3. **Repetition reinforces priority** — repeat important elements for emphasis
4. **Sparse captions = more creative freedom** — detailed captions constrain the model
5. **Use metadata params for BPM/key** — don't write "120 BPM" in the caption, use `--bpm 120`

### Lyrics Formatting

**Structure tags** (use in lyrics, not caption):
```
[Intro]
[Verse]
[Chorus]
[Bridge]
[Outro]
[Instrumental]
[Guitar Solo]
[Build]
[Drop]
[Breakdown]
```

**Vocal control** (prefix lines or sections):
```
[raspy vocal]
[whispered]
[falsetto]
[powerful belting]
[harmonies]
[ad-lib]
```

**Energy indicators:**
- UPPERCASE = high intensity ("WE RISE ABOVE")
- Parentheses = background vocals ("We rise (together)")
- Keep 6-10 syllables per line within sections for natural rhythm

## Video Production Integration

### Music for Scene Types

| Scene | Preset | Duration | Notes |
|-------|--------|----------|-------|
| Title | `dramatic` or `ambient` | 3-5s | Short, mood-setting |
| Problem | `tension` | 10-15s | Dark, unsettling |
| Solution | `hopeful` | 10-15s | Relief, optimism |
| Demo | `lofi` or `corporate-bg` | 30-120s | Non-distracting, matches demo length |
| Stats | `upbeat-tech` | 8-12s | Building credibility |
| CTA | `cta` | 5-10s | Maximum energy, punchy |
| Credits | `ambient` | 5-10s | Gentle fade-out |

### Timing Workflow

1. Plan scene durations first (from voiceover script)
2. Generate music to match: `--duration <scene_seconds>`
3. Music duration is precise (within 0.1s of requested)
4. For background music spanning multiple scenes: generate one long track

### Combining with Voiceover

Background music should be mixed at 10-20% volume in Remotion:
```tsx
<Audio src={staticFile('voiceover.mp3')} volume={1} />
<Audio src={staticFile('bg-music.mp3')} volume={0.15} />
```

For music under narration: use instrumental presets (`corporate-bg`, `ambient`, `lofi`).
For music-forward scenes (title, CTA): can use higher volume or vocal tracks.

### Brand Consistency

Use `--brand <name>` to load hints from `brands/<name>/brand.json`.
Use `--cover --reference brand_theme.mp3` to create variations of a brand's sonic identity.
For consistent sound across a project: fix the seed (`--seed 42`) and vary only duration/prompt.

## Advanced Parameters

| Flag | Default | Description |
|------|---------|-------------|
| `--thinking` | on (acemusic) | 5Hz LM enriches prompts and generates audio codes |
| `--no-thinking` | - | Faster generation, skip LM reasoning |
| `--variations N` | 1 | Generate N variations (1-8, acemusic only) |
| `--guidance-scale` | 7.0 | Prompt adherence (1.0-15.0) |
| `--infer-method` | ode | `ode` (deterministic) or `sde` (stochastic, more variety) |
| `--seed` | random | Lock randomness for reproducibility |

## Technical Details

- **acemusic cloud**: XL Turbo 4B DiT + 4B LM, best quality, ~5-15s per generation
- **Modal/RunPod**: Standard Turbo 2B DiT, no LM, ~2-3s per generation
- **Output**: 48kHz MP3/WAV/FLAC
- **Duration range**: 10-600 seconds
- **BPM range**: 30-300

### When NOT to use ACE-Step
- **Voice cloning** — use Qwen3-TTS or ElevenLabs instead
- **Sound effects** — use ElevenLabs SFX (`tools/sfx.py`)
- **Speech/narration** — use voiceover tools, not music gen
- **Stem extraction from video** — extract audio first with FFmpeg, then use `--extract`

.claude/skills/elevenlabs/SKILL.mdskill

Show content (10961 bytes)

---
name: elevenlabs
description: Generate AI voiceovers, sound effects, and music using ElevenLabs APIs. Use when creating audio content for videos, podcasts, or games. Triggers include generating voiceovers, narration, dialogue, sound effects from descriptions, background music, soundtrack generation, voice cloning, or any audio synthesis task.
---

# ElevenLabs Audio Generation

Requires `ELEVENLABS_API_KEY` in `.env`.

## Text-to-Speech

```python
from elevenlabs.client import ElevenLabs
from elevenlabs import save, VoiceSettings
import os

client = ElevenLabs(api_key=os.getenv("ELEVENLABS_API_KEY"))

audio = client.text_to_speech.convert(
    text="Welcome to my video!",
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    model_id="eleven_multilingual_v2",
    voice_settings=VoiceSettings(
        stability=0.5,
        similarity_boost=0.75,
        style=0.5,
        speed=1.0
    )
)
save(audio, "voiceover.mp3")
```

### Models

| Model | Quality | SSML Support | Notes |
|-------|---------|--------------|-------|
| `eleven_multilingual_v2` | Highest consistency | None | Stable, production-ready, 29 languages |
| `eleven_flash_v2_5` | Good | `<break>`, `<phoneme>` | Fast, supports pause/pronunciation tags |
| `eleven_turbo_v2_5` | Good | `<break>`, `<phoneme>` | Fastest latency |
| `eleven_v3` | Most expressive | None | Alpha — unreliable, needs prompt engineering |

**Choose:** multilingual_v2 for reliability, flash/turbo for SSML control, v3 for maximum expressiveness (expect retakes).

### Voice Settings by Style

| Style | stability | similarity | style | speed |
|-------|-----------|------------|-------|-------|
| Natural/professional | 0.75-0.85 | 0.9 | 0.0-0.1 | 1.0 |
| Conversational | 0.5-0.6 | 0.85 | 0.3-0.4 | 0.9-1.0 |
| Energetic/YouTuber | 0.3-0.5 | 0.75 | 0.5-0.7 | 1.0-1.1 |

### Pauses Between Sections

**With flash/turbo models:** Use SSML break tags inline:
```
...end of section. <break time="1.5s" /> Start of next...
```
Max 3 seconds per break. Excessive breaks can cause speed artifacts.

**With multilingual_v2 / v3:** No SSML support. Options:
- Paragraph breaks (blank lines) — creates ~0.3-0.5s natural pause
- Post-process with ffmpeg: split audio and insert silence

**WARNING:** `...` (ellipsis) is NOT a reliable pause — it can be vocalized as a word/sound. Do not use ellipsis as a pause mechanism.

### Pronunciation Control

**Phonetic spelling (any model):** Write words as you want them pronounced:
- `Janus` → `Jan-us`
- `nginx` → `engine-x`
- Use dashes, capitals, apostrophes to guide pronunciation

**SSML phoneme tags (flash/turbo only):**
```
<phoneme alphabet="ipa" ph="ˈdʒeɪnəs">Janus</phoneme>
```

### Iterative Workflow

1. Generate → listen → identify pronunciation/pacing issues
2. Adjust: phonetic spellings, break tags, voice settings
3. Regenerate. If pauses aren't precise enough, add silence in post with ffmpeg rather than fighting the TTS engine.

## Voice Cloning

### Instant Voice Clone

```python
with open("sample.mp3", "rb") as f:
    voice = client.voices.ivc.create(
        name="My Voice",
        files=[f],
        remove_background_noise=True
    )
print(f"Voice ID: {voice.voice_id}")
```

- Use `client.voices.ivc.create()` (not `client.voices.clone()`)
- Pass file handles in binary mode (`"rb"`), not paths
- Convert m4a first: `ffmpeg -i input.m4a -codec:a libmp3lame -qscale:a 2 output.mp3`
- Multiple samples (2-3 clips) improve accuracy
- Save voice ID for reuse

**Professional Voice Clone:** Requires Creator plan+, 30+ min audio. See [reference.md](reference.md).

## Sound Effects

Max 22 seconds per generation.

```python
result = client.text_to_sound_effects.convert(
    text="Thunder rumbling followed by heavy rain",
    duration_seconds=10,
    prompt_influence=0.3
)
with open("thunder.mp3", "wb") as f:
    for chunk in result:
        f.write(chunk)
```

**Prompt tips:** Be specific — "Heavy footsteps on wooden floorboards, slow and deliberate, with creaking"

## Music Generation

10 seconds to 5 minutes. Use `client.music.compose()` (not `.generate()`).

```python
result = client.music.compose(
    prompt="Upbeat indie rock, catchy guitar riff, energetic drums, travel vlog",
    music_length_ms=60000,
    force_instrumental=True
)
with open("music.mp3", "wb") as f:
    for chunk in result:
        f.write(chunk)
```

**Prompt structure:** Genre, mood, instruments, tempo, use case. Add "no vocals" or use `force_instrumental=True` for background music.

## Remotion Integration

### Complete Workflow: Script to Synchronized Scene

```
VOICEOVER-SCRIPT.md → voiceover.py → public/audio/ → Remotion composition
        ↓                  ↓               ↓                 ↓
  Scene narration    Generate MP3    Audio files     <Audio> component
  with durations     per scene       with timing     synced to scenes
```

### Step 1: Generate Per-Scene Audio

Use the toolkit's voiceover tool to generate audio for each scene:

```bash
# Generate voiceover files for each scene
python tools/voiceover.py --scene-dir public/audio/scenes --json

# Output:
# public/audio/scenes/
#   ├── scene-01-title.mp3
#   ├── scene-02-problem.mp3
#   ├── scene-03-solution.mp3
#   └── manifest.json  (durations for each file)
```

The `manifest.json` contains timing info:
```json
{
  "scenes": [
    { "file": "scene-01-title.mp3", "duration": 4.2 },
    { "file": "scene-02-problem.mp3", "duration": 12.8 },
    { "file": "scene-03-solution.mp3", "duration": 15.3 }
  ],
  "totalDuration": 32.3
}
```

### Step 2: Use Audio in Remotion Composition

```tsx
// src/Composition.tsx
import { Audio, staticFile, Series, useVideoConfig } from 'remotion';

// Import scene components
import { TitleSlide } from './scenes/TitleSlide';
import { ProblemSlide } from './scenes/ProblemSlide';
import { SolutionSlide } from './scenes/SolutionSlide';

// Scene durations (from manifest.json, converted to frames at 30fps)
const SCENE_DURATIONS = {
  title: Math.ceil(4.2 * 30),      // 126 frames
  problem: Math.ceil(12.8 * 30),   // 384 frames
  solution: Math.ceil(15.3 * 30),  // 459 frames
};

export const MainComposition: React.FC = () => {
  return (
    <>
      {/* Scene sequence */}
      <Series>
        <Series.Sequence durationInFrames={SCENE_DURATIONS.title}>
          <TitleSlide />
        </Series.Sequence>
        <Series.Sequence durationInFrames={SCENE_DURATIONS.problem}>
          <ProblemSlide />
        </Series.Sequence>
        <Series.Sequence durationInFrames={SCENE_DURATIONS.solution}>
          <SolutionSlide />
        </Series.Sequence>
      </Series>

      {/* Audio track - plays continuously across all scenes */}
      <Audio src={staticFile('audio/voiceover.mp3')} volume={1} />

      {/* Optional: Background music at lower volume */}
      <Audio src={staticFile('audio/music.mp3')} volume={0.15} />
    </>
  );
};
```

### Step 3: Per-Scene Audio (Alternative)

For more control, add audio to each scene individually:

```tsx
// src/scenes/ProblemSlide.tsx
import { Audio, staticFile, useCurrentFrame } from 'remotion';

export const ProblemSlide: React.FC = () => {
  const frame = useCurrentFrame();

  return (
    <div style={{ /* slide styles */ }}>
      <h1>The Problem</h1>
      {/* Scene content */}

      {/* Audio starts when this scene starts (frame 0 of this sequence) */}
      <Audio src={staticFile('audio/scenes/scene-02-problem.mp3')} />
    </div>
  );
};
```

### Syncing Visuals to Voiceover

Calculate scene duration from audio, not the other way around:

```tsx
// src/config/timing.ts
import manifest from '../../public/audio/scenes/manifest.json';

const FPS = 30;

// Convert audio durations to frame counts
export const sceneDurations = manifest.scenes.reduce((acc, scene) => {
  const name = scene.file.replace(/^scene-\d+-/, '').replace('.mp3', '');
  acc[name] = Math.ceil(scene.duration * FPS);
  return acc;
}, {} as Record<string, number>);

// Usage in composition:
// <Series.Sequence durationInFrames={sceneDurations.title}>
```

### Audio Timing Patterns

```tsx
import { Audio, Sequence, interpolate, useCurrentFrame } from 'remotion';

// Fade in audio
export const FadeInAudio: React.FC<{ src: string; fadeFrames?: number }> = ({
  src,
  fadeFrames = 30
}) => {
  const frame = useCurrentFrame();
  const volume = interpolate(frame, [0, fadeFrames], [0, 1], {
    extrapolateRight: 'clamp',
  });
  return <Audio src={src} volume={volume} />;
};

// Delayed audio start
export const DelayedAudio: React.FC<{ src: string; delayFrames: number }> = ({
  src,
  delayFrames
}) => (
  <Sequence from={delayFrames}>
    <Audio src={src} />
  </Sequence>
);

// Usage:
// <FadeInAudio src={staticFile('audio/music.mp3')} fadeFrames={60} />
// <DelayedAudio src={staticFile('audio/sfx/whoosh.mp3')} delayFrames={45} />
```

### Voiceover + Demo Video Sync

When a scene has both voiceover and demo video:

```tsx
import { Audio, OffthreadVideo, staticFile, useVideoConfig } from 'remotion';

export const DemoScene: React.FC = () => {
  const { durationInFrames, fps } = useVideoConfig();

  // Calculate playback rate to fit demo into voiceover duration
  const demoDuration = 45; // seconds (original demo length)
  const sceneDuration = durationInFrames / fps; // seconds (from voiceover)
  const playbackRate = demoDuration / sceneDuration;

  return (
    <>
      <OffthreadVideo
        src={staticFile('demos/feature-demo.mp4')}
        playbackRate={playbackRate}
      />
      <Audio src={staticFile('audio/scenes/scene-04-demo.mp3')} />
    </>
  );
};
```

### Error Handling

```tsx
import { Audio, staticFile, delayRender, continueRender } from 'remotion';
import { useEffect, useState } from 'react';

export const SafeAudio: React.FC<{ src: string }> = ({ src }) => {
  const [handle] = useState(() => delayRender());
  const [audioReady, setAudioReady] = useState(false);

  useEffect(() => {
    const audio = new window.Audio(src);
    audio.oncanplaythrough = () => {
      setAudioReady(true);
      continueRender(handle);
    };
    audio.onerror = () => {
      console.error(`Failed to load audio: ${src}`);
      continueRender(handle); // Continue without audio rather than hang
    };
  }, [src, handle]);

  if (!audioReady) return null;
  return <Audio src={src} />;
};
```

### Toolkit Command: /generate-voiceover

The `/generate-voiceover` command handles the full workflow:

```
/generate-voiceover

1. Reads VOICEOVER-SCRIPT.md
2. Extracts narration for each scene
3. Generates audio via ElevenLabs API
4. Saves to public/audio/scenes/
5. Creates manifest.json with durations
6. Updates project.json with timing info
```

## Popular Voices

- George: `JBFqnCBsd6RMkjVDRZzb` (warm narrator)
- Rachel: `21m00Tcm4TlvDq8ikWAM` (clear female)
- Adam: `pNInz6obpgDQGcFmaJgB` (professional male)

List all: `client.voices.get_all()`

For full API docs, see [reference.md](reference.md).

.claude/skills/ffmpeg/SKILL.mdskill

Show content (13247 bytes)

---
name: ffmpeg
description: Video and audio processing with FFmpeg. Use for format conversion, resizing, compression, audio extraction, and preparing assets for Remotion. Triggers include converting GIF to MP4, resizing video, extracting audio, compressing files, or any media transformation task.
---

# FFmpeg for Video Production

FFmpeg is the essential tool for video/audio processing. This skill covers common operations for Remotion video projects.

## Quick Reference

### GIF to MP4 (Remotion-compatible)

```bash
ffmpeg -i input.gif -movflags faststart -pix_fmt yuv420p \
  -vf "scale=trunc(iw/2)*2:trunc(ih/2)*2" output.mp4
```

**Why these flags:**
- `-movflags faststart` - Moves metadata to start for web streaming
- `-pix_fmt yuv420p` - Ensures compatibility with most players
- `scale=trunc(...)` - Forces even dimensions (required by most codecs)

### Resize Video

```bash
# To 1920x1080 (maintain aspect ratio, add black bars)
ffmpeg -i input.mp4 -vf "scale=1920:1080:force_original_aspect_ratio=decrease,pad=1920:1080:(ow-iw)/2:(oh-ih)/2" output.mp4

# To 1920x1080 (crop to fill)
ffmpeg -i input.mp4 -vf "scale=1920:1080:force_original_aspect_ratio=increase,crop=1920:1080" output.mp4

# Scale to width, auto height
ffmpeg -i input.mp4 -vf "scale=1280:-2" output.mp4
```

### Compress Video

```bash
# Good quality, smaller file (CRF 23 is default, lower = better quality)
ffmpeg -i input.mp4 -c:v libx264 -crf 23 -preset medium -c:a aac -b:a 128k output.mp4

# Aggressive compression for web preview
ffmpeg -i input.mp4 -c:v libx264 -crf 28 -preset fast -c:a aac -b:a 96k output.mp4

# Target file size (e.g., ~10MB for 60s video = ~1.3Mbps)
ffmpeg -i input.mp4 -c:v libx264 -b:v 1300k -c:a aac -b:a 128k output.mp4
```

### Extract Audio

```bash
# Extract to MP3
ffmpeg -i input.mp4 -vn -acodec libmp3lame -q:a 2 output.mp3

# Extract to AAC
ffmpeg -i input.mp4 -vn -acodec aac -b:a 192k output.m4a

# Extract to WAV (uncompressed)
ffmpeg -i input.mp4 -vn output.wav
```

### Convert Audio Formats

```bash
# M4A to MP3 (for ElevenLabs voice samples)
ffmpeg -i input.m4a -codec:a libmp3lame -qscale:a 2 output.mp3

# WAV to MP3
ffmpeg -i input.wav -codec:a libmp3lame -b:a 192k output.mp3

# Adjust volume
ffmpeg -i input.mp3 -filter:a "volume=1.5" output.mp3
```

### Trim/Cut Video

```bash
# Cut from timestamp to duration (recommended - reliable)
ffmpeg -i input.mp4 -ss 00:00:30 -t 00:00:15 -c:v libx264 -c:a aac output.mp4

# Cut from timestamp to timestamp
ffmpeg -i input.mp4 -ss 00:00:30 -to 00:00:45 -c:v libx264 -c:a aac output.mp4

# Stream copy (faster but may lose frames at cut points)
# Only use when source has frequent keyframes
ffmpeg -i input.mp4 -ss 00:00:30 -t 00:00:15 -c copy output.mp4
```

**Note:** Re-encoding is recommended for trimming. Stream copy (`-c copy`) can silently drop video if the seek point doesn't align with a keyframe.

### Speed Up / Slow Down

```bash
# 2x speed (video and audio)
ffmpeg -i input.mp4 -filter_complex "[0:v]setpts=0.5*PTS[v];[0:a]atempo=2.0[a]" -map "[v]" -map "[a]" output.mp4

# 0.5x speed (slow motion)
ffmpeg -i input.mp4 -filter_complex "[0:v]setpts=2.0*PTS[v];[0:a]atempo=0.5[a]" -map "[v]" -map "[a]" output.mp4

# Video only (no audio)
ffmpeg -i input.mp4 -filter:v "setpts=0.5*PTS" -an output.mp4
```

### Concatenate Videos

```bash
# Create file list
echo "file 'clip1.mp4'" > list.txt
echo "file 'clip2.mp4'" >> list.txt
echo "file 'clip3.mp4'" >> list.txt

# Concatenate (same codec/resolution)
ffmpeg -f concat -safe 0 -i list.txt -c copy output.mp4

# Concatenate with re-encoding (different sources)
ffmpeg -f concat -safe 0 -i list.txt -c:v libx264 -c:a aac output.mp4
```

### Add Fade In/Out

```bash
# Fade in first 1 second, fade out last 1 second (30fps video)
ffmpeg -i input.mp4 -vf "fade=t=in:st=0:d=1,fade=t=out:st=9:d=1" -c:a copy output.mp4

# Audio fade
ffmpeg -i input.mp4 -af "afade=t=in:st=0:d=1,afade=t=out:st=9:d=1" -c:v copy output.mp4
```

### Get Video Info

```bash
# Duration, resolution, codec info
ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 input.mp4

# Full info
ffprobe -v quiet -print_format json -show_format -show_streams input.mp4
```

## Remotion-Specific Patterns

### Video Speed Adjustment for Remotion

**When to use FFmpeg vs Remotion `playbackRate`:**

| Scenario | Use FFmpeg | Use Remotion |
|----------|------------|--------------|
| Constant speed (1.5x, 2x) | Either works | ✅ Simpler |
| Extreme speeds (>4x or <0.25x) | ✅ More reliable | May have issues |
| Variable speed (accelerate over time) | ✅ Pre-process | Complex workaround needed |
| Need perfect audio sync | ✅ Guaranteed | Usually fine |
| Demo needs to fit voiceover timing | ✅ Pre-calculate | Runtime adjustment |

**Remotion limitation:** `playbackRate` must be constant. Dynamic interpolation like `playbackRate={interpolate(frame, [0, 100], [1, 5])}` won't work correctly because Remotion evaluates frames independently.

```bash
# Speed up demo to fit a scene (e.g., 60s demo into 20s = 3x speed)
ffmpeg -i demo-raw.mp4 \
  -filter_complex "[0:v]setpts=0.333*PTS[v];[0:a]atempo=3.0[a]" \
  -map "[v]" -map "[a]" \
  public/demos/demo-fast.mp4

# Slow motion for emphasis (0.5x speed)
ffmpeg -i action.mp4 \
  -filter_complex "[0:v]setpts=2.0*PTS[v];[0:a]atempo=0.5[a]" \
  -map "[v]" -map "[a]" \
  public/demos/action-slow.mp4

# Speed up without audio (common for screen recordings)
ffmpeg -i demo.mp4 -filter:v "setpts=0.5*PTS" -an public/demos/demo-2x.mp4

# Timelapse effect (10x speed, drop audio)
ffmpeg -i long-demo.mp4 -filter:v "setpts=0.1*PTS" -an public/demos/timelapse.mp4
```

**Calculate speed factor:**
- To fit X seconds of video into Y seconds of scene: `speed = X / Y`
- setpts multiplier = `1 / speed` (e.g., 3x speed = setpts=0.333*PTS)
- atempo value = `speed` (e.g., 3x speed = atempo=3.0)

**Extreme speed (>2x audio):** Chain atempo filters (each limited to 0.5-2.0 range):
```bash
# 4x speed audio
-filter_complex "[0:a]atempo=2.0,atempo=2.0[a]"

# 8x speed audio
-filter_complex "[0:a]atempo=2.0,atempo=2.0,atempo=2.0[a]"
```

### Prepare Demo Recording for Remotion

```bash
# Standard 1080p, 30fps, Remotion-ready
ffmpeg -i raw-recording.mp4 \
  -vf "scale=1920:1080:force_original_aspect_ratio=decrease,pad=1920:1080:(ow-iw)/2:(oh-ih)/2,fps=30" \
  -c:v libx264 -crf 18 -preset slow \
  -c:a aac -b:a 192k \
  -movflags faststart \
  public/demos/demo.mp4
```

### Screen Recording to Remotion Asset

```bash
# From iPhone/iPad recording (usually 60fps, variable resolution)
ffmpeg -i iphone-recording.mov \
  -vf "scale=1920:-2,fps=30" \
  -c:v libx264 -crf 20 \
  -an \
  public/demos/mobile-demo.mp4
```

### Batch Convert GIFs

```bash
for f in assets/*.gif; do
  ffmpeg -i "$f" -movflags faststart -pix_fmt yuv420p \
    -vf "scale=trunc(iw/2)*2:trunc(ih/2)*2" \
    "public/demos/$(basename "$f" .gif).mp4"
done
```

## Common Issues

### "Height not divisible by 2"
Add scale filter: `-vf "scale=trunc(iw/2)*2:trunc(ih/2)*2"`

### Video won't play in browser
Use: `-movflags faststart -pix_fmt yuv420p -c:v libx264`

### Audio out of sync after speed change
Use filter_complex with atempo: `-filter_complex "[0:v]setpts=0.5*PTS[v];[0:a]atempo=2.0[a]"`

### File too large
Increase CRF (23→28) or reduce resolution

## Quality Guidelines

| Use Case | CRF | Preset | Notes |
|----------|-----|--------|-------|
| Archive/Master | 18 | slow | Best quality, large files |
| Production | 20-22 | medium | Good balance |
| Web/Preview | 23-25 | fast | Smaller files |
| Draft/Quick | 28+ | veryfast | Fast encoding |

## Platform-Specific Output Optimization

After Remotion renders your video (typically to `out/video.mp4`), use FFmpeg to optimize for each distribution platform.

### Workflow Integration

```
Remotion render (master)     FFmpeg optimization      Platform upload
       ↓                            ↓                       ↓
   out/video.mp4  ────────→  out/video-youtube.mp4  ───→  YouTube
                  ────────→  out/video-twitter.mp4  ───→  Twitter/X
                  ────────→  out/video-linkedin.mp4 ───→  LinkedIn
                  ────────→  out/video-web.mp4      ───→  Website embed
```

### YouTube (Recommended Settings)

YouTube re-encodes everything, so upload high quality:

```bash
# YouTube optimized (1080p)
ffmpeg -i out/video.mp4 \
  -c:v libx264 -preset slow -crf 18 \
  -profile:v high -level 4.0 \
  -bf 2 -g 30 \
  -c:a aac -b:a 192k -ar 48000 \
  -movflags +faststart \
  out/video-youtube.mp4

# YouTube Shorts (vertical 1080x1920)
ffmpeg -i out/video.mp4 \
  -vf "scale=1080:1920:force_original_aspect_ratio=decrease,pad=1080:1920:(ow-iw)/2:(oh-ih)/2" \
  -c:v libx264 -crf 18 -c:a aac -b:a 192k \
  out/video-shorts.mp4
```

### Twitter/X

Twitter has strict limits: max 140s, 512MB, 1920x1200:

```bash
# Twitter optimized (under 15MB target for fast upload)
ffmpeg -i out/video.mp4 \
  -c:v libx264 -preset medium -crf 24 \
  -profile:v main -level 3.1 \
  -vf "scale='min(1280,iw)':'min(720,ih)':force_original_aspect_ratio=decrease" \
  -c:a aac -b:a 128k -ar 44100 \
  -movflags +faststart \
  -fs 15M \
  out/video-twitter.mp4

# Check file size and duration
ffprobe -v error -show_entries format=duration,size -of csv=p=0 out/video-twitter.mp4
```

### LinkedIn

LinkedIn prefers MP4 with AAC audio, max 10 minutes:

```bash
# LinkedIn optimized
ffmpeg -i out/video.mp4 \
  -c:v libx264 -preset medium -crf 22 \
  -profile:v main \
  -vf "scale='min(1920,iw)':'min(1080,ih)':force_original_aspect_ratio=decrease" \
  -c:a aac -b:a 192k -ar 48000 \
  -movflags +faststart \
  out/video-linkedin.mp4
```

### Website/Embed (Optimized for Fast Loading)

```bash
# Web-optimized MP4 (small file, progressive loading)
ffmpeg -i out/video.mp4 \
  -c:v libx264 -preset medium -crf 26 \
  -profile:v baseline -level 3.0 \
  -vf "scale=1280:720" \
  -c:a aac -b:a 128k \
  -movflags +faststart \
  out/video-web.mp4

# WebM alternative (better compression, wider browser support)
ffmpeg -i out/video.mp4 \
  -c:v libvpx-vp9 -crf 30 -b:v 0 \
  -vf "scale=1280:720" \
  -c:a libopus -b:a 128k \
  -deadline good \
  out/video-web.webm
```

### GIF (for Previews/Thumbnails)

```bash
# High-quality GIF (first 5 seconds)
ffmpeg -i out/video.mp4 -t 5 \
  -vf "fps=15,scale=480:-1:flags=lanczos,split[s0][s1];[s0]palettegen[p];[s1][p]paletteuse" \
  out/preview.gif

# Smaller file GIF
ffmpeg -i out/video.mp4 -t 3 \
  -vf "fps=10,scale=320:-1:flags=lanczos,split[s0][s1];[s0]palettegen[p];[s1][p]paletteuse" \
  out/preview-small.gif
```

### Platform Requirements Quick Reference

| Platform | Max Resolution | Max Size | Max Duration | Audio |
|----------|---------------|----------|--------------|-------|
| YouTube | 8K | 256GB | 12 hours | AAC 48kHz |
| Twitter/X | 1920x1200 | 512MB | 140s | AAC 44.1kHz |
| LinkedIn | 4096x2304 | 5GB | 10 min | AAC 48kHz |
| Instagram Feed | 1080x1350 | 4GB | 60s | AAC 48kHz |
| Instagram Reels | 1080x1920 | 4GB | 90s | AAC 48kHz |
| TikTok | 1080x1920 | 287MB | 10 min | AAC |

### Batch Export for All Platforms

```bash
#!/bin/bash
# save as: export-all-platforms.sh
INPUT="out/video.mp4"

# YouTube (high quality)
ffmpeg -i "$INPUT" -c:v libx264 -preset slow -crf 18 \
  -c:a aac -b:a 192k -movflags +faststart \
  out/video-youtube.mp4

# Twitter (compressed)
ffmpeg -i "$INPUT" -c:v libx264 -crf 24 \
  -vf "scale='min(1280,iw)':'-2'" \
  -c:a aac -b:a 128k -movflags +faststart \
  out/video-twitter.mp4

# LinkedIn
ffmpeg -i "$INPUT" -c:v libx264 -crf 22 \
  -c:a aac -b:a 192k -movflags +faststart \
  out/video-linkedin.mp4

# Web embed (small)
ffmpeg -i "$INPUT" -c:v libx264 -crf 26 \
  -vf "scale=1280:720" \
  -c:a aac -b:a 128k -movflags +faststart \
  out/video-web.mp4

echo "Exported:"
ls -lh out/video-*.mp4
```

## Error Handling

Common errors and fixes when processing video:

```bash
# Check if FFmpeg succeeded
ffmpeg -i input.mp4 -c:v libx264 output.mp4 && echo "Success" || echo "Failed: check input file"

# Validate output file is playable
ffprobe -v error -select_streams v:0 -show_entries stream=codec_name -of csv=p=0 output.mp4

# Get detailed error info
ffmpeg -v error -i input.mp4 -f null - 2>&1 | head -20
```

### Handling Common Failures

| Error | Cause | Fix |
|-------|-------|-----|
| "No such file" | Input path wrong | Check path, use quotes for spaces |
| "Invalid data" | Corrupted input | Re-download or re-record source |
| "height not divisible by 2" | Odd dimensions | Add scale filter with trunc |
| "encoder not found" | Missing codec | Install FFmpeg with full codecs |
| Output 0 bytes | Silent failure | Check full ffmpeg output for errors |

---

## Feedback & Contributions

If this skill is missing information or could be improved:

- **Missing a command?** Describe what you needed
- **Found an error?** Let me know what's wrong
- **Want to contribute?** I can help you:
  1. Update this skill with improvements
  2. Create a PR to github.com/digitalsamba/claude-code-video-toolkit

Just say "improve this skill" and I'll guide you through updating `.claude/skills/ffmpeg/SKILL.md`.

.claude/skills/qwen-edit/SKILL.mdskill

Show content (2842 bytes)

---
name: qwen-edit
description: AI image editing prompting patterns for Qwen-Image-Edit. Use when editing photos while preserving identity, reframing cropped images, changing clothing or accessories, adjusting poses, applying style transfers, or character transformations. Provides prompt patterns, parameter tuning, and examples.
---

# Qwen-Image-Edit Skill

AI-powered image editing using Qwen-Image-Edit-2511 via RunPod serverless.

**Status:** Evolving - learnings being captured as we experiment

## When to Use This Skill

Use when the user wants to:
- Edit/transform photos while preserving identity
- Reframe cropped images (fix cut-off heads, etc.)
- Change clothing, add accessories
- Change pose (arm positions, hand placement)
- Apply style transfers (cyberpunk, anime, oil painting)
- Adjust lighting/color grading
- Add/remove objects
- Character transformations (Bond, Neo, etc.)

## When NOT to Use

- **Background replacement (single image)** - creates cut-out artifacts, halos
- **Face swapping** - cannot preserve identity from reference
- **Outpainting** - can't extend canvas reliably

## Use With Care

- **Multi-image compositing** - CAN work with explicit identity anchors (see examples.md for prompt patterns). Requires describing distinctive features (hair texture/color, ethnicity, outfit) and using guidance ~2.0
- **Camera angle changes** - Inconsistent results. Vertical angles (low/high) work better than rotational (three-quarter view)

## Quick Reference

```bash
# Basic edit
python tools/image_edit.py --input photo.jpg --prompt "Add sunglasses"

# With negative prompt (recommended)
python tools/image_edit.py --input photo.jpg \
  --prompt "Reframe as portrait with full head visible" \
  --negative "blur, distortion, artifacts"

# Style transfer
python tools/image_edit.py --input photo.jpg --style cyberpunk

# Background (use cautiously - often fails)
python tools/image_edit.py --input photo.jpg --background office

# Higher quality
python tools/image_edit.py --input photo.jpg --prompt "..." --steps 16 --guidance 3.0

# Multi-image composite (identity-preserving)
python tools/image_edit.py --input person.jpg background.jpg \
  --prompt "The [ethnicity] [gender] with [hair description] from first image is now in [scene] from second image. Same [features], [outfit]." \
  --negative "different ethnicity, different hair color, different face shape, generic stock photo" \
  --steps 16 --guidance 2.0
```

## Key Files

- `prompting.md` - Prompt patterns and structure
- `examples.md` - Good/bad examples from experiments
- `parameters.md` - Tuning steps, guidance, negative prompts

## Tool Location

`tools/image_edit.py` - CLI wrapper for RunPod endpoint

## Related Docs

- `docs/qwen-edit-patterns.md` - Character transformation patterns
- `.ai_dev/qwen-edit-research.md` - Research notes

README

claude-code-video-toolkit

An AI-native video production workspace for Claude Code. Skills, commands, templates, and tools that give Claude Code everything it needs to help you create professional videos — from concept to final render.

Quick Start

git clone https://github.com/digitalsamba/claude-code-video-toolkit.git
cd claude-code-video-toolkit
python3 -m pip install -r tools/requirements.txt   # Optional: AI voiceover, image gen, music, moviepy examples
claude                                              # Open Claude Code in the toolkit

Then in Claude Code:

/setup                    # Configure cloud GPU, storage, voice (~5 min, mostly free)
/video                    # Create your first video

That's it. /setup walks you through everything interactively — cloud GPU provider, file transfer, voice config. /video creates a project from a template and guides you through the whole workflow.

What's free: The toolkit leans heavily on open-source AI models — voiceovers (Qwen3-TTS), image generation (FLUX.2), music (ACE-Step), and more. You deploy them to your own cloud GPU account and run them at cost. Cloudflare R2 has a generous free tier (10GB, zero egress), and Modal gives $30/month free compute on the Starter plan — more than enough for a few 5-minute videos a month.

Requirements: Node.js 18+ and Claude Code. Python 3.9+ recommended for AI tools. FFmpeg optional.

Want to skip setup and just render something?
cd examples/hello-world && npm install && npm run render
No API keys needed — outputs an MP4 immediately.

A Note from the Author (not AI-generated)

I've spent months painstakingly putting this toolkit together and plan to keep iterating on it. AI makes things easier, but hard work still has huge value. Every video I create is a chance for improvement — every skill, template, tool, and workflow here has been refined through that cycle. It would be wonderful if others wanted to get involved with that: use it, refine it, and feed back into the repo via an issue or PR what you learn.

My own use case is fairly specific: creating sprint review videos for the AI mobile development arm of Digital Samba. But the idea behind this project is a reusable toolkit for using Claude Code to autonomously generate any kind of "explainer" style video — product demos, walkthroughs, presentations, whatever you need. Autonomous video creation is a lofty ideal for such a subjective field, but we can try :)

What makes this work is that Claude Code is fantastically resourceful and flexible — give it the framing and tooling that this toolkit provides and it will adapt it to create templates and videos based on your prompting. The skills, templates, and tools here are building blocks. Claude Code is the builder. You are the director, editor, and designer.

If you're getting started, run /setup then /video and let Claude Code guide you. Or start with /template to create a template for your own use case.

Cloud GPU — I recommend Modal for running the toolkit's AI tools. The Starter plan gives you $30/month free compute, which is more than enough. RunPod is also supported as an alternative. Run /setup to deploy the tools you need.

My motto: Be brave. Experiment. And please share any videos you create or ideas you have back with the project — it helps me keep improving this toolkit for everyone.

Features

Skills

Claude Code has deep knowledge in:

Skill	Description
remotion	React-based video framework — compositions, animations, rendering
elevenlabs	AI audio — text-to-speech, voice cloning, music, sound effects
ffmpeg	Media processing — format conversion, compression, resizing
playwright-recording	Browser automation — record demos as video
frontend-design	Visual design refinement for distinctive, production-grade aesthetics
qwen-edit	AI image editing — prompting patterns and best practices
acestep	AI music generation — prompts, lyrics, scene presets, video integration
ltx2	AI video generation — text-to-video, image-to-video clips, prompting guide
moviepy	Python video composition — overlay text on LTX-2/SadTalker output, build.py-style projects
runpod	Cloud GPU — setup, Docker images, endpoint management, costs

Commands

Command	Description
`/setup`	First-time setup — cloud GPU, file transfer, voice, prerequisites
`/video`	Video projects — list, resume, or create new
`/scene-review`	Scene-by-scene review in Remotion Studio
`/design`	Focused design refinement session for a scene
`/brand`	Brand profiles — list, edit, or create new
`/template`	List available templates or create new ones
`/skills`	List installed skills or create new ones
`/contribute`	Share improvements — issues, PRs, examples
`/record-demo`	Record browser interactions with Playwright
`/generate-voiceover`	Generate AI voiceover from a script
`/redub`	Redub existing video with a different voice
`/voice-clone`	Record, test, and save a cloned voice to a brand
`/versions`	Check dependency versions and toolkit updates

Note: After creating or modifying commands/skills, restart Claude Code to load changes.

Templates

Pre-built video structures in templates/:

sprint-review — Sprint review videos with demos, stats, and voiceover
sprint-review-v2 — Composable scene-based sprint review with modular architecture
product-demo — Marketing videos with dark tech aesthetic, stats, CTA

See examples/ for finished projects you can learn from (oldest first, showing toolkit evolution):

Date	Demo	Description
2025-12-05	sprint-review-cho-oyu	iOS sprint review with demos
2025-12-10	digital-samba-skill-demo	Product demo showcasing Claude Code skill
2026-01-22	ds-remote-mcp	Remote MCP server demo (the jazz background music is a joke)
2026-01-25	schlumbergera	Android sprint review video
2026-02-23	cortina	Mobile platforms sprint review
2026-03-15	the-space-between	AI-generated video essay — flux2 avatar, Qwen3-TTS voice, SadTalker animation
2026-04-08	q2-townhall-longarm-ad	Super Bowl-style launch ad with dramatic Qwen3-TTS announcer and LTX-2 animated Lugh cameo
2026-04-08	q2-townhall-stars	GitHub star history time-lapse with animated chart and deadpan-to-excited commentary

Scene Transitions

The toolkit includes a transitions library for scene-to-scene effects:

Transition	Description
`glitch()`	Digital distortion with RGB shift
`rgbSplit()`	Chromatic aberration effect
`zoomBlur()`	Radial motion blur
`lightLeak()`	Cinematic lens flare
`clockWipe()`	Radial sweep reveal
`pixelate()`	Digital mosaic dissolution
`checkerboard()`	Grid-based reveal (9 patterns)

Plus official Remotion transitions: slide(), fade(), wipe(), flip()

Preview all transitions:

cd showcase/transitions && npm install && npm run studio

See lib/transitions/README.md for full documentation.

Brand Profiles

Define visual identity in brands/. When you create a project with /video, the brand's colors, fonts, and styling are automatically applied.

brands/my-brand/
├── brand.json    # Colors, fonts, typography
├── voice.json    # ElevenLabs voice settings
└── assets/       # Logo, backgrounds

Included brands: default, digital-samba

Create your own with /brand.

Project Management System

Video projects are tracked through a multi-session lifecycle:

planning → assets → review → audio → editing → rendering → complete

Each project has a project.json that tracks:

Scenes — What to show, asset status, visual types
Audio — Voiceover and music status
Sessions — Work history across Claude Code sessions
Phase — Current stage in the workflow

The system automatically reconciles intent (what you planned) with reality (what files exist), and generates a CLAUDE.md per project for instant context when resuming.

See lib/project/README.md for schema details, scene status tracking, and filesystem reconciliation logic.

Python Tools

Audio, video, and image tools in tools/:

# Generate voiceover (ElevenLabs)
python tools/voiceover.py --script script.md --output voiceover.mp3

# Generate voiceover (Qwen3-TTS — self-hosted, cheaper alternative)
python tools/voiceover.py --provider qwen3 --speaker Ryan --scene-dir public/audio/scenes --json
python tools/qwen3_tts.py --text "Hello world" --tone warm --output hello.mp3

# Generate background music (ElevenLabs)
python tools/music.py --prompt "Upbeat corporate" --duration 120 --output music.mp3

# Generate background music (ACE-Step — free cloud API, XL Turbo 4B model)
python tools/music_gen.py --preset corporate-bg --duration 120 --output music.mp3
python tools/music_gen.py --prompt "Dramatic cinematic" --duration 30 --bpm 90 --key "D Minor" --output reveal.mp3
python tools/music_gen.py --prompt "Upbeat indie rock" --duration 60 --variations 4 --output intro.mp3

# Generate sound effects
python tools/sfx.py --preset whoosh --output sfx.mp3

# Redub video with different voice
python tools/redub.py --input video.mp4 --voice-id VOICE_ID --output dubbed.mp4

# Add background music to existing video
python tools/addmusic.py --input video.mp4 --prompt "Subtle ambient" --output output.mp4

# Rebrand NotebookLM videos (trim outro, add your logo/URL)
python tools/notebooklm_brand.py --input video.mp4 --logo logo.png --url "mysite.com" --output branded.mp4

# AI image editing (style transfer, backgrounds, custom prompts)
python tools/image_edit.py --input photo.jpg --style cyberpunk --cloud modal
python tools/image_edit.py --input photo.jpg --prompt "Add sunglasses" --cloud modal

# AI image upscaling (2x/4x)
python tools/upscale.py --input photo.jpg --output photo_4x.png --cloud modal

# Remove watermarks (requires cloud GPU)
python tools/dewatermark.py --input video.mp4 --preset sora --output clean.mp4 --cloud modal

# Locate watermark coordinates
python tools/locate_watermark.py --input video.mp4 --grid --output-dir ./review/

# Generate talking head video from image + audio (SadTalker)
python tools/sadtalker.py --image portrait.png --audio voiceover.mp3 --output talking.mp4 --cloud modal

# AI image generation (FLUX.2 Klein 4B — text-to-image + editing)
python tools/flux2.py --prompt "A sunset over mountains" --cloud modal
python tools/flux2.py --preset title-bg --brand digital-samba --cloud modal
python tools/flux2.py --list-presets

# AI video generation (LTX-2.3 22B — text-to-video + image-to-video)
python tools/ltx2.py --prompt "A sunset over the ocean, cinematic" --cloud modal
python tools/ltx2.py --prompt "Gentle camera drift" --input photo.jpg --cloud modal

Tool Categories:

Type	Tools	Purpose
Project	voiceover, music, music_gen, sfx	Used during video creation workflow
Utility	redub, addmusic, notebooklm_brand, locate_watermark	Quick transformations, no project needed
Cloud GPU	image_edit, upscale, dewatermark, sadtalker, qwen3_tts, flux2, music_gen, ltx2	AI processing via Modal or RunPod

Cloud GPU (Modal + RunPod)

8 AI tools run on cloud GPUs. Use --cloud modal (recommended) or --cloud runpod on any tool.

Tool	What It Does	Est. Cost
`qwen3_tts`	AI text-to-speech (9 speakers, voice cloning)	~$0.01
`flux2`	AI image generation & editing	~$0.02
`image_edit`	AI image editing & style transfer	~$0.03
`upscale`	AI image upscaling (2x/4x)	~$0.01
`music_gen`	AI music generation (8 scene presets)	Free (acemusic) / ~$0.05 (self-hosted)
`sadtalker`	Talking head video from portrait + audio	~$0.10
`ltx2`	AI video generation (text-to-video, image-to-video)	~$0.23
`dewatermark`	Video watermark removal	~$0.10

Modal (recommended): Each tool deploys from docker/modal-*/app.py — Modal builds and hosts the containers. $30/month free compute on the Starter plan, typical usage is $1-2/month. Run /setup to deploy all tools automatically.

RunPod (alternative): Uses pre-built Docker images from ghcr.io/conalmullan/video-toolkit-*. Pay-per-second, no minimums. Run python3 tools/<tool>.py --setup to create endpoints.

See docs/modal-setup.md and docs/runpod-setup.md for details.

Project Structure

claude-code-video-toolkit/
├── .claude/
│   ├── skills/          # Domain knowledge for Claude
│   └── commands/        # Slash commands (/video, /brand, etc.)
├── lib/                 # Shared components, theme system, utilities
│   ├── components/      # Reusable video components (11 components)
│   ├── transitions/     # Scene transition effects (7 custom + 4 official)
│   ├── theme/           # ThemeProvider, useTheme
│   └── project/         # Multi-session project system
├── tools/               # Python CLI tools
├── templates/           # Video templates
├── brands/              # Brand profiles
├── projects/            # Your video projects (gitignored)
├── examples/            # Curated showcase projects with finished videos
├── assets/              # Shared assets
├── playwright/          # Recording infrastructure
├── docs/                # Documentation
└── _internal/           # Toolkit metadata & roadmap

Documentation

Getting Started
Modal Setup — Cloud GPU with Modal (recommended)
RunPod Setup — Cloud GPU with RunPod (alternative)
Creating Templates
Creating Brands
Project System — Multi-session lifecycle, schema, reconciliation
Optional Components — Local GPU tools setup
Toolkit Development — Roadmap, backlog, changelog

Video Workflow

/video → Script → Assets → Scene Review → Design → Audio → Preview → Render

Create project — Run /video, choose template and brand
Review script — Edit VOICEOVER-SCRIPT.md to plan content and assets
Gather assets — Record demos with /record-demo or add external videos
Scene review — Run /scene-review to verify visuals in Remotion Studio
Design refinement — Use /design to improve slide visuals with the frontend-design skill
Generate audio — AI voiceover with /generate-voiceover
Configure — Update config file with asset paths and timing
Preview — npm run studio for live preview
Iterate — Work with Claude Code to adjust timing, styling, content
Render — npm run render for final MP4

Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

License

MIT License — see LICENSE for details.

Built for use with Claude Code by Anthropic.