Curated Claude Code catalog
Updated 07.05.2026 · 19:39 CET
01 / Skill
digitalsamba

claude-code-video-toolkit

Quality
9.0

This toolkit transforms Claude Code into an AI-native video production workspace, integrating specialized skills, commands, and templates for creating professional videos. It excels at autonomously generating "explainer" style content, such as product demos, walkthroughs, and sprint reviews, by orchestrating open-source AI models for voiceovers, image generation, and music composition.

USP

It uniquely positions Claude Code as the "builder" and "director" of video projects, offering a flexible, AI-orchestrated workflow with deep integration of open-source AI models. This toolkit provides a structured yet adaptable framework f…

Use cases

  • 01Creating AI-generated explainer videos
  • 02Producing sprint review videos with demos
  • 03Developing product demo videos
  • 04Automating video content creation
  • 05Composing programmatic video with AI elements

Detected files (8)

  • .claude/skills/frontend-design/SKILL.mdskill
    Show content (4275 bytes)
    ---
    name: frontend-design
    description: Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
    license: Complete terms in LICENSE.txt
    ---
    
    This skill guides creation of distinctive, production-grade frontend interfaces that avoid generic "AI slop" aesthetics. Implement real working code with exceptional attention to aesthetic details and creative choices.
    
    The user provides frontend requirements: a component, page, application, or interface to build. They may include context about the purpose, audience, or technical constraints.
    
    ## Design Thinking
    
    Before coding, understand the context and commit to a BOLD aesthetic direction:
    - **Purpose**: What problem does this interface solve? Who uses it?
    - **Tone**: Pick an extreme: brutally minimal, maximalist chaos, retro-futuristic, organic/natural, luxury/refined, playful/toy-like, editorial/magazine, brutalist/raw, art deco/geometric, soft/pastel, industrial/utilitarian, etc. There are so many flavors to choose from. Use these for inspiration but design one that is true to the aesthetic direction.
    - **Constraints**: Technical requirements (framework, performance, accessibility).
    - **Differentiation**: What makes this UNFORGETTABLE? What's the one thing someone will remember?
    
    **CRITICAL**: Choose a clear conceptual direction and execute it with precision. Bold maximalism and refined minimalism both work - the key is intentionality, not intensity.
    
    Then implement working code (HTML/CSS/JS, React, Vue, etc.) that is:
    - Production-grade and functional
    - Visually striking and memorable
    - Cohesive with a clear aesthetic point-of-view
    - Meticulously refined in every detail
    
    ## Frontend Aesthetics Guidelines
    
    Focus on:
    - **Typography**: Choose fonts that are beautiful, unique, and interesting. Avoid generic fonts like Arial and Inter; opt instead for distinctive choices that elevate the frontend's aesthetics; unexpected, characterful font choices. Pair a distinctive display font with a refined body font.
    - **Color & Theme**: Commit to a cohesive aesthetic. Use CSS variables for consistency. Dominant colors with sharp accents outperform timid, evenly-distributed palettes.
    - **Motion**: Use animations for effects and micro-interactions. Prioritize CSS-only solutions for HTML. Use Motion library for React when available. Focus on high-impact moments: one well-orchestrated page load with staggered reveals (animation-delay) creates more delight than scattered micro-interactions. Use scroll-triggering and hover states that surprise.
    - **Spatial Composition**: Unexpected layouts. Asymmetry. Overlap. Diagonal flow. Grid-breaking elements. Generous negative space OR controlled density.
    - **Backgrounds & Visual Details**: Create atmosphere and depth rather than defaulting to solid colors. Add contextual effects and textures that match the overall aesthetic. Apply creative forms like gradient meshes, noise textures, geometric patterns, layered transparencies, dramatic shadows, decorative borders, custom cursors, and grain overlays.
    
    NEVER use generic AI-generated aesthetics like overused font families (Inter, Roboto, Arial, system fonts), cliched color schemes (particularly purple gradients on white backgrounds), predictable layouts and component patterns, and cookie-cutter design that lacks context-specific character.
    
    Interpret creatively and make unexpected choices that feel genuinely designed for the context. No design should be the same. Vary between light and dark themes, different fonts, different aesthetics. NEVER converge on common choices (Space Grotesk, for example) across generations.
    
    **IMPORTANT**: Match implementation complexity to the aesthetic vision. Maximalist designs need elaborate code with extensive animations and effects. Minimalist or refined designs need restraint, precision, and careful attention to spacing, typography, and subtle details. Elegance comes from executing the vision well.
    
    Remember: Claude is capable of extraordinary creative work. Don't hold back, show what can truly be created when thinking outside the box and committing fully to a distinctive vision.
    
  • .claude/skills/ltx2/SKILL.mdskill
    Show content (9783 bytes)
    ---
    name: ltx2
    description: AI video generation with LTX-2.3 22B — text-to-video, image-to-video clips for video production. Use when generating video clips, animating images, creating b-roll, animated backgrounds, or motion content. Triggers include video generation, animate image, b-roll, motion, video clip, text-to-video, image-to-video.
    ---
    
    # LTX-2.3 Video Generation
    
    Generate ~5 second video clips from text prompts or images using the LTX-2.3 22B DiT model.
    Runs on Modal (A100-80GB). Requires `MODAL_LTX2_ENDPOINT_URL` in `.env`.
    
    ## Quick Reference
    
    ```bash
    # Text-to-video
    python3 tools/ltx2.py --prompt "A sunset over the ocean, golden light on waves, cinematic" --output sunset.mp4
    
    # Image-to-video (animate a still image)
    python3 tools/ltx2.py --prompt "Gentle camera drift, soft ambient motion" --input photo.jpg --output animated.mp4
    
    # Custom resolution and duration
    python3 tools/ltx2.py --prompt "..." --width 1024 --height 576 --num-frames 161 --output wide.mp4
    
    # Fast mode (fewer steps, quicker)
    python3 tools/ltx2.py --prompt "..." --quality fast --output quick.mp4
    
    # Reproducible output
    python3 tools/ltx2.py --prompt "..." --seed 42 --output reproducible.mp4
    ```
    
    ## Parameters
    
    | Parameter | Default | Description |
    |-----------|---------|-------------|
    | `--prompt` | (required) | Text description of the video |
    | `--input` | - | Input image for image-to-video |
    | `--width` | 768 | Video width (divisible by 64) |
    | `--height` | 512 | Video height (divisible by 64) |
    | `--num-frames` | 121 | Frame count, must satisfy `(n-1) % 8 == 0` |
    | `--fps` | 24 | Frames per second |
    | `--quality` | standard | `standard` (30 steps) or `fast` (15 steps) |
    | `--steps` | 30 | Override inference steps directly |
    | `--seed` | random | Seed for reproducibility |
    | `--output` | auto | Output file path |
    | `--negative-prompt` | sensible default | What to avoid |
    | `--lora` | none | Style LoRA preset. Currently: `crt-terminal`. |
    
    ## Style LoRAs
    
    Style LoRAs bias the output toward a specific visual aesthetic. They're baked into the Modal image and selected per-request; switching LoRAs forces a pipeline rebuild (~60s one-time cost per container lifetime per switch).
    
    ### `crt-terminal` — CRT / pixel-art terminals
    
    Base: LTX-2.3 22B, trained by [@lovis93](https://huggingface.co/lovis93/crt-animation-terminal-ltx-2.3-lora) (Apache 2.0).
    
    ```bash
    # Trigger word is auto-prepended — write the prompt normally
    python3 tools/ltx2.py --lora crt-terminal \
      --prompt "a terminal typing out \"\\$ claude --continue\" character by character in glowing green pixel font, scanlines, phosphor glow, low choppy frame rate, hacker mood" \
      --output crt_claude.mp4
    ```
    
    **What the preset changes:**
    - Prepends `crtanim,` to the prompt (the LoRA's trigger word)
    - Defaults to 1024×1024, 121 frames (the ratio it was trained on)
    - Relaxes the default negative prompt so on-screen text isn't filtered out
    
    **Prompt pattern:** `<CRT aesthetic> → <color palette> → <animation style> → <subject> → <literal text in quotes> → <mood>`. Keep on-screen text to 1–3 words — the model can't render long strings reliably. The LoRA prefers static framing; ask for camera moves explicitly if you want them.
    
    ## Valid Frame Counts
    
    `(n - 1) % 8 == 0`: 25 (~1s), 49 (~2s), 73 (~3s), 97 (~4s), **121 (~5s default)**, 161 (~6.7s), 193 (~8s max practical).
    
    ## Common Resolutions
    
    | Resolution | Ratio | Notes |
    |------------|-------|-------|
    | 768x512 | 3:2 | Default, good balance |
    | 512x512 | 1:1 | Square, fastest |
    | 1024x576 | 16:9 | Widescreen |
    | 576x1024 | 9:16 | Portrait/vertical |
    
    ## Prompting Guide
    
    LTX-2 responds well to cinematographic descriptions. Layer these dimensions:
    
    - **Camera:** "Slow dolly forward", "Aerial drone shot", "Tracking shot", "Static wide angle"
    - **Lighting:** "Golden hour", "Cinematic lighting", "Neon-lit", "Soft diffused light"
    - **Motion:** "Timelapse of...", "Slow motion", "Gentle camera drift", "Gradually transitions"
    - **Style:** "Shot on 35mm film", "Documentary style", "Clean minimal aesthetic"
    - **Negative:** Always implicitly avoids "worst quality, blurry, jittery, watermark, text, logo"
    
    Keep prompts under 200 words. Be specific about the scene.
    
    ### Good Prompts
    
    ```
    # Atmospheric b-roll
    "Aerial drone shot slowly flying over turquoise ocean waves breaking on white sand, golden hour sunlight, cinematic"
    
    # Product/tech scene
    "Close-up of hands typing on a mechanical keyboard, shallow depth of field, soft desk lamp lighting, cozy atmosphere"
    
    # Abstract background
    "Dark moody abstract background with flowing blue light streaks, subtle geometric grid, bokeh particles floating, cinematic tech atmosphere"
    
    # Animate a portrait
    "Professional headshot, subtle natural head movement, confident warm expression, studio lighting, shallow depth of field"
    
    # Animate a slide/screenshot
    "Gentle subtle particle effects floating across a presentation slide, soft ambient light shifts, very slight camera drift"
    ```
    
    ### Bad Prompts
    
    ```
    # Too vague
    "A cool video"
    
    # Too many competing ideas
    "A cat riding a skateboard while juggling fire on the moon during a thunderstorm"
    
    # Describing text/UI (model can't render text reliably)
    "A website showing the text 'Welcome to our platform'"
    ```
    
    ## Video Production Use Cases
    
    ### B-Roll Clips
    Generate atmospheric 5s shots for cutaways between narrated scenes:
    ```bash
    python3 tools/ltx2.py --prompt "Futuristic holographic interface, glowing data visualizations, clean workspace, cinematic" --output broll_tech.mp4
    python3 tools/ltx2.py --prompt "Aerial view of European city at golden hour, modern architecture" --output broll_europe.mp4
    ```
    
    ### Animated Slide Backgrounds
    Feed a slide screenshot and add subtle motion:
    ```bash
    python3 tools/ltx2.py --prompt "Gentle particle effects, soft ambient light shifts, very slight camera drift" --input slide.png --output animated_slide.mp4
    ```
    
    ### Animated Portraits
    Bring still headshots to life:
    ```bash
    python3 tools/ltx2.py --prompt "Subtle natural head movement, warm expression, professional lighting" --input headshot.png --output animated_portrait.mp4
    ```
    
    ### Stylized Character Cameo (SadTalker Alternative)
    For non-realistic faces — fantasy characters, masked figures, heavy beards, helmets, illustrations — SadTalker often produces uncanny or broken lip sync because it's trained on photoreal humans. LTX-2 image-to-video is frequently a better choice when **lip-sync precision isn't critical** (the viewer's brain fills in the gap as long as something is moving). Prompt for *motion + atmosphere*, not phonemes:
    
    ```bash
    python3 tools/ltx2.py \
      --input character_portrait.png \
      --prompt "Ancient warrior speaks slowly with gravitas, beard shifts subtly, glowing aura pulses, embers drift past, slow head movement, cinematic close-up, mystical atmosphere" \
      --width 768 --height 768 \
      --output character_speaking.mp4
    ```
    
    **When LTX-2 wins over SadTalker:**
    - Stylized / illustrated / fantasy characters
    - Heavy facial hair or accessories obscuring the mouth
    - Masked or helmeted figures
    - Short cameo lines where atmosphere matters more than precision
    - Dramatic VO rather than dialogue
    
    **When SadTalker still wins:**
    - Photoreal human presenters
    - Full sentences where mouth shape needs to match phonemes
    - Tutorials / talking-head explainers where the viewer is effectively reading lips
    
    ### Branded Intro/Outro
    Generate abstract motion backgrounds for title cards:
    ```bash
    python3 tools/ltx2.py --prompt "Dark moody background with flowing blue and coral light streaks, bokeh particles, cinematic tech atmosphere, no text" --output intro_bg.mp4
    ```
    
    ### Combining with Other Tools
    
    LTX-2 generates raw clips. Combine with the rest of the toolkit:
    
    | Workflow | Tools |
    |----------|-------|
    | Generate clip → upscale | `ltx2.py` → `upscale.py` |
    | Generate clip → add to Remotion | `ltx2.py` → use as `<OffthreadVideo>` in composition |
    | Generate image → animate | `flux2.py` → `ltx2.py --input` |
    | Generate clip → extract audio | `ltx2.py` → `ffmpeg -i clip.mp4 -vn audio.wav` |
    | Generate clip → add voiceover | `ltx2.py` → mix with `qwen3_tts.py` output |
    
    ## Technical Details
    
    - **Model:** LTX-2.3 22B DiT (Lightricks), bf16
    - **GPU:** A100-80GB on Modal (~$4.68/hr)
    - **Inference:** ~2.5 min per clip (768x512, 121 frames, 30 steps)
    - **Cost:** ~$0.20-0.25 per 5s clip
    - **Cold start:** ~60-90s (loading ~55GB weights)
    - **Output:** H.264 MP4 with synchronized ambient audio (24fps)
    - **Max duration:** ~8s (193 frames) per clip
    
    ### Known Limitations
    
    - **Training data artifacts:** ~30% of generations may have unwanted logos/text from training data. Re-run with different `--seed`.
    - **Text rendering:** Cannot reliably generate readable text in video. Use Remotion overlays instead.
    - **Max duration:** ~8s per clip. Longer content needs stitching.
    - **Audio:** Generated audio is ambient/environmental only. Use voiceover/music tools for speech and music.
    - **License:** Community License — free under $10M revenue, commercial license needed above that.
    
    ## Setup
    
    ```bash
    # 1. Create Modal secret for HuggingFace (one-time)
    modal secret create huggingface-token HF_TOKEN=hf_your_token
    
    # 2. Deploy (downloads ~55GB of weights, takes ~10 min)
    modal deploy docker/modal-ltx2/app.py
    
    # 3. Save endpoint URL to .env
    echo "MODAL_LTX2_ENDPOINT_URL=https://yourname--video-toolkit-ltx2-ltx2-generate.modal.run" >> .env
    
    # 4. Test
    python3 tools/ltx2.py --prompt "A candle flickering on a dark table, cinematic" --output test.mp4
    ```
    
    **Important:** HuggingFace token needs read-access scope. Accept the [Gemma 3 license](https://huggingface.co/google/gemma-3-12b-it-qat-q4_0-unquantized) before deploying. Unauthenticated downloads are severely rate-limited.
    
  • .claude/skills/moviepy/SKILL.mdskill
    Show content (13198 bytes)
    ---
    name: moviepy
    description: Python video composition with moviepy 2.x — overlaying deterministic text on AI-generated video (LTX-2, SadTalker), compositing clips, single-file build.py video projects. Use when adding labels/captions/lower-thirds to LTX-2 or SadTalker outputs, building short ad-style spots in pure Python without Remotion, or doing programmatic video composition. Triggers include text overlay on video, label LTX-2 clip, caption SadTalker output, lower third, build.py video, moviepy, Python video composition, sub-30s ad spot.
    ---
    
    # moviepy for Video Production
    
    moviepy is the toolkit's go-to library for **putting deterministic text on top of AI-generated video** and for building short, single-file Python video projects without a Remotion toolchain.
    
    The deeper principle is **trustworthy text**: any genre where text *has to* be readable, accurate, and consistent (legally, editorially, or commercially) is a genre where AI-rendered in-frame text is unacceptable and a moviepy overlay step is the natural fix. Names must be spelled right. Prices must be exact. Source attributions must be pixel-perfect. AI generation models cannot guarantee any of that.
    
    ## When to use moviepy vs. Remotion
    
    | Use moviepy when… | Use Remotion when… |
    |-------------------|---------------------|
    | Overlaying text/labels on an LTX-2 or SadTalker output | Building long-form sprint reviews or product demos |
    | Building sub-30s ad-style spots in a single `build.py` | Multi-template, multi-brand, design-heavy work |
    | Compositing data-driven visuals (matplotlib `FuncAnimation` → mp4) | Anything needing React components or design system reuse |
    | One-off transformations on existing video files | Anything where the project lifecycle (planning → render) matters |
    | You want zero Node.js / no React mental overhead | You want hot-reload preview in Remotion Studio |
    
    Two runnable references for everything in this skill live in `examples/`:
    
    - **`examples/quick-spot/build.py`** — 15-second ad-style spot. Audio-anchored timeline, text overlay, optional VO + ducked music. Renders silent out of the box with zero external assets.
    - **`examples/data-viz-chart/build.py`** — animated time-series chart with deterministic title and source attribution. Demonstrates the matplotlib (data) + moviepy (trustworthy text) split.
    
    Both run with `python3 build.py` and produce a real `out.mp4` immediately. Read them alongside this skill — every pattern below is shown working there.
    
    **Dependencies.** `moviepy`, `Pillow`, and `matplotlib` are declared in `tools/requirements.txt` and installed with the toolkit's one-line Python setup: `python3 -m pip install -r tools/requirements.txt`. If you hit `Missing dependency` when running an example, run that command from the repo root — the examples' `build.py` files will tell you the same thing in their error message and exit cleanly rather than printing a bare traceback.
    
    ## The main use case: text on AI-generated video
    
    Both LTX-2 and SadTalker output bare visuals:
    
    - **LTX-2** cannot reliably render readable text (the model hallucinates letterforms — see the ltx2 skill's "Bad Prompts").
    - **SadTalker** outputs a talking head with no captions, labels, lower thirds, or context.
    
    The fix is to generate the visual cleanly, then composite text over it deterministically with moviepy. This is the canonical pattern in this toolkit:
    
    ```python
    from moviepy import VideoFileClip, ImageClip, CompositeVideoClip
    
    # 1. AI-generated visual (LTX-2 or SadTalker output)
    bg = VideoFileClip("lugh_ltx.mp4").without_audio()
    
    # 2. Text rendered via PIL → ImageClip (see "Text rendering" below)
    title = (
        ImageClip("text_cache/intro_title.png")
        .with_duration(2.0)
        .with_start(0.5)
        .with_position(("center", 880))
    )
    
    # 3. Composite
    final = CompositeVideoClip([bg, title], size=(1920, 1080))
    final.write_videofile("lugh_with_caption.mp4", fps=30, codec="libx264")
    ```
    
    Common shapes this takes:
    
    | Shape | LTX-2 use | SadTalker use |
    |-------|-----------|---------------|
    | Title card over hero footage | "INTRODUCING LONGARM" over a cinematic LTX-2 b-roll | n/a |
    | Lower third / name plate | n/a | "Lugh — Ancient Warrior God" under a talking head |
    | Quote caption | "I am going home." over an LTX-2 character cameo | Same, over a SadTalker talking head |
    | Brand attribution | Logo + URL fade-in over the last second | Same |
    | Tinted overlay for contrast | Dark navy semi-transparent layer behind text | Same |
    
    ## Genres where this shines
    
    The "AI-visual + deterministic text overlay" pattern is the natural production pipeline for several styles of video. If the request matches one of these, reach for moviepy by default:
    
    | Genre | What you overlay | Why moviepy is the right call |
    |-------|------------------|-------------------------------|
    | **News / talking-head journalism** | Speaker name plates, location bars, breaking-news banners, source attribution, pull quotes | Names must be spelled right (editorial / legal). The biggest category by volume. |
    | **Documentary segments** | Interviewee lower thirds, chapter titles, archival source credits, location stamps | Same trust requirement as news. |
    | **Trailers / promo spots** | Title cards, credit overlays ("FROM THE DIRECTOR OF…"), date stings, quote cards, CTAs | Tightly timed, text-heavy, every frame matters. The `q2-townhall-longarm-ad` example is exactly this. |
    | **Social short-form (Reels, TikTok, Shorts)** | Word-accurate captions for sound-off viewing, hashtag overlays | Most social viewing is muted; captions are non-negotiable. |
    | **Product demos with annotations** | Pricing callouts, feature labels, "click here" pointers over screen recordings, before/after labels | Prices and product names must be exact. |
    | **Tutorials / explainers** | Step number overlays, terminal-command captions, keyboard-shortcut callouts | Step numbers must be sequential, commands must be copy-pasteable. |
    
    Lesser-but-real fits: music videos (lyric overlays), reaction videos (source attribution), sports recaps (score overlays), real-estate tours (price / sqft), conference talks (speaker + session plate).
    
    **For full SRT-driven subtitling** (long-form, time-coded, multilingual) moviepy is workable but not ideal — reach for `ffmpeg` with `subtitles` filter or a dedicated subtitle tool. moviepy is best for hand-placed overlays, not bulk caption tracks.
    
    ## Text rendering — use PIL, not `TextClip`
    
    **Critical gotcha:** moviepy 2.x's `TextClip(method='label')` has a tight-bbox bug that **clips letter ascenders and descenders** (the tops of capitals, the tails of g/p/y). On Apple Silicon you'll see characters with sliced edges and not realise what's wrong for hours.
    
    **The workaround:** render text to a transparent PNG via PIL, then load it as an `ImageClip`. Cache the result by content hash so re-builds are free.
    
    ```python
    import hashlib
    from pathlib import Path
    from PIL import Image, ImageDraw, ImageFont
    
    ARIAL_BOLD = "/System/Library/Fonts/Supplemental/Arial Bold.ttf"
    
    def render_text_png(txt, size, hex_color, cache_dir="./text_cache"):
        cache = Path(cache_dir); cache.mkdir(parents=True, exist_ok=True)
        key = hashlib.sha1(f"{txt}|{size}|{hex_color}".encode()).hexdigest()[:16]
        path = cache / f"{key}.png"
        if path.exists():
            return str(path)
    
        font = ImageFont.truetype(ARIAL_BOLD, size)
        bbox = ImageDraw.Draw(Image.new("RGBA", (1, 1))).textbbox((0, 0), txt, font=font)
        tw, th = bbox[2] - bbox[0], bbox[3] - bbox[1]
        pad = max(20, size // 4)
    
        img = Image.new("RGBA", (tw + pad * 2, th + pad * 2), (0, 0, 0, 0))
        rgb = tuple(int(hex_color.lstrip("#")[i:i+2], 16) for i in (0, 2, 4))
        ImageDraw.Draw(img).text((pad - bbox[0], pad - bbox[1]), txt, font=font, fill=(*rgb, 255))
        img.save(path)
        return str(path)
    ```
    
    The full helper (with kwargs for bold, position, fades, and cleaner ergonomics) is in `examples/quick-spot/build.py` — copy it rather than re-implementing.
    
    ## Audio-anchored timeline pattern
    
    For ad-style edits where every frame matters, generate per-scene VO first and anchor every visual to known absolute timestamps. This eliminates timing drift entirely. See **CLAUDE.md → Video Timing → Audio-Anchored Timelines** for the full pattern. The short version:
    
    ```python
    # Audio-anchored timeline (25s):
    #   Scene 1 tired      0.3 → 3.74  (audio 3.44s)
    #   Scene 2 worries    4.0 → 8.88  (audio 4.88s)
    
    text_clip("TIRED OF",     start=0.5,  duration=1.2)
    text_clip("THIRD-PARTY",  start=1.0,  duration=1.8)
    vo_clip("01_tired.mp3",   start=0.3)
    vo_clip("02_worries.mp3", start=4.0)
    ```
    
    ## Common recipes
    
    ### Text on a single AI-generated clip
    
    ```python
    from moviepy import VideoFileClip, ImageClip, CompositeVideoClip
    
    bg = VideoFileClip("ltx_hero.mp4").without_audio()
    caption = (
        ImageClip(render_text_png("THE FUTURE OF AGENTS", 140, "#FFFFFF"))
        .with_duration(bg.duration)
        .with_position(("center", 880))
    )
    CompositeVideoClip([bg, caption], size=bg.size).write_videofile("captioned.mp4", fps=30)
    ```
    
    ### Lower third over a SadTalker talking head
    
    ```python
    from moviepy import VideoFileClip, ImageClip, ColorClip, CompositeVideoClip
    
    talking = VideoFileClip("narrator_sadtalker.mp4")
    W, H = talking.size
    
    # Semi-transparent bar across the bottom for contrast
    bar = (
        ColorClip((W, 140), color=(20, 24, 38))
        .with_duration(talking.duration)
        .with_opacity(0.75)
        .with_position(("center", H - 160))
    )
    name = (
        ImageClip(render_text_png("LUGH", 72, "#F06859"))
        .with_duration(talking.duration)
        .with_position((80, H - 150))
    )
    title = (
        ImageClip(render_text_png("Ancient Warrior God", 36, "#FFFFFF"))
        .with_duration(talking.duration)
        .with_position((80, H - 80))
    )
    CompositeVideoClip([talking, bar, name, title]).write_videofile("with_lower_third.mp4", fps=30)
    ```
    
    ### Tinted overlay for text contrast over busy footage
    
    LTX-2 b-roll is often too visually busy for legible text. Drop a semi-transparent navy layer between the video and the text:
    
    ```python
    from moviepy import ColorClip
    
    tint = (
        ColorClip((W, H), color=(20, 24, 38))
        .with_duration(duration)
        .with_opacity(0.55)
    )
    # Composite order: bg → tint → text
    CompositeVideoClip([bg, tint, text_clip])
    ```
    
    ### Side-by-side composite
    
    ```python
    from moviepy import VideoFileClip, CompositeVideoClip, ColorClip
    
    left  = VideoFileClip("demo_a.mp4").resized(width=960).with_position((  0, "center"))
    right = VideoFileClip("demo_b.mp4").resized(width=960).with_position((960, "center"))
    bg    = ColorClip((1920, 1080), color=(0, 0, 0)).with_duration(max(left.duration, right.duration))
    CompositeVideoClip([bg, left, right]).write_videofile("split.mp4", fps=30)
    ```
    
    ### Mix per-scene VO with ducked music
    
    ```python
    from moviepy import AudioFileClip, CompositeAudioClip
    from moviepy.audio.fx.MultiplyVolume import MultiplyVolume
    from moviepy.audio.fx.AudioFadeIn import AudioFadeIn
    from moviepy.audio.fx.AudioFadeOut import AudioFadeOut
    
    music = AudioFileClip("music.mp3").with_effects([
        MultiplyVolume(0.22),  # duck under VO
        AudioFadeIn(0.5),
        AudioFadeOut(1.5),
    ])
    vo = [
        AudioFileClip(f"scenes/0{i}.mp3").with_effects([MultiplyVolume(1.15)]).with_start(start)
        for i, start in [(1, 0.3), (2, 4.0), (3, 9.1)]
    ]
    final_audio = CompositeAudioClip([music] + vo)
    ```
    
    ## Gotchas
    
    - **moviepy 2.x renamed methods.** Use `subclipped` (not `subclip`), `with_duration` / `with_start` / `with_position` (not `set_duration` etc.), `with_effects([...])` instead of `.fadein()`/`.fadeout()`. Many tutorials online still show 1.x syntax — be skeptical.
    - **`TextClip(method='label')` clips ascenders/descenders.** Always use the PIL workaround above.
    - **`OffthreadVideo` is Remotion-only.** moviepy uses `VideoFileClip`. Don't mix the two.
    - **Resizing requires Pillow ≥ 10.0** for the LANCZOS resample. If you see `ANTIALIAS` errors, upgrade Pillow.
    - **`ColorClip` takes RGB tuples, not hex strings.** Use `(20, 24, 38)`, not `"#141826"`.
    - **Audio in `VideoFileClip` is loaded by default.** Call `.without_audio()` if you only want the visual — composing with audio you don't want will cause silent VO drops in `CompositeAudioClip`.
    - **Always set `size=(W, H)` on `CompositeVideoClip`.** Without it, output dimensions follow the first clip, which can be smaller than your target.
    
    ## When to reach for what
    
    | Task | Tool |
    |------|------|
    | Animate a still image | `tools/ltx2.py --input` |
    | Talking head from photoreal portrait | `tools/sadtalker.py` |
    | Talking head from stylized character | `tools/ltx2.py --input` (see ltx2 skill) |
    | **Add a label/caption/lower third to either of the above** | **moviepy + PIL (this skill)** |
    | Convert / compress / resize an existing file | `ffmpeg` (see ffmpeg skill) |
    | Long-form, design-system-driven video | Remotion (see remotion skill) |
    
    ## References
    
    - Runnable example — short ad-style spot: `examples/quick-spot/build.py`
    - Runnable example — data-viz with text overlay: `examples/data-viz-chart/build.py`
    - Audio-anchored timelines: `CLAUDE.md → Video Timing → Audio-Anchored Timelines`
    - Related skills: `ltx2`, `ffmpeg`, `remotion`
    
  • .claude/skills/playwright-recording/SKILL.mdskill
    Show content (12670 bytes)
    ---
    name: playwright-recording
    description: Record browser interactions as video using Playwright. Use for capturing demo videos, app walkthroughs, and UI flows for Remotion videos. Triggers include recording a demo, capturing browser video, screen recording a website, or creating walkthrough footage.
    ---
    
    # Playwright Video Recording
    
    Playwright can record browser interactions as video - perfect for demo footage in Remotion compositions.
    
    ## Quick Start
    
    ### Installation
    
    ```bash
    # In your video project
    npm init -y
    npm install -D playwright @playwright/test
    npx playwright install chromium
    ```
    
    ### Basic Recording Script
    
    ```typescript
    // scripts/record-demo.ts
    import { chromium } from 'playwright';
    
    async function recordDemo() {
      const browser = await chromium.launch();
      const context = await browser.newContext({
        viewport: { width: 1920, height: 1080 },
        recordVideo: {
          dir: './recordings',
          size: { width: 1920, height: 1080 }
        }
      });
    
      const page = await context.newPage();
    
      // Your recording actions
      await page.goto('https://example.com');
      await page.waitForTimeout(2000);
      await page.click('button.demo');
      await page.waitForTimeout(3000);
    
      // Close to save video
      await context.close();
      await browser.close();
    
      console.log('Recording saved to ./recordings/');
    }
    
    recordDemo();
    ```
    
    Run with:
    ```bash
    npx ts-node scripts/record-demo.ts
    # or
    npx tsx scripts/record-demo.ts
    ```
    
    ## Recording Configuration
    
    ### Viewport Sizes
    
    ```typescript
    // Standard 1080p (recommended for Remotion)
    viewport: { width: 1920, height: 1080 }
    
    // 720p (smaller files)
    viewport: { width: 1280, height: 720 }
    
    // Square (social media)
    viewport: { width: 1080, height: 1080 }
    
    // Mobile
    viewport: { width: 390, height: 844 } // iPhone 14
    ```
    
    ### Video Quality Settings
    
    ```typescript
    const context = await browser.newContext({
      viewport: { width: 1920, height: 1080 },
      recordVideo: {
        dir: './recordings',
        size: { width: 1920, height: 1080 } // Match viewport for crisp output
      },
      // Slow down for visibility
      // Note: slowMo is on browser launch, not context
    });
    
    // For slow motion, launch browser with slowMo
    const browser = await chromium.launch({
      slowMo: 100 // 100ms delay between actions
    });
    ```
    
    ## Recording Patterns
    
    ### Form Submission Demo
    
    ```typescript
    import { chromium } from 'playwright';
    
    async function recordFormDemo() {
      const browser = await chromium.launch({ slowMo: 50 });
      const context = await browser.newContext({
        viewport: { width: 1920, height: 1080 },
        recordVideo: { dir: './recordings', size: { width: 1920, height: 1080 } }
      });
      const page = await context.newPage();
    
      await page.goto('https://myapp.com/form');
      await page.waitForTimeout(1000);
    
      // Type with realistic speed
      await page.fill('#name', 'John Smith', { timeout: 5000 });
      await page.waitForTimeout(500);
    
      await page.fill('#email', 'john@example.com');
      await page.waitForTimeout(500);
    
      // Click submit
      await page.click('button[type="submit"]');
    
      // Wait for result
      await page.waitForSelector('.success-message');
      await page.waitForTimeout(2000);
    
      await context.close();
      await browser.close();
    }
    ```
    
    ### Multi-Page Navigation
    
    ```typescript
    async function recordNavDemo() {
      const browser = await chromium.launch({ slowMo: 100 });
      const context = await browser.newContext({
        viewport: { width: 1920, height: 1080 },
        recordVideo: { dir: './recordings', size: { width: 1920, height: 1080 } }
      });
      const page = await context.newPage();
    
      // Page 1
      await page.goto('https://myapp.com');
      await page.waitForTimeout(2000);
    
      // Navigate to page 2
      await page.click('nav a[href="/features"]');
      await page.waitForLoadState('networkidle');
      await page.waitForTimeout(2000);
    
      // Navigate to page 3
      await page.click('nav a[href="/pricing"]');
      await page.waitForLoadState('networkidle');
      await page.waitForTimeout(2000);
    
      await context.close();
      await browser.close();
    }
    ```
    
    ### Scroll Demo
    
    ```typescript
    async function recordScrollDemo() {
      const browser = await chromium.launch();
      const context = await browser.newContext({
        viewport: { width: 1920, height: 1080 },
        recordVideo: { dir: './recordings', size: { width: 1920, height: 1080 } }
      });
      const page = await context.newPage();
    
      await page.goto('https://myapp.com/long-page');
      await page.waitForTimeout(1000);
    
      // Smooth scroll
      await page.evaluate(async () => {
        const delay = (ms: number) => new Promise(r => setTimeout(r, ms));
        for (let i = 0; i < 10; i++) {
          window.scrollBy({ top: 200, behavior: 'smooth' });
          await delay(300);
        }
      });
    
      await page.waitForTimeout(1000);
      await context.close();
      await browser.close();
    }
    ```
    
    ### Login Flow
    
    ```typescript
    async function recordLoginDemo() {
      const browser = await chromium.launch({ slowMo: 75 });
      const context = await browser.newContext({
        viewport: { width: 1920, height: 1080 },
        recordVideo: { dir: './recordings', size: { width: 1920, height: 1080 } }
      });
      const page = await context.newPage();
    
      await page.goto('https://myapp.com/login');
      await page.waitForTimeout(1000);
    
      await page.fill('#email', 'demo@example.com');
      await page.waitForTimeout(300);
    
      await page.fill('#password', '••••••••');
      await page.waitForTimeout(500);
    
      await page.click('button[type="submit"]');
    
      // Wait for dashboard
      await page.waitForURL('**/dashboard');
      await page.waitForTimeout(3000);
    
      await context.close();
      await browser.close();
    }
    ```
    
    ## Cursor Highlighting
    
    Playwright doesn't show cursor by default. Add visual indicators:
    
    ### CSS Cursor Highlight
    
    ```typescript
    // Inject cursor visualization
    await page.addStyleTag({
      content: `
        * { cursor: none !important; }
        .playwright-cursor {
          position: fixed;
          width: 24px;
          height: 24px;
          background: rgba(255, 100, 100, 0.5);
          border: 2px solid rgba(255, 50, 50, 0.8);
          border-radius: 50%;
          pointer-events: none;
          z-index: 999999;
          transform: translate(-50%, -50%);
          transition: transform 0.1s ease;
        }
        .playwright-cursor.clicking {
          transform: translate(-50%, -50%) scale(0.8);
          background: rgba(255, 50, 50, 0.8);
        }
      `
    });
    
    // Add cursor element
    await page.evaluate(() => {
      const cursor = document.createElement('div');
      cursor.className = 'playwright-cursor';
      document.body.appendChild(cursor);
    
      document.addEventListener('mousemove', (e) => {
        cursor.style.left = e.clientX + 'px';
        cursor.style.top = e.clientY + 'px';
      });
    
      document.addEventListener('mousedown', () => cursor.classList.add('clicking'));
      document.addEventListener('mouseup', () => cursor.classList.remove('clicking'));
    });
    ```
    
    ### Click Ripple Effect
    
    ```typescript
    // Add click ripple visualization
    await page.addStyleTag({
      content: `
        .click-ripple {
          position: fixed;
          width: 40px;
          height: 40px;
          border-radius: 50%;
          background: rgba(234, 88, 12, 0.4);
          pointer-events: none;
          z-index: 999998;
          transform: translate(-50%, -50%) scale(0);
          animation: ripple 0.4s ease-out forwards;
        }
        @keyframes ripple {
          to {
            transform: translate(-50%, -50%) scale(2);
            opacity: 0;
          }
        }
      `
    });
    
    // Custom click function with ripple
    async function clickWithRipple(page, selector) {
      const element = await page.locator(selector);
      const box = await element.boundingBox();
    
      await page.evaluate(({ x, y }) => {
        const ripple = document.createElement('div');
        ripple.className = 'click-ripple';
        ripple.style.left = x + 'px';
        ripple.style.top = y + 'px';
        document.body.appendChild(ripple);
        setTimeout(() => ripple.remove(), 400);
      }, { x: box.x + box.width / 2, y: box.y + box.height / 2 });
    
      await element.click();
    }
    ```
    
    ## Output for Remotion
    
    ### Move Recording to public/demos/
    
    ```typescript
    import { chromium } from 'playwright';
    import * as fs from 'fs';
    import * as path from 'path';
    
    async function recordForRemotion(outputName: string) {
      const browser = await chromium.launch({ slowMo: 50 });
      const context = await browser.newContext({
        viewport: { width: 1920, height: 1080 },
        recordVideo: { dir: './temp-recordings', size: { width: 1920, height: 1080 } }
      });
      const page = await context.newPage();
    
      // ... recording actions ...
    
      await context.close();
    
      // Get the video path
      const video = page.video();
      const videoPath = await video?.path();
    
      if (videoPath) {
        const destPath = `./public/demos/${outputName}.webm`;
        fs.mkdirSync(path.dirname(destPath), { recursive: true });
        fs.renameSync(videoPath, destPath);
        console.log(`Recording saved to: ${destPath}`);
    
        // Get duration for config
        // Use ffprobe: ffprobe -v error -show_entries format=duration -of csv=p=0 file.webm
      }
    
      await browser.close();
    }
    ```
    
    ### Convert WebM to MP4
    
    Playwright outputs WebM. Convert for better Remotion compatibility:
    
    ```bash
    ffmpeg -i recording.webm -c:v libx264 -crf 20 -preset medium -movflags faststart public/demos/demo.mp4
    ```
    
    ## Interactive Recording
    
    For user-driven recordings where you manually perform actions:
    
    ```typescript
    // Inject ESC key listener to stop recording
    async function injectStopListener(page: Page): Promise<void> {
      await page.evaluate(() => {
        if ((window as any).__escListenerAdded) return;
        (window as any).__escListenerAdded = true;
        (window as any).__stopRecording = false;
        document.addEventListener('keydown', (e) => {
          if (e.key === 'Escape') {
            e.preventDefault();
            (window as any).__stopRecording = true;
          }
        });
      });
    }
    
    // Poll for stop signal - handle navigation errors gracefully
    while (!stopped) {
      try {
        const shouldStop = await page.evaluate(() => (window as any).__stopRecording === true);
        if (shouldStop) break;
      } catch {
        // Page navigating - continue recording
      }
      await new Promise(r => setTimeout(r, 200));
    }
    ```
    
    **Key insight:** `page.evaluate()` throws during navigation. Use try/catch and continue - don't treat errors as stop signals.
    
    ## Window Scaling for Laptops
    
    Record at full 1080p while showing a smaller window:
    
    ```typescript
    const scale = 0.75; // 75% window size
    const context = await browser.newContext({
      viewport: { width: 1920 * scale, height: 1080 * scale },
      deviceScaleFactor: 1 / scale,
      recordVideo: { dir: './recordings', size: { width: 1920, height: 1080 } },
    });
    ```
    
    ## Cookie Banner Dismissal
    
    Comprehensive selector list for common consent platforms:
    
    ```typescript
    const COOKIE_SELECTORS = [
      '#onetrust-accept-btn-handler',           // OneTrust
      '#CybotCookiebotDialogBodyButtonAccept',  // Cookiebot
      '.cc-btn.cc-dismiss',                      // Cookie Consent by Insites
      '[class*="cookie"] button[class*="accept"]',
      '[class*="consent"] button[class*="accept"]',
      'button:has-text("Accept all")',
      'button:has-text("Accept cookies")',
      'button:has-text("Got it")',
    ];
    
    async function dismissCookieBanners(page: Page): Promise<void> {
      await page.waitForTimeout(500);
      for (const selector of COOKIE_SELECTORS) {
        try {
          const btn = page.locator(selector).first();
          if (await btn.isVisible({ timeout: 100 })) {
            await btn.click({ timeout: 500 });
            return;
          }
        } catch { /* try next */ }
      }
    }
    ```
    
    Call after `page.goto()` and on `page.on('load')` for navigation.
    
    ## Important: Injected Elements Appear in Video
    
    **Warning:** Any DOM elements you inject (cursors, control panels, overlays) will be recorded. For UI-free recordings, use terminal-based controls only (Ctrl+C, max duration timer).
    
    ## Tips for Good Demo Recordings
    
    1. **Use slowMo** - 50-100ms makes actions visible
    2. **Add waitForTimeout** - Pause between actions for comprehension
    3. **Wait for animations** - Use `waitForLoadState('networkidle')`
    4. **Match Remotion dimensions** - 1920x1080 at 30fps typical
    5. **Test without recording first** - Debug before final capture
    6. **Clear browser state** - Use fresh context for clean demos
    7. **Dismiss cookie banners** - Use comprehensive selector list above
    8. **Re-inject on navigation** - Cursor/listeners reset on page load
    
    ---
    
    ## Feedback & Contributions
    
    If this skill is missing information or could be improved:
    
    - **Missing a pattern?** Describe what you needed
    - **Found an error?** Let me know what's wrong
    - **Want to contribute?** I can help you:
      1. Update this skill with improvements
      2. Create a PR to github.com/digitalsamba/claude-code-video-toolkit
    
    Just say "improve this skill" and I'll guide you through updating `.claude/skills/playwright-recording/SKILL.md`.
    
  • .claude/skills/acestep/SKILL.mdskill
    Show content (13332 bytes)
    ---
    name: acestep
    description: AI music generation with ACE-Step 1.5 — background music, vocal tracks, covers, stem extraction, audio repainting, and continuation for video production. Use when generating music, soundtracks, jingles, or working with audio stems. Triggers include background music, soundtrack, jingle, music generation, stem extraction, cover, style transfer, repaint, continuation, or musical composition tasks.
    ---
    
    # ACE-Step 1.5 Music Generation
    
    Open-source music generation via `tools/music_gen.py`.
    
    **Cloud providers:**
    - **acemusic** (default) — Official ACE-Step cloud API with XL Turbo (4B) model + 5Hz LM thinking mode. Free API key from [acemusic.ai/api-key](https://acemusic.ai/api-key). No GPU required.
    - **modal** — Self-hosted ACE-Step 2B Turbo on Modal. Requires `MODAL_MUSIC_GEN_ENDPOINT_URL`.
    - **runpod** — Self-hosted ACE-Step 2B Turbo on RunPod. Requires `RUNPOD_ACESTEP_ENDPOINT_ID`.
    
    ## Setup
    
    ```bash
    # acemusic (recommended — free, best quality, no GPU)
    echo "ACEMUSIC_API_KEY=your_key" >> .env
    # Get key at https://acemusic.ai/api-key
    
    # Self-hosted (optional fallback)
    python tools/music_gen.py --setup             # RunPod
    modal deploy docker/modal-music-gen/app.py    # Modal
    ```
    
    ## Quick Reference
    
    ```bash
    # Basic generation (uses acemusic XL Turbo by default)
    python tools/music_gen.py --prompt "Upbeat tech corporate" --duration 60 --output bg.mp3
    
    # Generate 4 variations, pick the best
    python tools/music_gen.py --prompt "Calm ambient piano" --duration 30 --variations 4 --output ambient.mp3
    
    # Fast mode (disable thinking)
    python tools/music_gen.py --no-thinking --prompt "Quick draft" --duration 30 --output draft.mp3
    
    # With musical control
    python tools/music_gen.py --prompt "Calm ambient piano" --duration 30 --bpm 72 --key "D Major" --output ambient.mp3
    
    # Scene presets (video production)
    python tools/music_gen.py --preset corporate-bg --duration 60 --output bg.mp3
    python tools/music_gen.py --preset tension --duration 20 --output problem.mp3
    python tools/music_gen.py --preset cta --brand digital-samba --duration 15 --output cta.mp3
    
    # Vocals with lyrics
    python tools/music_gen.py --prompt "Indie pop jingle" --lyrics "[verse]\nBuild it better\nShip it faster" --duration 30 --output jingle.mp3
    
    # Cover / style transfer
    python tools/music_gen.py --cover --reference theme.mp3 --prompt "Jazz piano version" --duration 60 --output jazz_cover.mp3
    
    # Repaint a weak section
    python tools/music_gen.py --repaint --input track.mp3 --repaint-start 15 --repaint-end 25 --prompt "Guitar solo" --output fixed.mp3
    
    # Continue from existing audio
    python tools/music_gen.py --continuation --input track.mp3 --prompt "Continue with jazz piano" --output extended.mp3
    
    # Stem extraction
    python tools/music_gen.py --extract vocals --input mixed.mp3 --output vocals.mp3
    
    # Fall back to self-hosted
    python tools/music_gen.py --cloud modal --prompt "Background music" --duration 60 --output bg.mp3
    ```
    
    ## Fixing "Samey" Output
    
    If generated music sounds repetitive or lacks variety, try these in order:
    
    1. **Use acemusic cloud** (default) — the XL Turbo 4B model is significantly more capable than the 2B model on Modal/RunPod
    2. **Keep thinking mode on** (default for acemusic) — the 5Hz LM enriches sparse prompts into detailed musical descriptions
    3. **Generate variations** — `--variations 4` generates 4 takes, pick the best
    4. **Use stochastic inference** — `--infer-method sde` adds randomness (same seed gives different results)
    5. **Vary BPM and key across scenes** — don't use the same preset for every scene
    6. **Write sparser prompts** — "Upbeat indie rock" gives the model more creative freedom than a hyper-detailed description
    7. **Vary seeds** — omit `--seed` to let each generation be unique
    
    ## Creating a Song (Step by Step)
    
    ### 1. Instrumental background track (simplest)
    ```bash
    python tools/music_gen.py --prompt "Upbeat indie rock, driving drums, jangly guitar" --duration 60 --bpm 120 --key "G Major" --output track.mp3
    ```
    
    ### 2. Song with vocals and lyrics
    Write lyrics in a temp file or pass inline. Use structure tags to control song sections.
    
    ```bash
    # Write lyrics to a file first (recommended for longer songs)
    cat > /tmp/lyrics.txt << 'LYRICS'
    [Verse 1]
    Walking through the morning light
    Coffee in my hand feels right
    Another day to build and dream
    Nothing's ever what it seems
    
    [Chorus - anthemic]
    WE KEEP MOVING FORWARD
    Through the noise and doubt
    We keep moving forward
    That's what it's about
    
    [Verse 2]
    Screens are glowing late at night
    Shipping code until it's right
    The deadline's close but so are we
    Almost there, just wait and see
    
    [Chorus - bigger]
    WE KEEP MOVING FORWARD
    Through the noise and doubt
    We keep moving forward
    That's what it's about
    
    [Outro - fade]
    (Moving forward...)
    LYRICS
    
    # Generate the song
    python tools/music_gen.py \
      --prompt "Upbeat indie rock anthem, male vocal, driving drums, electric guitar, studio polish" \
      --lyrics "$(cat /tmp/lyrics.txt)" \
      --duration 60 \
      --bpm 128 \
      --key "G Major" \
      --output my_song.mp3
    ```
    
    ### 3. Repaint a weak section
    If the chorus sounds weak, regenerate just that section:
    ```bash
    python tools/music_gen.py --repaint --input my_song.mp3 --repaint-start 20 --repaint-end 35 --prompt "Powerful anthemic chorus, big drums" --output fixed.mp3
    ```
    
    ### 4. Continue/extend a track
    ```bash
    python tools/music_gen.py --continuation --input my_song.mp3 --prompt "Continue with gentle acoustic outro" --output extended.mp3
    ```
    
    ### Key tips for good results
    - **Caption = overall style** (genre, instruments, mood, production quality)
    - **Lyrics = temporal structure** (verse/chorus flow, vocal delivery)
    - **UPPERCASE in lyrics** = high vocal intensity
    - **Parentheses** = background vocals: "We rise (together)"
    - **Keep 6-10 syllables per line** for natural rhythm
    - **Don't describe the melody in the caption** — describe the *sound* and *feeling*
    - **Use `--seed`** to lock randomness when iterating on prompt/lyrics
    
    ### Controlling vocal gender
    The model doesn't reliably follow "female vocal" or "male vocal" on its own. Use **both** of these together:
    1. **In the prompt**: Be explicit — "solo female singer, alto voice" or "female vocalist only, breathy intimate voice". Adding an artist reference helps (e.g., "Brandi Carlile style").
    2. **In the lyrics**: Add `[female vocal]` tags before each section:
    ```
    [female vocal]
    [Verse 1]
    Walking through the morning light...
    
    [female vocal]
    [Chorus - anthemic]
    WE KEEP MOVING FORWARD...
    ```
    Just saying "female vocal" in the prompt alone is often ignored. The combination of prompt + lyrics tags is what works.
    
    ### Duets and vocal trading
    For duets with male/female vocals trading verses, use both the prompt and per-section lyrics tags:
    - **Prompt**: "duet, male and female vocals trading verses, warm harmonies on chorus"
    - **Lyrics**: Tag each section with who sings it:
    ```
    [Verse 1 - male vocal, storytelling]
    First verse lyrics here...
    
    [Chorus - male and female duet, harmonies]
    Chorus lyrics here...
    
    [Verse 2 - female vocal, wry]
    Second verse lyrics here...
    
    [Bridge - male vocal, spoken]
    Spoken bridge...
    
    [Bridge - female vocal, sung]
    Sung response...
    ```
    This reliably produces vocal trading between sections and harmonies on shared parts.
    
    ## Scene Presets
    
    | Preset | BPM | Key | Use Case |
    |--------|-----|-----|----------|
    | `corporate-bg` | 110 | C Major | Professional background, presentations |
    | `upbeat-tech` | 128 | G Major | Product launches, tech demos |
    | `ambient` | 72 | D Major | Overview slides, reflective content |
    | `dramatic` | 90 | D Minor | Reveals, announcements |
    | `tension` | 85 | A Minor | Problem statements, challenges |
    | `hopeful` | 120 | C Major | Solution reveals, resolutions |
    | `cta` | 135 | E Major | Call to action, closing energy |
    | `lofi` | 85 | F Major | Screen recordings, coding demos |
    
    ## Task Types
    
    ### text2music (default)
    Generate music from text prompt + optional lyrics.
    
    ### cover
    Style transfer from reference audio. Control blend with `--cover-strength` (0.0-1.0):
    - **0.2** — Loose style inspiration (more creative freedom)
    - **0.5** — Balanced style transfer
    - **0.7** — Close to original structure (default)
    - **1.0** — Maximum fidelity to source
    
    ### extract
    Stem separation — isolate individual tracks from mixed audio.
    Tracks: `vocals`, `drums`, `bass`, `guitar`, `piano`, `keyboard`, `strings`, `brass`, `woodwinds`, `other`
    
    ### repainting (acemusic only)
    Regenerate a specific time segment within existing audio while preserving the rest.
    ```bash
    python tools/music_gen.py --repaint --input track.mp3 --repaint-start 15 --repaint-end 25 --prompt "Guitar solo" --output fixed.mp3
    ```
    
    ### continuation (acemusic only)
    Extend existing audio by continuing from where it ends.
    ```bash
    python tools/music_gen.py --continuation --input track.mp3 --prompt "Continue with jazz piano" --output extended.mp3
    ```
    
    ## Prompt Engineering
    
    ### Caption Writing — Layer Dimensions
    
    Write captions by layering multiple descriptive dimensions rather than single-word descriptions.
    
    **Dimensions to include:**
    - **Genre/Style**: pop, rock, jazz, electronic, lo-fi, synthwave, orchestral
    - **Emotion/Mood**: melancholic, euphoric, dreamy, nostalgic, intimate, tense
    - **Instruments**: acoustic guitar, synth pads, 808 drums, strings, brass, piano
    - **Timbre**: warm, crisp, airy, punchy, lush, polished, raw
    - **Era**: "80s synth-pop", "modern indie", "classical romantic"
    - **Production**: lo-fi, studio-polished, live recording, cinematic
    - **Vocal**: breathy, powerful, falsetto, raspy, spoken word (or "instrumental")
    
    **Good**: "Slow melancholic piano ballad with intimate female vocal, warm strings building to powerful chorus, studio-polished production"
    **Bad**: "Sad song"
    
    ### Key Principles
    
    1. **Specificity over vagueness** — describe instruments, mood, production style
    2. **Avoid contradictions** — don't request "classical strings" and "hardcore metal" simultaneously
    3. **Repetition reinforces priority** — repeat important elements for emphasis
    4. **Sparse captions = more creative freedom** — detailed captions constrain the model
    5. **Use metadata params for BPM/key** — don't write "120 BPM" in the caption, use `--bpm 120`
    
    ### Lyrics Formatting
    
    **Structure tags** (use in lyrics, not caption):
    ```
    [Intro]
    [Verse]
    [Chorus]
    [Bridge]
    [Outro]
    [Instrumental]
    [Guitar Solo]
    [Build]
    [Drop]
    [Breakdown]
    ```
    
    **Vocal control** (prefix lines or sections):
    ```
    [raspy vocal]
    [whispered]
    [falsetto]
    [powerful belting]
    [harmonies]
    [ad-lib]
    ```
    
    **Energy indicators:**
    - UPPERCASE = high intensity ("WE RISE ABOVE")
    - Parentheses = background vocals ("We rise (together)")
    - Keep 6-10 syllables per line within sections for natural rhythm
    
    ## Video Production Integration
    
    ### Music for Scene Types
    
    | Scene | Preset | Duration | Notes |
    |-------|--------|----------|-------|
    | Title | `dramatic` or `ambient` | 3-5s | Short, mood-setting |
    | Problem | `tension` | 10-15s | Dark, unsettling |
    | Solution | `hopeful` | 10-15s | Relief, optimism |
    | Demo | `lofi` or `corporate-bg` | 30-120s | Non-distracting, matches demo length |
    | Stats | `upbeat-tech` | 8-12s | Building credibility |
    | CTA | `cta` | 5-10s | Maximum energy, punchy |
    | Credits | `ambient` | 5-10s | Gentle fade-out |
    
    ### Timing Workflow
    
    1. Plan scene durations first (from voiceover script)
    2. Generate music to match: `--duration <scene_seconds>`
    3. Music duration is precise (within 0.1s of requested)
    4. For background music spanning multiple scenes: generate one long track
    
    ### Combining with Voiceover
    
    Background music should be mixed at 10-20% volume in Remotion:
    ```tsx
    <Audio src={staticFile('voiceover.mp3')} volume={1} />
    <Audio src={staticFile('bg-music.mp3')} volume={0.15} />
    ```
    
    For music under narration: use instrumental presets (`corporate-bg`, `ambient`, `lofi`).
    For music-forward scenes (title, CTA): can use higher volume or vocal tracks.
    
    ### Brand Consistency
    
    Use `--brand <name>` to load hints from `brands/<name>/brand.json`.
    Use `--cover --reference brand_theme.mp3` to create variations of a brand's sonic identity.
    For consistent sound across a project: fix the seed (`--seed 42`) and vary only duration/prompt.
    
    ## Advanced Parameters
    
    | Flag | Default | Description |
    |------|---------|-------------|
    | `--thinking` | on (acemusic) | 5Hz LM enriches prompts and generates audio codes |
    | `--no-thinking` | - | Faster generation, skip LM reasoning |
    | `--variations N` | 1 | Generate N variations (1-8, acemusic only) |
    | `--guidance-scale` | 7.0 | Prompt adherence (1.0-15.0) |
    | `--infer-method` | ode | `ode` (deterministic) or `sde` (stochastic, more variety) |
    | `--seed` | random | Lock randomness for reproducibility |
    
    ## Technical Details
    
    - **acemusic cloud**: XL Turbo 4B DiT + 4B LM, best quality, ~5-15s per generation
    - **Modal/RunPod**: Standard Turbo 2B DiT, no LM, ~2-3s per generation
    - **Output**: 48kHz MP3/WAV/FLAC
    - **Duration range**: 10-600 seconds
    - **BPM range**: 30-300
    
    ### When NOT to use ACE-Step
    - **Voice cloning** — use Qwen3-TTS or ElevenLabs instead
    - **Sound effects** — use ElevenLabs SFX (`tools/sfx.py`)
    - **Speech/narration** — use voiceover tools, not music gen
    - **Stem extraction from video** — extract audio first with FFmpeg, then use `--extract`
    
  • .claude/skills/elevenlabs/SKILL.mdskill
    Show content (10961 bytes)
    ---
    name: elevenlabs
    description: Generate AI voiceovers, sound effects, and music using ElevenLabs APIs. Use when creating audio content for videos, podcasts, or games. Triggers include generating voiceovers, narration, dialogue, sound effects from descriptions, background music, soundtrack generation, voice cloning, or any audio synthesis task.
    ---
    
    # ElevenLabs Audio Generation
    
    Requires `ELEVENLABS_API_KEY` in `.env`.
    
    ## Text-to-Speech
    
    ```python
    from elevenlabs.client import ElevenLabs
    from elevenlabs import save, VoiceSettings
    import os
    
    client = ElevenLabs(api_key=os.getenv("ELEVENLABS_API_KEY"))
    
    audio = client.text_to_speech.convert(
        text="Welcome to my video!",
        voice_id="JBFqnCBsd6RMkjVDRZzb",
        model_id="eleven_multilingual_v2",
        voice_settings=VoiceSettings(
            stability=0.5,
            similarity_boost=0.75,
            style=0.5,
            speed=1.0
        )
    )
    save(audio, "voiceover.mp3")
    ```
    
    ### Models
    
    | Model | Quality | SSML Support | Notes |
    |-------|---------|--------------|-------|
    | `eleven_multilingual_v2` | Highest consistency | None | Stable, production-ready, 29 languages |
    | `eleven_flash_v2_5` | Good | `<break>`, `<phoneme>` | Fast, supports pause/pronunciation tags |
    | `eleven_turbo_v2_5` | Good | `<break>`, `<phoneme>` | Fastest latency |
    | `eleven_v3` | Most expressive | None | Alpha — unreliable, needs prompt engineering |
    
    **Choose:** multilingual_v2 for reliability, flash/turbo for SSML control, v3 for maximum expressiveness (expect retakes).
    
    ### Voice Settings by Style
    
    | Style | stability | similarity | style | speed |
    |-------|-----------|------------|-------|-------|
    | Natural/professional | 0.75-0.85 | 0.9 | 0.0-0.1 | 1.0 |
    | Conversational | 0.5-0.6 | 0.85 | 0.3-0.4 | 0.9-1.0 |
    | Energetic/YouTuber | 0.3-0.5 | 0.75 | 0.5-0.7 | 1.0-1.1 |
    
    ### Pauses Between Sections
    
    **With flash/turbo models:** Use SSML break tags inline:
    ```
    ...end of section. <break time="1.5s" /> Start of next...
    ```
    Max 3 seconds per break. Excessive breaks can cause speed artifacts.
    
    **With multilingual_v2 / v3:** No SSML support. Options:
    - Paragraph breaks (blank lines) — creates ~0.3-0.5s natural pause
    - Post-process with ffmpeg: split audio and insert silence
    
    **WARNING:** `...` (ellipsis) is NOT a reliable pause — it can be vocalized as a word/sound. Do not use ellipsis as a pause mechanism.
    
    ### Pronunciation Control
    
    **Phonetic spelling (any model):** Write words as you want them pronounced:
    - `Janus` → `Jan-us`
    - `nginx` → `engine-x`
    - Use dashes, capitals, apostrophes to guide pronunciation
    
    **SSML phoneme tags (flash/turbo only):**
    ```
    <phoneme alphabet="ipa" ph="ˈdʒeɪnəs">Janus</phoneme>
    ```
    
    ### Iterative Workflow
    
    1. Generate → listen → identify pronunciation/pacing issues
    2. Adjust: phonetic spellings, break tags, voice settings
    3. Regenerate. If pauses aren't precise enough, add silence in post with ffmpeg rather than fighting the TTS engine.
    
    ## Voice Cloning
    
    ### Instant Voice Clone
    
    ```python
    with open("sample.mp3", "rb") as f:
        voice = client.voices.ivc.create(
            name="My Voice",
            files=[f],
            remove_background_noise=True
        )
    print(f"Voice ID: {voice.voice_id}")
    ```
    
    - Use `client.voices.ivc.create()` (not `client.voices.clone()`)
    - Pass file handles in binary mode (`"rb"`), not paths
    - Convert m4a first: `ffmpeg -i input.m4a -codec:a libmp3lame -qscale:a 2 output.mp3`
    - Multiple samples (2-3 clips) improve accuracy
    - Save voice ID for reuse
    
    **Professional Voice Clone:** Requires Creator plan+, 30+ min audio. See [reference.md](reference.md).
    
    ## Sound Effects
    
    Max 22 seconds per generation.
    
    ```python
    result = client.text_to_sound_effects.convert(
        text="Thunder rumbling followed by heavy rain",
        duration_seconds=10,
        prompt_influence=0.3
    )
    with open("thunder.mp3", "wb") as f:
        for chunk in result:
            f.write(chunk)
    ```
    
    **Prompt tips:** Be specific — "Heavy footsteps on wooden floorboards, slow and deliberate, with creaking"
    
    ## Music Generation
    
    10 seconds to 5 minutes. Use `client.music.compose()` (not `.generate()`).
    
    ```python
    result = client.music.compose(
        prompt="Upbeat indie rock, catchy guitar riff, energetic drums, travel vlog",
        music_length_ms=60000,
        force_instrumental=True
    )
    with open("music.mp3", "wb") as f:
        for chunk in result:
            f.write(chunk)
    ```
    
    **Prompt structure:** Genre, mood, instruments, tempo, use case. Add "no vocals" or use `force_instrumental=True` for background music.
    
    ## Remotion Integration
    
    ### Complete Workflow: Script to Synchronized Scene
    
    ```
    VOICEOVER-SCRIPT.md → voiceover.py → public/audio/ → Remotion composition
            ↓                  ↓               ↓                 ↓
      Scene narration    Generate MP3    Audio files     <Audio> component
      with durations     per scene       with timing     synced to scenes
    ```
    
    ### Step 1: Generate Per-Scene Audio
    
    Use the toolkit's voiceover tool to generate audio for each scene:
    
    ```bash
    # Generate voiceover files for each scene
    python tools/voiceover.py --scene-dir public/audio/scenes --json
    
    # Output:
    # public/audio/scenes/
    #   ├── scene-01-title.mp3
    #   ├── scene-02-problem.mp3
    #   ├── scene-03-solution.mp3
    #   └── manifest.json  (durations for each file)
    ```
    
    The `manifest.json` contains timing info:
    ```json
    {
      "scenes": [
        { "file": "scene-01-title.mp3", "duration": 4.2 },
        { "file": "scene-02-problem.mp3", "duration": 12.8 },
        { "file": "scene-03-solution.mp3", "duration": 15.3 }
      ],
      "totalDuration": 32.3
    }
    ```
    
    ### Step 2: Use Audio in Remotion Composition
    
    ```tsx
    // src/Composition.tsx
    import { Audio, staticFile, Series, useVideoConfig } from 'remotion';
    
    // Import scene components
    import { TitleSlide } from './scenes/TitleSlide';
    import { ProblemSlide } from './scenes/ProblemSlide';
    import { SolutionSlide } from './scenes/SolutionSlide';
    
    // Scene durations (from manifest.json, converted to frames at 30fps)
    const SCENE_DURATIONS = {
      title: Math.ceil(4.2 * 30),      // 126 frames
      problem: Math.ceil(12.8 * 30),   // 384 frames
      solution: Math.ceil(15.3 * 30),  // 459 frames
    };
    
    export const MainComposition: React.FC = () => {
      return (
        <>
          {/* Scene sequence */}
          <Series>
            <Series.Sequence durationInFrames={SCENE_DURATIONS.title}>
              <TitleSlide />
            </Series.Sequence>
            <Series.Sequence durationInFrames={SCENE_DURATIONS.problem}>
              <ProblemSlide />
            </Series.Sequence>
            <Series.Sequence durationInFrames={SCENE_DURATIONS.solution}>
              <SolutionSlide />
            </Series.Sequence>
          </Series>
    
          {/* Audio track - plays continuously across all scenes */}
          <Audio src={staticFile('audio/voiceover.mp3')} volume={1} />
    
          {/* Optional: Background music at lower volume */}
          <Audio src={staticFile('audio/music.mp3')} volume={0.15} />
        </>
      );
    };
    ```
    
    ### Step 3: Per-Scene Audio (Alternative)
    
    For more control, add audio to each scene individually:
    
    ```tsx
    // src/scenes/ProblemSlide.tsx
    import { Audio, staticFile, useCurrentFrame } from 'remotion';
    
    export const ProblemSlide: React.FC = () => {
      const frame = useCurrentFrame();
    
      return (
        <div style={{ /* slide styles */ }}>
          <h1>The Problem</h1>
          {/* Scene content */}
    
          {/* Audio starts when this scene starts (frame 0 of this sequence) */}
          <Audio src={staticFile('audio/scenes/scene-02-problem.mp3')} />
        </div>
      );
    };
    ```
    
    ### Syncing Visuals to Voiceover
    
    Calculate scene duration from audio, not the other way around:
    
    ```tsx
    // src/config/timing.ts
    import manifest from '../../public/audio/scenes/manifest.json';
    
    const FPS = 30;
    
    // Convert audio durations to frame counts
    export const sceneDurations = manifest.scenes.reduce((acc, scene) => {
      const name = scene.file.replace(/^scene-\d+-/, '').replace('.mp3', '');
      acc[name] = Math.ceil(scene.duration * FPS);
      return acc;
    }, {} as Record<string, number>);
    
    // Usage in composition:
    // <Series.Sequence durationInFrames={sceneDurations.title}>
    ```
    
    ### Audio Timing Patterns
    
    ```tsx
    import { Audio, Sequence, interpolate, useCurrentFrame } from 'remotion';
    
    // Fade in audio
    export const FadeInAudio: React.FC<{ src: string; fadeFrames?: number }> = ({
      src,
      fadeFrames = 30
    }) => {
      const frame = useCurrentFrame();
      const volume = interpolate(frame, [0, fadeFrames], [0, 1], {
        extrapolateRight: 'clamp',
      });
      return <Audio src={src} volume={volume} />;
    };
    
    // Delayed audio start
    export const DelayedAudio: React.FC<{ src: string; delayFrames: number }> = ({
      src,
      delayFrames
    }) => (
      <Sequence from={delayFrames}>
        <Audio src={src} />
      </Sequence>
    );
    
    // Usage:
    // <FadeInAudio src={staticFile('audio/music.mp3')} fadeFrames={60} />
    // <DelayedAudio src={staticFile('audio/sfx/whoosh.mp3')} delayFrames={45} />
    ```
    
    ### Voiceover + Demo Video Sync
    
    When a scene has both voiceover and demo video:
    
    ```tsx
    import { Audio, OffthreadVideo, staticFile, useVideoConfig } from 'remotion';
    
    export const DemoScene: React.FC = () => {
      const { durationInFrames, fps } = useVideoConfig();
    
      // Calculate playback rate to fit demo into voiceover duration
      const demoDuration = 45; // seconds (original demo length)
      const sceneDuration = durationInFrames / fps; // seconds (from voiceover)
      const playbackRate = demoDuration / sceneDuration;
    
      return (
        <>
          <OffthreadVideo
            src={staticFile('demos/feature-demo.mp4')}
            playbackRate={playbackRate}
          />
          <Audio src={staticFile('audio/scenes/scene-04-demo.mp3')} />
        </>
      );
    };
    ```
    
    ### Error Handling
    
    ```tsx
    import { Audio, staticFile, delayRender, continueRender } from 'remotion';
    import { useEffect, useState } from 'react';
    
    export const SafeAudio: React.FC<{ src: string }> = ({ src }) => {
      const [handle] = useState(() => delayRender());
      const [audioReady, setAudioReady] = useState(false);
    
      useEffect(() => {
        const audio = new window.Audio(src);
        audio.oncanplaythrough = () => {
          setAudioReady(true);
          continueRender(handle);
        };
        audio.onerror = () => {
          console.error(`Failed to load audio: ${src}`);
          continueRender(handle); // Continue without audio rather than hang
        };
      }, [src, handle]);
    
      if (!audioReady) return null;
      return <Audio src={src} />;
    };
    ```
    
    ### Toolkit Command: /generate-voiceover
    
    The `/generate-voiceover` command handles the full workflow:
    
    ```
    /generate-voiceover
    
    1. Reads VOICEOVER-SCRIPT.md
    2. Extracts narration for each scene
    3. Generates audio via ElevenLabs API
    4. Saves to public/audio/scenes/
    5. Creates manifest.json with durations
    6. Updates project.json with timing info
    ```
    
    ## Popular Voices
    
    - George: `JBFqnCBsd6RMkjVDRZzb` (warm narrator)
    - Rachel: `21m00Tcm4TlvDq8ikWAM` (clear female)
    - Adam: `pNInz6obpgDQGcFmaJgB` (professional male)
    
    List all: `client.voices.get_all()`
    
    For full API docs, see [reference.md](reference.md).
    
  • .claude/skills/ffmpeg/SKILL.mdskill
    Show content (13247 bytes)
    ---
    name: ffmpeg
    description: Video and audio processing with FFmpeg. Use for format conversion, resizing, compression, audio extraction, and preparing assets for Remotion. Triggers include converting GIF to MP4, resizing video, extracting audio, compressing files, or any media transformation task.
    ---
    
    # FFmpeg for Video Production
    
    FFmpeg is the essential tool for video/audio processing. This skill covers common operations for Remotion video projects.
    
    ## Quick Reference
    
    ### GIF to MP4 (Remotion-compatible)
    
    ```bash
    ffmpeg -i input.gif -movflags faststart -pix_fmt yuv420p \
      -vf "scale=trunc(iw/2)*2:trunc(ih/2)*2" output.mp4
    ```
    
    **Why these flags:**
    - `-movflags faststart` - Moves metadata to start for web streaming
    - `-pix_fmt yuv420p` - Ensures compatibility with most players
    - `scale=trunc(...)` - Forces even dimensions (required by most codecs)
    
    ### Resize Video
    
    ```bash
    # To 1920x1080 (maintain aspect ratio, add black bars)
    ffmpeg -i input.mp4 -vf "scale=1920:1080:force_original_aspect_ratio=decrease,pad=1920:1080:(ow-iw)/2:(oh-ih)/2" output.mp4
    
    # To 1920x1080 (crop to fill)
    ffmpeg -i input.mp4 -vf "scale=1920:1080:force_original_aspect_ratio=increase,crop=1920:1080" output.mp4
    
    # Scale to width, auto height
    ffmpeg -i input.mp4 -vf "scale=1280:-2" output.mp4
    ```
    
    ### Compress Video
    
    ```bash
    # Good quality, smaller file (CRF 23 is default, lower = better quality)
    ffmpeg -i input.mp4 -c:v libx264 -crf 23 -preset medium -c:a aac -b:a 128k output.mp4
    
    # Aggressive compression for web preview
    ffmpeg -i input.mp4 -c:v libx264 -crf 28 -preset fast -c:a aac -b:a 96k output.mp4
    
    # Target file size (e.g., ~10MB for 60s video = ~1.3Mbps)
    ffmpeg -i input.mp4 -c:v libx264 -b:v 1300k -c:a aac -b:a 128k output.mp4
    ```
    
    ### Extract Audio
    
    ```bash
    # Extract to MP3
    ffmpeg -i input.mp4 -vn -acodec libmp3lame -q:a 2 output.mp3
    
    # Extract to AAC
    ffmpeg -i input.mp4 -vn -acodec aac -b:a 192k output.m4a
    
    # Extract to WAV (uncompressed)
    ffmpeg -i input.mp4 -vn output.wav
    ```
    
    ### Convert Audio Formats
    
    ```bash
    # M4A to MP3 (for ElevenLabs voice samples)
    ffmpeg -i input.m4a -codec:a libmp3lame -qscale:a 2 output.mp3
    
    # WAV to MP3
    ffmpeg -i input.wav -codec:a libmp3lame -b:a 192k output.mp3
    
    # Adjust volume
    ffmpeg -i input.mp3 -filter:a "volume=1.5" output.mp3
    ```
    
    ### Trim/Cut Video
    
    ```bash
    # Cut from timestamp to duration (recommended - reliable)
    ffmpeg -i input.mp4 -ss 00:00:30 -t 00:00:15 -c:v libx264 -c:a aac output.mp4
    
    # Cut from timestamp to timestamp
    ffmpeg -i input.mp4 -ss 00:00:30 -to 00:00:45 -c:v libx264 -c:a aac output.mp4
    
    # Stream copy (faster but may lose frames at cut points)
    # Only use when source has frequent keyframes
    ffmpeg -i input.mp4 -ss 00:00:30 -t 00:00:15 -c copy output.mp4
    ```
    
    **Note:** Re-encoding is recommended for trimming. Stream copy (`-c copy`) can silently drop video if the seek point doesn't align with a keyframe.
    
    ### Speed Up / Slow Down
    
    ```bash
    # 2x speed (video and audio)
    ffmpeg -i input.mp4 -filter_complex "[0:v]setpts=0.5*PTS[v];[0:a]atempo=2.0[a]" -map "[v]" -map "[a]" output.mp4
    
    # 0.5x speed (slow motion)
    ffmpeg -i input.mp4 -filter_complex "[0:v]setpts=2.0*PTS[v];[0:a]atempo=0.5[a]" -map "[v]" -map "[a]" output.mp4
    
    # Video only (no audio)
    ffmpeg -i input.mp4 -filter:v "setpts=0.5*PTS" -an output.mp4
    ```
    
    ### Concatenate Videos
    
    ```bash
    # Create file list
    echo "file 'clip1.mp4'" > list.txt
    echo "file 'clip2.mp4'" >> list.txt
    echo "file 'clip3.mp4'" >> list.txt
    
    # Concatenate (same codec/resolution)
    ffmpeg -f concat -safe 0 -i list.txt -c copy output.mp4
    
    # Concatenate with re-encoding (different sources)
    ffmpeg -f concat -safe 0 -i list.txt -c:v libx264 -c:a aac output.mp4
    ```
    
    ### Add Fade In/Out
    
    ```bash
    # Fade in first 1 second, fade out last 1 second (30fps video)
    ffmpeg -i input.mp4 -vf "fade=t=in:st=0:d=1,fade=t=out:st=9:d=1" -c:a copy output.mp4
    
    # Audio fade
    ffmpeg -i input.mp4 -af "afade=t=in:st=0:d=1,afade=t=out:st=9:d=1" -c:v copy output.mp4
    ```
    
    ### Get Video Info
    
    ```bash
    # Duration, resolution, codec info
    ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 input.mp4
    
    # Full info
    ffprobe -v quiet -print_format json -show_format -show_streams input.mp4
    ```
    
    ## Remotion-Specific Patterns
    
    ### Video Speed Adjustment for Remotion
    
    **When to use FFmpeg vs Remotion `playbackRate`:**
    
    | Scenario | Use FFmpeg | Use Remotion |
    |----------|------------|--------------|
    | Constant speed (1.5x, 2x) | Either works | ✅ Simpler |
    | Extreme speeds (>4x or <0.25x) | ✅ More reliable | May have issues |
    | Variable speed (accelerate over time) | ✅ Pre-process | Complex workaround needed |
    | Need perfect audio sync | ✅ Guaranteed | Usually fine |
    | Demo needs to fit voiceover timing | ✅ Pre-calculate | Runtime adjustment |
    
    **Remotion limitation:** `playbackRate` must be constant. Dynamic interpolation like `playbackRate={interpolate(frame, [0, 100], [1, 5])}` won't work correctly because Remotion evaluates frames independently.
    
    ```bash
    # Speed up demo to fit a scene (e.g., 60s demo into 20s = 3x speed)
    ffmpeg -i demo-raw.mp4 \
      -filter_complex "[0:v]setpts=0.333*PTS[v];[0:a]atempo=3.0[a]" \
      -map "[v]" -map "[a]" \
      public/demos/demo-fast.mp4
    
    # Slow motion for emphasis (0.5x speed)
    ffmpeg -i action.mp4 \
      -filter_complex "[0:v]setpts=2.0*PTS[v];[0:a]atempo=0.5[a]" \
      -map "[v]" -map "[a]" \
      public/demos/action-slow.mp4
    
    # Speed up without audio (common for screen recordings)
    ffmpeg -i demo.mp4 -filter:v "setpts=0.5*PTS" -an public/demos/demo-2x.mp4
    
    # Timelapse effect (10x speed, drop audio)
    ffmpeg -i long-demo.mp4 -filter:v "setpts=0.1*PTS" -an public/demos/timelapse.mp4
    ```
    
    **Calculate speed factor:**
    - To fit X seconds of video into Y seconds of scene: `speed = X / Y`
    - setpts multiplier = `1 / speed` (e.g., 3x speed = setpts=0.333*PTS)
    - atempo value = `speed` (e.g., 3x speed = atempo=3.0)
    
    **Extreme speed (>2x audio):** Chain atempo filters (each limited to 0.5-2.0 range):
    ```bash
    # 4x speed audio
    -filter_complex "[0:a]atempo=2.0,atempo=2.0[a]"
    
    # 8x speed audio
    -filter_complex "[0:a]atempo=2.0,atempo=2.0,atempo=2.0[a]"
    ```
    
    ### Prepare Demo Recording for Remotion
    
    ```bash
    # Standard 1080p, 30fps, Remotion-ready
    ffmpeg -i raw-recording.mp4 \
      -vf "scale=1920:1080:force_original_aspect_ratio=decrease,pad=1920:1080:(ow-iw)/2:(oh-ih)/2,fps=30" \
      -c:v libx264 -crf 18 -preset slow \
      -c:a aac -b:a 192k \
      -movflags faststart \
      public/demos/demo.mp4
    ```
    
    ### Screen Recording to Remotion Asset
    
    ```bash
    # From iPhone/iPad recording (usually 60fps, variable resolution)
    ffmpeg -i iphone-recording.mov \
      -vf "scale=1920:-2,fps=30" \
      -c:v libx264 -crf 20 \
      -an \
      public/demos/mobile-demo.mp4
    ```
    
    ### Batch Convert GIFs
    
    ```bash
    for f in assets/*.gif; do
      ffmpeg -i "$f" -movflags faststart -pix_fmt yuv420p \
        -vf "scale=trunc(iw/2)*2:trunc(ih/2)*2" \
        "public/demos/$(basename "$f" .gif).mp4"
    done
    ```
    
    ## Common Issues
    
    ### "Height not divisible by 2"
    Add scale filter: `-vf "scale=trunc(iw/2)*2:trunc(ih/2)*2"`
    
    ### Video won't play in browser
    Use: `-movflags faststart -pix_fmt yuv420p -c:v libx264`
    
    ### Audio out of sync after speed change
    Use filter_complex with atempo: `-filter_complex "[0:v]setpts=0.5*PTS[v];[0:a]atempo=2.0[a]"`
    
    ### File too large
    Increase CRF (23→28) or reduce resolution
    
    ## Quality Guidelines
    
    | Use Case | CRF | Preset | Notes |
    |----------|-----|--------|-------|
    | Archive/Master | 18 | slow | Best quality, large files |
    | Production | 20-22 | medium | Good balance |
    | Web/Preview | 23-25 | fast | Smaller files |
    | Draft/Quick | 28+ | veryfast | Fast encoding |
    
    ## Platform-Specific Output Optimization
    
    After Remotion renders your video (typically to `out/video.mp4`), use FFmpeg to optimize for each distribution platform.
    
    ### Workflow Integration
    
    ```
    Remotion render (master)     FFmpeg optimization      Platform upload
           ↓                            ↓                       ↓
       out/video.mp4  ────────→  out/video-youtube.mp4  ───→  YouTube
                      ────────→  out/video-twitter.mp4  ───→  Twitter/X
                      ────────→  out/video-linkedin.mp4 ───→  LinkedIn
                      ────────→  out/video-web.mp4      ───→  Website embed
    ```
    
    ### YouTube (Recommended Settings)
    
    YouTube re-encodes everything, so upload high quality:
    
    ```bash
    # YouTube optimized (1080p)
    ffmpeg -i out/video.mp4 \
      -c:v libx264 -preset slow -crf 18 \
      -profile:v high -level 4.0 \
      -bf 2 -g 30 \
      -c:a aac -b:a 192k -ar 48000 \
      -movflags +faststart \
      out/video-youtube.mp4
    
    # YouTube Shorts (vertical 1080x1920)
    ffmpeg -i out/video.mp4 \
      -vf "scale=1080:1920:force_original_aspect_ratio=decrease,pad=1080:1920:(ow-iw)/2:(oh-ih)/2" \
      -c:v libx264 -crf 18 -c:a aac -b:a 192k \
      out/video-shorts.mp4
    ```
    
    ### Twitter/X
    
    Twitter has strict limits: max 140s, 512MB, 1920x1200:
    
    ```bash
    # Twitter optimized (under 15MB target for fast upload)
    ffmpeg -i out/video.mp4 \
      -c:v libx264 -preset medium -crf 24 \
      -profile:v main -level 3.1 \
      -vf "scale='min(1280,iw)':'min(720,ih)':force_original_aspect_ratio=decrease" \
      -c:a aac -b:a 128k -ar 44100 \
      -movflags +faststart \
      -fs 15M \
      out/video-twitter.mp4
    
    # Check file size and duration
    ffprobe -v error -show_entries format=duration,size -of csv=p=0 out/video-twitter.mp4
    ```
    
    ### LinkedIn
    
    LinkedIn prefers MP4 with AAC audio, max 10 minutes:
    
    ```bash
    # LinkedIn optimized
    ffmpeg -i out/video.mp4 \
      -c:v libx264 -preset medium -crf 22 \
      -profile:v main \
      -vf "scale='min(1920,iw)':'min(1080,ih)':force_original_aspect_ratio=decrease" \
      -c:a aac -b:a 192k -ar 48000 \
      -movflags +faststart \
      out/video-linkedin.mp4
    ```
    
    ### Website/Embed (Optimized for Fast Loading)
    
    ```bash
    # Web-optimized MP4 (small file, progressive loading)
    ffmpeg -i out/video.mp4 \
      -c:v libx264 -preset medium -crf 26 \
      -profile:v baseline -level 3.0 \
      -vf "scale=1280:720" \
      -c:a aac -b:a 128k \
      -movflags +faststart \
      out/video-web.mp4
    
    # WebM alternative (better compression, wider browser support)
    ffmpeg -i out/video.mp4 \
      -c:v libvpx-vp9 -crf 30 -b:v 0 \
      -vf "scale=1280:720" \
      -c:a libopus -b:a 128k \
      -deadline good \
      out/video-web.webm
    ```
    
    ### GIF (for Previews/Thumbnails)
    
    ```bash
    # High-quality GIF (first 5 seconds)
    ffmpeg -i out/video.mp4 -t 5 \
      -vf "fps=15,scale=480:-1:flags=lanczos,split[s0][s1];[s0]palettegen[p];[s1][p]paletteuse" \
      out/preview.gif
    
    # Smaller file GIF
    ffmpeg -i out/video.mp4 -t 3 \
      -vf "fps=10,scale=320:-1:flags=lanczos,split[s0][s1];[s0]palettegen[p];[s1][p]paletteuse" \
      out/preview-small.gif
    ```
    
    ### Platform Requirements Quick Reference
    
    | Platform | Max Resolution | Max Size | Max Duration | Audio |
    |----------|---------------|----------|--------------|-------|
    | YouTube | 8K | 256GB | 12 hours | AAC 48kHz |
    | Twitter/X | 1920x1200 | 512MB | 140s | AAC 44.1kHz |
    | LinkedIn | 4096x2304 | 5GB | 10 min | AAC 48kHz |
    | Instagram Feed | 1080x1350 | 4GB | 60s | AAC 48kHz |
    | Instagram Reels | 1080x1920 | 4GB | 90s | AAC 48kHz |
    | TikTok | 1080x1920 | 287MB | 10 min | AAC |
    
    ### Batch Export for All Platforms
    
    ```bash
    #!/bin/bash
    # save as: export-all-platforms.sh
    INPUT="out/video.mp4"
    
    # YouTube (high quality)
    ffmpeg -i "$INPUT" -c:v libx264 -preset slow -crf 18 \
      -c:a aac -b:a 192k -movflags +faststart \
      out/video-youtube.mp4
    
    # Twitter (compressed)
    ffmpeg -i "$INPUT" -c:v libx264 -crf 24 \
      -vf "scale='min(1280,iw)':'-2'" \
      -c:a aac -b:a 128k -movflags +faststart \
      out/video-twitter.mp4
    
    # LinkedIn
    ffmpeg -i "$INPUT" -c:v libx264 -crf 22 \
      -c:a aac -b:a 192k -movflags +faststart \
      out/video-linkedin.mp4
    
    # Web embed (small)
    ffmpeg -i "$INPUT" -c:v libx264 -crf 26 \
      -vf "scale=1280:720" \
      -c:a aac -b:a 128k -movflags +faststart \
      out/video-web.mp4
    
    echo "Exported:"
    ls -lh out/video-*.mp4
    ```
    
    ## Error Handling
    
    Common errors and fixes when processing video:
    
    ```bash
    # Check if FFmpeg succeeded
    ffmpeg -i input.mp4 -c:v libx264 output.mp4 && echo "Success" || echo "Failed: check input file"
    
    # Validate output file is playable
    ffprobe -v error -select_streams v:0 -show_entries stream=codec_name -of csv=p=0 output.mp4
    
    # Get detailed error info
    ffmpeg -v error -i input.mp4 -f null - 2>&1 | head -20
    ```
    
    ### Handling Common Failures
    
    | Error | Cause | Fix |
    |-------|-------|-----|
    | "No such file" | Input path wrong | Check path, use quotes for spaces |
    | "Invalid data" | Corrupted input | Re-download or re-record source |
    | "height not divisible by 2" | Odd dimensions | Add scale filter with trunc |
    | "encoder not found" | Missing codec | Install FFmpeg with full codecs |
    | Output 0 bytes | Silent failure | Check full ffmpeg output for errors |
    
    ---
    
    ## Feedback & Contributions
    
    If this skill is missing information or could be improved:
    
    - **Missing a command?** Describe what you needed
    - **Found an error?** Let me know what's wrong
    - **Want to contribute?** I can help you:
      1. Update this skill with improvements
      2. Create a PR to github.com/digitalsamba/claude-code-video-toolkit
    
    Just say "improve this skill" and I'll guide you through updating `.claude/skills/ffmpeg/SKILL.md`.
    
  • .claude/skills/qwen-edit/SKILL.mdskill
    Show content (2842 bytes)
    ---
    name: qwen-edit
    description: AI image editing prompting patterns for Qwen-Image-Edit. Use when editing photos while preserving identity, reframing cropped images, changing clothing or accessories, adjusting poses, applying style transfers, or character transformations. Provides prompt patterns, parameter tuning, and examples.
    ---
    
    # Qwen-Image-Edit Skill
    
    AI-powered image editing using Qwen-Image-Edit-2511 via RunPod serverless.
    
    **Status:** Evolving - learnings being captured as we experiment
    
    ## When to Use This Skill
    
    Use when the user wants to:
    - Edit/transform photos while preserving identity
    - Reframe cropped images (fix cut-off heads, etc.)
    - Change clothing, add accessories
    - Change pose (arm positions, hand placement)
    - Apply style transfers (cyberpunk, anime, oil painting)
    - Adjust lighting/color grading
    - Add/remove objects
    - Character transformations (Bond, Neo, etc.)
    
    ## When NOT to Use
    
    - **Background replacement (single image)** - creates cut-out artifacts, halos
    - **Face swapping** - cannot preserve identity from reference
    - **Outpainting** - can't extend canvas reliably
    
    ## Use With Care
    
    - **Multi-image compositing** - CAN work with explicit identity anchors (see examples.md for prompt patterns). Requires describing distinctive features (hair texture/color, ethnicity, outfit) and using guidance ~2.0
    - **Camera angle changes** - Inconsistent results. Vertical angles (low/high) work better than rotational (three-quarter view)
    
    ## Quick Reference
    
    ```bash
    # Basic edit
    python tools/image_edit.py --input photo.jpg --prompt "Add sunglasses"
    
    # With negative prompt (recommended)
    python tools/image_edit.py --input photo.jpg \
      --prompt "Reframe as portrait with full head visible" \
      --negative "blur, distortion, artifacts"
    
    # Style transfer
    python tools/image_edit.py --input photo.jpg --style cyberpunk
    
    # Background (use cautiously - often fails)
    python tools/image_edit.py --input photo.jpg --background office
    
    # Higher quality
    python tools/image_edit.py --input photo.jpg --prompt "..." --steps 16 --guidance 3.0
    
    # Multi-image composite (identity-preserving)
    python tools/image_edit.py --input person.jpg background.jpg \
      --prompt "The [ethnicity] [gender] with [hair description] from first image is now in [scene] from second image. Same [features], [outfit]." \
      --negative "different ethnicity, different hair color, different face shape, generic stock photo" \
      --steps 16 --guidance 2.0
    ```
    
    ## Key Files
    
    - `prompting.md` - Prompt patterns and structure
    - `examples.md` - Good/bad examples from experiments
    - `parameters.md` - Tuning steps, guidance, negative prompts
    
    ## Tool Location
    
    `tools/image_edit.py` - CLI wrapper for RunPod endpoint
    
    ## Related Docs
    
    - `docs/qwen-edit-patterns.md` - Character transformation patterns
    - `.ai_dev/qwen-edit-research.md` - Research notes
    

README

claude-code-video-toolkit

claude-code-video-toolkit — NARRATE ▸ SCORE ▸ GENERATE ▸ COMPOSE ▸ RENDER

GitHub release

An AI-native video production workspace for Claude Code. Skills, commands, templates, and tools that give Claude Code everything it needs to help you create professional videos — from concept to final render.

Quick Start

git clone https://github.com/digitalsamba/claude-code-video-toolkit.git
cd claude-code-video-toolkit
python3 -m pip install -r tools/requirements.txt   # Optional: AI voiceover, image gen, music, moviepy examples
claude                                              # Open Claude Code in the toolkit

Then in Claude Code:

/setup                    # Configure cloud GPU, storage, voice (~5 min, mostly free)
/video                    # Create your first video

That's it. /setup walks you through everything interactively — cloud GPU provider, file transfer, voice config. /video creates a project from a template and guides you through the whole workflow.

What's free: The toolkit leans heavily on open-source AI models — voiceovers (Qwen3-TTS), image generation (FLUX.2), music (ACE-Step), and more. You deploy them to your own cloud GPU account and run them at cost. Cloudflare R2 has a generous free tier (10GB, zero egress), and Modal gives $30/month free compute on the Starter plan — more than enough for a few 5-minute videos a month.

Requirements: Node.js 18+ and Claude Code. Python 3.9+ recommended for AI tools. FFmpeg optional.

Want to skip setup and just render something?

cd examples/hello-world && npm install && npm run render

No API keys needed — outputs an MP4 immediately.


A Note from the Author (not AI-generated)

I've spent months painstakingly putting this toolkit together and plan to keep iterating on it. AI makes things easier, but hard work still has huge value. Every video I create is a chance for improvement — every skill, template, tool, and workflow here has been refined through that cycle. It would be wonderful if others wanted to get involved with that: use it, refine it, and feed back into the repo via an issue or PR what you learn.

My own use case is fairly specific: creating sprint review videos for the AI mobile development arm of Digital Samba. But the idea behind this project is a reusable toolkit for using Claude Code to autonomously generate any kind of "explainer" style video — product demos, walkthroughs, presentations, whatever you need. Autonomous video creation is a lofty ideal for such a subjective field, but we can try :)

What makes this work is that Claude Code is fantastically resourceful and flexible — give it the framing and tooling that this toolkit provides and it will adapt it to create templates and videos based on your prompting. The skills, templates, and tools here are building blocks. Claude Code is the builder. You are the director, editor, and designer.

If you're getting started, run /setup then /video and let Claude Code guide you. Or start with /template to create a template for your own use case.

Cloud GPU — I recommend Modal for running the toolkit's AI tools. The Starter plan gives you $30/month free compute, which is more than enough. RunPod is also supported as an alternative. Run /setup to deploy the tools you need.

My motto: Be brave. Experiment. And please share any videos you create or ideas you have back with the project — it helps me keep improving this toolkit for everyone.

Features

Skills

Claude Code has deep knowledge in:

SkillDescription
remotionReact-based video framework — compositions, animations, rendering
elevenlabsAI audio — text-to-speech, voice cloning, music, sound effects
ffmpegMedia processing — format conversion, compression, resizing
playwright-recordingBrowser automation — record demos as video
frontend-designVisual design refinement for distinctive, production-grade aesthetics
qwen-editAI image editing — prompting patterns and best practices
acestepAI music generation — prompts, lyrics, scene presets, video integration
ltx2AI video generation — text-to-video, image-to-video clips, prompting guide
moviepyPython video composition — overlay text on LTX-2/SadTalker output, build.py-style projects
runpodCloud GPU — setup, Docker images, endpoint management, costs

Commands

CommandDescription
/setupFirst-time setup — cloud GPU, file transfer, voice, prerequisites
/videoVideo projects — list, resume, or create new
/scene-reviewScene-by-scene review in Remotion Studio
/designFocused design refinement session for a scene
/brandBrand profiles — list, edit, or create new
/templateList available templates or create new ones
/skillsList installed skills or create new ones
/contributeShare improvements — issues, PRs, examples
/record-demoRecord browser interactions with Playwright
/generate-voiceoverGenerate AI voiceover from a script
/redubRedub existing video with a different voice
/voice-cloneRecord, test, and save a cloned voice to a brand
/versionsCheck dependency versions and toolkit updates

Note: After creating or modifying commands/skills, restart Claude Code to load changes.

Templates

Pre-built video structures in templates/:

  • sprint-review — Sprint review videos with demos, stats, and voiceover
  • sprint-review-v2 — Composable scene-based sprint review with modular architecture
  • product-demo — Marketing videos with dark tech aesthetic, stats, CTA

See examples/ for finished projects you can learn from (oldest first, showing toolkit evolution):

DateDemoDescription
2025-12-05sprint-review-cho-oyuiOS sprint review with demos
2025-12-10digital-samba-skill-demoProduct demo showcasing Claude Code skill
2026-01-22ds-remote-mcpRemote MCP server demo (the jazz background music is a joke)
2026-01-25schlumbergeraAndroid sprint review video
2026-02-23cortinaMobile platforms sprint review
2026-03-15the-space-betweenAI-generated video essay — flux2 avatar, Qwen3-TTS voice, SadTalker animation
2026-04-08q2-townhall-longarm-adSuper Bowl-style launch ad with dramatic Qwen3-TTS announcer and LTX-2 animated Lugh cameo
2026-04-08q2-townhall-starsGitHub star history time-lapse with animated chart and deadpan-to-excited commentary

Scene Transitions

The toolkit includes a transitions library for scene-to-scene effects:

TransitionDescription
glitch()Digital distortion with RGB shift
rgbSplit()Chromatic aberration effect
zoomBlur()Radial motion blur
lightLeak()Cinematic lens flare
clockWipe()Radial sweep reveal
pixelate()Digital mosaic dissolution
checkerboard()Grid-based reveal (9 patterns)

Plus official Remotion transitions: slide(), fade(), wipe(), flip()

Preview all transitions:

cd showcase/transitions && npm install && npm run studio

See lib/transitions/README.md for full documentation.

Brand Profiles

Define visual identity in brands/. When you create a project with /video, the brand's colors, fonts, and styling are automatically applied.

brands/my-brand/
├── brand.json    # Colors, fonts, typography
├── voice.json    # ElevenLabs voice settings
└── assets/       # Logo, backgrounds

Included brands: default, digital-samba

Create your own with /brand.

Project Management System

Video projects are tracked through a multi-session lifecycle:

planning → assets → review → audio → editing → rendering → complete

Each project has a project.json that tracks:

  • Scenes — What to show, asset status, visual types
  • Audio — Voiceover and music status
  • Sessions — Work history across Claude Code sessions
  • Phase — Current stage in the workflow

The system automatically reconciles intent (what you planned) with reality (what files exist), and generates a CLAUDE.md per project for instant context when resuming.

See lib/project/README.md for schema details, scene status tracking, and filesystem reconciliation logic.

Python Tools

Audio, video, and image tools in tools/:

# Generate voiceover (ElevenLabs)
python tools/voiceover.py --script script.md --output voiceover.mp3

# Generate voiceover (Qwen3-TTS — self-hosted, cheaper alternative)
python tools/voiceover.py --provider qwen3 --speaker Ryan --scene-dir public/audio/scenes --json
python tools/qwen3_tts.py --text "Hello world" --tone warm --output hello.mp3

# Generate background music (ElevenLabs)
python tools/music.py --prompt "Upbeat corporate" --duration 120 --output music.mp3

# Generate background music (ACE-Step — free cloud API, XL Turbo 4B model)
python tools/music_gen.py --preset corporate-bg --duration 120 --output music.mp3
python tools/music_gen.py --prompt "Dramatic cinematic" --duration 30 --bpm 90 --key "D Minor" --output reveal.mp3
python tools/music_gen.py --prompt "Upbeat indie rock" --duration 60 --variations 4 --output intro.mp3

# Generate sound effects
python tools/sfx.py --preset whoosh --output sfx.mp3

# Redub video with different voice
python tools/redub.py --input video.mp4 --voice-id VOICE_ID --output dubbed.mp4

# Add background music to existing video
python tools/addmusic.py --input video.mp4 --prompt "Subtle ambient" --output output.mp4

# Rebrand NotebookLM videos (trim outro, add your logo/URL)
python tools/notebooklm_brand.py --input video.mp4 --logo logo.png --url "mysite.com" --output branded.mp4

# AI image editing (style transfer, backgrounds, custom prompts)
python tools/image_edit.py --input photo.jpg --style cyberpunk --cloud modal
python tools/image_edit.py --input photo.jpg --prompt "Add sunglasses" --cloud modal

# AI image upscaling (2x/4x)
python tools/upscale.py --input photo.jpg --output photo_4x.png --cloud modal

# Remove watermarks (requires cloud GPU)
python tools/dewatermark.py --input video.mp4 --preset sora --output clean.mp4 --cloud modal

# Locate watermark coordinates
python tools/locate_watermark.py --input video.mp4 --grid --output-dir ./review/

# Generate talking head video from image + audio (SadTalker)
python tools/sadtalker.py --image portrait.png --audio voiceover.mp3 --output talking.mp4 --cloud modal

# AI image generation (FLUX.2 Klein 4B — text-to-image + editing)
python tools/flux2.py --prompt "A sunset over mountains" --cloud modal
python tools/flux2.py --preset title-bg --brand digital-samba --cloud modal
python tools/flux2.py --list-presets

# AI video generation (LTX-2.3 22B — text-to-video + image-to-video)
python tools/ltx2.py --prompt "A sunset over the ocean, cinematic" --cloud modal
python tools/ltx2.py --prompt "Gentle camera drift" --input photo.jpg --cloud modal

Tool Categories:

TypeToolsPurpose
Projectvoiceover, music, music_gen, sfxUsed during video creation workflow
Utilityredub, addmusic, notebooklm_brand, locate_watermarkQuick transformations, no project needed
Cloud GPUimage_edit, upscale, dewatermark, sadtalker, qwen3_tts, flux2, music_gen, ltx2AI processing via Modal or RunPod

Cloud GPU (Modal + RunPod)

8 AI tools run on cloud GPUs. Use --cloud modal (recommended) or --cloud runpod on any tool.

ToolWhat It DoesEst. Cost
qwen3_ttsAI text-to-speech (9 speakers, voice cloning)~$0.01
flux2AI image generation & editing~$0.02
image_editAI image editing & style transfer~$0.03
upscaleAI image upscaling (2x/4x)~$0.01
music_genAI music generation (8 scene presets)Free (acemusic) / ~$0.05 (self-hosted)
sadtalkerTalking head video from portrait + audio~$0.10
ltx2AI video generation (text-to-video, image-to-video)~$0.23
dewatermarkVideo watermark removal~$0.10

Modal (recommended): Each tool deploys from docker/modal-*/app.py — Modal builds and hosts the containers. $30/month free compute on the Starter plan, typical usage is $1-2/month. Run /setup to deploy all tools automatically.

RunPod (alternative): Uses pre-built Docker images from ghcr.io/conalmullan/video-toolkit-*. Pay-per-second, no minimums. Run python3 tools/<tool>.py --setup to create endpoints.

See docs/modal-setup.md and docs/runpod-setup.md for details.

Project Structure

claude-code-video-toolkit/
├── .claude/
│   ├── skills/          # Domain knowledge for Claude
│   └── commands/        # Slash commands (/video, /brand, etc.)
├── lib/                 # Shared components, theme system, utilities
│   ├── components/      # Reusable video components (11 components)
│   ├── transitions/     # Scene transition effects (7 custom + 4 official)
│   ├── theme/           # ThemeProvider, useTheme
│   └── project/         # Multi-session project system
├── tools/               # Python CLI tools
├── templates/           # Video templates
├── brands/              # Brand profiles
├── projects/            # Your video projects (gitignored)
├── examples/            # Curated showcase projects with finished videos
├── assets/              # Shared assets
├── playwright/          # Recording infrastructure
├── docs/                # Documentation
└── _internal/           # Toolkit metadata & roadmap

Documentation

Video Workflow

/video → Script → Assets → Scene Review → Design → Audio → Preview → Render
  1. Create project — Run /video, choose template and brand
  2. Review script — Edit VOICEOVER-SCRIPT.md to plan content and assets
  3. Gather assets — Record demos with /record-demo or add external videos
  4. Scene review — Run /scene-review to verify visuals in Remotion Studio
  5. Design refinement — Use /design to improve slide visuals with the frontend-design skill
  6. Generate audio — AI voiceover with /generate-voiceover
  7. Configure — Update config file with asset paths and timing
  8. Previewnpm run studio for live preview
  9. Iterate — Work with Claude Code to adjust timing, styling, content
  10. Rendernpm run render for final MP4

Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

License

MIT License — see LICENSE for details.


Built for use with Claude Code by Anthropic.