Back to Resources
6 min read
Video GenerationMotion Control

Kling 3.0 Motion Control

Before anything else, you need a source video of yourself. This is the puppet master — the motion that Kling 3.0 extracts and applies to whatever character you choose.

Kling 3.0 Motion Control

Start Here: Record Yourself First

Before anything else, you need a source video of yourself. This is the puppet master — the motion that Kling 3.0 extracts and applies to whatever character you choose.

⌨️ Ready to build? All the scripts, skills, and blueprints to run these models are inside the library. 👉 Join the Skool Community: https://www.skool.com/augmented-ai-automations-1536/about 👉 Need Bespoke AI Automations: https://calpro.augmentedstartups.com/

I recorded a simple talking-head video. Nothing fancy. Just me, decent lighting, and a clean background. The key is that your body movement — head turns, gestures, expressions — becomes the driver for the final character. The better your source video, the more natural the puppet looks.

Keep it conversational. Move naturally. Don't try to be stiff or precise — natural motion translates better than rehearsed movement.


Step 1 — Build Your Image Generation Dashboard (Vibe Coding)

Rather than generating each celebrity image one by one in a manual UI, I built an automation dashboard using Cursor. This is what people are calling vibe coding — you describe what you want, Cursor builds it, you iterate until it works.

The whole point of the dashboard is to lock in a base prompt that never changes. You only swap out the celebrity name and outfit per row. No re-typing. No copy-pasting prompts across tabs. One table, one click, all generations dispatched.

The Base Prompt

Every image generation uses this same base prompt — only {celebrity} and {outfit} change:

📋 Transform the person in the reference image to look exactly like {celebrity}, wearing {outfit}. Keep the exact same background, same room, same position. The subject is forward-facing, looking directly into the camera. Ultra-realistic photographic quality, professional portrait photography, photorealistic skin texture, highly detailed facial features matching {celebrity}'s likeness precisely. Natural lighting consistent with the original scene.


Step 2 — Add Your Celebrities and Outfits

With the dashboard set up, I added my list of characters. For this guide I used:

  • Emily Ratajkowski
  • Tony Stark — Robert Downey Jr.
  • Steve Jobs (black turtleneck)
  • Black Widow

The dashboard also has an outfit field per row. Each celebrity gets their signature look. Tony Stark gets the suit. Steve Jobs gets the turtleneck. Each row = one generation. You can add as many as you want — the API handles them all.


Step 3 — Auto-Generate All Images via Sjinn

Because Sjinn has a proper API, the dashboard calls it directly. One click. Every celebrity image generates simultaneously — no waiting, no clicking through a manual UI, no browser tab management.

This is the key difference versus tools like Higgsfield or other platforms that require you to manually click generate, wait, download, repeat — one image at a time. With the Sjinn API approach, you dispatch all generations at once and collect the results when they're ready. Four celebrities = four API calls = done in the same time it used to take to do one.


Step 4 — Set Your Reference Frame

Once the images were generated, I took a screenshot of one of the outputs and used it as the reference image in the Kling 3.0 prompt.

The critical instruction in the prompt: do not change the background. Only change the character. The background stays consistent across all clips — same room, same lighting, same position. Only the person changes. This is what makes the final edited video look seamless when you cut between celebrities.


Step 5 — Clip Your Source Video

In any standard video editor, I manually cut my source video into sections — one section per celebrity. I was deliberate about where the cuts landed. The timing and natural movement at the cut point makes a big difference to how smooth the final result looks.

If you want to automate this step, it's entirely possible — as long as your video has a consistent cadence. For example, if you want to swap character every 5 seconds, you can build a simple script that auto-splits the video at fixed intervals and feeds each clip to the corresponding celebrity image. You could vibe-code this directly into your dashboard.

⚠️ Important: Kling requires each video clip to be more than 3 seconds. Keep that in mind when cutting. Very short clips will be rejected.

A note on cost: you could convert your entire source video for every celebrity, but that gets expensive fast. Be selective. Pick your best segments — the moments where your movement is natural and expressive.


Step 6 — Motion Control via Sjinn

With reference images and video clips ready, I went back to Sjinn to run the motion control step. For this round I did it manually — uploading each video clip with its matching celebrity image.

I spoke to Jax, the founder of Sjinn, and he confirmed automated motion control is being added to the platform within a day. When that's live, the entire pipeline from images to animated clips becomes one-click automated.

You could also run the motion control step via fal.ai — both work well. I used Sjinn because I had credits to use up and the output quality is solid.

The future version of this dashboard auto-splits your source video AND auto-feeds each clip to its corresponding celebrity image for motion control. Video in, finished clips out. Fully automated. The pieces are all there — it just needs to be connected.


Step 7 — Connect the Dashboard to OpenClaw 🦞

The final piece I added was exposing the dashboard as API endpoints and connecting them to OpenClaw 🦞. Now my AI agents can trigger celebrity image generations directly — without me touching the dashboard at all. These are just some of the agents that I created:

I tell OpenClaw 🦞 which celebrity I want and what they're wearing. It calls the dashboard. The dashboard calls Sjinn. The image comes back. I spend zero time in any UI.

This is the shift from a fast manual process to a fully automated content engine. The first version takes an afternoon to build. After that, generating celebrity content is a voice command.


Where This Gets Useful for Business

The celebrity examples are the pattern interrupt — they stop the scroll. But the real applications go well beyond entertainment:

  • Training videos with a consistent branded presenter — no camera setup, no scheduling, no studio
  • Personalised sales videos at scale — same script, different character adapted for different audience segments
  • Internal communications that actually get watched — swap your avatar into a character your team responds to
  • Product demos with a controlled, consistent avatar — no bad hair days, no retakes, no travel
  • Multilingual content — same character, different voiceover, different market, produced automatically

Learn the Full Workflow — Live, With Me

If you want to build this end-to-end — from vibe-coding the dashboard to Kling 3.0 motion control to connecting it all to OpenClaw 🦞 — we cover it live inside the Corporate Automation Library (CAL).

You get the full tutorial — every step shown, every prompt shared, every tool explained. Plus live group Q&A sessions where you can ask questions as you build and get unblocked fast.

We don't just cover video automation. The community covers the full corporate automation stack — n8n workflows that eliminate repetitive admin, OpenClaw 🦞 agents that run your operations, content pipelines that produce while you sleep, and the automation thinking that makes all of it compound over time.

105 members already in. 4 founding spots left at $19/month — price goes to $29 at 200 members.

👉 Join CAL here — lock in your founding rate before it closes.


Ritesh Kanjee — Augmented AI

Kling 3.0 Motion Control | Augmented AI Automations