my-server
← Back to Blog

title: "Getting Wan2.2 to Actually Run: A Fox, a Single Frame, and a 4 GB Wall" slug: "wan22-video-generation-debugging" published: 2026-05-31 tags: [local-ai, video-generation, wan2.2, stable-diffusion-cpp, vulkan, loch-nessh, amd, cortex] summary: "Every video request was returning a 1-frame WebM. Debugging through a broken API shape, a fabricated Vulkan workaround, and a hard allocation ceiling to get a fox to run across a forest — at 224x128." menu_title: "Getting Wan2.2 to Run" series: "cortex-fleet" series_part: 4 draft: false

Getting Wan2.2 to Actually Run: A Fox, a Single Frame, and a 4 GB Wall

I'd had the Wan2.2 model loaded on ness-linux3 for a few days. The deployment manifest was in git, the weights were on disk, loch-nessh would load the pod on demand. Everything looked right. But every video generation request came back the same way: a tiny WebM file, correct dimensions, VP8 codec — and exactly one frame.

This is the story of finding out why, and what it took to get past it.


The symptom

/tmp/fox-10s.webm — 65 KB, 512×512, 16 fps, 1 frame. Duration: 0.001 seconds.

$ ffprobe -v quiet -select_streams v:0 -count_frames \
    -show_entries stream=nb_read_frames,r_frame_rate,width,height \
    -of default=noprint_wrappers=1 /tmp/fox-10s.webm

width=512
height=512
r_frame_rate=16/1
nb_read_frames=1

The filename said 10 seconds. The file said one frame. Something was wrong before the model even started generating.


Bug one: the request shape

loch-nessh translates OpenAI-format image/video requests into the sdcpp vid_gen HTTP API. When I looked at what it was actually sending, I found this in executor.rs:

let vid_body = serde_json::json!({
    "prompt": req.get("prompt")...,
    "sample_params": {
        "width":        width,
        "height":       height,
        "video_frames": req.get("video_frames")...unwrap_or(33),
        "fps":          ...,
        "sample_steps": ...,
        "cfg_scale":    ...,
    },
    "output_format": "webm",
});

The parameters were wrapped inside a sample_params object. I tested a job directly against the sdcpp pod to find out what it actually expected:

# nested — always 1 frame
curl .../sdcpp/v1/vid_gen -d '{"prompt":"test","sample_params":{"video_frames":5,...}}'

# flat — correct frame count
curl .../sdcpp/v1/vid_gen -d '{"prompt":"test","video_frames":5,"width":256,"height":256,...}'

Five frames with flat params, one frame with nested params. The sdcpp server accepted the nested body without error — it just silently fell back to its default of one frame when it didn't find video_frames at the top level. The fix was straightforward: unwrap the object, hoist everything to the root.

I fixed it, rebuilt, pushed, redeployed. Tested a small case. Worked: 5 frames, 256×256, confirmed with ffprobe. Great.


The image

Before going further, here's the fox that started this whole investigation. This was generated with FLUX.2-dev earlier in the session — a still image that became the conceptual target for the video:


Bug two: the 4 GB wall

With the API shape fixed, I tried to generate an actual 10-second video. 160 frames at 16 fps, 832×480 — the native Wan2.2 resolution. The pod received the request, logged:

[WARN] stable-diffusion.cpp — align video frames from 160 to 157 for Wan 2.x
[INFO] generate_video 832x480x157
ggml_vulkan: Device memory allocation of size 31539456000 failed.
ggml_vulkan: Requested buffer size exceeds device buffer size limit: ErrorOutOfDeviceMemory
[ERROR] Wan2.x-T2V-14B: failed to allocate the compute buffer

31.5 GB requested. Failed immediately. I checked the Vulkan device properties inside the pod:

maxMemoryAllocationSize = 0xfffffffc   # 4 GB

There it is. The RADV driver on gfx1151 (the Ryzen AI Max 395+'s iGPU) has a 4 GB hard ceiling per Vulkan memory allocation. This isn't a total VRAM limit — the hardware has 96 GB of unified memory pinned to the GPU — it's a per-allocation limit enforced by the Vulkan API layer.

ggml's video attention mechanism allocates a single contiguous buffer for the spatial-temporal attention computation. The size of that buffer scales roughly as:

buffer_size ∝ seq²
seq = (width/8) × (height/8) × ((frames-1)/4 + 1)

For 832×480 at 33 frames: seq = 104 × 60 × 9 = 56,160 → buffer ~31 GB. For 157 frames: seq = 249,600 → buffer ~587 GB. Even the minimum frame count at the native resolution blows the ceiling by 7×.

I dropped resolution and recalculated. The relationship held across every test:

ResolutionFramesBuffer neededResult
832×4803331 GBOOM
832×480157587 GBOOM
256×25616118.9 GBOOM
224×1281613.3 GB
256×2565~44 MB

224×128 at 161 frames fits. Barely — 3.3 GB against a 4 GB ceiling. But it fits.


The detour: a document with a made-up flag

At some point during the debugging, a document appeared with the heading "Force Vulkan Memory Splitting" and this suggestion:

GGML_VK_FORCE_CONCURRENT=1 ./sd-cli ...

The claim was that this environment variable would split ggml's Vulkan compute allocations across multiple physical buffers, bypassing the 4 GB ceiling. It cited GitHub issues, Reddit threads, StackOverflow — the full bibliography.

I checked:

strings /app/sd-server | grep -i "FORCE_CONCURRENT\|VK_FORCE\|GGML_VK"
# (only GGML_VULKAN_DEVICE appears)

Not in the binary. The string doesn't exist anywhere in the compiled sd-server. The document was AI-generated, and that specific flag was fabricated. The other suggestion in the document — --vae-tiling — is real and confirmed in the sd-server --help output. But the memory splitting flag is fiction.

The actual fix for the 4 GB limit is a ROCm/HIP rebuild of stable-diffusion.cpp (cmake -DSD_HIP=ON). ROCm treats the GPU memory as a unified contiguous pool and doesn't impose Vulkan's per-allocation ceiling. That's the next step. For now: 224×128.


The generation

curl -X POST http://ness-linux3:32100/v1/images/generations \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "wan22-video",
    "prompt": "a fox running through a forest, cinematic",
    "size": "224x128",
    "video_frames": 161,
    "fps": 16
  }'

Submitted. The pod loaded (38 GB VRAM reserved). HighNoise pass: 520 seconds. LowNoise pass: 520 seconds. VAE decode: 11 seconds. Total: 17 minutes and 32 seconds for a 10-second video at 224×128.

$ ffprobe -v quiet -select_streams v:0 -count_frames \
    -show_entries stream=nb_read_frames,r_frame_rate,width,height \
    -of default=noprint_wrappers=1 /tmp/fox10s_224x128.webm

width=224
height=128
r_frame_rate=16/1
nb_read_frames=161

161 frames. 10.06 seconds. The fox runs.


What's next

The resolution ceiling is a solvable problem. ness-linux3 has /dev/kfd mounted in all pods — ROCm is already accessible. A Dockerfile.sd-hip with cmake -DSD_HIP=ON and HSA_OVERRIDE_GFX_VERSION=11.0.0 for gfx1151 compatibility should lift the constraint entirely. No BIOS changes, no GRUB changes — ROCm and Vulkan run side by side, the LLM pods don't need to change.

There's also the question of I2V (image-to-video) continuation — generating a sequence of connected clips rather than one low-resolution shot. That requires the Wan2.2-I2V-A14B weights, which aren't downloaded yet. But the plumbing is there.

For now: the fox runs. It runs at 224×128, it takes 17 minutes, and it took most of a day to get there. But it runs.


The fixes, in summary

  1. loch-nessh executor: sample_params wrapper removed; vid_gen params are now flat top-level fields (commit 0651e30)
  2. Vulkan ceiling: maxMemoryAllocationSize = 4 GB on RADV gfx1151; buffer scales as seq²; practical limit for 10s@16fps is 224×128
  3. Fabricated workaround: GGML_VK_FORCE_CONCURRENT is not a real flag — do not use
  4. Real fix pending: ROCm/HIP rebuild of sd-server