Simple ComfyUI Workflow for WAN2.1 Image-to-Video (i2v) Using GGUF Models

Creating high-quality AI-generated videos from images has become more accessible with the latest advancements in Stable Diffusion models. WAN2.1 is a powerful method for generating smooth motion from a single image, and when paired with ComfyUI, it offers a highly customizable and efficient workflow.

In this guide, we’ll walk through a simple ComfyUI workflow for WAN2.1 image-to-video generation using GGUF models, making it possible to run on lower-power GPUs without sacrificing quality. Based on the official WAN2.1 workflow, this setup is optimized for ease of use while maintaining flexibility for creative control.

Models Download

  • GGUF Models: The 480p WAN2.1 i2v models are available here.I have a RTX 3090 with 24GB VRAM and I used wan2.1-i2v-14b-480p-Q8_0.gguf. If you have less VRAM, use the other variants like Q5 or Q6. The model goes to ComfyUI\models\unet directory. The 720p WAN2.1 i2v models are here.
  • Text Encoder: Download umt5_xxl_fp8_e4m3fn_scaled.safetensors and place it in ComfyUI\models\text_encoders
  • Clip Vision: Download clip_vision_h.safetensors and place it in ComfyUI\models\clip_vision
  • VAE: Download wan_2.1_vae.safetensors and place it in ComfyUI\models\vae

Installation

  • Update your ComfyUI to the latest version.
  • Drag the full size image to your ComfyUI canvas.
  • Use ComfyUI Manager to install any missing nodes.

Nodes

This loads the GGUF model

Use this one if you want to use the original diffusion model. Connect this to the ModelSamplingSD3 node

This loads the clip model

This specifies the shift. Use 3 for 480p model and 5 for 720p model

Positive prompt and negative prompt. You can use ChatGPT to help you write the prompt if you are not sure how to describe the motion. The negative prompt is from Wan2.1’s repository.

Specify the size and length here. Length is frames of video you want to generate. For default frame rate 16, 81 frams is about 5 seconds.

Sampler node. Steps 20 is enough for a smooth video. Default cfg is 5, and default sampler_name is uni_pc.

This loads the VAE model.

Default frame rate is 16. crf controls the quality of the video generated. Usually the lower the better. However don’t go lower than 17.

 

Default Settings Summeary

Looking through the code on WAN2.1 repository, here are some default settings for WAN2.1.

  • Frame Rate: 16
  • Shift: 5 for 720p and 3 for 480p
  • cfg: 5
  • Sampler: uni_pc
  • Negative prompt: 色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走

 

Examples

480p Q8_0 model 336×592 49 frames 20 steps

480×848 49 frames 20 steps

Prompt: The girl uses one hand to gently touch her hair. she tilts her head gently. her eyes blink naturally. the camera slowly zooms in for a soft-focus close-up

336×592 49 frames 20 steps

480×848 49 frames 20 steps

Prompt: The girl walks forward slowly. the camera zooms in slowly.

720p Q5_0 model 720 x 1280 49 frames 20 steps

The prompt is the same as the first example.

Here are running times on my RTX 3090

i2v 480p Q8_0 model 

480 x 608 33 frames 20 steps 4:42
480 x 608 81 frames 30 steps 22:44
336 x 592 49 frames 20 steps 4:37

i2v 720p Q5_0 model 

720 x 1280 49 frames 20 steps 44:45

 


Conclusion

Using WAN2.1 with ComfyUI provides an intuitive yet powerful way to generate AI-driven motion from still images. I have tried quite a few image to video models and this is by far the best model(3/3/2025). By leveraging GGUF models, this workflow makes it easier to achieve smooth animations even on devices with limited computing power.

Whether you’re experimenting with AI video creation or looking for an efficient way to integrate WAN2.1 into your workflow, this guide offers a solid foundation to get started. As ComfyUI and GGUF models continue to evolve, even more possibilities for high-quality image-to-video generation will emerge.

Further Reading

Simple ComfyUI Workflow for WAN2.1 Text-to-Video (t2v) Using GGUF Models

Simple ComfyUI Workflow for Hunyuan Image-to-Video (i2v) Using GGUF Models


This post may contain affiliated links. When you click on the link and purchase a product, we receive a small commision to keep us running. Thanks.


Be the first to comment

Leave a Reply