Z-Image is a newly released open-source image generation model from the Tongyi-MAI team at Alibaba. It is built on a large-scale diffusion transformer architecture that unifies text and image understanding into a single model, allowing it to generate high-quality images with strong prompt adherence and consistent visual detail.
If you’re thinking about purchasing a new GPU, we’d greatly appreciate it if you used our Amazon Associate links. The price you pay will be exactly the same, but Amazon provides us with a small commission for each purchase. It’s a simple way to support our site and helps us keep creating useful content for you. Recommended GPUs: RTX 5090, RTX 5080, and RTX 5070. #ad
One of the most interesting aspects of Z-Image is that it balances quality and performance well. While the base model focuses on visual fidelity and flexibility, it also supports quantized formats such as GGUF, making it much more accessible for local use on consumer hardware. This is especially useful for creators who want to experiment with modern image models without relying on cloud services.
In this article, I’ll walk through how to use Z-Image GGUF in ComfyUI. ComfyUI’s node-based workflow makes it easy to integrate large models, test prompts, and iterate visually. If you’re already using ComfyUI for image or video generation, Z-Image fits naturally into an existing local workflow.
Z-Image Models
- GGUF Models: You can find the GGUF models here. You only need one model. I have a RTX 5090, and I use the Q8 variant. I downloaded z_image_turbo-Q8_0.gguf. If your GPU has less VRAM, consider the Q5 or Q4 variants. Put the GGUF model in ComfyUI\models\unet\ .
- Text Encoder: Download qwen_3_4b.safetensors and put it in ComfyUI\models\text_encoders\ .
- VAE: Download ae.safetensors and put it in ComfyUI\models\vae\ .
Z-Image Workflow Installation
- Update your ComfyUI to the latest version if you haven’t already. (Run update\update_comfyui.bat for Windows). Depending on which gguf custom node you installed before, you also need to update the ComfyUI-GGUF or gguf custom node to the latest version if you have not updated it recently.
- Download the json file and open it using ComfyUI.
- Use ComfyUI Manager to install missing nodes.
- Restart ComfyUI.
Nodes
Select the GGUF model you downloaded here.
qwen_3_4b is used as the text encoder. Note that the type is lumina2.
Z-Image uses Flux.1’s VAE.
Specify the image size here. The range is 512×512 to 2048×2048.
You can enter negative prompt now, which is not possible with Z-Image-Turbo.
Recommended steps: 28 ~ 50. Recommended cfg: 3 ~ 5.
Z-Image GGUF Examples
The following examples are using 28 steps. It takes about 64 seconds to generate a 1152 x 2048 image on my RTX 5090 after the models have been loaded. It takes about 6 seconds to generate an image of the same size using Z-Image-Turbo with 4 steps.
Ultra-realistic portrait of an East Asian woman with warm natural skin tone, soft diffused daylight, crisp facial details, natural pores and fine hair texture, minimal makeup, slight smile, smooth gradient background, shallow depth of field, cinematic realism, perfect color accuracy, lifelike eyes, gentle catchlights, high dynamic range, 8K photo aesthetic.
Hyper-realistic close-up portrait of a Black man with deep rich skin texture, natural sheen, tight curls, expressive warm eyes, subtle facial hair, precise shadows, Rembrandt lighting, extremely detailed pores, realistic highlights, neutral dark background, professional portrait look, ultra-sharp realism.
Ultra-detailed portrait of a South Asian woman wearing traditional gold earrings, soft warm skin tone, intricate hair strands, authentic facial texture, natural makeup, ambient window light, soft bokeh background, lifelike colors, elegant realism, 8K clarity, professional studio depth of field.
Photorealistic portrait of a Latino man with defined jawline, subtle beard texture, sun-kissed skin, detailed pores, warm directional sunlight, slight backlight rim on hair, soft bokeh city background, crisp sharp focus on the eyes, authentic natural expression, HDR realism.
Ultra-realistic portrait of a Middle Eastern woman with expressive eyes, long dark hair, smooth warm olive skin tone, subtle makeup, natural reflections in the eyes, fine eyebrow details, high-precision lighting, matte background, strong facial realism, soft cinematic shadows.
Photorealistic street portrait of a stylish mixed-race woman walking in a city street at golden hour. Natural skin texture, warm highlights, realistic hair movement, soft bokeh from street lights, high contrast rim light, accurate shadows, natural expression, 8K fashion photography feel.
Conclusion
Z-Image brings a modern diffusion transformer model into the open-source ecosystem, offering strong image quality, good prompt control, and flexible deployment options. Using the GGUF version allows the model to run efficiently on local machines, making it practical for everyday experimentation and creative work.
When combined with ComfyUI, Z-Image becomes even more approachable. The visual workflow makes it easy to adjust parameters, refine prompts, and integrate the model into larger pipelines without writing code. Whether you are testing new image styles, building reusable workflows, or exploring next-generation diffusion models, Z-Image GGUF in ComfyUI is a solid option worth trying.
References
Further Reading
Generate Realistic Images with Z-Image-Turbo GGUF in ComfyUI






Leave a Reply