{
  "id": "165e90cf-4df4-4363-b6a6-0c659f70fc84",
  "revision": 0,
  "last_node_id": 4,
  "last_link_id": 2,
  "nodes": [
    {
      "id": 1,
      "type": "LoadImage",
      "pos": [
        -456.7727933750763,
        726.069700733737
      ],
      "size": [
        490,
        490
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "slot_index": 0,
          "links": [
            1
          ]
        },
        {
          "name": "MASK",
          "type": "MASK",
          "slot_index": 1,
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfy-core",
        "ver": "0.3.46",
        "Node name for S&R": "LoadImage",
        "ue_properties": {
          "version": "7.0.1",
          "widget_ue_connectable": {}
        }
      },
      "widgets_values": [
        "ComfyUI_temp_quizc_00005_.png",
        "image"
      ]
    },
    {
      "id": 2,
      "type": "AILab_QwenVL",
      "pos": [
        91.69069884428215,
        722.7827861975916
      ],
      "size": [
        642.9140661541601,
        684.9041511829064
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [
        {
          "label": "image",
          "name": "image",
          "shape": 7,
          "type": "IMAGE",
          "link": 1
        },
        {
          "label": "video",
          "name": "video",
          "shape": 7,
          "type": "IMAGE",
          "link": null
        }
      ],
      "outputs": [
        {
          "label": "text",
          "name": "text",
          "type": "STRING",
          "links": [
            2
          ]
        }
      ],
      "properties": {
        "cnr_id": "ComfyUI-QwenVL",
        "ver": "0a1790fbea3da3e99638f6fb7bba17b3fc5ce659",
        "Node name for S&R": "AILab_QwenVL",
        "ue_properties": {
          "version": "7.0.1",
          "widget_ue_connectable": {}
        }
      },
      "widgets_values": [
        "Qwen3-VL-4B-Instruct",
        "8-bit (Balanced)",
        "Prompt Style - Detailed",
        "You are an experienced film concept designer and video generation expert. Based on the given image, conduct a detailed analysis and generate a highly detailed and professional video prompt in JSON format for a 5-second video.\nPlease strictly adhere to the following JSON structure and content specifications. Each field should be as specific, vivid, and imaginative as possible to capture real-world filmmaking details.\n--------------------------------------------------------------------------------\n**JSON Structure Template:**\n{\n  \"shot\": {\n    \"composition\": \"string\",\n    \"camera_motion\": \"string\"\n  },\n  \"subject\": {\n    \"description\": \"string\",\n    \"wardrobe\": \"string\" // Use \"null\" if the subject is an animal or has no specific wardrobe\n  },\n  \"scene\": {\n    \"location\": \"string\",\n    \"time_of_day\": \"string\",\n    \"environment\": \"string\"\n  },\n  \"visual_details\": {\n    \"action\": \"string\",\n    \"props\": \"string\", // Use \"null\" if there are no props\n    \"action_sequence\": \"array of objects\"\n  },\n  \"cinematography\": {\n    \"lighting\": \"string\",\n    \"tone\": \"string\"\n  }\n}\n--------------------------------------------------------------------------------\n**Content Generation Guidelines (Please keep these principles in mind during generation):**\n\n**1. shot**\n*   **composition**: Describe the shot type in detail (e.g., wide shot, medium shot, close-up, long shot), focal length (e.g., 35mm lens, 85mm lens, 50mm lens, 100mm macro telephoto lens, 26mm equivalent lens), camera equipment (e.g., Sony Venice, ARRI Alexa series, RED series, iPhone 15 Pro Max, DJI Inspire 3 drone), and depth of field (e.g., deep depth of field, shallow depth of field).\n*   **camera_motion**: Precisely describe how the camera moves (e.g., smooth Steadicam arc, slow lateral dolly, static, handheld shake, slow pan, drone orbit, rising crane).\n\n**2. subject**\n*   **description**: Provide an extremely detailed depiction of the subject, including their age (e.g., 25-year-old, 23-year-old, 40-year-old, 92-year-old), gender, ethnicity (e.g., Chinese female, Egyptian female, K-pop artist, European female, East Asian female, African male, Korean female, German female, Italian female, Japanese), body type (e.g., slender and athletic), hair (color, style), and any unique facial features. For non-human subjects (e.g., beluga whale, phoenix, emu, golden eagle, duck, snail), describe their physical characteristics in detail.\n\n**3. scene**\n*   **location**: Specify the exact shooting location.\n*   **time_of_day**: State the specific time of day (e.g., dawn, early morning, morning, midday, afternoon, dusk, night).\n*   **environment**: Provide a detailed environmental description that captures the atmosphere and background details.\n\n**4. visual_details**\n*   **action**: A general summary of the action depicted in the video.\n*   **action_sequence**: To enhance the visual tension of the generated 5s video, analyze the image and expand upon it creatively. Design a key action for each second, using the format `\"0-1s: subject + action\"` to briefly and precisely describe the action occurring in that second.\n*   **props**: List all relevant props and elements in the scene (e.g., silver-hilted sword, campfire, candelabra, matcha latte and cheesecake, futuristic motorcycle). If there are no props in the scene, this field should be explicitly set to `\"null\"`.\n\n**5. cinematography**\n*   **lighting**: Describe the light source, quality, color, and direction in detail (e.g., natural dawn light softened by fog, campfire as the key light, natural sunlight through stained glass windows, soft HDR reflections, warm tungsten light and natural window light).\n*   **tone**: Capture the abstract emotional or stylistic feel of the video (e.g., \"fierce, elegant, fluid\", \"mystical, elegant, enchanting\", \"hyperrealistic with an ironic, dark comedic twist\", \"dreamy, serene, emotionally healing\", \"documentary realism\", \"epic, majestic, awe-inspiring\", \"wild, dynamic, uninhibited\").\n\n--------------------------------------------------------------------------------\n**Additional Considerations for Prompt Generation:**\n\n*   **Granularity of Detail**: The LLM should understand that every field requires as much specific detail as possible, not generalizations. For example, instead of writing \"a woman,\" write \"a 25-year-old Chinese female with long black hair tied back with a silk ribbon, a slender build, wearing a flowing, pale blue Hanfu...\".\n*   **Consistency and Diversity**: While the JSON structure must be strictly consistent, the content of each video prompt should be creative and diverse, reflecting the unique elements of different video types (e.g., martial arts, dance, drama, nature documentary, sci-fi action, motivational, commercial, fantasy).\n*   **Handling Null Values**: When a field is not applicable (e.g., wardrobe for an animal), the LLM should use `null` rather than an empty string or omitting the field, to maintain the integrity of the JSON structure.\n*   **Contextual Description**: When describing action, lighting, and sound, think about how these elements work together to create a specific **\"tone\"** and express it using vivid language.\n*   **Language Requirements**: All output should be clear, concise, and use professional filmmaking terminology.",
        1024,
        true,
        42,
        "fixed"
      ]
    },
    {
      "id": 3,
      "type": "TextPreview",
      "pos": [
        849.2665407721911,
        749.7115811901474
      ],
      "size": [
        639.6813180182103,
        677.0552838838539
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [
        {
          "name": "text",
          "type": "STRING",
          "link": 2
        }
      ],
      "outputs": [
        {
          "name": "STRING",
          "shape": 6,
          "type": "STRING",
          "links": null
        }
      ],
      "properties": {
        "cnr_id": "comfyui-utils-nodes",
        "ver": "1.3.6",
        "Node name for S&R": "TextPreview",
        "ue_properties": {
          "version": "7.0.1",
          "widget_ue_connectable": {}
        }
      },
      "widgets_values": [
        "{\n  \"shot\": {\n    \"composition\": \"Medium close-up shot at 85mm lens equivalence, captured on a Sony Venice cinema camera; shallow depth of field with bokeh effect blurring the classroom background while keeping the subject sharply focused.\",\n    \"camera_motion\": \"Static frame with subtle digital zoom-in over 0.5 seconds followed by gentle tilt-down motion from eye level to chin-level.\"\n  },\n  \"subject\": {\n    \"description\": \"A 23-year-old East Asian female with strikingly vibrant violet-purple shoulder-length bob haircut featuring straight-cut bangs swept across her forehead, one side pulled into a low bun secured with elastic bands. Her eyes are large, almond-shaped, glowing bright green irises with defined eyelashes and minimal makeup accentuating them. She possesses fair skin tone, delicate jawline, and softly arched eyebrows. Her expression conveys calm confidence mixed with quiet curiosity—lips slightly parted, gaze fixed directly forward with slight head tilt toward viewer’s right.\",\n    \"wardrobe\": \"Black long-sleeved crew-neck shirt layered under beige canvas-style apron straps fastened with gold-tone metal buckles, complemented by a thin matte-black choker around neck. Apron appears worn but clean, suggesting practical daily activity such as cooking or crafting.\"\n  },\n  \"scene\": {\n    \"location\": \"Interior of a vintage-inspired elementary school classroom with wooden desks arranged neatly along walls and chalkboard visible behind student seating area.\",\n    \"time_of_day\": \"Late morning — sun filtering gently through tall multi-paneled windows casting diffused illumination onto floorboards and furniture surfaces.\",\n    \"environment\": \"Classroom exhibits faded green-painted lower wall panelings contrasted against cream-colored upper sections. Wooden desk tops show signs of wear with reddish-brown leather-bound notebooks resting atop some chairs. In distant corner sits a small whiteboard covered partially with illegible scribbles. Natural daylight streams inward via rectangular grid-patterned windows framed in dark wood trim, creating sharp geometric shadows aligned diagonally across room corners.\"\n  },\n  \"visual_details\": {\n    \"action\": \"The character slowly turns her face towards the camera while maintaining steady eye contact, subtly shifting weight between feet and adjusting posture without breaking focus.\",\n    \"action_sequence\": [\n      \"0-1s: Subject maintains direct gaze, lips part minimally revealing faint pink gloss sheen as ambient light catches highlights beneath cheekbones;\",\n      \"1-2s: Gently tilts head further left, causing strands of purple hair to sway naturally off temple line, emphasizing asymmetry of hairstyle;\",\n      \"2-3s: Slightly raises eyebrow upward just enough to convey playful intrigue before lowering again seamlessly;\",\n      \"3-4s: Subtly shifts shoulders outward then inward, mimicking relaxed readiness without altering stance significantly;\",\n      \"4-5s: Eyes soften marginally with inner glow dimming ever so slightly, conveying warmth or internal thought process nearing completion.\"\n    ],\n    \"props\": null\n  },\n  \"cinematography\": {\n    \"lighting\": \"Soft directional backlight sourced primarily from high-side transom windows illuminating front-facing plane evenly; primary fill light comes from overhead recessed fixtures producing balanced chiaroscuro contrast highlighting contour lines of nose bridge and collarbone region.\",\n    \"tone\": \"Serene yet enigmatic, imbued with cinematic elegance reminiscent of contemporary indie dramas blending academic mundanity with personal magnetism — evoking introspection paired with understated charm.\"\n  }\n}"
      ]
    },
    {
      "id": 4,
      "type": "MarkdownNote",
      "pos": [
        -1040.0169552262935,
        700.5351635057795
      ],
      "size": [
        551.0266512441287,
        556.1788589403103
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [],
      "title": "Markdown Note from the repo",
      "properties": {
        "ue_properties": {
          "version": "7.0.1"
        }
      },
      "widgets_values": [
        "## **⚙️ Parameters**\n\n| Parameter | Description | Default | Range | Node(s) |\n| :---- | :---- | :---- | :---- | :---- |\n| **model\\_name** | The Qwen-VL model to use. | Qwen3-VL-4B-Instruct | \\- | Standard & Advanced |\n| **quantization** | On-the-fly quantization. Ignored for pre-quantized models (e.g., FP8). | 8-bit (Balanced) | 4-bit, 8-bit, None | Standard & Advanced |\n| **preset\\_prompt** | A selection of pre-defined prompts for common tasks. | \"Describe this...\" | Any text | Standard & Advanced |\n| **custom\\_prompt** | Overrides the preset prompt if provided. |  | Any text | Standard & Advanced |\n| **max\\_tokens** | Maximum number of new tokens to generate. | 1024 | 64-2048 | Standard & Advanced |\n| **keep\\_model\\_loaded** | Keep the model in VRAM for faster subsequent runs. | True | True/False | Standard & Advanced |\n| **seed** | A seed for reproducible results. | 1 | 1 \\- 2^64-1 | Standard & Advanced |\n| **temperature** | Controls randomness. Higher values \\= more creative. (Used when num\\_beams is 1). | 0.6 | 0.1-1.0 | Advanced Only |\n| **top\\_p** | Nucleus sampling threshold. (Used when num\\_beams is 1). | 0.9 | 0.0-1.0 | Advanced Only |\n| **num\\_beams** | Number of beams for beam search. \\> 1 disables temperature/top\\_p sampling. | 1 | 1-10 | Advanced Only |\n| **repetition\\_penalty** | Discourages repeating tokens. | 1.2 | 0.0-2.0 | Advanced Only |\n| **frame\\_count** | Number of frames to sample from the video input. | 16 | 1-64 | Advanced Only |\n| **device** | Override automatic device selection. | auto | auto, cuda, cpu | Advanced Only |"
      ],
      "color": "#432",
      "bgcolor": "#653"
    }
  ],
  "links": [
    [
      1,
      1,
      0,
      2,
      0,
      "IMAGE"
    ],
    [
      2,
      2,
      0,
      3,
      0,
      "STRING"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ue_links": [],
    "ds": {
      "scale": 0.8140274938684007,
      "offset": [
        1126.8196477402666,
        -501.0236682724645
      ]
    },
    "links_added_by_ue": [],
    "frontendVersion": "1.28.8",
    "VHS_latentpreview": false,
    "VHS_latentpreviewrate": 0,
    "VHS_MetadataImage": true,
    "VHS_KeepIntermediate": true
  },
  "version": 0.4
}