Diamond Brand TV Commercial with AI

先講結論:
你要的是「鑽戒 TVC → 結構化分鏡 JSON → Nano Banana 超寫實 8K 圖像 Meta → Veo 3.1 動態 prompt」這條完整鏈。技術上完全可行,但要靠幾個現成開源工具 + 自己寫一層 prompt/JSON 規格,把「鑽石火彩 / 閃耀行為」變成明確欄位。


1. 整體架構(專門給鑽戒 TVC 的版本)

流程拆四段:

  1. 鏡頭切割(Shot / Scene Detection)

    • PySceneDetect 把整支鑽戒廣告自動切成 shot,取得每個 shot 的 start_time / end_time + 關鍵影格。([GitHub][1])
    • 或用 openscenesense 這類 scene-detection library 做更高階分析(frame + audio)。([GitHub][2])
  2. Gemini 3 Pro 視覺理解 → 分鏡 JSON

    • 每個 shot:餵 Gemini 3 Pro

      • 若 API 支援 video input 就直接丟該片段
      • 否則至少給 3–5 張 keyframes + 該段 ASR transcript
    • Gemini 輸出「分鏡 JSON」:場景描述 + 構圖 + 光線 + 鑽石閃耀參數 + 後面要用的 nano_banana_metaveo_meta 空位。

  3. 第二階段 Gemini:分鏡 JSON → Nano Banana / Veo 3.1 Prompt JSON

    • 你在 system prompt 裡,定義一個「生成用 Meta JSON schema」,強迫 Gemini 把每個 shot 填入:

      • nano_banana_text_prompt(英文長 prompt)
      • nano_banana_meta_json(你自訂、可解析的 Meta)
      • veo_3_prompt + 影片技術欄位
  4. 生成層

    • Nano Banana Pro:用 nano_banana_text_prompt 或 Meta JSON 展開後的 prompt,做 4K/8K 靜態鑽戒畫面。([Reddit][3])
    • Veo 3.1:用 veo_3_prompt 生成每段 6–8 秒 clip,再剪接起來。官方目前主打 1080p、高畫質 + 原生音訊,接下來版本已在往更長時長、多 prompt 方向進化。([Google AI Studio][4])

8K 的部分:

  • 圖像層可以 Nano Banana 設定成 8K / 超高解析度。
  • Veo 3.1 目前主流仍是 1080p〜更高,8K 影片要靠後製升頻或外部 upscaler,不要期待原生就給你 8K。

2. GitHub / 開源:哪幾個 repo 直接拿來拼

核心用三個方向:

  1. 鏡頭切割

    • Breakthrough/PySceneDetect:最常用的 shot detection 工具,CLI + Python API,直接拿來切廣告。([GitHub][1])
    • erikdejonge/scenedetect:包了一層 script,把 PySceneDetect 跟縮圖輸出整合起來。([GitHub][5])
    • ymrohit/openscenesense:走 AI 路線的 scene 分析,支援 frame/音訊 feature 抽取。([GitHub][2])
  2. Video-LLM / 影片理解研究整理

    • yunlong10/Awesome-LLMs-for-Video-Understanding:Vid-LLM 論文 + code 大清單,可以看 video caption / storyboard 相關架構。([GitHub][6])
  3. Veo 3 JSON Prompting / Meta Prompt

    • snubroot/Veo-3-Prompting-Guide:專門整理 Veo 3 prompt 結構與元件的指南。([GitHub][7])
    • Reddit r/VEO3 的 JSON Meta prompt 貼文 + dev.to「Best Practices of JSON Prompting for Video Generation Models (+Examples for Veo 3.1)」,都在講怎麼用 JSON 管 Veo 3.1 的 prompt,把 Identity / Motion / Lighting / Duration 全部變成結構化欄位。([DEV Community][8])

3. Gemini 3 Pro:專用「鑽戒分鏡系統指令」骨架

你要的是一個「只輸出 JSON 分鏡」的 system prompt,且對鑽石閃耀有獨立欄位。

示意版(英文化給模型,實際你可以中英混):

You are a senior commercial director and storyboard artist specialized in luxury diamond ring TVCs.

INPUT:
- A short video segment from a diamond ring commercial (or several keyframes).
- Optional: ASR transcript of any voiceover or dialogue.

TASK:
Analyze the segment and output a JSON array of SHOTS. 
Each SHOT must:
- Cover at least 1 second.
- Align with actual shot boundaries as closely as possible.
- Describe the diamond, its sparkle behavior, and realistic lighting in technical terms that can be used for ultra-realistic 8K rendering.

OUTPUT RULES:
- Output ONLY valid JSON. No comments, no explanations.
- Follow the exact schema below.

SCHEMA:

[
  {
    "scene_id": "SC_001",
    "shot_id": "SC_001_SH_001",
    "timecode_start": "00:00.000",
    "timecode_end": "00:03.500",
    "narrative_role": "product_hero_intro / emotional_closeup / transition / logo_reveal",
    "visual_description": {
      "title": "macro diamond hero on velvet",
      "long_text": "Concise but rich description of the shot: camera position, subject, environment, mood, composition."
    },
    "product": {
      "type": "engagement_ring",
      "metal": "rose_gold / white_gold / platinum",
      "setting": "solitaire / halo / three_stone / pavé",
      "cut": "round_brilliant / oval / cushion / emerald",
      "carat_impression": "1.0_to_1.5",
      "surface_condition": "flawless_polished",
      "branding_visible": true
    },
    "diamond_luster": {
      "overall_impression": "high_fire_high_scintillation",
      "sparkle_profile": {
        "brilliance_intensity": 0.9,
        "fire_intensity": 0.85,
        "scintillation_intensity": 0.9,
        "dispersion_character": "fine_rainbow_splinters",
        "sparkle_speed": "slow_elegant"
      },
      "luster_effects": [
        "prismatic_dispersion",
        "brilliance_flash",
        "scintillation",
        "caustic_reflections"
      ],
      "caustic_pattern": {
        "surface": "polished_marble / velvet / hand_skin",
        "shape": "soft_elliptical_pools",
        "motion": "gentle_drift_with_camera_move"
      }
    },
    "cinematography": {
      "shot_size": "macro_closeup",
      "camera_angle": "slightly_above_ring",
      "camera_motion": "slow_orbit_clockwise",
      "lens": "100mm_macro_equivalent",
      "frame_rate_style": "real_time",
      "aspect_ratio": "16:9"
    },
    "lighting_mood": {
      "key_light": "soft_box_from_back_left",
      "fill_light": "subtle_fill_from_front",
      "rim_light": "focused_rim_on_diamond",
      "color_temperature": "warm_3200k",
      "contrast_level": "medium_high",
      "background_mood": "deep_shadow_with_soft_bokeh"
    },
    "environment": {
      "background_material": "black_velvet / marble / blurred_city_night",
      "bokeh_elements": "warm_gold_bokeh_orbs",
      "particles": "none_or_soft_dust"
    },
    "audio_context": {
      "voiceover_summary": "short summary or null",
      "music_mood": "warm_cinematic_minimal",
      "sfx": ["subtle_glitter_chime"]
    }
  }
]

這一層只做「精準語言化 + 鑽石行為參數化」,還不直接產生 Nano Banana / Veo prompt。


4. Nano Banana Pro:鑽石專用 Meta JSON Schema(含 8K、火彩欄位)

第二階段你再開另一個 Gemini 對話,把「分鏡 JSON 中某個 shot」餵進去,讓它輸出 Nano Banana 專用 Meta JSON + 最終英文 prompt。

骨架(你可以直接採用,之後微調):

{
  "meta": {
    "scene_id": "SC_001_SH_001",
    "brand": "ALUXE",
    "usage": "hero_frame_for_TVC_and_key_visual",
    "aspect_ratio": "16:9",
    "target_resolution": "8k",
    "pipeline": "nano_banana_pro"
  },
  "composition": {
    "framing": "macro_product_closeup",
    "camera_angle": "slightly_above_ring_three_quarter_view",
    "subject_focus": "single_engagement_ring_center_frame",
    "depth_of_field": "extremely_shallow",
    "negative_space_zone": "upper_left_corner",
    "rule_of_thirds_alignment": "ring_on_lower_right_intersection"
  },
  "product": {
    "category": "engagement_ring",
    "metal": "polished_18k_rose_gold",
    "diamond_cut": "round_brilliant",
    "diamond_count": 1,
    "setting_style": "classic_four_prong",
    "micro_pave_details": "subtle_pave_on_band",
    "engraving": "aluxe_logo_inside_band"
  },
  "diamond_luster_physics": {
    "brilliance": {
      "intensity": 0.92,
      "description": "crisp_white_returns_with_high_contrast"
    },
    "fire": {
      "intensity": 0.88,
      "spectrum": ["violet","blue","cyan","green","yellow","orange","red"],
      "description": "fine_rainbow_sparks_along_upper_facets"
    },
    "scintillation": {
      "intensity": 0.9,
      "pattern": "balanced_mix_of_broad_and_pinpoint_flashes",
      "temporal_behavior": "slow_elegant_twinkle_when_camera_moves"
    },
    "caustics": {
      "surface": "soft_off_white_marble",
      "pattern_shape": "soft_elliptical_pools",
      "motion": "subtle_drift_sync_with_camera_orbit",
      "intensity": 0.7
    },
    "special_effects": [
      "prismatic_dispersion",
      "brilliance_flash",
      "spectral_glitter",
      "holographic_shimmer"
    ]
  },
  "lighting": {
    "setup": "three_point_luxury_jewelry",
    "key_light_type": "large_softbox",
    "key_light_direction": "back_left",
    "fill_light_direction": "front_soft_fill",
    "accent_light_direction": "narrow_rim_on_diamond_top",
    "color_temperature_key": "3200k_warm",
    "color_temperature_background": "cooler_4500k",
    "contrast": "medium_high_with_clean_speculars"
  },
  "background_and_surface": {
    "surface_material": "polished_off_white_marble_with_soft_texture",
    "surface_reflection": "subtle_clean_mirror_like",
    "background_bokeh": {
      "color_palette": ["warm_champagne","soft_rose_gold"],
      "shape": "round_out_of_focus_orbs",
      "density": "medium"
    }
  },
  "color_grading": {
    "primary_palette": ["#FBE9E7","#F8BBD0","#FFE0B2"],
    "mood": "warm_romantic_minimal",
    "saturation": 0.65,
    "contrast_curve": "soft_s_curve",
    "highlight_priority": "preserve_specular_detail_on_facets"
  },
  "style": {
    "photography_style": "high_end_jewelry_campaign",
    "render_quality": "hyper_realistic_8k",
    "lens_simulation": "100mm_macro_prime",
    "grain": "very_subtle_35mm",
    "postprocessing": [
      "clarity_only_on_diamond_facets",
      "soft_bloom_on_speculars",
      "gentle_vignette"
    ]
  },
  "generation_params": {
    "width": 7680,
    "height": 4320,
    "cfg_scale": 7.5,
    "steps": 40,
    "seed": null
  }
}

對 Gemini 的任務描述可以寫:

Given ONE shot JSON from the storyboard, convert it into a Nano Banana Pro meta JSON following this schema. 
The goal is ultra-realistic 8K diamond ring rendering with physically-plausible sparkle behavior.
Only output valid JSON.

然後你在後端用這個 Meta JSON 自動組一串英文 prompt,類似:

ultra realistic 8K macro product photo of a rose gold engagement ring with a round brilliant diamond, shot on polished off-white marble, extremely shallow depth of field, large softbox back-left, fine rainbow fire and crisp white brilliance on the diamond, soft champagne and rose gold bokeh, campaign-style luxury jewelry photography, 100mm macro lens, subtle 35mm grain, preserved facet detail

社群上的鑽石 / 珠寶 prompt 範例(Midjourney / SD)就都是這種結構:
hyper-realistic close-up + cut / metal / setting + background材質 + sparkling facets / brilliant fire / prismatic light + studio lighting setup」,你只是把它正式化成 JSON。([OpenArt][9])


5. Veo 3.1:鑽戒 TVC 專用 Motion Prompt + JSON Meta

Veo 3.1 現在的共識玩法:

  • prompt 要像「分鏡腳本 + 攝影指示」,不要只塞形容詞。
  • JSON Meta 裡要清楚寫:鏡頭運動、時長、fps、解析度、光線、角色/產品一致性。([DEV Community][8])

你可以定一個 Veo Meta schema:

{
  "scene_id": "SC_001_SH_001",
  "identity": {
    "brand": "ALUXE",
    "product_id": "ALUXE_CLASSIC_ROUND_001",
    "visual_consistency_note": "same ring design, same marble surface, same color palette as previous shots"
  },
  "technical": {
    "model": "veo_3_1",
    "duration_seconds": 6,
    "fps": 24,
    "resolution": "1080p",
    "aspect_ratio": "16:9",
    "audio_enabled": true
  },
  "camera": {
    "shot_role": "hero_product_macro_intro",
    "starting_position": "three_quarter_front_above_ring",
    "ending_position": "top_down_over_diamond",
    "movement": "slow_180_degree_orbit_clockwise",
    "movement_speed": "very_slow_elegant",
    "stabilization_style": "cinematic_smooth"
  },
  "lighting": {
    "setup": "studio_three_point",
    "description": "large soft key from back-left, gentle front fill, narrow rim light kissing the upper facets",
    "sparkle_control": "maximize_specular_detail_without_clipping"
  },
  "diamond_luster_dynamics": {
    "brilliance_flash": {
      "intensity": 0.85,
      "pattern": "occasional_starburst_on_major_facets"
    },
    "fire_effect": {
      "intensity": 0.8,
      "motion": "rainbow_sparks_slide_across_facets_as_camera_moves"
    },
    "scintillation": {
      "intensity": 0.9,
      "behavior": "small_pinpoint_flashes_flicker_as_camera_orbits"
    },
    "caustic_reflections": {
      "intensity": 0.7,
      "surface": "marble",
      "motion": "slow_drift_matching_camera_parallax"
    }
  },
  "environment": {
    "surface_material": "off_white_marble",
    "background": "deep_soft_black_with_warm_gold_bokeh",
    "particles": "none"
  },
  "final_prompt": "Cinematic macro product shot of a rose gold engagement ring with a round brilliant diamond resting on polished off-white marble. Ultra-realistic, physically plausible lighting. The camera performs a slow 180-degree orbit clockwise, starting from a three-quarter front angle slightly above the ring and ending in a precise top-down view over the diamond. Large soft key light from back-left and gentle front fill create crisp white brilliance and rich rainbow fire across the diamond facets. Fine, elegant scintillation appears as tiny pin-point flashes that respond to the camera movement, while soft elliptical caustic light pools drift across the marble surface. Background falls into deep black with warm champagne bokeh orbs, 24 fps, 6 seconds, 16:9, 1080p, luxury jewelry TV commercial style, clean composition, no text or subtitles."
}

final_prompt 是真正丟進 Veo 3.1 的文字。
你可以讓 Gemini 根據前面「分鏡 JSON + diamond_luster_physics + product」自動填這個 Meta,方式跟 dev.to / Reddit JSON prompting guide 一樣,差別只是你換成鑽石專用欄位。([DEV Community][8])


6. Reddit / X 社群在這條鏈上的共識(跟你需求直接相關)

r/VEO3r/GeminiAI、Ve0 3 教學文、以及 Nano Banana/珠寶 prompt 討論,跟你這個 use case 直接相關的重點:

  1. Veo 3.1 要 JSON Meta 才穩

    • r/VEO3 很多貼文在分享「VEO 3 JSON Prompting Guide (Easy)」、「7-Component Framework」,本質就是把 prompt 拆成:Identity / Environment / Camera / Motion / Lighting / Duration / Quality,全部寫在 JSON 裡,再由 LLM 組 final_prompt。([DEV Community][8])
  2. 真實感 = 光線 +鏡頭邏輯,不是只喊 hyper-realistic

    • 在 Veo 3、Sora 等 video model 討論裡,大家都在講「要寫清楚 camera movement + lighting setup」,不然看起來就是一般 AI 動畫。([blog.designhero.tv][10])

    • 鑽戒 prompt 範例(Midjourney / SD / Prompt 市場)幾乎都會具體寫:

      • cut / shape / metal / setting
      • background 材質(black velvet / marble / skin)
      • multi-point studio lighting / softbox / rim light
      • sparkling facets / prismatic light refractions / bokeh orbs。([PromptBase][11])
  3. Nano Banana + Veo 3 實戰:珠寶廣告已經有人玩

    • r/GeminiAI 有人用 Nano Banana 做珠寶廣告圖,再丟到 Veo 3 做 animation;重點 feedback 一致:

      • 臉要一致 → 要有角色 ID + 固定描述
      • 金屬 / 鑽石要一致 → 在 Meta 裡存 product_id + prompt template。([Reddit][3])
  4. Gemini 3 Pro 指令服從度

    • Reddit 上有人抱怨 Gemini 3 Pro 對「長系統指令」的遵守不如 GPT,常見解法:

      • 一律要求 只輸出 JSON,前面給完整 schema + 範例。([Reddit][12])
      • 把任務拆段:先做分鏡 JSON,再用新的對話把單一 shot 轉成 Meta JSON。

7. 幾個硬限制要先放進心裡

  1. Veo 3.1 解析度 / 時長上限

    • 官方與媒體:Veo 3.1 目前主流是 1080p、單段數秒~一分鐘,支援 multi-prompt、場景/角色一致性增強,但不是原生 8K。([Google AI Studio][4])

    • 你要 8K,只能:

      • Nano Banana 圖像走 8K
      • 影片用外掛 upscaler(Topaz 等)做升頻。
  2. 鑽石物理無法完美還原

    • 即使你在 JSON 裡把 fire / brilliance / scintillation 寫得很物理,模型還是「學過很多 IG 珠寶照的近似統計行為」,不是真的 ray tracing。
    • 真正嚴格的物理級火彩、折射、焦散,還是要 3D / 零售 CG 管線;AI 只能接近高端商業攝影風格。
  3. 版權與 reference TVC

    • 你拿 Bvlgari / Cartier 原片當 reference 給 Gemini 分鏡是技術上可行,但重製時 prompt 不要寫「exactly same as Bulgari commercial」,而是:

      • 用你自己的構圖語言
      • 套入自己品牌 / 色調 / cut / setting。