Skip to main content
YourNextLandingPage
Playbook·12 min read

AI image prompts for product photography: the DTC formula

AI image prompts for product photography need 6 specific inputs. Here's the DTC formula — plus 8 ready-to-use templates for packshots, lifestyle, and ads.

That's the formula. Here's why most prompts skip four of the six.

A supplement founder paid $1,400 for a 10-SKU photography session. Three weeks later she tried to replicate one shot with AI. Her prompt: 'a supplement bottle on a wooden table with good lighting.' The result was beautiful. The bottle was amber glass. Her product is matte black. The label said nothing. Her logo is the only thing on the can. The AI had made a better supplement brand than she had.

She came back with a longer prompt — described the matte black bottle, the yellow label, the exact table. The AI ignored half of it and hallucinated the rest. Third attempt she added lighting. Fourth she specified the camera angle. By the fifth prompt, using a structured six-part formula, she got a shot that matched her brand. She ran all 10 SKUs through it for $28 total.

The formula is not complicated. But it requires specificity in the right order — and most guides online stop at part two. Part two is the scene. Parts three through six are why your photos look like someone else's brand.

AI image generation interface on a laptop showing a supplement bottle product photography prompt with six labeled input fields on a light raw oak desk

Why most AI product photo prompts generate the wrong thing

Most AI image tools are trained on billions of photographs. When you write 'supplement bottle on a wooden table,' the model pattern-matches to the most statistically likely version of that scene: a warm amber glass bottle, probably with a herbal leaf motif, photographed on raw walnut under golden-hour light. That is the average supplement photo on the internet. Not your product. Not your brand.

The more vague your prompt, the closer to average your result. The six-part formula works because it overrides the model's priors at each stage — it doesn't leave room for the statistically likely version to fill the gaps. Each part raises the specificity ceiling by one level.

The 6-part AI image prompt formula for product photography

  1. Product subject description — what you're photographing, with material, finish, color, and form
  2. Scene and environment — the surface, backdrop, and any supporting props
  3. Lighting specification — the source, direction, quality, and shadow behavior
  4. Camera angle and distance — overhead, eye-level, macro, or packshot distance
  5. Style, mood, and brand constraints — editorial feel, palette, and named reference aesthetic
  6. Negative constraints — what to exclude to prevent hallucination and brand drift

Part 1: Product subject description

Describe your product like a stranger seeing it for the first time

Material first, then color, then finish, then form. 'A matte black cylindrical supplement bottle with a white screw cap, approximately 150ml, with a bold yellow rectangular label — no visible logo or readable text, only abstract geometric shapes on the label surface.' That level of specificity anchors the model. 'My product' or a brand name does not.

The three things that kill this step: calling it 'my product' (the model has no memory of previous prompts), using your brand name (the model may pattern-match to something else entirely), or using adjectives like 'sleek' or 'premium' instead of material properties. Premium is not a material. Matte is.

  • Lead with material descriptors: matte, glossy, frosted, embossed, textured, smooth, metallic
  • Include form factor: cylindrical, flat, hexagonal, tapered, square-shouldered, collapsible
  • Use specific color names: 'deep forest green' over 'dark green', 'dusty rose' over 'pink'
  • Include scale where relevant: '150ml bottle, fits in a palm'
  • Avoid brand names, logos, or product names the model may associate with a different brand

Part 2: Scene and environment

The surface is the scene

For DTC brands, the scene is almost always a surface — a marble counter, a raw oak desk, a white paper sweep, a linen-covered tray. Don't describe the room. Describe the surface and what's on it within 30cm of the product. 'A white marble surface with natural grey veining, a small linen cloth folded at the left edge, and a single dried eucalyptus sprig' is a scene the model can execute. 'A beautiful minimalist kitchen' is an aesthetic preference it can only guess at.

Props are supporting actors, not set dressing. Three objects max. Each one should reinforce the product's usage occasion — not decorate it. A supplement bottle beside a ceramic water glass is on-brand. A supplement bottle beside a blooming orchid is a lifestyle brand that lost the plot. The supplements niche and skincare niche have very different prop vocabularies — confusing them is how you end up with a whey protein shot that looks like a serum ad.

Three surface setups for AI product photography — marble, raw oak, and linen — each with matching on-brand props arranged on a light studio desk
  • Name the surface material and color: 'light raw oak', 'warm white marble', 'weathered terracotta tile'
  • Keep props to three or fewer — restraint reads as premium, crowded reads as stock
  • Props should belong to the same usage occasion as the product
  • Include a background note if relevant: 'background fading to soft paper white'
  • Avoid 'surrounded by' phrasing — it overwhelms the scene and confuses the composition

Part 3: Lighting specification

The most-skipped line in every failing prompt

Most prompts skip lighting entirely, or write 'good lighting' — which is as useful as writing 'tasty' in a recipe. Light direction, quality, and source all need to be specified. 'Soft diffused window light from off-frame left, creating gentle product shadows with no harsh highlights' is a lighting spec. It tells the model where the source is, how it's qualified, and what it does to the product surface.

The two most common AI product photo failures: overexposure (the model defaults to bright, even light because that reads as 'professional' in its training data) and harsh rim lighting (the model learned that 'dramatic product photography' means a dark background with a lit edge). A one-line lighting spec fixes both.

  • Specify direction: 'from off-frame left', 'overhead diffused', 'from lower right at 45 degrees'
  • Specify quality: 'soft and diffused', 'golden hour', 'cool north-facing window light', 'overcast daylight'
  • Specify shadow behavior: 'gentle product shadow', 'crisp short shadow', 'no visible shadows'
  • Add a reflective-surface negative if needed: 'no harsh highlights on the product surface'
  • Natural light references the model understands well: 'morning window light', 'late afternoon sun', 'cloudy diffused day'

Part 4: Camera angle and distance

The three angles that sell different things

Camera angle changes what the product communicates. Overhead is editorial and works for multi-product arrangements. Eye-level is intimate and best for hero packshots and paid ads. Macro is technical and trust-building — use it for beauty and premium tactile goods. Pick the angle based on what the product needs to prove, not what looks most interesting.

  1. Overhead (flat lay) — editorial and airy, best for arrangements and social-media crops, works for any product category that photographs well from above
  2. Eye-level straight-on — the standard paid-ads packshot, intimate and direct, strongest for single-hero placement and conversion-focused landing pages
  3. Macro close-up — shows texture, finish, and material detail, builds premium trust for cosmetics, skincare, jewelry, and anything with a tactile story to tell
Three-panel diagram on a light desk showing overhead, eye-level, and macro camera angles for the same unbranded supplement bottle on a paper sweep

Part 5: Style, mood, and brand constraints

The palette line your prompt is missing

This is the section most brands discover last, after 20 prompts that all look like the same unnamed magazine. The style line does two jobs: it tells the model which aesthetic tradition to draw from, and it sets the color palette so your photos cohere across SKUs and campaigns.

The safest named references are editorial publications with a consistent visual identity: Kinfolk, Cereal, Monocle. Avoid Pinterest or broad descriptors like 'luxury' — they're too ambiguous for the model to translate into consistent output. Pair the reference with an explicit palette: 'Kinfolk-style editorial restraint. Color palette: warm cream, deep amber, espresso accents, off-white linen.' That combination is what makes 8 different SKU photos look like a catalog, not a grab bag.

The palette line is the difference between photos that match our brand and beautiful photos of someone else's brand. Most DTC founders find this out after 15 generations.

Part 6: Negative constraints

Tell the model what you don't want

Negative constraints are the guardrail at the end of every prompt. They prevent the model from filling its statistical defaults where your prompt left gaps. The minimum guardrail for any commercial product prompt: 'No recognizable logos, no real brand names, no identifiable real people.' This also protects against the model pattern-matching your product description to a recognizable real brand and importing that brand's visual language into the shot.

  • Always include: 'No recognizable logos, no real brand names'
  • 'No identifiable real people' if the scene includes any human presence
  • 'No generic [product type] packaging' if your product has a distinctive form you don't want replaced
  • 'No neon, no glass morphism, no fluorescent lighting' to prevent default AI aesthetics
  • 'No exaggerated reflections' for glass or metallic products
  • 'No perfectly symmetrical composition' to avoid the overworked centered-product-on-gradient look

The 8 AI image prompt templates DTC brands use

1. Standard packshot

Editorial photograph, magazine-quality, of a [product subject description] on a clean white paper sweep, eye-level at center frame. Soft diffused window light from off-frame left, wrapping the product without harsh highlights. Photorealistic. Color palette: white, [1–2 product accent colors]. 16:9 composition. No recognizable logos, no real brand names, no identifiable real people.

2. Lifestyle hero

Editorial overhead photograph, magazine-quality, of a [product subject description] on a [surface — marble counter / raw oak desk / linen tray], with [2–3 on-brand props] in a considered arrangement. Soft morning light from off-frame upper left, gentle product shadows. Photorealistic, shallow depth of field, Kinfolk-style editorial restraint. Color palette: [your palette]. 16:9. No recognizable logos, no real brand names.

3. Macro detail

Macro photograph, magazine-quality, of [specific product detail — label texture / cap mechanism / material surface] on a [surface], captured close-range with extreme shallow depth of field. Soft diffused side light from off-frame left. Photorealistic, Cereal magazine-style restraint. Color palette: [your palette]. 16:9. No recognizable logos.

4. Flat-lay grid

Editorial overhead photograph, magazine-quality, of [3–5 products from the same range] arranged in a considered flat-lay grid on a [surface]. Even spacing, slight angular offset — not rigid. Soft diffused daylight, gentle shadows. Photorealistic, Kinfolk-style editorial. Color palette: [your palette]. 16:9. No recognizable logos, no real brand names.

5. Hands-only lifestyle

Editorial close-up photograph, magazine-quality, of a person's hands — no face, no identifiable features — holding [product subject description] against a [background — soft off-white wall / blurred outdoor scene]. Natural daylight, warm and soft. Photorealistic, shallow depth of field, Cereal-style. Color palette: [your palette]. 16:9. No identifiable real people, no recognizable logos.

6. Ingredient hero

Editorial overhead photograph, magazine-quality, of [product subject description] beside a small arrangement of its key ingredients — [name 2–3 raw ingredients] — on a [surface]. Soft morning light from off-frame left, gentle shadows. Photorealistic, Kinfolk-style editorial restraint. Color palette: [your palette]. 16:9. No recognizable logos, no real brand names.

7. Campaign header

Editorial wide-angle photograph, magazine-quality, of [product subject description] as the hero of a considered scene — [surface + setting] — with deliberate negative space on the left for typography overlay. Soft diffused light, gentle shadows. Photorealistic, Monocle-style editorial. Color palette: [your palette]. 16:9 with compositional space for headline text left of center. No recognizable logos, no real brand names.

8. Mobile ad thumbnail

Editorial close-up photograph, magazine-quality, of [product subject description] at eye level, filling 70% of the frame against a [clean background]. Soft directional light, sharp product focus, background thrown out of focus. Photorealistic. Color palette: [your palette]. Square or 4:5 composition optimized for mobile feed. No recognizable logos, no real brand names.

Eight AI image prompt template cards displayed on a light desk, each labeled with a product photography use case from packshot to mobile ad thumbnail

Building cross-SKU visual consistency with a scene lock

The system that makes 8 SKUs look like one brand

The highest-value use of AI image prompts for a DTC brand isn't the single great photo — it's the catalog. A brand with 12 SKUs needs 12 product photos that look like they were shot the same afternoon by the same photographer in the same studio. Traditional photography achieves this through setup continuity: the lighting rig stays constant, the sweep doesn't move, the same camera settings apply across every product. AI achieves this through a scene lock.

A scene lock is a 4–6 line block you write once and paste unchanged into every prompt for a campaign. Surface, lighting, style reference, palette — only the product subject description changes between SKUs. Run the same scene lock across your full range and the photos naturally cohere: same marble counter, same morning light quality, same muted palette, same depth-of-field signature. This is the same technique automated branding workflows use to generate visual systems from a single brand kit — applied to AI photography.

  • Write the scene lock as a separate reusable block and save it in your brand doc
  • Swap only the product subject description between SKU prompts
  • Test the lock on 3 diverse products before running your full catalog
  • If one SKU breaks visual consistency, adjust the scene lock rather than the individual prompt
  • Version the scene lock per campaign season — same structure, updated palette and setting

The AI workflow most DTC founders skip

The traditional product photography workflow for a 10-SKU line: $1,200–$2,000 for a studio day, $200–$400 for retouching, 7–14 days from booking to delivery. Total: $1,400–$2,400, per campaign — every time you run a new ad set, change a seasonal story, or drop a new colorway, you're back in the queue.

The AI workflow with a proper six-part prompt: one source photo per SKU, a scene lock built in one afternoon, 8 prompts run through a generation platform. Cost: $20–$50 in platform credits. Time from first prompt to final export: 2–4 hours. The gap between the two workflows isn't quality anymore — it's packaging fidelity. The AI interprets your product; it doesn't reproduce it pixel-accurately. Which is why the prompt formula matters: the more precise your input, the closer your output is to your actual product. The product photography for online stores playbook covers how the two workflows overlap at the review and selection stage.

YourNextLandingPage's generation goes one step further — it locks your actual packaging using reference image conditioning, so the label, logo, and form stay intact while the scene, lighting, and composition change. For DTC skincare brands running new ad creative every week, that's the difference between 'looks like our product' and 'is our product.'

  • Caspa — background replacement and scene generation, strong for CPG and supplements
  • Flair.ai — product staging with drag-and-drop prop placement, best for lifestyle context
  • Midjourney v6 — highest prompt flexibility and style control, steeper learning curve
  • Adobe Firefly — integrated with Creative Cloud, best for teams already in the Adobe stack
  • YourNextLandingPage — reference-locked generation that preserves actual packaging fidelity

Common mistakes that tank your AI product photos

  1. Describing your product by brand name instead of material properties — the model pattern-matches to whatever the name sounds like, not your actual product
  2. Skipping the lighting spec — the model defaults to bright even light that flattens texture and makes every product look like a stock photo
  3. Using more than three props — crowded scenes read as stock imagery; restraint reads as premium
  4. Asking for 'good composition' instead of specifying an angle — 'good' is statistically average; specify overhead, eye-level, or macro
  5. Running 20 prompts without a scene lock and wondering why the photos don't cohere as a catalog
  6. Using broad aesthetic adjectives ('luxury', 'premium', 'sleek') instead of specific material descriptors
  7. Skipping the negative constraint block — the model fills every unspecified gap with its statistical prior, which is rarely your brand
  8. Generating the hero photo before testing the prompt on a secondary shot — always validate the formula on a lower-stakes image first

Frequently asked questions

What are the best AI tools for product photography prompts?

Caspa and Flair.ai are the most accessible for DTC founders — both accept a reference product photo and let you describe a new scene. Midjourney v6 offers the most prompt flexibility but requires more technical proficiency. Adobe Firefly integrates with Creative Cloud for teams already in that stack. For brands that need packaging fidelity (the logo and label must survive the generation), YourNextLandingPage uses reference-image conditioning that most standalone tools lack.

How long should an AI image prompt for product photography be?

A complete six-part prompt runs 80–150 words. Shorter than 50 words and you're leaving too many decisions to the model's defaults. Longer than 200 words and the model begins to deprioritize content at the end of the prompt. The six-part structure ensures the most critical information (product subject, scene, lighting) comes first, where the model weights it most heavily.

Can I use AI-generated product photos for ads on Meta or Google?

Yes. Meta and Google do not currently prohibit AI-generated images in paid ads, and the Baymard Institute's research on product detail pages confirms that image quality — not production method — is what drives click-through and add-to-cart rates. The only restrictions apply if the generated image misrepresents the product (adding claims not on the label, etc.), which is a compliance issue independent of how the image was made.

Why does the AI keep changing my product's color?

Because you haven't specified the color with enough precision in Part 1, or the lighting spec in Part 3 is introducing a warm or cool cast. Write the color as a precise descriptor ('matte deep forest green' rather than 'green') and add a lighting spec that uses neutral or cool diffused daylight. Add a negative constraint: 'no color cast on product surfaces.' Color accuracy is consistently the top reason page speed and UX research flags as a friction point in product evaluation sessions.

What is a scene lock for AI image prompts?

A scene lock is a reusable 4–6 line block — surface, lighting, style reference, palette — that you paste unchanged into every prompt for a campaign or catalog. Only the product subject description changes between SKUs. It's the technique that makes 8 AI-generated product photos look like they were shot in the same studio session, and it's the single most underused tactic in AI product photography.

How do I get AI-generated product photos to look consistent across a full catalog?

Build a scene lock and test it on three diverse SKUs before running the full catalog. If the outputs cohere visually — same light quality, same surface warmth, same depth-of-field signature — you're ready to scale. If one SKU breaks consistency, adjust the scene lock rather than the individual prompt. A 30-minute investment in a tested scene lock saves hours of post-generation editing.

Do I need a professional product photo to start, or can the AI work from a phone shot?

Most AI tools that accept a reference image can work from a reasonably sharp phone shot — the model uses it as a structural reference, not a high-res source file. The better your source image, the more accurately the AI can preserve your product's details. A white-background phone shot in good natural light is sufficient for most DTC use cases. The photoshoot AI DTC playbook covers how to prepare a source image that maximizes AI output quality.

The takeaway

The six-part formula exists because AI models fill every unspecified gap with their statistical average — and the statistical average is never your brand. Product subject, scene, lighting, angle, style, negative constraints. Six inputs, in order, every time. The prompts that fail are the ones that skip parts three through six because they feel like detail. They're not detail. They're brand.

Build a scene lock once. Test it on three SKUs. Run the catalog. The $1,400 studio session is still available if you want it — but for DTC brands running paid ads weekly, waiting three weeks for photos is the real cost.

YourNextLandingPage is in early access. Join the waitlist to try reference-locked AI generation on your product catalog — and stop rebuilding the same prompt from scratch every campaign.

Be first when early access opens.

Drop your email — we'll send one note when you can sign in. No spam. Cancel anytime.

No spam. Unsubscribe anytime. Early signups get first access.