Available Image Generation Models
Choose from a wide variety of specialized image generation models:Qwen-Image Text to Image
Qwen-Image Text to Image
- Model: Qwen-Image’s 20B MMDiT model
- Specialty: Multilingual text rendering, advanced editing
- Best for: Text-heavy images, multilingual content, detailed editing
OmniGen2 Text to Image
OmniGen2 Text to Image
- Model: OmniGen2’s unified 7B multimodal model
- Architecture: Dual-path architecture
- Best for: High-quality text-to-image generation, versatile applications
OmniGen2 Image Edit
OmniGen2 Image Edit
- Model: OmniGen2’s advanced image editing capabilities
- Features: Text rendering support, natural language editing
- Best for: Image modification, style changes, content editing
Cosmos Predict2 2B T2I
Cosmos Predict2 2B T2I
- Model: Cosmos-Predict2 2B T2I
- Specialty: Physically accurate, detail-rich generation
- Best for: Realistic images, scientific visualization, detailed artwork
Chroma Text to Image
Chroma Text to Image
- Model: Chroma (modified from Flux)
- Architecture: Enhanced Flux-based architecture
- Best for: High-quality generation, architectural improvements
HiDream Series
HiDream Series
- HiDream I1 Dev: Development and testing
- HiDream I1 Rapide: Fast image generation
- HiDream I1 Complet: Full-featured generation
- HiDream E1.1 Image Edit: Advanced image editing (better quality than E1)
- HiDream E1 Image Edit: Image editing capabilities
Stable Diffusion 3.5 Series
Stable Diffusion 3.5 Series
- SD3.5 Simple: Standard text-to-image generation
- SD3.5 Grand Canny ControlNet: Edge detection guided generation
- SD3.5 Grande Profondeur: Depth-aware image generation
- SD3.5 Grand Flou: Blur-based reference image generation
Stable Diffusion XL Series
Stable Diffusion XL Series
- SDXL Simple: High-quality standard generation
- SDXL Refiner Prompt: Enhanced results with refiners
- Révisions de Texte SDXL: Reference image concept transfer
- Révision Zéro Positive SDXL: Text prompts with reference images
- SDXL Turbo: Single-step image generation
Lotus Depth
Lotus Depth
- Model: Lotus Depth in ComfyUI
- Specialty: Efficient depth estimation with high detail retention
- Best for: Depth-aware applications, 3D processing
Quick Start
Launch GPU Instance
ComfyUI Template
Recommended GPU
Access ComfyUI
Choose a Workflow
- Text-to-Image
- Image-to-Image
- Advanced Workflows
- Stable Diffusion XL workflows
- ControlNet integration
- LoRA model support
- Batch generation capabilities
Advanced Prompt Techniques
Prompt Structure Best Practices
Subject Description
Subject Description
✅ Detailed: “A modern glass skyscraper with geometric patterns”
Action and Pose
Action and Pose
- “running through a field”
- “sitting peacefully by a window”
- “dancing in the rain”
- “looking directly at the camera”
- “reaching toward the sky”
Environment and Setting
Environment and Setting
- “in a mystical forest with glowing mushrooms”
- “on a busy city street at night”
- “in a cozy library with warm lighting”
- “against a dramatic storm sky”
- “in a minimalist white studio”
Style and Aesthetic
Style and Aesthetic
- “professional portrait photography”
- “street photography, candid moment”
- “macro photography, extreme close-up”
- “aerial photography, bird’s eye view”
- “oil painting in impressionist style”
- “digital art, concept art style”
- “watercolor illustration, soft edges”
- “pencil sketch, detailed line art”
Technical Parameters
Technical Parameters
- “8k ultra high resolution”
- “cinematic lighting, dramatic shadows”
- “shallow depth of field, bokeh background”
- “HDR, vibrant colors, high contrast”
- “soft natural lighting, golden hour”
Prompt Weighting and Control
- Attention Weighting
- Negative Prompts
- Style Mixing
(keyword:1.3)- Increase emphasis by 30%(keyword:0.8)- Decrease emphasis by 20%((keyword))- Strong emphasis (equivalent to 1.21)[keyword]- Slight de-emphasis (equivalent to 0.91)
Model-Specific Guidelines
Stable Diffusion XL (SDXL)
Strengths
- Exceptional detail and resolution
- Great with complex compositions
- Excellent text rendering in images
- Superior photorealism capabilities
Optimal Settings
- Steps: 25-35 (30 recommended)
- Guidance Scale: 6-9 (7.5 recommended)
- Resolution: 1024x1024 or 1152x896
- Sampler: DPM++ 2M Karras
Stable Diffusion 2.1
Strengths
- Fast generation times
- Good for iteration and experimentation
- Reliable results with simple prompts
- Cost-effective for batch generation
Optimal Settings
- Steps: 20-30 (25 recommended)
- Guidance Scale: 7-12 (9 recommended)
- Resolution: 512x512 or 768x512
- Sampler: Euler a or DPM++ 2M
DALL-E Style Model
Strengths
- Excellent instruction following
- Great with complex scene descriptions
- Superior text integration
- Photorealistic human faces
Optimal Settings
- Steps: 30-40 (35 recommended)
- Guidance Scale: 8-12 (10 recommended)
- Resolution: 1024x1024, 1024x1792, 1792x1024
- Sampler: DDIM or DPM++ SDE
Generation Parameters
Resolution and Aspect Ratios
- Square Formats
- Portrait Formats
- Landscape Formats
| Resolution | Use Case | Cost |
|---|---|---|
| 512x512 | Social media avatars, icons | 1x |
| 768x768 | Instagram posts, thumbnails | 1.5x |
| 1024x1024 | High-quality social media, prints | 2x |
| 1536x1536 | Large prints, detailed artwork | 3x |
Quality vs Speed Settings
Draft Quality (Fast)
Draft Quality (Fast)
- Steps: 15-20
- Guidance Scale: 6-8
- Generation Time: 1-2 seconds
- Cost: Standard pricing
Standard Quality (Balanced)
Standard Quality (Balanced)
- Steps: 25-35
- Guidance Scale: 7-9
- Generation Time: 2-4 seconds
- Cost: Standard pricing
High Quality (Detailed)
High Quality (Detailed)
- Steps: 40-50
- Guidance Scale: 8-12
- Generation Time: 4-8 seconds
- Cost: 1.5x standard pricing
Ultra Quality (Maximum)
Ultra Quality (Maximum)
- Steps: 60-80
- Guidance Scale: 10-15
- Generation Time: 8-15 seconds
- Cost: 2x standard pricing
Batch Generation and Variations
Generating Multiple Images
Seed Control and Reproducibility
- Using Seeds
- Seed Variations
- Random Seeds
- Reproduce exact same image
- Create systematic variations
- A/B test different parameters
- Debug generation issues
Image Enhancement and Post-Processing
Upscaling and Super-Resolution
Real-ESRGAN Upscaling
Real-ESRGAN Upscaling
- Scale factors: 2x, 4x, 8x
- Models: RealESRGAN, ESRGAN, SRCNN
- Cost: $0.05 per upscale operation
Face Enhancement
Face Enhancement
- Enhance facial features and skin texture
- Preserve original identity and characteristics
- Adjustable enhancement strength
- Cost: $0.03 per enhancement
Background Removal
Background Removal
Style Transfer and Artistic Effects
- Neural Style Transfer
- Preset Artistic Filters
Commercial Use and Licensing
Usage Rights and Licensing
Personal Use License
Personal Use License
- Personal projects and portfolios
- Educational and research purposes
- Social media posts (personal accounts)
- Non-commercial art and creativity
- Commercial sales or licensing
- Business marketing materials
- Stock photo services
- Resale or redistribution
Commercial Use License
Commercial Use License
- Business marketing and advertising
- Product packaging and branding
- Website and app content
- Print materials and merchandise
- Client work and services
- Stock photo creation
- Attribution may be required for certain uses
- Some models have specific commercial restrictions
- Enterprise plans include full commercial rights
Extended Commercial License
Extended Commercial License
- Unlimited commercial usage
- Resale and redistribution rights
- No attribution requirements
- White-label usage rights
- Custom licensing terms available
Model-Specific Considerations
- Open Source Models
- Proprietary Models
- Generally permissive licensing
- Commercial use typically allowed
- May require attribution in some cases
- Check specific model cards for details
Troubleshooting and Tips
Common Issues and Solutions
Poor Image Quality
Poor Image Quality
- Increase the number of steps (30-50)
- Adjust guidance scale (7-12 for most models)
- Use negative prompts to exclude quality issues
- Try different sampling methods
- Increase resolution if budget allows
Prompt Not Followed
Prompt Not Followed
- Be more specific and detailed in prompts
- Use attention weighting:
(important detail:1.3) - Break complex prompts into simpler parts
- Try different guidance scale values
- Use negative prompts to exclude unwanted elements
Inconsistent Results
Inconsistent Results
- Use seed values for reproducible results
- Increase guidance scale for more prompt adherence
- Use more specific and detailed prompts
- Try different sampling methods
- Generate multiple images and select best results
Anatomical Issues
Anatomical Issues
- Use negative prompts:
bad anatomy, extra limbs, deformed hands - Try models specifically trained for human subjects
- Use reference images or controlnets
- Generate multiple versions and select best anatomy
- Consider post-processing with face/hand enhancement
Optimization Tips
Cost Optimization
- Start with lower resolution for iteration
- Use faster models for experimentation
- Batch similar requests together
- Optimize prompts to reduce generation attempts
Quality Optimization
- Spend time crafting detailed prompts
- Use appropriate negative prompts
- Choose the right model for your use case
- Experiment with different parameters
Speed Optimization
- Use lower step counts for drafts
- Choose faster models when quality allows
- Batch multiple images in single requests
- Use appropriate resolution for final use
Workflow Optimization
- Save successful prompts and settings
- Use seeds for reproducible results
- Create prompt templates for common use cases
- Organize generated images with metadata