Stable Diffusion 3 (SD3) Is Now Available And Here’s How You Can Run It Locally

Stability AI has introduced Stable Diffusion 3 Medium, which they describe as their "most advanced text-to-image open model." Released on June 12th, 2024, this new model is the latest evolution following the groundbreaking SDXL. It features 2 billion parameters designed to deliver photorealistic images without the need for complicated workflows. Notably, it operates effectively on standard consumer systems and addresses common rendering flaws in hands and faces. Consider using Flux 1 instead as its far better in all aspects.

To learn how to run it locally scroll to the bottom.

SD3 Images

Enhanced Capabilities and Features

Stable Diffusion 3 (SD3) Medium is engineered to process sophisticated prompts involving spatial arrangements, compositional elements, actions, and various styles. Stability AI has significantly improved text generation, achieving "unprecedented" accuracy, attributed to their innovative Diffusion Transformer architecture.

For those looking for technical details, the research paper offers an in-depth look at the SD3 architecture. An overview of components and an example of an MM-DiT block diagram have been provided for reference.

Model Efficiency and Customization

With 2 billion parameters, SD3 Medium is more compact than many other Stable Diffusion 3 models, which typically range from 800 million to 8 billion parameters. This model's low VRAM requirement makes it highly suitable for standard consumer GPUs, maintaining performance without degradation. It excels in processing detailed customizations from minimal datasets.

Users can download the SD3 safetensors and text encoders from Civitai, which includes the base SD3 Medium file, and variations with additional CLIP and T5XXL text encoders. A variety of pre-made ComfyUI workflows are available to make image generation accessible.

More SD3 Images

Ongoing Development and Future Plans

Christian Laforte, co-CEO of Stability AI, conveyed the company’s commitment to continual model improvements. "Stability AI will continue to push the frontier of generative AI," said Laforte, emphasizing their aim to lead in image generation technology. Future updates will not only enhance image generation but will also extend to multimodal capabilities in video, audio, and language.

Access and Licensing Information

Users can explore the capabilities of SD3 Medium through Stability’s API. The model weights are provided under an open non-commercial license, with an option for a low-cost Creator License for broader use. Those interested in large-scale commercial applications can contact the startup for detailed licensing arrangements.

SD3 has specific licensing considerations under the Stability AI Non-Commercial Research Community License. The model is free for non-commercial use, such as academic research, while commercial usage requires a separate license from Stability AI. More details can be found on the Stability AI license page.

Current Challenges and Company's Response

Stability AI’s launch of SD3 Medium comes during a challenging period. Since its founding in 2020, the company quickly rose to prominence in the generative AI field, standing alongside competitors like Midjourney and OpenAI’s DALL-E. By 2022, the startup's valuation reached $1 billion.

However, the company has faced numerous lawsuits and financial difficulties recently. Artists have taken legal action against Stability AI for using their work to train models without permission. Additionally, financial constraints have prompted discussions of a potential sale, as noted by The Information.

Leadership Changes Amidst Turbulence

In March, CEO and founder Emad Mostaque resigned to focus on decentralized AI projects. Despite these hurdles, Stability AI's software continues to show impressive results, with SD3 Medium demonstrating significant performance enhancements.

Training and Future Model Releases

Training requirements for SD3-Medium are expected to be similar to and slightly lower than SDXL. The model is capable of absorbing nuanced details from small datasets, enhancing its suitability for customization and creativity. Additionally, Stability AI has announced plans to release other SD3 model versions, including Small (1B parameter), Large (4B parameter), and Huge (8B Parameter) models, for free as they complete training.

How to Use SD3 Medium Locally

For a video guide checkout Olivio’s Tutorial

1. Download the Models and Text Encoders:

You can get all the SD3 safetensors, Text Encoders, and example ComfyUI workflows from Civitai here.

2. Update Your Software:

ComfyUI has full SD3 support. Make sure to update to the latest version.
Note that, as of 10:00 am EST on 6/12/2024, Automatic1111 WebUI does not support SD3 yet, but support is expected soon.

3. Choose the Correct Checkpoint:

For ComfyUI, you can use SD3 Medium Incl Clips or SD3 Medium Incl Clips_t5xxlfp8 checkpoints by placing them in the ComfyUI/models/checkpoints directory.
If you use sd3_medium.safetensors, you will need to load text encoder/CLIP weights separately.

4. Load Text Encoders Separately (if needed):

Place the downloaded text encoder weights in the ComfyUI/models/clip/ directory.

5. Use Pre-Made Workflows:

Download and utilize pre-made ComfyUI workflows designed for easy setup:
Simple txt2img Workflow
Multi-Prompt Workflow

6. Generate Images:

Use the downloaded models and pre-made workflows in ComfyUI to create your text-to-image transformations.

In conclusion, while the company faces challenges, Stability AI continues to enhance its technology and expand its offerings in generative AI. Stable Diffusion 3 Medium represents a significant advancement with its combination of performance, efficiency, and versatility, ensuring its position at the forefront of the text-to-image field.