Same number of parameters in the U-Net as 1.5, but uses OpenCLIP-ViT/H as the text encoder and is trained from scratch. New stable diffusion model ( Stable Diffusion 2.0-v) at 768x768 resolution. To enable fp16 (which can cause numerical instabilities with the vanilla attention module on the v2.1 model), run your script with ATTN_PRECISION=fp16 python Per default, the attention operation of the model is evaluated at full precision when xformers is not installed. New stable diffusion model ( Stable Diffusion 2.1-v, Hugging Face) at 768x768 resolution and ( Stable Diffusion 2.1-base, HuggingFace) at 512x512 resolution, both based on the same number of parameters and architecture as 2.0 and fine-tuned on 2.0, on a less restrictive NSFW filtering of the LAION-5B dataset.Instructions are available here.Ī public demo of SD-unCLIP is already available at /stable-diffusion-reimagine Comes in two variants: Stable unCLIP-L and Stable unCLIP-H, which are conditioned on CLIP ViT-L and ViT-H image embeddings, respectively. This model allows for image variations and mixing operations as described in Hierarchical Text-Conditional Image Generation with CLIP Latents, and, thanks to its modularity, can be combined with other models such as KARLO. New stable diffusion finetune ( Stable unCLIP 2.1, Hugging Face) at 768x768 resolution, based on SD2.1-768. The following list provides an overview of all currently available models. This repository contains Stable Diffusion models trained from scratch and will be continuously updated with
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |