/AIGC/ Design Your Pokémon! (Stable Diffusion)

目录

This project aims to develop a Text-To-Pokémon Model by fine-tuning the Stable Diffusion (SD) model. SD is trained on a wide and diverse dataset, fine-tuning continues to train the pre-trained model using a custom dataset with a specific style, i.e., 800+ Justin Pinkneys’ captioned Pokemon dataset here.

Reference Huggingface/Train/Text2image

export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export TRAIN_DIR="path_to_your_dataset"
export OUTPUT_DIR="path_to_save_model"

accelerate launch train_text_to_image.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --train_data_dir=$TRAIN_DIR \
  --use_ema \
  --resolution=512 --center_crop --random_flip \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --gradient_checkpointing \
  --mixed_precision="fp16" \
  --max_train_steps=15000 \
  --learning_rate=1e-05 \
  --max_grad_norm=1 \
  --lr_scheduler="constant" --lr_warmup_steps=0 \
  --output_dir=${OUTPUT_DIR}

Training and Testing Samples

Generated Images Along Training

If we compare images in 500 iterations and 20000 iterations, we can see flaws like irregular face and twisted eyes in former ones, the style and quality of latter ones is indeed closer to the grand truth Pokémon images in the first row. The quality of generated images is increasing apparently.