Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SD1.5 and SD2.1 not suitable for DDIM inverse #34

Open
shouwangzhe200 opened this issue Oct 8, 2023 · 6 comments
Open

SD1.5 and SD2.1 not suitable for DDIM inverse #34

shouwangzhe200 opened this issue Oct 8, 2023 · 6 comments

Comments

@shouwangzhe200
Copy link

I tried the code in playground_real.ipynb on SD1.4, SD1.5, and SD2.1, and found that the inverse process using DDIM can only generate images that are consistent with the original image for SD1.4. For SD1.5, the generated images have significant deviations, while for SD2.1, the reconstructed images through the inverse process completely collapse.

@ljzycmd
Copy link
Collaborator

ljzycmd commented Oct 8, 2023

Hi @shouwangzhe200, for the failure of DDIM inversion, you can increase the success rate by:

  1. Increase the inversion steps. In our script, the default steps for inversion is only 50, thus you can use larger inversion steps to obtain satisfying results (\eg, 500). This may alleviate the poor reconstruction quality problem with the SD1.5 model.
  2. As for the collapse of SD2.1, you should be cautious about the prediction_type of the U-Net. That is to say, the checkpoint in the hugging face repo here is trained with velocity prediction (prediction_type="v_prediction" rather than the epsilon). While the implementation in our repo relies on the epsilon prediction, thus please ensure that you use their released model here trained with prediction_type="epsilon".

In addition, with the above suggestions, DDIM inversion still reconstructs the image with poor quality in some cases. Some following works like Null-text Inversion can further alleviate this. You can refer to #30 (comment) for more details.

Hope this can help you.

@shouwangzhe200
Copy link
Author

Hi @shouwangzhe200, for the failure of DDIM inversion, you can increase the success rate by:

  1. Increase the inversion steps. In our script, the default steps for inversion is only 50, thus you can use larger inversion steps to obtain satisfying results (\eg, 500). This may alleviate the poor reconstruction quality problem with the SD1.5 model.
  2. As for the collapse of SD2.1, you should be cautious about the prediction_type of the U-Net. That is to say, the checkpoint in the hugging face repo here is trained with velocity prediction (prediction_type="v_prediction" rather than the epsilon). While the implementation in our repo relies on the epsilon prediction, thus please ensure that you use their released model here trained with prediction_type="epsilon".

In addition, with the above suggestions, DDIM inversion still reconstructs the image with poor quality in some cases. Some following works like Null-text Inversion can further alleviate this. You can refer to #30 (comment) for more details.

Hope this can help you.

Thank you for your very detailed explanation, which has been very helpful to me. Another question is, why does SD1.4 only need 50 steps for inversion, while SD1.5 needs 500 steps?

@ljzycmd
Copy link
Collaborator

ljzycmd commented Oct 9, 2023

Hi @shouwangzhe200, in my view, this is not true, and this phenomenon can be case-specific and uneven. Some images can be reconstructed well when only 50 steps are employed for inversion on SD1.5 too, and many images still cannot be inverted successfully with 500 steps on SD1.4. In general, more denoising steps help obtain higher reconstruction quality.

@LiuShiyu95
Copy link

@ljzycmd Thank you for your great job! May I ask if for Inversion, it is necessary to use the training method on the epsilon prediction, or can I still use inversion on the training method on the v_prediction?

@ljzycmd
Copy link
Collaborator

ljzycmd commented Dec 26, 2023

Hi @LiuShiyu95, the prediction type during the inversion process should comply with the prediction type during model training. You can use the DDIMInverseScheduler for the inversion: https://huggingface.co/docs/diffusers/api/schedulers/ddim_inverse.

Hope the above can help you.

@LiuShiyu95
Copy link

I understand, thanks very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants