-
Notifications
You must be signed in to change notification settings - Fork 6.1k
Closed
Description
I've noticed a potential inconsistency in how the VAE-encoded control_image is processed between the training script for ControlNet with Stable Diffusion 3 and the corresponding inference pipeline.
In the inference pipeline (pipeline_stable_diffusion_3_controlnet.py):
The control_image latent is processed by both subtracting the vae_shift_factor and multiplying by the scaling_factor.
diffusers/src/diffusers/pipelines/controlnet_sd3/pipeline_stable_diffusion_3_controlnet.py
Line 1074 in 425a715
control_image = (control_image - vae_shift_factor) * self.vae.config.scaling_factor |
However, in the provided training example, the VAE-encoded controlnet_image is only multiplied by the scaling_factor, without subtracting the shift_factor.
controlnet_image = controlnet_image * vae.config.scaling_factor |
controlnet_image = (controlnet_image - vae.config.shift_factor) * vae.config.scaling_factor
viv92 and tushar-10xConstruction
Metadata
Metadata
Assignees
Labels
No labels