Skip to content

Commit f33b89b

Browse files
yiyixuxuhlkyasomozaa-r-r-o-wDN6
authored
The Modular Diffusers (#9672)
adding modular diffusers as experimental feature --------- Co-authored-by: hlky <[email protected]> Co-authored-by: Álvaro Somoza <[email protected]> Co-authored-by: Aryan <[email protected]> Co-authored-by: Dhruv Nair <[email protected]> Co-authored-by: Sayak Paul <[email protected]>
1 parent 48a6d29 commit f33b89b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+16691
-42
lines changed

docs/source/en/_toctree.yml

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,26 @@
9393
- local: hybrid_inference/api_reference
9494
title: API Reference
9595
title: Hybrid Inference
96+
- sections:
97+
- local: modular_diffusers/overview
98+
title: Overview
99+
- local: modular_diffusers/modular_pipeline
100+
title: Modular Pipeline
101+
- local: modular_diffusers/components_manager
102+
title: Components Manager
103+
- local: modular_diffusers/modular_diffusers_states
104+
title: Modular Diffusers States
105+
- local: modular_diffusers/pipeline_block
106+
title: Pipeline Block
107+
- local: modular_diffusers/sequential_pipeline_blocks
108+
title: Sequential Pipeline Blocks
109+
- local: modular_diffusers/loop_sequential_pipeline_blocks
110+
title: Loop Sequential Pipeline Blocks
111+
- local: modular_diffusers/auto_pipeline_blocks
112+
title: Auto Pipeline Blocks
113+
- local: modular_diffusers/end_to_end_guide
114+
title: End-to-End Example
115+
title: Modular Diffusers
96116
- sections:
97117
- local: using-diffusers/consisid
98118
title: ConsisID
Lines changed: 316 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,316 @@
1+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# AutoPipelineBlocks
14+
15+
<Tip warning={true}>
16+
17+
🧪 **Experimental Feature**: Modular Diffusers is an experimental feature we are actively developing. The API may be subject to breaking changes.
18+
19+
</Tip>
20+
21+
`AutoPipelineBlocks` is a subclass of `ModularPipelineBlocks`. It is a multi-block that automatically selects which sub-blocks to run based on the inputs provided at runtime, creating conditional workflows that adapt to different scenarios. The main purpose is convenience and portability - for developers, you can package everything into one workflow, making it easier to share and use.
22+
23+
In this tutorial, we will show you how to create an `AutoPipelineBlocks` and learn more about how the conditional selection works.
24+
25+
<Tip>
26+
27+
Other types of multi-blocks include [SequentialPipelineBlocks](sequential_pipeline_blocks.md) (for linear workflows) and [LoopSequentialPipelineBlocks](loop_sequential_pipeline_blocks.md) (for iterative workflows). For information on creating individual blocks, see the [PipelineBlock guide](pipeline_block.md).
28+
29+
Additionally, like all `ModularPipelineBlocks`, `AutoPipelineBlocks` are definitions/specifications, not runnable pipelines. You need to convert them into a `ModularPipeline` to actually execute them. For information on creating and running pipelines, see the [Modular Pipeline guide](modular_pipeline.md).
30+
31+
</Tip>
32+
33+
For example, you might want to support text-to-image and image-to-image tasks. Instead of creating two separate pipelines, you can create an `AutoPipelineBlocks` that automatically chooses the workflow based on whether an `image` input is provided.
34+
35+
Let's see an example. We'll use the helper function from the [PipelineBlock guide](./pipeline_block.md) to create our blocks:
36+
37+
**Helper Function**
38+
39+
```py
40+
from diffusers.modular_pipelines import PipelineBlock, InputParam, OutputParam
41+
import torch
42+
43+
def make_block(inputs=[], intermediate_inputs=[], intermediate_outputs=[], block_fn=None, description=None):
44+
class TestBlock(PipelineBlock):
45+
model_name = "test"
46+
47+
@property
48+
def inputs(self):
49+
return inputs
50+
51+
@property
52+
def intermediate_inputs(self):
53+
return intermediate_inputs
54+
55+
@property
56+
def intermediate_outputs(self):
57+
return intermediate_outputs
58+
59+
@property
60+
def description(self):
61+
return description if description is not None else ""
62+
63+
def __call__(self, components, state):
64+
block_state = self.get_block_state(state)
65+
if block_fn is not None:
66+
block_state = block_fn(block_state, state)
67+
self.set_block_state(state, block_state)
68+
return components, state
69+
70+
return TestBlock
71+
```
72+
73+
Now let's create a dummy `AutoPipelineBlocks` that includes dummy text-to-image, image-to-image, and inpaint pipelines.
74+
75+
76+
```py
77+
from diffusers.modular_pipelines import AutoPipelineBlocks
78+
79+
# These are dummy blocks and we only focus on "inputs" for our purpose
80+
inputs = [InputParam(name="prompt")]
81+
# block_fn prints out which workflow is running so we can see the execution order at runtime
82+
block_fn = lambda x, y: print("running the text-to-image workflow")
83+
block_t2i_cls = make_block(inputs=inputs, block_fn=block_fn, description="I'm a text-to-image workflow!")
84+
85+
inputs = [InputParam(name="prompt"), InputParam(name="image")]
86+
block_fn = lambda x, y: print("running the image-to-image workflow")
87+
block_i2i_cls = make_block(inputs=inputs, block_fn=block_fn, description="I'm a image-to-image workflow!")
88+
89+
inputs = [InputParam(name="prompt"), InputParam(name="image"), InputParam(name="mask")]
90+
block_fn = lambda x, y: print("running the inpaint workflow")
91+
block_inpaint_cls = make_block(inputs=inputs, block_fn=block_fn, description="I'm a inpaint workflow!")
92+
93+
class AutoImageBlocks(AutoPipelineBlocks):
94+
# List of sub-block classes to choose from
95+
block_classes = [block_inpaint_cls, block_i2i_cls, block_t2i_cls]
96+
# Names for each block in the same order
97+
block_names = ["inpaint", "img2img", "text2img"]
98+
# Trigger inputs that determine which block to run
99+
# - "mask" triggers inpaint workflow
100+
# - "image" triggers img2img workflow (but only if mask is not provided)
101+
# - if none of above, runs the text2img workflow (default)
102+
block_trigger_inputs = ["mask", "image", None]
103+
# Description is extremely important for AutoPipelineBlocks
104+
@property
105+
def description(self):
106+
return (
107+
"Pipeline generates images given different types of conditions!\n"
108+
+ "This is an auto pipeline block that works for text2img, img2img and inpainting tasks.\n"
109+
+ " - inpaint workflow is run when `mask` is provided.\n"
110+
+ " - img2img workflow is run when `image` is provided (but only when `mask` is not provided).\n"
111+
+ " - text2img workflow is run when neither `image` nor `mask` is provided.\n"
112+
)
113+
114+
# Create the blocks
115+
auto_blocks = AutoImageBlocks()
116+
# convert to pipeline
117+
auto_pipeline = auto_blocks.init_pipeline()
118+
```
119+
120+
Now we have created an `AutoPipelineBlocks` that contains 3 sub-blocks. Notice the warning message at the top - this automatically appears in every `ModularPipelineBlocks` that contains `AutoPipelineBlocks` to remind end users that dynamic block selection happens at runtime.
121+
122+
```py
123+
AutoImageBlocks(
124+
Class: AutoPipelineBlocks
125+
126+
====================================================================================================
127+
This pipeline contains blocks that are selected at runtime based on inputs.
128+
Trigger Inputs: ['mask', 'image']
129+
====================================================================================================
130+
131+
132+
Description: Pipeline generates images given different types of conditions!
133+
This is an auto pipeline block that works for text2img, img2img and inpainting tasks.
134+
- inpaint workflow is run when `mask` is provided.
135+
- img2img workflow is run when `image` is provided (but only when `mask` is not provided).
136+
- text2img workflow is run when neither `image` nor `mask` is provided.
137+
138+
139+
140+
Sub-Blocks:
141+
• inpaint [trigger: mask] (TestBlock)
142+
Description: I'm a inpaint workflow!
143+
144+
• img2img [trigger: image] (TestBlock)
145+
Description: I'm a image-to-image workflow!
146+
147+
• text2img [default] (TestBlock)
148+
Description: I'm a text-to-image workflow!
149+
150+
)
151+
```
152+
153+
Check out the documentation with `print(auto_pipeline.doc)`:
154+
155+
```py
156+
>>> print(auto_pipeline.doc)
157+
class AutoImageBlocks
158+
159+
Pipeline generates images given different types of conditions!
160+
This is an auto pipeline block that works for text2img, img2img and inpainting tasks.
161+
- inpaint workflow is run when `mask` is provided.
162+
- img2img workflow is run when `image` is provided (but only when `mask` is not provided).
163+
- text2img workflow is run when neither `image` nor `mask` is provided.
164+
165+
Inputs:
166+
167+
prompt (`None`, *optional*):
168+
169+
image (`None`, *optional*):
170+
171+
mask (`None`, *optional*):
172+
```
173+
174+
There is a fundamental trade-off of AutoPipelineBlocks: it trades clarity for convenience. While it is really easy for packaging multiple workflows, it can become confusing without proper documentation. e.g. if we just throw a pipeline at you and tell you that it contains 3 sub-blocks and takes 3 inputs `prompt`, `image` and `mask`, and ask you to run an image-to-image workflow: if you don't have any prior knowledge on how these pipelines work, you would be pretty clueless, right?
175+
176+
This pipeline we just made though, has a docstring that shows all available inputs and workflows and explains how to use each with different inputs. So it's really helpful for users. For example, it's clear that you need to pass `image` to run img2img. This is why the description field is absolutely critical for AutoPipelineBlocks. We highly recommend you to explain the conditional logic very well for each `AutoPipelineBlocks` you would make. We also recommend to always test individual pipelines first before packaging them into AutoPipelineBlocks.
177+
178+
Let's run this auto pipeline with different inputs to see if the conditional logic works as described. Remember that we have added `print` in each `PipelineBlock`'s `__call__` method to print out its workflow name, so it should be easy to tell which one is running:
179+
180+
```py
181+
>>> _ = auto_pipeline(image="image", mask="mask")
182+
running the inpaint workflow
183+
>>> _ = auto_pipeline(image="image")
184+
running the image-to-image workflow
185+
>>> _ = auto_pipeline(prompt="prompt")
186+
running the text-to-image workflow
187+
>>> _ = auto_pipeline(image="prompt", mask="mask")
188+
running the inpaint workflow
189+
```
190+
191+
However, even with documentation, it can become very confusing when AutoPipelineBlocks are combined with other blocks. The complexity grows quickly when you have nested AutoPipelineBlocks or use them as sub-blocks in larger pipelines.
192+
193+
Let's make another `AutoPipelineBlocks` - this one only contains one block, and it does not include `None` in its `block_trigger_inputs` (which corresponds to the default block to run when none of the trigger inputs are provided). This means this block will be skipped if the trigger input (`ip_adapter_image`) is not provided at runtime.
194+
195+
```py
196+
from diffusers.modular_pipelines import SequentialPipelineBlocks, InsertableDict
197+
inputs = [InputParam(name="ip_adapter_image")]
198+
block_fn = lambda x, y: print("running the ip-adapter workflow")
199+
block_ipa_cls = make_block(inputs=inputs, block_fn=block_fn, description="I'm a IP-adapter workflow!")
200+
201+
class AutoIPAdapter(AutoPipelineBlocks):
202+
block_classes = [block_ipa_cls]
203+
block_names = ["ip-adapter"]
204+
block_trigger_inputs = ["ip_adapter_image"]
205+
@property
206+
def description(self):
207+
return "Run IP Adapter step if `ip_adapter_image` is provided."
208+
```
209+
210+
Now let's combine these 2 auto blocks together into a `SequentialPipelineBlocks`:
211+
212+
```py
213+
auto_ipa_blocks = AutoIPAdapter()
214+
blocks_dict = InsertableDict()
215+
blocks_dict["ip-adapter"] = auto_ipa_blocks
216+
blocks_dict["image-generation"] = auto_blocks
217+
all_blocks = SequentialPipelineBlocks.from_blocks_dict(blocks_dict)
218+
pipeline = all_blocks.init_pipeline()
219+
```
220+
221+
Let's take a look: now things get more confusing. In this particular example, you could still try to explain the conditional logic in the `description` field here - there are only 4 possible execution paths so it's doable. However, since this is a `SequentialPipelineBlocks` that could contain many more blocks, the complexity can quickly get out of hand as the number of blocks increases.
222+
223+
```py
224+
>>> all_blocks
225+
SequentialPipelineBlocks(
226+
Class: ModularPipelineBlocks
227+
228+
====================================================================================================
229+
This pipeline contains blocks that are selected at runtime based on inputs.
230+
Trigger Inputs: ['image', 'mask', 'ip_adapter_image']
231+
Use `get_execution_blocks()` with input names to see selected blocks (e.g. `get_execution_blocks('image')`).
232+
====================================================================================================
233+
234+
235+
Description:
236+
237+
238+
Sub-Blocks:
239+
[0] ip-adapter (AutoIPAdapter)
240+
Description: Run IP Adapter step if `ip_adapter_image` is provided.
241+
242+
243+
[1] image-generation (AutoImageBlocks)
244+
Description: Pipeline generates images given different types of conditions!
245+
This is an auto pipeline block that works for text2img, img2img and inpainting tasks.
246+
- inpaint workflow is run when `mask` is provided.
247+
- img2img workflow is run when `image` is provided (but only when `mask` is not provided).
248+
- text2img workflow is run when neither `image` nor `mask` is provided.
249+
250+
251+
)
252+
253+
```
254+
255+
This is when the `get_execution_blocks()` method comes in handy - it basically extracts a `SequentialPipelineBlocks` that only contains the blocks that are actually run based on your inputs.
256+
257+
Let's try some examples:
258+
259+
`mask`: we expect it to skip the first ip-adapter since `ip_adapter_image` is not provided, and then run the inpaint for the second block.
260+
261+
```py
262+
>>> all_blocks.get_execution_blocks('mask')
263+
SequentialPipelineBlocks(
264+
Class: ModularPipelineBlocks
265+
266+
Description:
267+
268+
269+
Sub-Blocks:
270+
[0] image-generation (TestBlock)
271+
Description: I'm a inpaint workflow!
272+
273+
)
274+
```
275+
276+
Let's also actually run the pipeline to confirm:
277+
278+
```py
279+
>>> _ = pipeline(mask="mask")
280+
skipping auto block: AutoIPAdapter
281+
running the inpaint workflow
282+
```
283+
284+
Try a few more:
285+
286+
```py
287+
print(f"inputs: ip_adapter_image:")
288+
blocks_select = all_blocks.get_execution_blocks('ip_adapter_image')
289+
print(f"expected_execution_blocks: {blocks_select}")
290+
print(f"actual execution blocks:")
291+
_ = pipeline(ip_adapter_image="ip_adapter_image", prompt="prompt")
292+
# expect to see ip-adapter + text2img
293+
294+
print(f"inputs: image:")
295+
blocks_select = all_blocks.get_execution_blocks('image')
296+
print(f"expected_execution_blocks: {blocks_select}")
297+
print(f"actual execution blocks:")
298+
_ = pipeline(image="image", prompt="prompt")
299+
# expect to see img2img
300+
301+
print(f"inputs: prompt:")
302+
blocks_select = all_blocks.get_execution_blocks('prompt')
303+
print(f"expected_execution_blocks: {blocks_select}")
304+
print(f"actual execution blocks:")
305+
_ = pipeline(prompt="prompt")
306+
# expect to see text2img (prompt is not a trigger input so fallback to default)
307+
308+
print(f"inputs: mask + ip_adapter_image:")
309+
blocks_select = all_blocks.get_execution_blocks('mask','ip_adapter_image')
310+
print(f"expected_execution_blocks: {blocks_select}")
311+
print(f"actual execution blocks:")
312+
_ = pipeline(mask="mask", ip_adapter_image="ip_adapter_image")
313+
# expect to see ip-adapter + inpaint
314+
```
315+
316+
In summary, `AutoPipelineBlocks` is a good tool for packaging multiple workflows into a single, convenient interface and it can greatly simplify the user experience. However, always provide clear descriptions explaining the conditional logic, test individual pipelines first before combining them, and use `get_execution_blocks()` to understand runtime behavior in complex compositions.

0 commit comments

Comments
 (0)