Wan 22 FLUX and Qwen Image Upgraded Ultimate Tutorial for Open Source SOTA Image and Video Gen Models

Wan 2.2, FLUX & Qwen Image Upgraded: Ultimate Tutorial for Open Source SOTA Image & Video Gen Models

Full tutorial link > https://www.youtube.com/watch?v=3BFDcO2Ysu4

Wan 2.2, Qwen Image, FLUX, FLUX Krea, all these models are the SOTA open-source models and in this master tutorial I will show you how to use these models in the easiest, most performant, and most accurate way. After doing almost one week of research, I have determined the very best presets and prepared this tutorial. With literally one click you will be able to install, download models, set presets, and use these amazing models. Wan 2.2 is currently the king of video generation models and now it is super fast with lightx2v Wan2.2-Lightning LoRAs. Moreover, Qwen Image is now ultra-fast with the recently released 8-step LoRA with almost no quality loss. Furthermore, I have updated FLUX and FLUX Krea presets to improve image generation quality. Finally, I have trained FLUX Krea with our existing DreamBooth and LoRA training workflows, analyzed and shared the results in this tutorial. As additional information, I have shown the upcoming Qwen Image editing/inpainting model and the Qwen Image training application I am developing.

▶️ SwarmUI Installers, Presets and Model Downloader App : 🔗 https://www.patreon.com/posts/114517862

▶️ ComfyUI Backend Installer : 🔗 https://www.patreon.com/posts/105023709

▶️ FLUX / FLUX Krea DreamBooth Training : 🔗 https://www.patreon.com/posts/112099700

▶️ FLUX / FLUX Krea LoRA Training : 🔗 https://www.patreon.com/posts/110879657

▶️ Main SwarmUI Installation Tutorial : 🔗 https://youtu.be/fTzlQ0tjxj0

▶️ RunPod SwarmUI Installation Tutorial : 🔗 https://youtu.be/R02kPf9Y3_w

▶️ Massed Compute SwarmUI Installation Tutorial (starting 00:21:32) : 🔗 https://youtu.be/8cMIwS9qo4M

Video Chapters

00:00:00 Introduction to New State-of-the-Art AI Models

00:00:43 Wan 2.2 vs Wan 2.1 Image-to-Video Comparison

00:01:43 Huge Improvement with New Wan 2.2 Text-to-Video Presets

00:02:44 More Examples of New Wan 2.2 Presets (Text & Image-to-Video)

00:03:08 Using RIFE for Smooth Frame Interpolation (2x FPS)

00:04:30 Image Generation: Wan 2.2 Realism vs FLUX Dev & Krea Dev

00:05:08 Introducing Ultra-Fast Qwen Image 8-Step Preset

00:05:44 Coming Soon: Qwen Image Editing Capabilities Preview

00:06:10 Comparing Qwen Image Presets (High Quality, Fast & Realism)

00:07:14 Behind the Scenes: The Extensive Testing Process for Presets

00:07:42 FLUX Krea Dev Training Experiments (DreamBooth & LoRA)

00:08:21 Updates: Qwen Training App, ComfyUI & SwarmUI Installers

00:08:59 How to Update SwarmUI and ComfyUI Installations

00:10:05 Importing New Presets into SwarmUI

00:10:51 Easiest Way: Using the Automatic Preset Import Script

00:12:22 Using the Model Downloader for Required AI Models

00:13:22 Configuring Downloader for ComfyUI & Forge WebUI

00:15:31 Demo: Generating a Wan 2.2 Image-to-Video (8-Steps)

00:17:02 Using Google Studio AI for High-Quality Prompt Generation

00:18:19 Starting the Generation & Multi-GPU Trick

00:19:46 Advanced Video Options: Frames, FPS, and RIFE Settings

00:21:11 Demo: Generating a Wan 2.2 Text-to-Video (8-Steps)

00:22:27 Live Result: Image-to-Video Generation Finished

00:23:34 Demo: Ultra-Fast Image Generation with Qwen (8-Steps)

00:24:50 Live Result: Text-to-Video Generation Finished (Amazing Quality)

00:25:21 Generation Speed Analysis & Downloading Your Video

00:26:38 Comparing FLUX Krea Dev & Qwen Realism Presets

00:28:46 How to Upscale Images to 2x High Resolution

00:29:47 Summary of New Presets and Recommendations

00:30:40 In-Depth: Training on FLUX Krea Dev (LoRA & DreamBooth)

00:33:48 Coming Soon: One-Click Qwen Image Training Application

00:36:11 Join The Community (Discord & Reddit) & Final Words

Advancements in AI Image and Video Generation in 2025

The year 2025 has marked a pivotal era for AI-driven content creation, with models pushing boundaries in realism, speed, and versatility. From text-to-video (T2V) to image editing, innovations like Mixture-of-Experts (MoE) architectures and enhanced prompt adherence are transforming industries such as film, advertising, and design.

Alibaba's Tongyi Wanxiang (Wan) 2.2 stands out as the first MoE-based video diffusion model, boasting 27 billion parameters (14B active) for cinematic T2V and image-to-video (I2V) at 720p resolution. It excels in motion dynamics, lighting control, and ultra-fast rendering, outperforming predecessors like Wan 2.1 in physics simulation and quality. Open-sourced on July 28, 2025, it's ideal for creators seeking high-fidelity outputs.

Qwen-Image, another Alibaba gem, is a 20B parameter MMDiT foundation model specializing in complex text rendering in English and Chinese, even in intricate scenes. Released in August 2025, it supports precise editing, style preservation, and multilingual prompts, surpassing benchmarks in text incorporation and aesthetics. Its open-source nature makes it a go-to for detailed image generation.

Black Forest Labs' Flux.1 [dev], a 12B parameter flow transformer, shines in text-to-image tasks with exceptional detail and commercial viability.

Some background music by NoCopyrightSounds : https://gist.github.com/FurkanGozukara/681667e5d7051b073f2e795794c46170

Video Transcription

00:00:00 Greetings everyone. Today I am going to introduce how to use state-of-the-art image generation and
00:00:07 video generation models with easiest and most accurate and best performance. I have
00:00:14 been relentlessly testing new Wan 2.2 LoRAs to update our presets. Moreover, not only Wan 2.2
00:00:24 LoRAs, but I also tested FLUX Dev and Qwen Image as well. And I will show all of them
00:00:32 to you in this tutorial video so that you will see the significant differences and
00:00:38 improvements we have for each preset. For example, here we are seeing the difference
00:00:45 between Wan 2.1 and new Wan 2.2 image-to-video model differences. As you can see that we have a
00:00:53 significant improvement with image-to-video. And this is the image used to generate those videos.
00:01:01 The difference in text-to-video is even more significant. This is Wan 2.1 text-to-video
00:01:07 base model and let's see what it generates. So you see, this was what we were generating with Wan 2.1
00:01:15 base model. And this is the prompt used. When we compare it with Wan 2.1 Fusion X text-to-video,
00:01:21 this is the result. As you can see, the result is also not good with Fusion X text-to-video. When
00:01:28 we move to old Wan 2.2 high quality text-to-video result, this is the result we get. This was our
00:01:36 best result with Wan 2.2 text-to-video previously. However, with the updated configuration,
00:01:43 let's see the results. So this is our Wan 2.2 high quality text-to-video 20 steps. And let's
00:01:49 see the significant difference. So you see, there is a huge difference between this older
00:01:55 version to this newer version. By the way, if you are feeling that it is too fast, you can
00:02:00 reduce FPS. I will explain all of that. This is 24 FPS, 121 frames video. Moreover, we have a new
00:02:09 Wan 2.2 text-to-video 8 steps, and this is just amazing. You see, with only 8 steps, we are able
00:02:17 to generate this amazing video just from text. 24 FPS, 121 frames video takes less than 5 minutes
00:02:28 on RTX 5090. If you generate 16 FPS, 81 frames video, it will be even faster, under 3 minutes.
00:02:36 Moreover, here another example with Wan 2.2 high quality text-to-video 20 steps, and you see this
00:02:44 is like an animation level quality. With only 20 steps, I updated all the presets and newer presets
00:02:52 are generating amazing videos. This is 24 FPS and 121 frames, and you see the quality. And here
00:03:00 another example of Wan 2.2 image-to-video with only 8 steps. This is 16 FPS, 81 frames video,
00:03:08 and I can make RIFE 2x FPS increase and make it much more fluent. All I need is making this
00:03:16 frame interpolation enabled with making it 2x. And now it is regenerating it with 2x FPS
00:03:25 increase. Moreover, I have updated my Windows installation of SwarmUI, RunPod installation,
00:03:32 and Massed Compute installation to automatically install RIFE frame interpolation and also famous
00:03:40 TeaCache. They will be automatically installed when you make a fresh installation of Windows
00:03:45 or RunPod or Massed Compute with the newest zip file. But always, you can manually install like
00:03:52 clicking here, install TeaCache, or when you go to the image-to-video, you will see that there is
00:03:58 install RIFE, and from there you can install RIFE frame interpolation. And this is result of RIFE 2x
00:04:05 FPS increase. This is different seed, therefore the video is different, but you can see that it
00:04:11 looks much more smooth this way. Moreover, here another example of Wan 2.2 text-to-video with 8
00:04:18 steps, and you see the quality. This is generated with only 8 steps, therefore it is really fast,
00:04:25 and you can see that it is pretty good, pretty decent even though it is only 8 steps.
00:04:30 Furthermore, now we have Wan 2.2 image realism preset as well. With this preset,
00:04:37 you can generate really, really good realistic images like this. These are raw images. It is
00:04:42 not very good at stylized images like this. And when we compare same prompt with FLUX Dev,
00:04:49 this is the FLUX Dev result for realistic prompt, and this is FLUX Dev result for stylized prompt.
00:04:56 And let's compare with FLUX Krea Dev. This is FLUX Krea Dev. FLUX Krea Dev is also amazing at
00:05:02 realism, especially with humans. Its stylized prompt is like this as you are seeing right
00:05:08 now. It is also pretty decent. And now we have Qwen Image 8 steps. This is really, really fast,
00:05:15 like 10 times faster than before. This is Qwen 8 steps fast result for realistic prompt, and this
00:05:22 is the stylized prompt result of the Qwen. As you are seeing, Qwen is unchallenged if your aim is
00:05:30 not realism. Hopefully, I will also make a video for realism of Qwen soon with training it, but so
00:05:37 far, these are the results and these are amazing. Qwen Image editing just has been published while
00:05:44 I was editing this video. It is looking amazing, extremely promising. Hopefully,
00:05:50 I will make a full tutorial, one-click install presets for it very soon. So it is not ready yet,
00:05:57 but I am showing you what it is capable of, what demo images they have published. Hopefully,
00:06:02 it is coming very soon. So stay subscribed and our tutorial is continuing right now.
00:06:10 And this is Qwen high quality preset we have. This is the result for realism prompts, and this is the
00:06:16 result for stylized prompt. As you can see that with stylized prompt, the 8 steps really fast
00:06:24 preset is almost same quality as high quality. Therefore, we are getting like 10 times speed gain
00:06:32 with almost no quality loss, as you are seeing, this to this. This is amazing. We also have Qwen
00:06:39 realism fast prompt. You see this is definitely more realistic than the high quality or 8 steps.
00:06:47 Let me show you, this is 8 steps, this is high quality, and this is our Qwen realism prompt.
00:06:52 It really makes it realistic and it's also faster than high quality of Qwen. And this is the result
00:06:58 of Qwen realism. It also made stylized prompt somewhat degree more realistic output compared
00:07:06 to the high quality or Qwen fast results. To find out all these new presets, I have
00:07:14 literally done hundreds of generations, analyzed hundreds of results. For example, let's open this
00:07:22 one randomly and let's see the result. This is the grid test that I did, and these are the results of
00:07:28 the grid test. So I did hundreds of tests like this. I have been doing this for several days,
00:07:34 analyzed all of them, and prepared these amazing presets for you. Furthermore, I did a DreamBooth
00:07:42 training on FLUX Krea Dev. This was our original post if you remember. And when you scroll down,
00:07:48 you will see that I have posted the comparison results of the FLUX Krea Dev with each epoch
00:07:55 grid. And in this tutorial, I will also analyze these and show you. I also did a
00:08:01 LoRA training on FLUX Krea Dev as well. The post is also updated and posted full grid as
00:08:08 well. So I will analyze the grid and compare results and make my comment on this training.
00:08:16 Another thing that I have to mention is that I am working on Qwen Image training right now,
00:08:21 developing an application. We will talk about this as well in this video. Moreover,
00:08:26 I have updated our ComfyUI installer as well. Now it will automatically install FFMPEG, RIFE,
00:08:32 and TeaCache on Massed Compute, and it is made more robust to update all of the extra nodes that
00:08:40 we automatically install. And finally, our SwarmUI installer. This is where we will get our presets,
00:08:46 our installers. This is a big update, and now we have automatically importing presets
00:08:53 feature as well. So let's begin the tutorial. So as usual, follow the links in the description
00:08:59 of the video. Download the SwarmUI model downloader latest version. Also download
00:09:04 the ComfyUI installer latest version. Move them into your installation folder and extract all
00:09:12 files and overwrite everything. You can use any extraction method. Everything is extracted. Let's
00:09:18 sort by name. Then update your SwarmUI. You see, Windows update SwarmUI, but before doing that,
00:09:24 I recommend you to update your ComfyUI as usual. So put the latest zip file into your
00:09:30 ComfyUI installation, extract and overwrite all the files, then first run Windows update
00:09:36 ComfyUI.bat file. We also improved the update process. Now it is much more robust. Okay,
00:09:42 update has been completed. Then return back to SwarmUI. Let's sort by name. Windows update
00:09:47 SwarmUI.bat file. Okay, run. It will update it with maximum accuracy and robustness, and it
00:09:54 will start the SwarmUI as usual. So if you don't know how to install ComfyUI and SwarmUI and setup,
00:10:00 this is the tutorial that you need to watch, but if you already have them installed, you are ready.
00:10:05 And the latest version of SwarmUI will start like this. You see, these are my existing presets. Let
00:10:11 me demonstrate you something. I will just make this like this, edit. And how you are going
00:10:15 to update the new presets that we have? Either you can use import preset feature, choose file,
00:10:22 go back to installation folder, amazing SwarmUI presets, overwrite. But if you have leftovers
00:10:28 that we have changed names or some other stuff, they will be staying here. But this is the way
00:10:34 of keeping your existing presets. If you want a clean import, you need to delete every one
00:10:40 of them like this and then import. I recommended the author of SwarmUI to add must delete option,
00:10:47 but it is not available yet. So I have developed a solution myself. You see,
00:10:51 Windows preset delete import.bat file. This file will ask you import or not. When you click yes,
00:11:00 it will automatically delete your existing presets, then it will import everything. However,
00:11:05 for this to work, you need to have SwarmUI running in 7861 port, which is the default port.
00:11:12 Currently, my SwarmUI is running in different port because I started it with Windows update file,
00:11:18 so I will close it. Let's close this as well and let's use the Windows start SwarmUI.bat file. This
00:11:24 will start it with the accurate port. And you see it is started. I will fix this issue when
00:11:28 you are watching. I will make both of them same port, but this is a reminding thing. Then I will
00:11:34 double-click Windows preset delete and import.bat file, click yes. And you see it deleted all of my
00:11:40 existing presets and imported new ones. Moreover, it will back up your existing presets into presets
00:11:47 backup folder that it generates automatically. And you see this is my deleted presets. They were
00:11:53 saved here. Now our presets are ready. Don't forget to click refresh icon because it may
00:12:00 still display older presets. Also, let's sort by name, and now all of our presets are ready.
00:12:06 So how you are going to use these presets? These presets will automatically select model files as
00:12:12 well. To be able to use them, you need to use Windows start download models up.bat file or
00:12:18 change the model names, paths yourself. This application is being developed. I am adding
00:12:25 new models, adding new bundles, improving its features. Currently, it supports so many models
00:12:30 that you can download. I recommend you to use SwarmUI bundles. We have Qwen Image core bundle.
00:12:35 It shows all the models, their sizes. We have Wan 2.2 core 8 steps bundle. It shows all the models
00:12:41 it is going to download, sizes. For example, with new Wan 2.2 presets, we are using these four new
00:12:48 LoRAs. Moreover, we have Wan 2.1 core bundle. You see all the models. We have FLUX models bundle.
00:12:54 So you can download all these bundles, then you will be ready to use all of them. So I recommend
00:12:59 to download them. For today's tutorial, you need to download Qwen Image core bundle, click it.
00:13:04 You need to download Wan 2.2 8 steps core bundle. And if you want to also test Wan 2.1, you need to
00:13:11 download this one. And if you want to also use FLUX, you need to download this one. These are
00:13:16 the core bundles. These are state-of-the-art image and video generation models. So just
00:13:22 download them and you will be ready. Moreover, if you are a ComfyUI user or
00:13:26 if you are fond of Focus, we support both of them. For ComfyUI, select this ComfyUI folder structure,
00:13:33 go back to your ComfyUI installation and go inside models like this, copy this path and give its path
00:13:41 like this. So now it will download into accurate ComfyUI folders such as LoRAs is changing with
00:13:47 ComfyUI compared to SwarmUI. Moreover, if you are Forge WebUI user, just check this out and
00:13:54 then give its path here, then it will download into there. Again, it is same for Forge WebUI,
00:13:59 you need to give this path. For example, we also have installer for Forge WebUI. It is updated,
00:14:05 it has more features. It is fully supporting RTX 5000 series. And if you want to make it
00:14:12 use lowercase folder names, just check this and when you click remember settings, it will save
00:14:18 them and when you start next time, it will use them. Moreover, you can manually download models
00:14:24 one by one. For example, image generation models, Qwen Image models. You see we have Qwen GGUF Q4,
00:14:30 GGUF Q5. So if you are low on VRAM, you can use them. However, with SwarmUI and ComfyUI, it will
00:14:36 automatically do block swapping, so you don't even need that. FLUX models, we have all of them here.
00:14:41 We have FLUX GGUF models, you see, all of them. We have High Dream models. However, I don't recommend
00:14:47 High Dream anymore. Qwen is coming so strong. It is the best model if you ask my opinion. Stable
00:14:52 Diffusion 1.5 models, Stable Diffusion XL models. So we support so many models. We support image
00:14:57 upscaling models, YOLO masking models. We have text encoders, UMT5 text encoders, CLIP models.
00:15:05 We have video generation models, huge, Wan 2.1 official models, Wan 2.1 Fusion X models, Wan 2.1
00:15:13 LoRAs like this, Wan 2.2 official models. You see, we have also GGUF models, Wan 2.2 LoRAs.
00:15:20 So check this application out, and if you need any other models, you can message me from this video
00:15:25 or from Patreon, and hopefully I will add them. Once you downloaded all the models and imported
00:15:31 new presets, how you are going to use them? This is important. For example, Wan 2.2 image-to-video
00:15:37 8 steps. Let's do a demo with it. So first of all, click Quick Tools and Reset params to default.
00:15:44 This is mandatory. Do this at every step and you will not have any issues. Then click this
00:15:50 hamburger menu and direct apply. Do not select it. Make direct apply. This is working better. You see
00:15:56 it did set every parameter, including the models and everything. Then click Init Image because
00:16:03 this is an image-to-video model. Choose file. For example, let's use this Pikachu for animation. So
00:16:10 I will select it. This is my image resolution, and this is the base resolution of the model.
00:16:15 You can always change the base resolution and it will automatically calculate new resolution
00:16:20 based on that. How you can change? From models, you see this model is automatically selected,
00:16:24 click here, edit metadata. Make sure that its architecture is accurate and this is the base
00:16:30 resolution of the model that you can change for automatic calculation, then save. Then
00:16:34 click this res and use closest aspect ratio. I recommend this. You can also use exact aspect
00:16:40 ratio for not cropping it, but closest aspect ratio is better. Then all I need to do is write
00:16:46 my prompt here. Do not change this. This is how it uses both of the lightning LoRAs because Wan 2.2
00:16:54 works with base and refiner model. This is how we are setting up two LoRAs for each of the models.
00:17:02 For writing a prompt to this image, I will use our prompt generation file. So type Google, Google
00:17:09 Studio AI. This is for free. Enter inside there. Click this plus icon after you log in with your
00:17:15 account. It is for free. This is amazing. Upload file. Go back to your SwarmUI model installation
00:17:21 and you will see that, let's sort by name, and you will see that we have video models prompt
00:17:26 generate guidance. This guidance can be used for Qwen Image generation as well. It is amazing. I
00:17:32 will type this. Write me a prompt for uploaded image with a very intense action scene. You can
00:17:40 type anything. Then I will click this upload icon and I will select my image. So upload both of
00:17:47 the files for Google Studio AI to process. I will make temperature like 50%. I like it. Set thinking
00:17:54 budget to maximum. Make sure that grounding with Google Search is off. So these are the parameters
00:17:59 and run. This model is just amazing. This is for free. Google is still providing Gemini Pro with
00:18:06 maximum context size, with maximum features in Google Studio AI. So leverage it for yourself
00:18:13 until it becomes paid. Okay, so it will do the thinking and write a prompt for us by
00:18:19 using our amazing video models prompt generation guidance. Okay, we are getting the prompt here. So
00:18:25 let's copy this and paste it here. If we read the prompt, it is cinematic, high contrast lighting,
00:18:31 low angle shot with a dynamic handheld camera feel, an intense action scene unfolds in a misty,
00:18:36 primal forest where a hyper-realistic, furry Pikachu is suddenly ambushed on a slick,
00:18:42 mossy rock by a churning stream, and it goes on. Just pause the video and read it. Then generate.
00:18:47 So currently, I am running this on RunPod because I am recording this video in my laptop right now.
00:18:54 So I want it to be fast and I will show you a trick. This is a usual setup that I have shown
00:18:59 you numerous times. I have installed the ComfyUI and I am using Sage Attention. This is exactly
00:19:04 same in my local computer as well. If you look at my local computer, this is its backend. Currently,
00:19:10 since I closed its terminal, it shows that it is failed to send. You see, this is my ComfyUI
00:19:14 installation. I am still using Sage Attention. By the way, currently Sage Attention is working with
00:19:19 Qwen Image as well when you update your ComfyUI and SwarmUI to the latest version. So use it.
00:19:25 And the generation will start soon. It is first loading the models. Okay, generation started.
00:19:30 Yeah, the trick that I was going to show you is that you see there is OverQueue. When I make this
00:19:34 zero, as soon as I hit generate, it will start the generation on next available GPU. So currently I
00:19:41 have four GPUs, so I can use all of them at the same time. And the generation started. By the way,
00:19:46 what else parameters that you can set other than the prompt? Let's click this way,
00:19:50 advanced options, and in here you will see that in the image-to-video, video frames. This is super
00:19:56 important. This determines the length of your video. Currently, since my video FPS is 24, which
00:20:04 is set is here, you see video FPS, this is what the image-to-video models uses. This will be 73
00:20:10 minus 1, the first frame, divided by 24. So this will be 3 seconds video. If you want it longer,
00:20:17 you can set this as 81 frames and 16 FPS, then it will be 5 seconds video, or you can even make it
00:20:25 121 frames and 24 FPS, it will be 5 seconds video. So it is up to you. Test with your case and see
00:20:33 which one is working better. If you make it 16 FPS, then I recommend you to also enable video
00:20:39 frame interpolation. When you set this as two, it will make it double FPS with almost realistic
00:20:48 quality. This is working great. By the way, I also have implemented automatic installation
00:20:53 of this video frame interpolation RIFE and also TeaCache into the installers. So when you next
00:21:00 time install, it will be automatically installed for both Windows and RunPod and Massed Compute.
00:21:05 So meanwhile this is getting generated, let's also generate a text-to-video. So quick tools, reset
00:21:11 params to default. Let's go back to preset, and I'm going to use Wan 2.2 text-to-video 8 steps.
00:21:17 So click here, direct apply. Everything is set, you see. Just type my prompt here, and I need to
00:21:24 change the resolution whichever I want. Let's make this 16:9, so this will be the resolution. How
00:21:30 many frames I want? Let's change the frame count for this. The frame count of text-to-video is set
00:21:36 here. So text-to-video and image-to-video uses different panels on SwarmUI. Then let's make this
00:21:41 81 frames. I will make this 16 FPS and I will make it double FPS increase with RIFE and hit generate.
00:21:50 Then this will generate the video. By the way, if you want to see how it works in ComfyUI, go
00:21:56 to ComfyUI workflow and import from generate tab. It will import it. Let's just wait for it to load.
00:22:04 Okay, it is loaded. Import from generate tab, and this is the workflow that it uses. From here,
00:22:09 you can also verify how it is working and you will see that it is using both of the LoRAs accurately,
00:22:15 both of the base models accurately. I did spend huge time to prepare these easy-to-use,
00:22:22 very easy-to-use presets. And this is the generated video, live generated video from
00:22:27 image. You see with only 8 steps, and this is the quality of the generation. This is really good.
00:22:33 This is 3 seconds video. Let's see how much time it took. The generation took 3 minutes. You see,
00:22:39 only 3 minutes to generate this amazing quality video with Wan 2.2.
00:22:45 Let's also generate an image since we have updated Qwen Image with fast preset. I could use this
00:22:51 prompt, but let's give another command here. Make this prompt to generate a static image, not video.
00:23:00 Then hit run and let's see what we will get. So this way, you can both get video generation
00:23:06 prompts or you can get image generation prompts. Okay, we have a prompt here. Let's see. So let's
00:23:12 use both of the prompts to generate. First, let's use this prompt. So I will click quick tools,
00:23:16 reset params to default. Then what I need to do is select my preset. So let's sort this by name
00:23:22 to not get confused because it is by default is different. Then I will select Qwen Image 8 steps
00:23:28 ultra fast. So direct apply, write my prompt, and generate. This will really fast generate an image
00:23:34 with Qwen Image and the quality is just amazing. You will see in a moment. Currently it is loading
00:23:39 the model on next available GPU. You can always see it from server logs, debug, and let's see
00:23:45 what is happening. So you see it shows that got prompt on the ComfyUI 2. This means it is on the
00:23:50 third GPU right now. It is loading the model. Everything is automatically downloaded with my
00:23:55 automatic downloader. Everything is automatically set. So I am making this extremely easy to use and
00:24:01 way cheaper to use than online services. And everything is same in your local computer as
00:24:07 well. So just use it with your local computer if you have a decent GPU, or if you want to scale it,
00:24:12 use it on a cloud service like Massed Compute, which I recommend, or like RunPod. It is either
00:24:17 way fine. So this is the generation of the video. The Qwen Image model is still being loaded. RunPod
00:24:24 is really slow when loading the models. On my computer, this is almost lightning fast. Second
00:24:29 video is almost done. Once this Qwen Image model loaded, we will be able to generate way faster.
00:24:34 Okay, you see the generation started. It is really fast. This is real-time generation. It is really
00:24:39 fast. And generation almost finished. We will see in a moment. And you see now it is working
00:24:44 with Sage Attention as well. Okay, the image has been generated. Let's see it. My internet is slow,
00:24:50 unfortunately. Yes, you see, this is an amazing composition as you are seeing right now. So let's
00:24:57 see the other prompt that it generated. I will just... Oh, by the way, this is the text-to-video
00:25:03 result. Let's also see that. Yes. This has been generated with 8 steps from text-to-video. You
00:25:09 see the quality? This is just amazing, amazing quality with an amazing prompt. Wan 2.2 is
00:25:16 just amazing. This is mind-blowing quality. And how much time did this take? We generated
00:25:21 this live when we are recording the video. This took only 3.86 minutes, under 4 minutes, as you
00:25:28 are seeing. And this is 81 frames, 5 seconds video with RIFE interpolation 2x. So this is actually
00:25:35 32 FPS right now. So I can just download this. How? Click more, download, and it will download
00:25:42 it into your computer. When I open it, I can view it in my computer. When I see properties,
00:25:48 I can see FPS. By the way, it is not exactly 5 seconds because we are trimming first four frames,
00:25:54 which usually causes some color differences. So maybe you noticed it with other generations. This
00:26:01 is why. Okay, let's return back to generation of the static image. Let's generate because currently
00:26:06 setup is selected for Qwen Image. And the next generation will be much faster. So the first
00:26:11 generation was 86 seconds. Let's see the second generation. Okay, it is getting generated with 8
00:26:17 steps with Qwen Image. And it is done. Yes. Yes, you see the quality? This took only 20 seconds.
00:26:25 You see Qwen Image only taking 20 seconds with 8 steps. Making these presets really took huge time
00:26:32 of me. So I really recommend you to use them. Let's also generate with FLUX Krea Dev. I also
00:26:38 updated that. So reset params to default. Let's direct apply FLUX Krea Dev here and
00:26:45 generate. Now it will generate with FLUX Krea Dev. Meanwhile it is generating with FLUX Krea Dev,
00:26:50 let's also generate with Wan 2.2 image realism. This is not a very realistic prompt, but let's
00:26:56 see. Reset params to default, direct apply. Just type the prompt here. Do not change whatever it
00:27:02 writes here. This is important to work accurately. Okay, generate. So whenever you change a model,
00:27:08 a preset, always quick tools, reset params to default to not make any mistakes. Then direct
00:27:14 apply. Okay, this is I think FLUX Krea... Oh, this is probably Qwen Image realism because the
00:27:20 Qwen models were already loaded on my GPUs. Yes, this is Qwen Image realism. This is not a very
00:27:27 realistic prompt. I will also make a realistic prompt. Oh, really good. You see? Still really,
00:27:32 really high quality even though this is a not a real realistic prompt. This is not a prompt of a
00:27:37 man. Let's make the prompt for a realistic one. Photo of a an handsome man wearing an expensive
00:27:45 suit in an amazing garden. Let's generate. And this is FLUX Krea Dev. You see FLUX Krea is also
00:27:52 extremely optimized for realistic images. This is really, really good, really, really decent image.
00:27:58 Our presets are also using the very best available samplers and schedulers, all optimized for highest
00:28:05 quality with minimal loss of speed. Qwen Image realism generating this realistic prompt,
00:28:12 rather realistic. And here, this is raw generation of Qwen Image realism. Probably I need to work on
00:28:18 the prompt. This is a very primitive prompt. Let's also try this prompt on FLUX Krea Dev.
00:28:23 So I will just direct apply and just type it. Since it is loaded on the GPU, it will right away
00:28:29 use it. You see with the SwarmUI, it is handling everything automatically for me and really fast.
00:28:35 I am using four GPUs at the same time. If you have multiple GPUs on your computer, whether it
00:28:40 is on cloud service or whether it is on your local computer, it will work. And this is FLUX Krea Dev.
00:28:46 Let's also do 2x upscale. So I will just direct apply. This is selecting FLUX Dev by default,
00:28:53 but I am going to just change the model from here. So if your model names are different,
00:28:58 you need to also change them. Let's generate. This time, we will both generate and 2x upscale
00:29:05 with our upscaling workflow and let's see the results. The upscaling will be slow because
00:29:10 the resolution is now will be four times higher compared to what we were generating,
00:29:15 so it will be taking four times duration. We can always see server logs. By the way, it won't be
00:29:21 exactly four times. Why? Because we are doing lesser steps this time. Okay, it will be still
00:29:26 pretty fast. Okay, upscaling started. So this is how you use the presets. This is four times
00:29:32 upscaled. Let's open in a new tab. The resolution is 2048 to 2048. Let me make it default. Okay,
00:29:41 so this is upscaled FLUX Krea Dev image. So this is how you use presets. I recommend you to
00:29:47 try all of them. If you want the highest quality, then we have Wan 2.2 high quality 20 steps. As you
00:29:54 do more steps, it becomes better, but quality of this also amazing. And we also have Wan 2.2 high
00:30:01 quality 20 steps. This is also super quality. This takes long. I am also keeping older presets from
00:30:07 now on, which is beginning with Z letter, Z old Wan 2.2 high quality, Z old Wan 2.2 8 steps. So
00:30:15 you don't need to use them, but I'm just keeping them if you want to compare later. Moreover,
00:30:19 the Qwen 8 steps ultra fast is matching with Qwen Image high quality and it is almost like six times
00:30:28 to 10 times faster than the high quality that we had previously. So use this Qwen Image 8 steps
00:30:34 and it is amazing as I just shown you. For FLUX context, follow the tutorial for FLUX context.
00:30:40 Okay, what about FLUX Krea Dev training? Because I have been getting asked of it. FLUX Krea Dev
00:30:47 training is right away working with our FLUX Dev training. Just use the latest zip file,
00:30:53 download models, it will automatically download FLUX Krea Dev as well. And I have made training,
00:30:59 but not only made training, I also posted comparisons and my opinions. FLUX Krea Dev is
00:31:05 requiring either higher learning rate slightly or more epochs. My followers also verified that. So
00:31:12 you can download these epoch grid comparisons from here. The links are in the post. And when you open
00:31:18 them, you will see full quality grid comparisons like this. For example, this is epoch 150,
00:31:25 trained on same data set. I also trained with slightly lower and higher learning rate. So this
00:31:30 is our FLUX Dev DreamBooth. This is slightly lower FLUX Krea Dev DreamBooth. This is same as FLUX Dev
00:31:38 learning rate, and this is slightly higher FLUX Dev learning rate on FLUX Krea. So in my opinion,
00:31:45 FLUX Krea is working better at some prompts and some cases. For example, for this prompt,
00:31:52 FLUX Krea is definitely better, more matching to my face, more realistic. For example,
00:31:58 in this case, FLUX Krea Dev is different than FLUX Dev. So it's a taste whichever you like.
00:32:05 I think FLUX Krea Dev looks more realistic, but FLUX Dev has some better, I don't know,
00:32:11 maybe details. It is up to you. For example, in this case, let's look at the results.
00:32:17 So FLUX Dev again looks more colorful, more lively. So it is up to you whichever the one
00:32:24 you like. You can train on both of the models and compare. Our workflows and presets are just right
00:32:30 away working. Just train, analyze these grids yourself on your computer and decide yourself.
00:32:36 Moreover, I also trained a FLUX Krea LoRA. I also shared the grid, exactly same configuration,
00:32:42 workflow, presets is working. You can download the massive grid from here. When you download it,
00:32:48 you will see the grid. By the way, this grid is a little bit edited because I had to generate FLUX
00:32:55 Krea Dev on FLUX Krea Dev base model and FLUX LoRA on FLUX Dev base model. So I compiled this grid.
00:33:02 This is not raw output of the SwarmUI, but this is accurate way of displaying. It shows starting from
00:33:10 epoch 125 up to 200 epochs. Compare yourself. I think FLUX LoRA is better than FLUX Krea Dev LoRA,
00:33:19 but it is up to you. Just train on both of them, whichever version you like more,
00:33:25 just see it. It may not be working on these prompts, but it may work better on your prompts.
00:33:30 So it may be depending on prompts. Definitely FLUX Krea Dev is more realistic when we compare these
00:33:37 two images. FLUX Krea Dev has more realism in itself, but as I said, it depends on your prompt,
00:33:43 your case, your data set. So train and compare and see yourself. It is working right away.
00:33:48 So what about Qwen Image training? Because I have been getting asked of it and what I
00:33:55 believe is that the Qwen Image will surpass the FLUX Dev in every case because its base model is
00:34:03 better than in every case from FLUX Dev, even at the realism, better than FLUX Dev. Its base
00:34:10 resolution is better, its prompt following is better, its prompt composition is better,
00:34:14 everything is better. For training Qwen Image, I am going to use Kohya Musubi Tuner and I am
00:34:20 developing an amazing Gradio application for it with all the features, with all the parameters,
00:34:27 options available. You can see this is the interface. It is not complete yet. I am still
00:34:32 developing it. Then I will find the very best configurations for every GPU. I think as low as
00:34:39 6 or 8 GB GPUs will be able to train Qwen Image model, maybe 10 GB, we will see it. Then you will
00:34:47 be able to train Qwen Image on your computer or on cloud service with just one click as we did for
00:34:54 FLUX models. You see there are so many options. I will test all of them. Don't you worry. The
00:34:59 presets will be just ready to use. You will just load it and use it right away. I am adding all the
00:35:05 features. Moreover, I am adding other features that Kohya is implementing into Musubi Tuner,
00:35:11 like image captioning, which has been arrived recently. So you will be able to batch caption
00:35:17 with Qwen text encoder itself, Qwen VL itself. You will be able to single image caption or
00:35:23 just batch caption. I don't know if the caption will be necessary for Qwen Image. We will see it
00:35:28 after we done the research, but I am adding the features. Moreover, I will implement Wan
00:35:35 2.2 training into this application as well since Kohya is implementing it into Musubi Tuner. This
00:35:41 Musubi Tuner is what is originally made in the original repo of this application, but
00:35:47 I am developing a completely different one right now. I still didn't delete it. However, what I am
00:35:53 developing is Qwen Image LoRA. There will be Wan 2.2 training tab and image captioning so far. So
00:36:00 this will be one-click install, one-click setup, one-click download, everything will be so easy,
00:36:05 so ready with highest possible quality, hopefully. So if you have any questions, always ask me. I
00:36:11 recommend you to join our Discord channel. It is SECourses Discord. When you type it to the Google,
00:36:16 you will get to that. Just join server and message me from there if you want.
00:36:21 We have 11,000 members. We are growing. Currently 1,200 people is online. Moreover,
00:36:28 we have growing Reddit page. You see SECourses. Our member count is growing, our visit count,
00:36:35 everything is growing. By the way, our visit count was over 500k. Currently it is displaying with a
00:36:40 bug. I don't know why. I am posting so many good stuff here, news about AI, technology,
00:36:47 science. You will see so many good stuff here. I recommend you. I'm also sharing news regarding our
00:36:53 developed applications. For example, All-Joy captions have been recently updated. I'm also
00:36:58 posting sometimes research results, the things that I do. For example, Qwen Image inpainting
00:37:04 coming, almost there. I will hopefully also make a video for it. The experiments I am conducting,
00:37:09 a lot of robotics, but AI-related all of stuff I am sharing here if I find them funny. So I
00:37:16 really recommend you to join our Reddit as well. Hopefully, see you in next amazing tutorial video.

Uh oh!

Wan 22 FLUX and Qwen Image Upgraded Ultimate Tutorial for Open Source SOTA Image and Video Gen Models

Wan 2.2, FLUX & Qwen Image Upgraded: Ultimate Tutorial for Open Source SOTA Image & Video Gen Models

Full tutorial link > https://www.youtube.com/watch?v=3BFDcO2Ysu4

Video Transcription

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!