Skip to content

Wan 22 FLUX and Qwen Image Upgraded Ultimate Tutorial for Open Source SOTA Image and Video Gen Models

FurkanGozukara edited this page Oct 16, 2025 · 1 revision

Wan 2.2, FLUX & Qwen Image Upgraded: Ultimate Tutorial for Open Source SOTA Image & Video Gen Models

Wan 2.2, FLUX & Qwen Image Upgraded: Ultimate Tutorial for Open Source SOTA Image & Video Gen Models

image Hits Patreon BuyMeACoffee Furkan Gözükara Medium Codio Furkan Gözükara Medium

YouTube Channel Furkan Gözükara LinkedIn Udemy Twitter Follow Furkan Gözükara

Wan 2.2, Qwen Image, FLUX, FLUX Krea, all these models are the SOTA open-source models and in this master tutorial I will show you how to use these models in the easiest, most performant, and most accurate way. After doing almost one week of research, I have determined the very best presets and prepared this tutorial. With literally one click you will be able to install, download models, set presets, and use these amazing models. Wan 2.2 is currently the king of video generation models and now it is super fast with lightx2v Wan2.2-Lightning LoRAs. Moreover, Qwen Image is now ultra-fast with the recently released 8-step LoRA with almost no quality loss. Furthermore, I have updated FLUX and FLUX Krea presets to improve image generation quality. Finally, I have trained FLUX Krea with our existing DreamBooth and LoRA training workflows, analyzed and shared the results in this tutorial. As additional information, I have shown the upcoming Qwen Image editing/inpainting model and the Qwen Image training application I am developing.

▶️ SwarmUI Installers, Presets and Model Downloader App : 🔗 https://www.patreon.com/posts/114517862

▶️ ComfyUI Backend Installer : 🔗 https://www.patreon.com/posts/105023709

▶️ FLUX / FLUX Krea DreamBooth Training : 🔗 https://www.patreon.com/posts/112099700

▶️ FLUX / FLUX Krea LoRA Training : 🔗 https://www.patreon.com/posts/110879657

▶️ Main SwarmUI Installation Tutorial : 🔗 https://youtu.be/fTzlQ0tjxj0

▶️ RunPod SwarmUI Installation Tutorial : 🔗 https://youtu.be/R02kPf9Y3_w

▶️ Massed Compute SwarmUI Installation Tutorial (starting 00:21:32) : 🔗 https://youtu.be/8cMIwS9qo4M

Video Chapters

00:00:00 Introduction to New State-of-the-Art AI Models

00:00:43 Wan 2.2 vs Wan 2.1 Image-to-Video Comparison

00:01:43 Huge Improvement with New Wan 2.2 Text-to-Video Presets

00:02:44 More Examples of New Wan 2.2 Presets (Text & Image-to-Video)

00:03:08 Using RIFE for Smooth Frame Interpolation (2x FPS)

00:04:30 Image Generation: Wan 2.2 Realism vs FLUX Dev & Krea Dev

00:05:08 Introducing Ultra-Fast Qwen Image 8-Step Preset

00:05:44 Coming Soon: Qwen Image Editing Capabilities Preview

00:06:10 Comparing Qwen Image Presets (High Quality, Fast & Realism)

00:07:14 Behind the Scenes: The Extensive Testing Process for Presets

00:07:42 FLUX Krea Dev Training Experiments (DreamBooth & LoRA)

00:08:21 Updates: Qwen Training App, ComfyUI & SwarmUI Installers

00:08:59 How to Update SwarmUI and ComfyUI Installations

00:10:05 Importing New Presets into SwarmUI

00:10:51 Easiest Way: Using the Automatic Preset Import Script

00:12:22 Using the Model Downloader for Required AI Models

00:13:22 Configuring Downloader for ComfyUI & Forge WebUI

00:15:31 Demo: Generating a Wan 2.2 Image-to-Video (8-Steps)

00:17:02 Using Google Studio AI for High-Quality Prompt Generation

00:18:19 Starting the Generation & Multi-GPU Trick

00:19:46 Advanced Video Options: Frames, FPS, and RIFE Settings

00:21:11 Demo: Generating a Wan 2.2 Text-to-Video (8-Steps)

00:22:27 Live Result: Image-to-Video Generation Finished

00:23:34 Demo: Ultra-Fast Image Generation with Qwen (8-Steps)

00:24:50 Live Result: Text-to-Video Generation Finished (Amazing Quality)

00:25:21 Generation Speed Analysis & Downloading Your Video

00:26:38 Comparing FLUX Krea Dev & Qwen Realism Presets

00:28:46 How to Upscale Images to 2x High Resolution

00:29:47 Summary of New Presets and Recommendations

00:30:40 In-Depth: Training on FLUX Krea Dev (LoRA & DreamBooth)

00:33:48 Coming Soon: One-Click Qwen Image Training Application

00:36:11 Join The Community (Discord & Reddit) & Final Words

Advancements in AI Image and Video Generation in 2025

The year 2025 has marked a pivotal era for AI-driven content creation, with models pushing boundaries in realism, speed, and versatility. From text-to-video (T2V) to image editing, innovations like Mixture-of-Experts (MoE) architectures and enhanced prompt adherence are transforming industries such as film, advertising, and design.

Alibaba's Tongyi Wanxiang (Wan) 2.2 stands out as the first MoE-based video diffusion model, boasting 27 billion parameters (14B active) for cinematic T2V and image-to-video (I2V) at 720p resolution. It excels in motion dynamics, lighting control, and ultra-fast rendering, outperforming predecessors like Wan 2.1 in physics simulation and quality. Open-sourced on July 28, 2025, it's ideal for creators seeking high-fidelity outputs.

Qwen-Image, another Alibaba gem, is a 20B parameter MMDiT foundation model specializing in complex text rendering in English and Chinese, even in intricate scenes. Released in August 2025, it supports precise editing, style preservation, and multilingual prompts, surpassing benchmarks in text incorporation and aesthetics. Its open-source nature makes it a go-to for detailed image generation.

Black Forest Labs' Flux.1 [dev], a 12B parameter flow transformer, shines in text-to-image tasks with exceptional detail and commercial viability.

Some background music by NoCopyrightSounds : https://gist.github.com/FurkanGozukara/681667e5d7051b073f2e795794c46170

Video Transcription

  • 00:00:00 Greetings everyone. Today I am going to introduce  how to use state-of-the-art image generation and  

  • 00:00:07 video generation models with easiest and  most accurate and best performance. I have  

  • 00:00:14 been relentlessly testing new Wan 2.2 LoRAs to  update our presets. Moreover, not only Wan 2.2  

  • 00:00:24 LoRAs, but I also tested FLUX Dev and Qwen  Image as well. And I will show all of them  

  • 00:00:32 to you in this tutorial video so that you  will see the significant differences and  

  • 00:00:38 improvements we have for each preset. For  example, here we are seeing the difference  

  • 00:00:45 between Wan 2.1 and new Wan 2.2 image-to-video  model differences. As you can see that we have a  

  • 00:00:53 significant improvement with image-to-video. And  this is the image used to generate those videos. 

  • 00:01:01 The difference in text-to-video is even more  significant. This is Wan 2.1 text-to-video  

  • 00:01:07 base model and let's see what it generates. So you  see, this was what we were generating with Wan 2.1  

  • 00:01:15 base model. And this is the prompt used. When we  compare it with Wan 2.1 Fusion X text-to-video,  

  • 00:01:21 this is the result. As you can see, the result is  also not good with Fusion X text-to-video. When  

  • 00:01:28 we move to old Wan 2.2 high quality text-to-video  result, this is the result we get. This was our  

  • 00:01:36 best result with Wan 2.2 text-to-video previously.  However, with the updated configuration,  

  • 00:01:43 let's see the results. So this is our Wan 2.2  high quality text-to-video 20 steps. And let's  

  • 00:01:49 see the significant difference. So you see,  there is a huge difference between this older  

  • 00:01:55 version to this newer version. By the way, if  you are feeling that it is too fast, you can  

  • 00:02:00 reduce FPS. I will explain all of that. This is  24 FPS, 121 frames video. Moreover, we have a new  

  • 00:02:09 Wan 2.2 text-to-video 8 steps, and this is just  amazing. You see, with only 8 steps, we are able  

  • 00:02:17 to generate this amazing video just from text. 24  FPS, 121 frames video takes less than 5 minutes  

  • 00:02:28 on RTX 5090. If you generate 16 FPS, 81 frames  video, it will be even faster, under 3 minutes. 

  • 00:02:36 Moreover, here another example with Wan 2.2 high  quality text-to-video 20 steps, and you see this  

  • 00:02:44 is like an animation level quality. With only 20  steps, I updated all the presets and newer presets  

  • 00:02:52 are generating amazing videos. This is 24 FPS  and 121 frames, and you see the quality. And here  

  • 00:03:00 another example of Wan 2.2 image-to-video with  only 8 steps. This is 16 FPS, 81 frames video,  

  • 00:03:08 and I can make RIFE 2x FPS increase and make  it much more fluent. All I need is making this  

  • 00:03:16 frame interpolation enabled with making it  2x. And now it is regenerating it with 2x FPS  

  • 00:03:25 increase. Moreover, I have updated my Windows  installation of SwarmUI, RunPod installation,  

  • 00:03:32 and Massed Compute installation to automatically  install RIFE frame interpolation and also famous  

  • 00:03:40 TeaCache. They will be automatically installed  when you make a fresh installation of Windows  

  • 00:03:45 or RunPod or Massed Compute with the newest zip  file. But always, you can manually install like  

  • 00:03:52 clicking here, install TeaCache, or when you go  to the image-to-video, you will see that there is  

  • 00:03:58 install RIFE, and from there you can install RIFE  frame interpolation. And this is result of RIFE 2x  

  • 00:04:05 FPS increase. This is different seed, therefore  the video is different, but you can see that it  

  • 00:04:11 looks much more smooth this way. Moreover, here  another example of Wan 2.2 text-to-video with 8  

  • 00:04:18 steps, and you see the quality. This is generated  with only 8 steps, therefore it is really fast,  

  • 00:04:25 and you can see that it is pretty good,  pretty decent even though it is only 8 steps. 

  • 00:04:30 Furthermore, now we have Wan 2.2 image  realism preset as well. With this preset,  

  • 00:04:37 you can generate really, really good realistic  images like this. These are raw images. It is  

  • 00:04:42 not very good at stylized images like this.  And when we compare same prompt with FLUX Dev,  

  • 00:04:49 this is the FLUX Dev result for realistic prompt,  and this is FLUX Dev result for stylized prompt.  

  • 00:04:56 And let's compare with FLUX Krea Dev. This is  FLUX Krea Dev. FLUX Krea Dev is also amazing at  

  • 00:05:02 realism, especially with humans. Its stylized  prompt is like this as you are seeing right  

  • 00:05:08 now. It is also pretty decent. And now we have  Qwen Image 8 steps. This is really, really fast,  

  • 00:05:15 like 10 times faster than before. This is Qwen 8  steps fast result for realistic prompt, and this  

  • 00:05:22 is the stylized prompt result of the Qwen. As you  are seeing, Qwen is unchallenged if your aim is  

  • 00:05:30 not realism. Hopefully, I will also make a video  for realism of Qwen soon with training it, but so  

  • 00:05:37 far, these are the results and these are amazing. Qwen Image editing just has been published while  

  • 00:05:44 I was editing this video. It is looking  amazing, extremely promising. Hopefully,  

  • 00:05:50 I will make a full tutorial, one-click install  presets for it very soon. So it is not ready yet,  

  • 00:05:57 but I am showing you what it is capable of,  what demo images they have published. Hopefully,  

  • 00:06:02 it is coming very soon. So stay subscribed  and our tutorial is continuing right now. 

  • 00:06:10 And this is Qwen high quality preset we have. This  is the result for realism prompts, and this is the  

  • 00:06:16 result for stylized prompt. As you can see that  with stylized prompt, the 8 steps really fast  

  • 00:06:24 preset is almost same quality as high quality.  Therefore, we are getting like 10 times speed gain  

  • 00:06:32 with almost no quality loss, as you are seeing,  this to this. This is amazing. We also have Qwen  

  • 00:06:39 realism fast prompt. You see this is definitely  more realistic than the high quality or 8 steps.  

  • 00:06:47 Let me show you, this is 8 steps, this is high  quality, and this is our Qwen realism prompt.  

  • 00:06:52 It really makes it realistic and it's also faster  than high quality of Qwen. And this is the result  

  • 00:06:58 of Qwen realism. It also made stylized prompt  somewhat degree more realistic output compared  

  • 00:07:06 to the high quality or Qwen fast results. To find out all these new presets, I have  

  • 00:07:14 literally done hundreds of generations, analyzed  hundreds of results. For example, let's open this  

  • 00:07:22 one randomly and let's see the result. This is the  grid test that I did, and these are the results of  

  • 00:07:28 the grid test. So I did hundreds of tests like  this. I have been doing this for several days,  

  • 00:07:34 analyzed all of them, and prepared these amazing  presets for you. Furthermore, I did a DreamBooth  

  • 00:07:42 training on FLUX Krea Dev. This was our original  post if you remember. And when you scroll down,  

  • 00:07:48 you will see that I have posted the comparison  results of the FLUX Krea Dev with each epoch  

  • 00:07:55 grid. And in this tutorial, I will also  analyze these and show you. I also did a  

  • 00:08:01 LoRA training on FLUX Krea Dev as well. The  post is also updated and posted full grid as  

  • 00:08:08 well. So I will analyze the grid and compare  results and make my comment on this training. 

  • 00:08:16 Another thing that I have to mention is that  I am working on Qwen Image training right now,  

  • 00:08:21 developing an application. We will talk  about this as well in this video. Moreover,  

  • 00:08:26 I have updated our ComfyUI installer as well.  Now it will automatically install FFMPEG, RIFE,  

  • 00:08:32 and TeaCache on Massed Compute, and it is made  more robust to update all of the extra nodes that  

  • 00:08:40 we automatically install. And finally, our SwarmUI  installer. This is where we will get our presets,  

  • 00:08:46 our installers. This is a big update, and  now we have automatically importing presets  

  • 00:08:53 feature as well. So let's begin the tutorial. So as usual, follow the links in the description  

  • 00:08:59 of the video. Download the SwarmUI model  downloader latest version. Also download  

  • 00:09:04 the ComfyUI installer latest version. Move them  into your installation folder and extract all  

  • 00:09:12 files and overwrite everything. You can use any  extraction method. Everything is extracted. Let's  

  • 00:09:18 sort by name. Then update your SwarmUI. You see,  Windows update SwarmUI, but before doing that,  

  • 00:09:24 I recommend you to update your ComfyUI as  usual. So put the latest zip file into your  

  • 00:09:30 ComfyUI installation, extract and overwrite  all the files, then first run Windows update  

  • 00:09:36 ComfyUI.bat file. We also improved the update  process. Now it is much more robust. Okay,  

  • 00:09:42 update has been completed. Then return back  to SwarmUI. Let's sort by name. Windows update  

  • 00:09:47 SwarmUI.bat file. Okay, run. It will update it  with maximum accuracy and robustness, and it  

  • 00:09:54 will start the SwarmUI as usual. So if you don't  know how to install ComfyUI and SwarmUI and setup,  

  • 00:10:00 this is the tutorial that you need to watch, but  if you already have them installed, you are ready. 

  • 00:10:05 And the latest version of SwarmUI will start like  this. You see, these are my existing presets. Let  

  • 00:10:11 me demonstrate you something. I will just make  this like this, edit. And how you are going  

  • 00:10:15 to update the new presets that we have? Either  you can use import preset feature, choose file,  

  • 00:10:22 go back to installation folder, amazing SwarmUI  presets, overwrite. But if you have leftovers  

  • 00:10:28 that we have changed names or some other stuff,  they will be staying here. But this is the way  

  • 00:10:34 of keeping your existing presets. If you want  a clean import, you need to delete every one  

  • 00:10:40 of them like this and then import. I recommended  the author of SwarmUI to add must delete option,  

  • 00:10:47 but it is not available yet. So I have  developed a solution myself. You see,  

  • 00:10:51 Windows preset delete import.bat file. This file  will ask you import or not. When you click yes,  

  • 00:11:00 it will automatically delete your existing  presets, then it will import everything. However,  

  • 00:11:05 for this to work, you need to have SwarmUI  running in 7861 port, which is the default port.  

  • 00:11:12 Currently, my SwarmUI is running in different port  because I started it with Windows update file,  

  • 00:11:18 so I will close it. Let's close this as well and  let's use the Windows start SwarmUI.bat file. This  

  • 00:11:24 will start it with the accurate port. And you  see it is started. I will fix this issue when  

  • 00:11:28 you are watching. I will make both of them same  port, but this is a reminding thing. Then I will  

  • 00:11:34 double-click Windows preset delete and import.bat  file, click yes. And you see it deleted all of my  

  • 00:11:40 existing presets and imported new ones. Moreover,  it will back up your existing presets into presets  

  • 00:11:47 backup folder that it generates automatically.  And you see this is my deleted presets. They were  

  • 00:11:53 saved here. Now our presets are ready. Don't  forget to click refresh icon because it may  

  • 00:12:00 still display older presets. Also, let's sort  by name, and now all of our presets are ready. 

  • 00:12:06 So how you are going to use these presets? These  presets will automatically select model files as  

  • 00:12:12 well. To be able to use them, you need to use  Windows start download models up.bat file or  

  • 00:12:18 change the model names, paths yourself. This  application is being developed. I am adding  

  • 00:12:25 new models, adding new bundles, improving its  features. Currently, it supports so many models  

  • 00:12:30 that you can download. I recommend you to use  SwarmUI bundles. We have Qwen Image core bundle.  

  • 00:12:35 It shows all the models, their sizes. We have Wan  2.2 core 8 steps bundle. It shows all the models  

  • 00:12:41 it is going to download, sizes. For example, with  new Wan 2.2 presets, we are using these four new  

  • 00:12:48 LoRAs. Moreover, we have Wan 2.1 core bundle. You  see all the models. We have FLUX models bundle.  

  • 00:12:54 So you can download all these bundles, then you  will be ready to use all of them. So I recommend  

  • 00:12:59 to download them. For today's tutorial, you need  to download Qwen Image core bundle, click it.  

  • 00:13:04 You need to download Wan 2.2 8 steps core bundle.  And if you want to also test Wan 2.1, you need to  

  • 00:13:11 download this one. And if you want to also use  FLUX, you need to download this one. These are  

  • 00:13:16 the core bundles. These are state-of-the-art  image and video generation models. So just  

  • 00:13:22 download them and you will be ready. Moreover, if you are a ComfyUI user or  

  • 00:13:26 if you are fond of Focus, we support both of them.  For ComfyUI, select this ComfyUI folder structure,  

  • 00:13:33 go back to your ComfyUI installation and go inside  models like this, copy this path and give its path  

  • 00:13:41 like this. So now it will download into accurate  ComfyUI folders such as LoRAs is changing with  

  • 00:13:47 ComfyUI compared to SwarmUI. Moreover, if you  are Forge WebUI user, just check this out and  

  • 00:13:54 then give its path here, then it will download  into there. Again, it is same for Forge WebUI,  

  • 00:13:59 you need to give this path. For example, we also  have installer for Forge WebUI. It is updated,  

  • 00:14:05 it has more features. It is fully supporting  RTX 5000 series. And if you want to make it  

  • 00:14:12 use lowercase folder names, just check this and  when you click remember settings, it will save  

  • 00:14:18 them and when you start next time, it will use  them. Moreover, you can manually download models  

  • 00:14:24 one by one. For example, image generation models,  Qwen Image models. You see we have Qwen GGUF Q4,  

  • 00:14:30 GGUF Q5. So if you are low on VRAM, you can use  them. However, with SwarmUI and ComfyUI, it will  

  • 00:14:36 automatically do block swapping, so you don't even  need that. FLUX models, we have all of them here.  

  • 00:14:41 We have FLUX GGUF models, you see, all of them. We  have High Dream models. However, I don't recommend  

  • 00:14:47 High Dream anymore. Qwen is coming so strong. It  is the best model if you ask my opinion. Stable  

  • 00:14:52 Diffusion 1.5 models, Stable Diffusion XL models.  So we support so many models. We support image  

  • 00:14:57 upscaling models, YOLO masking models. We have  text encoders, UMT5 text encoders, CLIP models.  

  • 00:15:05 We have video generation models, huge, Wan 2.1  official models, Wan 2.1 Fusion X models, Wan 2.1  

  • 00:15:13 LoRAs like this, Wan 2.2 official models. You  see, we have also GGUF models, Wan 2.2 LoRAs.  

  • 00:15:20 So check this application out, and if you need any  other models, you can message me from this video  

  • 00:15:25 or from Patreon, and hopefully I will add them. Once you downloaded all the models and imported  

  • 00:15:31 new presets, how you are going to use them? This  is important. For example, Wan 2.2 image-to-video  

  • 00:15:37 8 steps. Let's do a demo with it. So first of all,  click Quick Tools and Reset params to default.  

  • 00:15:44 This is mandatory. Do this at every step and  you will not have any issues. Then click this  

  • 00:15:50 hamburger menu and direct apply. Do not select it.  Make direct apply. This is working better. You see  

  • 00:15:56 it did set every parameter, including the models  and everything. Then click Init Image because  

  • 00:16:03 this is an image-to-video model. Choose file. For  example, let's use this Pikachu for animation. So  

  • 00:16:10 I will select it. This is my image resolution,  and this is the base resolution of the model.  

  • 00:16:15 You can always change the base resolution and  it will automatically calculate new resolution  

  • 00:16:20 based on that. How you can change? From models,  you see this model is automatically selected,  

  • 00:16:24 click here, edit metadata. Make sure that its  architecture is accurate and this is the base  

  • 00:16:30 resolution of the model that you can change  for automatic calculation, then save. Then  

  • 00:16:34 click this res and use closest aspect ratio. I  recommend this. You can also use exact aspect  

  • 00:16:40 ratio for not cropping it, but closest aspect  ratio is better. Then all I need to do is write  

  • 00:16:46 my prompt here. Do not change this. This is how it  uses both of the lightning LoRAs because Wan 2.2  

  • 00:16:54 works with base and refiner model. This is how we  are setting up two LoRAs for each of the models. 

  • 00:17:02 For writing a prompt to this image, I will use  our prompt generation file. So type Google, Google  

  • 00:17:09 Studio AI. This is for free. Enter inside there.  Click this plus icon after you log in with your  

  • 00:17:15 account. It is for free. This is amazing. Upload  file. Go back to your SwarmUI model installation  

  • 00:17:21 and you will see that, let's sort by name, and  you will see that we have video models prompt  

  • 00:17:26 generate guidance. This guidance can be used for  Qwen Image generation as well. It is amazing. I  

  • 00:17:32 will type this. Write me a prompt for uploaded  image with a very intense action scene. You can  

  • 00:17:40 type anything. Then I will click this upload icon  and I will select my image. So upload both of  

  • 00:17:47 the files for Google Studio AI to process. I will  make temperature like 50%. I like it. Set thinking  

  • 00:17:54 budget to maximum. Make sure that grounding with  Google Search is off. So these are the parameters  

  • 00:17:59 and run. This model is just amazing. This is for  free. Google is still providing Gemini Pro with  

  • 00:18:06 maximum context size, with maximum features in  Google Studio AI. So leverage it for yourself  

  • 00:18:13 until it becomes paid. Okay, so it will do  the thinking and write a prompt for us by  

  • 00:18:19 using our amazing video models prompt generation  guidance. Okay, we are getting the prompt here. So  

  • 00:18:25 let's copy this and paste it here. If we read the  prompt, it is cinematic, high contrast lighting,  

  • 00:18:31 low angle shot with a dynamic handheld camera  feel, an intense action scene unfolds in a misty,  

  • 00:18:36 primal forest where a hyper-realistic, furry  Pikachu is suddenly ambushed on a slick,  

  • 00:18:42 mossy rock by a churning stream, and it goes on.  Just pause the video and read it. Then generate. 

  • 00:18:47 So currently, I am running this on RunPod because  I am recording this video in my laptop right now.  

  • 00:18:54 So I want it to be fast and I will show you a  trick. This is a usual setup that I have shown  

  • 00:18:59 you numerous times. I have installed the ComfyUI  and I am using Sage Attention. This is exactly  

  • 00:19:04 same in my local computer as well. If you look at  my local computer, this is its backend. Currently,  

  • 00:19:10 since I closed its terminal, it shows that it  is failed to send. You see, this is my ComfyUI  

  • 00:19:14 installation. I am still using Sage Attention. By  the way, currently Sage Attention is working with  

  • 00:19:19 Qwen Image as well when you update your ComfyUI  and SwarmUI to the latest version. So use it.  

  • 00:19:25 And the generation will start soon. It is first  loading the models. Okay, generation started.  

  • 00:19:30 Yeah, the trick that I was going to show you is  that you see there is OverQueue. When I make this  

  • 00:19:34 zero, as soon as I hit generate, it will start the  generation on next available GPU. So currently I  

  • 00:19:41 have four GPUs, so I can use all of them at the  same time. And the generation started. By the way,  

  • 00:19:46 what else parameters that you can set other  than the prompt? Let's click this way,  

  • 00:19:50 advanced options, and in here you will see that  in the image-to-video, video frames. This is super  

  • 00:19:56 important. This determines the length of your  video. Currently, since my video FPS is 24, which  

  • 00:20:04 is set is here, you see video FPS, this is what  the image-to-video models uses. This will be 73  

  • 00:20:10 minus 1, the first frame, divided by 24. So this  will be 3 seconds video. If you want it longer,  

  • 00:20:17 you can set this as 81 frames and 16 FPS, then it  will be 5 seconds video, or you can even make it  

  • 00:20:25 121 frames and 24 FPS, it will be 5 seconds video.  So it is up to you. Test with your case and see  

  • 00:20:33 which one is working better. If you make it 16  FPS, then I recommend you to also enable video  

  • 00:20:39 frame interpolation. When you set this as two,  it will make it double FPS with almost realistic  

  • 00:20:48 quality. This is working great. By the way, I  also have implemented automatic installation  

  • 00:20:53 of this video frame interpolation RIFE and also  TeaCache into the installers. So when you next  

  • 00:21:00 time install, it will be automatically installed  for both Windows and RunPod and Massed Compute. 

  • 00:21:05 So meanwhile this is getting generated, let's also  generate a text-to-video. So quick tools, reset  

  • 00:21:11 params to default. Let's go back to preset, and  I'm going to use Wan 2.2 text-to-video 8 steps.  

  • 00:21:17 So click here, direct apply. Everything is set,  you see. Just type my prompt here, and I need to  

  • 00:21:24 change the resolution whichever I want. Let's make  this 16:9, so this will be the resolution. How  

  • 00:21:30 many frames I want? Let's change the frame count  for this. The frame count of text-to-video is set  

  • 00:21:36 here. So text-to-video and image-to-video uses  different panels on SwarmUI. Then let's make this  

  • 00:21:41 81 frames. I will make this 16 FPS and I will make  it double FPS increase with RIFE and hit generate.  

  • 00:21:50 Then this will generate the video. By the way,  if you want to see how it works in ComfyUI, go  

  • 00:21:56 to ComfyUI workflow and import from generate tab.  It will import it. Let's just wait for it to load.  

  • 00:22:04 Okay, it is loaded. Import from generate tab,  and this is the workflow that it uses. From here,  

  • 00:22:09 you can also verify how it is working and you will  see that it is using both of the LoRAs accurately,  

  • 00:22:15 both of the base models accurately. I did  spend huge time to prepare these easy-to-use,  

  • 00:22:22 very easy-to-use presets. And this is the  generated video, live generated video from  

  • 00:22:27 image. You see with only 8 steps, and this is the  quality of the generation. This is really good.  

  • 00:22:33 This is 3 seconds video. Let's see how much time  it took. The generation took 3 minutes. You see,  

  • 00:22:39 only 3 minutes to generate this  amazing quality video with Wan 2.2. 

  • 00:22:45 Let's also generate an image since we have updated  Qwen Image with fast preset. I could use this  

  • 00:22:51 prompt, but let's give another command here. Make  this prompt to generate a static image, not video.  

  • 00:23:00 Then hit run and let's see what we will get.  So this way, you can both get video generation  

  • 00:23:06 prompts or you can get image generation prompts.  Okay, we have a prompt here. Let's see. So let's  

  • 00:23:12 use both of the prompts to generate. First, let's  use this prompt. So I will click quick tools,  

  • 00:23:16 reset params to default. Then what I need to do  is select my preset. So let's sort this by name  

  • 00:23:22 to not get confused because it is by default is  different. Then I will select Qwen Image 8 steps  

  • 00:23:28 ultra fast. So direct apply, write my prompt, and  generate. This will really fast generate an image  

  • 00:23:34 with Qwen Image and the quality is just amazing.  You will see in a moment. Currently it is loading  

  • 00:23:39 the model on next available GPU. You can always  see it from server logs, debug, and let's see  

  • 00:23:45 what is happening. So you see it shows that got  prompt on the ComfyUI 2. This means it is on the  

  • 00:23:50 third GPU right now. It is loading the model.  Everything is automatically downloaded with my  

  • 00:23:55 automatic downloader. Everything is automatically  set. So I am making this extremely easy to use and  

  • 00:24:01 way cheaper to use than online services. And  everything is same in your local computer as  

  • 00:24:07 well. So just use it with your local computer if  you have a decent GPU, or if you want to scale it,  

  • 00:24:12 use it on a cloud service like Massed Compute,  which I recommend, or like RunPod. It is either  

  • 00:24:17 way fine. So this is the generation of the video.  The Qwen Image model is still being loaded. RunPod  

  • 00:24:24 is really slow when loading the models. On my  computer, this is almost lightning fast. Second  

  • 00:24:29 video is almost done. Once this Qwen Image model  loaded, we will be able to generate way faster.  

  • 00:24:34 Okay, you see the generation started. It is really  fast. This is real-time generation. It is really  

  • 00:24:39 fast. And generation almost finished. We will  see in a moment. And you see now it is working  

  • 00:24:44 with Sage Attention as well. Okay, the image has  been generated. Let's see it. My internet is slow,  

  • 00:24:50 unfortunately. Yes, you see, this is an amazing  composition as you are seeing right now. So let's  

  • 00:24:57 see the other prompt that it generated. I will  just... Oh, by the way, this is the text-to-video  

  • 00:25:03 result. Let's also see that. Yes. This has been  generated with 8 steps from text-to-video. You  

  • 00:25:09 see the quality? This is just amazing, amazing  quality with an amazing prompt. Wan 2.2 is  

  • 00:25:16 just amazing. This is mind-blowing quality. And how much time did this take? We generated  

  • 00:25:21 this live when we are recording the video. This  took only 3.86 minutes, under 4 minutes, as you  

  • 00:25:28 are seeing. And this is 81 frames, 5 seconds video  with RIFE interpolation 2x. So this is actually  

  • 00:25:35 32 FPS right now. So I can just download this.  How? Click more, download, and it will download  

  • 00:25:42 it into your computer. When I open it, I can  view it in my computer. When I see properties,  

  • 00:25:48 I can see FPS. By the way, it is not exactly 5  seconds because we are trimming first four frames,  

  • 00:25:54 which usually causes some color differences. So  maybe you noticed it with other generations. This  

  • 00:26:01 is why. Okay, let's return back to generation of  the static image. Let's generate because currently  

  • 00:26:06 setup is selected for Qwen Image. And the next  generation will be much faster. So the first  

  • 00:26:11 generation was 86 seconds. Let's see the second  generation. Okay, it is getting generated with 8  

  • 00:26:17 steps with Qwen Image. And it is done. Yes. Yes,  you see the quality? This took only 20 seconds.  

  • 00:26:25 You see Qwen Image only taking 20 seconds with 8  steps. Making these presets really took huge time  

  • 00:26:32 of me. So I really recommend you to use them. Let's also generate with FLUX Krea Dev. I also  

  • 00:26:38 updated that. So reset params to default.  Let's direct apply FLUX Krea Dev here and  

  • 00:26:45 generate. Now it will generate with FLUX Krea Dev.  Meanwhile it is generating with FLUX Krea Dev,  

  • 00:26:50 let's also generate with Wan 2.2 image realism.  This is not a very realistic prompt, but let's  

  • 00:26:56 see. Reset params to default, direct apply. Just  type the prompt here. Do not change whatever it  

  • 00:27:02 writes here. This is important to work accurately.  Okay, generate. So whenever you change a model,  

  • 00:27:08 a preset, always quick tools, reset params to  default to not make any mistakes. Then direct  

  • 00:27:14 apply. Okay, this is I think FLUX Krea... Oh,  this is probably Qwen Image realism because the  

  • 00:27:20 Qwen models were already loaded on my GPUs. Yes,  this is Qwen Image realism. This is not a very  

  • 00:27:27 realistic prompt. I will also make a realistic  prompt. Oh, really good. You see? Still really,  

  • 00:27:32 really high quality even though this is a not a  real realistic prompt. This is not a prompt of a  

  • 00:27:37 man. Let's make the prompt for a realistic one.  Photo of a an handsome man wearing an expensive  

  • 00:27:45 suit in an amazing garden. Let's generate. And  this is FLUX Krea Dev. You see FLUX Krea is also  

  • 00:27:52 extremely optimized for realistic images. This is  really, really good, really, really decent image.  

  • 00:27:58 Our presets are also using the very best available  samplers and schedulers, all optimized for highest  

  • 00:28:05 quality with minimal loss of speed. Qwen Image  realism generating this realistic prompt,  

  • 00:28:12 rather realistic. And here, this is raw generation  of Qwen Image realism. Probably I need to work on  

  • 00:28:18 the prompt. This is a very primitive prompt.  Let's also try this prompt on FLUX Krea Dev.  

  • 00:28:23 So I will just direct apply and just type it.  Since it is loaded on the GPU, it will right away  

  • 00:28:29 use it. You see with the SwarmUI, it is handling  everything automatically for me and really fast.  

  • 00:28:35 I am using four GPUs at the same time. If you  have multiple GPUs on your computer, whether it  

  • 00:28:40 is on cloud service or whether it is on your local  computer, it will work. And this is FLUX Krea Dev. 

  • 00:28:46 Let's also do 2x upscale. So I will just direct  apply. This is selecting FLUX Dev by default,  

  • 00:28:53 but I am going to just change the model from  here. So if your model names are different,  

  • 00:28:58 you need to also change them. Let's generate.  This time, we will both generate and 2x upscale  

  • 00:29:05 with our upscaling workflow and let's see the  results. The upscaling will be slow because  

  • 00:29:10 the resolution is now will be four times  higher compared to what we were generating,  

  • 00:29:15 so it will be taking four times duration. We can  always see server logs. By the way, it won't be  

  • 00:29:21 exactly four times. Why? Because we are doing  lesser steps this time. Okay, it will be still  

  • 00:29:26 pretty fast. Okay, upscaling started. So this  is how you use the presets. This is four times  

  • 00:29:32 upscaled. Let's open in a new tab. The resolution  is 2048 to 2048. Let me make it default. Okay,  

  • 00:29:41 so this is upscaled FLUX Krea Dev image. So this is how you use presets. I recommend you to  

  • 00:29:47 try all of them. If you want the highest quality,  then we have Wan 2.2 high quality 20 steps. As you  

  • 00:29:54 do more steps, it becomes better, but quality of  this also amazing. And we also have Wan 2.2 high  

  • 00:30:01 quality 20 steps. This is also super quality. This  takes long. I am also keeping older presets from  

  • 00:30:07 now on, which is beginning with Z letter, Z old  Wan 2.2 high quality, Z old Wan 2.2 8 steps. So  

  • 00:30:15 you don't need to use them, but I'm just keeping  them if you want to compare later. Moreover,  

  • 00:30:19 the Qwen 8 steps ultra fast is matching with Qwen  Image high quality and it is almost like six times  

  • 00:30:28 to 10 times faster than the high quality that we  had previously. So use this Qwen Image 8 steps  

  • 00:30:34 and it is amazing as I just shown you. For FLUX  context, follow the tutorial for FLUX context. 

  • 00:30:40 Okay, what about FLUX Krea Dev training? Because  I have been getting asked of it. FLUX Krea Dev  

  • 00:30:47 training is right away working with our FLUX  Dev training. Just use the latest zip file,  

  • 00:30:53 download models, it will automatically download  FLUX Krea Dev as well. And I have made training,  

  • 00:30:59 but not only made training, I also posted  comparisons and my opinions. FLUX Krea Dev is  

  • 00:31:05 requiring either higher learning rate slightly or  more epochs. My followers also verified that. So  

  • 00:31:12 you can download these epoch grid comparisons from  here. The links are in the post. And when you open  

  • 00:31:18 them, you will see full quality grid comparisons  like this. For example, this is epoch 150,  

  • 00:31:25 trained on same data set. I also trained with  slightly lower and higher learning rate. So this  

  • 00:31:30 is our FLUX Dev DreamBooth. This is slightly lower  FLUX Krea Dev DreamBooth. This is same as FLUX Dev  

  • 00:31:38 learning rate, and this is slightly higher FLUX  Dev learning rate on FLUX Krea. So in my opinion,  

  • 00:31:45 FLUX Krea is working better at some prompts  and some cases. For example, for this prompt,  

  • 00:31:52 FLUX Krea is definitely better, more matching  to my face, more realistic. For example,  

  • 00:31:58 in this case, FLUX Krea Dev is different than  FLUX Dev. So it's a taste whichever you like.  

  • 00:32:05 I think FLUX Krea Dev looks more realistic,  but FLUX Dev has some better, I don't know,  

  • 00:32:11 maybe details. It is up to you. For example,  in this case, let's look at the results.  

  • 00:32:17 So FLUX Dev again looks more colorful, more  lively. So it is up to you whichever the one  

  • 00:32:24 you like. You can train on both of the models and  compare. Our workflows and presets are just right  

  • 00:32:30 away working. Just train, analyze these grids  yourself on your computer and decide yourself. 

  • 00:32:36 Moreover, I also trained a FLUX Krea LoRA. I  also shared the grid, exactly same configuration,  

  • 00:32:42 workflow, presets is working. You can download  the massive grid from here. When you download it,  

  • 00:32:48 you will see the grid. By the way, this grid is a  little bit edited because I had to generate FLUX  

  • 00:32:55 Krea Dev on FLUX Krea Dev base model and FLUX LoRA  on FLUX Dev base model. So I compiled this grid.  

  • 00:33:02 This is not raw output of the SwarmUI, but this is  accurate way of displaying. It shows starting from  

  • 00:33:10 epoch 125 up to 200 epochs. Compare yourself. I  think FLUX LoRA is better than FLUX Krea Dev LoRA,  

  • 00:33:19 but it is up to you. Just train on both  of them, whichever version you like more,  

  • 00:33:25 just see it. It may not be working on these  prompts, but it may work better on your prompts.  

  • 00:33:30 So it may be depending on prompts. Definitely FLUX  Krea Dev is more realistic when we compare these  

  • 00:33:37 two images. FLUX Krea Dev has more realism in  itself, but as I said, it depends on your prompt,  

  • 00:33:43 your case, your data set. So train and compare  and see yourself. It is working right away. 

  • 00:33:48 So what about Qwen Image training? Because  I have been getting asked of it and what I  

  • 00:33:55 believe is that the Qwen Image will surpass the  FLUX Dev in every case because its base model is  

  • 00:34:03 better than in every case from FLUX Dev, even  at the realism, better than FLUX Dev. Its base  

  • 00:34:10 resolution is better, its prompt following  is better, its prompt composition is better,  

  • 00:34:14 everything is better. For training Qwen Image,  I am going to use Kohya Musubi Tuner and I am  

  • 00:34:20 developing an amazing Gradio application for it  with all the features, with all the parameters,  

  • 00:34:27 options available. You can see this is the  interface. It is not complete yet. I am still  

  • 00:34:32 developing it. Then I will find the very best  configurations for every GPU. I think as low as  

  • 00:34:39 6 or 8 GB GPUs will be able to train Qwen Image  model, maybe 10 GB, we will see it. Then you will  

  • 00:34:47 be able to train Qwen Image on your computer or  on cloud service with just one click as we did for  

  • 00:34:54 FLUX models. You see there are so many options.  I will test all of them. Don't you worry. The  

  • 00:34:59 presets will be just ready to use. You will just  load it and use it right away. I am adding all the  

  • 00:35:05 features. Moreover, I am adding other features  that Kohya is implementing into Musubi Tuner,  

  • 00:35:11 like image captioning, which has been arrived  recently. So you will be able to batch caption  

  • 00:35:17 with Qwen text encoder itself, Qwen VL itself.  You will be able to single image caption or  

  • 00:35:23 just batch caption. I don't know if the caption  will be necessary for Qwen Image. We will see it  

  • 00:35:28 after we done the research, but I am adding  the features. Moreover, I will implement Wan  

  • 00:35:35 2.2 training into this application as well since  Kohya is implementing it into Musubi Tuner. This  

  • 00:35:41 Musubi Tuner is what is originally made in  the original repo of this application, but  

  • 00:35:47 I am developing a completely different one right  now. I still didn't delete it. However, what I am  

  • 00:35:53 developing is Qwen Image LoRA. There will be Wan  2.2 training tab and image captioning so far. So  

  • 00:36:00 this will be one-click install, one-click setup,  one-click download, everything will be so easy,  

  • 00:36:05 so ready with highest possible quality, hopefully. So if you have any questions, always ask me. I  

  • 00:36:11 recommend you to join our Discord channel. It is  SECourses Discord. When you type it to the Google,  

  • 00:36:16 you will get to that. Just join server  and message me from there if you want.  

  • 00:36:21 We have 11,000 members. We are growing.  Currently 1,200 people is online. Moreover,  

  • 00:36:28 we have growing Reddit page. You see SECourses.  Our member count is growing, our visit count,  

  • 00:36:35 everything is growing. By the way, our visit count  was over 500k. Currently it is displaying with a  

  • 00:36:40 bug. I don't know why. I am posting so many  good stuff here, news about AI, technology,  

  • 00:36:47 science. You will see so many good stuff here. I  recommend you. I'm also sharing news regarding our  

  • 00:36:53 developed applications. For example, All-Joy  captions have been recently updated. I'm also  

  • 00:36:58 posting sometimes research results, the things  that I do. For example, Qwen Image inpainting  

  • 00:37:04 coming, almost there. I will hopefully also make  a video for it. The experiments I am conducting,  

  • 00:37:09 a lot of robotics, but AI-related all of stuff  I am sharing here if I find them funny. So I  

  • 00:37:16 really recommend you to join our Reddit as well.  Hopefully, see you in next amazing tutorial video.

Clone this wiki locally