-
-
Notifications
You must be signed in to change notification settings - Fork 363
How To Install And Use Kohya LoRA GUI Web UI on RunPod IO With Stable Diffusion and Automatic1111
Full tutorial link > https://www.youtube.com/watch?v=3uzCNrQao3o
How to install famous Kohya SS LoRA GUI on RunPod IO pods and do training on cloud seamlessly as in your PC. Then use Automatic1111 Web UI to generate images with your trained LoRA files. Everything is explained step by step and amazing resource GitHub file is provided with necessary commands. If you want to use Kohya's Stable Diffusion trainers on RunPod this tutorial is for that.
Source GitHub File
Auto Installer Script
https://www.patreon.com/posts/84898806
Sign up RunPod
Our Discord server
https://bit.ly/SECoursesDiscord
If I have been of assistance to you and you would like to show your support for my work, please consider becoming a patron on 🥰
https://www.patreon.com/SECourses
Technology & Science: News, Tips, Tutorials, Tricks, Best Applications, Guides, Reviews
https://www.youtube.com/playlist?list=PL_pbwdIyffsnkay6X91BWb9rrfLATUMr3
Playlist of StableDiffusion Tutorials, Automatic1111 and Google Colab Guides, DreamBooth, Textual Inversion / Embedding, LoRA, AI Upscaling, Pix2Pix, Img2Img
https://www.youtube.com/playlist?list=PL_pbwdIyffsmclLl0O144nQRnezKlNdx3
00:00:00 Introduction to how to install Kohya GUI on RunPod tutorial
00:00:20 Pick which RunPod server and template
00:01:20 Starting installation of Kohya LoRA on RunPod
00:03:42 How to start Kohya Web GUI after installation
00:04:16 How to download models on RunPod and start Kohya LoRA training
00:05:36 LoRA training parameters
00:06:57 Starting Kohya LoRA training on RunPod
00:07:46 Where are Kohya LoRA training checkpoints saved
00:08:05 How to use LoRA saved checkpoints on RunPod
00:08:29 How to use LoRA checkpoints in Automatic1111 Web UI
00:09:12 Noticing a very crucial mistake during training
00:10:59 Testing different checkpoints after fixing the previous training mistake
00:11:36 How to understand model overtraining
00:12:28 How to fix overtraining problem
Title: Install Kohya GUI on RunPod for LoRA Training: Step-by-Step Tutorial
Description:
Welcome to my comprehensive guide on how to install Kohya GUI on RunPod for LoRA training. I take you through each step, explaining clearly to ensure you can follow along with ease. This tutorial will help you set up a powerful development environment using an RTX 3090 GPU and a RunPod with 30GB RAM.
In this video, we will:
Deploy a community cloud with a specific template.
Edit template overrides and set the container disk.
Connect to JupyterLab and clone a GitHub repository.
Generate a new virtual environment and activate it.
Install Kohya on RunPod and handle common errors.
Set up and start the Kohya web UI on RunPod.
Execute a quick demonstration of training a realistic vision model.
Troubleshoot common errors during the training process.
Optimize the training process and improve training quality.
Navigate through our GitHub repository for further learning.
Remember, if you're unfamiliar with how to use Kohya or RunPod, I've included links to excellent tutorials in the video description.
Whether you're just getting started with Kohya, RunPod, or LoRA training, or looking to enhance your existing skills, this tutorial offers valuable insights.
Don't forget to like, share, and subscribe for more tutorials like this!
#StableDiffusion #Kohya #RunPod #LoRATraining #Tutorial #MachineLearning #AI
-
00:00:00 Greetings everyone. In this video I will show you how to install Kohya GUI on RunPod to do
-
00:00:07 LoRA training. This has been asked to me many times. Sorry for the delay. Hopefully I will
-
00:00:12 explain that today. So this is the beginning screen of the RunPod IO. Let's go to the
-
00:00:18 community cloud. I will use RTX 3090 which is a very powerful GPU. Also this RunPod has 30GB
-
00:00:26 RAM. Click deploy. Select a template here. Select web automatic template. This is really important.
-
00:00:32 Currently it is 6.0.1. When you are watching this tutorial. It may be higher. Then go to
-
00:00:39 the edit template overrides. Make the container disk 10GB. You can set the volume disk as much as
-
00:00:44 you want. Set overrides and click continue. Click deploy. Okay container started. Let's
-
00:00:51 connect with JupyterLab. Now I am using a GitHub repository to put descriptions and used commands
-
00:00:58 in my tutorials. The link of this file will be in the description. Every command is written here. If
-
00:01:04 you don't know how to use Kohya I have excellent tutorial here. You can click this link and watch
-
00:01:09 it. Also if you don't know how to use RunPod I have another excellent tutorial here. You can
-
00:01:14 click and watch it. So the commands are ready here. Kohya LoRA GUI on RunPod. First thing is,
-
00:01:20 we will clone the repository with this command. Select it copy. Then in the JupyterLab terminal
-
00:01:26 in the workspace. Let's clone the repository like this. The repository is cloned. Then this command.
-
00:01:33 Now we are inside Kohya SS. Then we will generate a new virtual environment in this folder with
-
00:01:40 this command. The virtual environment is generated inside Kohya SS folder here. Then we will execute
-
00:01:46 this command. Now the new virtual environment is activated. Now we will run this command. This
-
00:01:53 won't affect our Stable Diffusion installation on our RunPod. So this is a very convenient way
-
00:01:59 to install Kohya on RunPod. Okay in the first try, we have got an error because obviously the
-
00:02:07 download of the file failed. So what I am going to do is I will repeat the operation. So I will
-
00:02:13 rerun the command to be sure. To rerun the command I just did like this while the virtual environment
-
00:02:19 is activated. Okay this time we didn't get the previous error. However we have got the tkinter
-
00:02:26 error. This is the most common error that you were encountering. I have a solution for that. While
-
00:02:32 virtual environment is activated you don't need to run this again. However if you start a new CMD
-
00:02:38 you need to do. Then just copy this command while virtual environment is activated paste
-
00:02:43 it like this. Then copy this command. This will install this tkinter. It is installed. And finally
-
00:02:51 we will install latest torch. Copy while virtual environment is activated. Install. This is really
-
00:02:58 important. You need to have Kohya SS virtual environment to be activated while executing all
-
00:03:06 of these commands. So if your virtual environment is activated, you don't need to run this once
-
00:03:11 again. The torch installation is pretty fast. It is installing 2.0.1 version which is the latest
-
00:03:18 official version. This is being installed in our Kohya virtual environment. This won't affect our
-
00:03:24 Stable Diffusion installation. By default Kohya is installing 1.12 which works pretty slow on
-
00:03:31 the newest GPUs. Okay the installation has been completed. You can ignore this message because
-
00:03:37 we won't use xformers while training. It is just slowing us down. Then we will start the Kohya web
-
00:03:43 UI. Copy this. For starting this, you don't need to have virtual environment activated. Actually
-
00:03:48 it is preferably not to activate it. Open a new terminal inside Kohya SS like this. Just copy
-
00:03:55 paste it. It will automatically activate the virtual environment and also it will give you
-
00:04:00 a Gradio link like this. Open it and the Kohya GUI web UI started on RunPod and ready to use.
-
00:04:07 As I said if you don't know how to use Kohya to do training, you can watch this amazing tutorial. I
-
00:04:13 will do quick demonstration of training. So I will use realistic vision full model. To
-
00:04:18 download it just copy this. Run it inside Stable Diffusion models folder so we can use it with our
-
00:04:24 Automatic1111 web UI. I will also download the best VAE file from this link. It will get into
-
00:04:30 VAE folder. You can also download realistic vision version 2 classification images from
-
00:04:36 this post. Posted on our Patreon. So I will use these training images same as in the last video.
-
00:04:43 I have uploaded them to here. Also classification images are ready as well. So in the Kohya web UI
-
00:04:51 obviously these icons won't work because we are on run pod. Therefore we need to copy paste the model
-
00:04:58 path ourselves or you can use the automatic models from this drop down. So I will get the path of the
-
00:05:05 model from Stable Diffusion realistic vision. Copy path pasted here. Put a backslash to the beginning
-
00:05:12 of it and our model is ready. Then as shown in the previous video ohwx man. I will set the training
-
00:05:20 images directory manually which is here copy path. Let's also set the regularization images. Copy
-
00:05:27 path like this and the destination directory will be: test1. Prepare training data. Okay,
-
00:05:33 test one appeared here. Copy info to folders tab. Okay, everything is copied and in the
-
00:05:39 training parameters. I will use everything. Default only network rank 256 these are the
-
00:05:46 best settings that I have found. In the advanced tab. Now this is important. Don't use xformers,
-
00:05:52 uncheck it. And finally, let's also save our configuration. So for saving, open a notepad
-
00:06:01 file type workspace/kohya_test1.json. It will be saved here. Copy paste it here. Save and
-
00:06:10 you will see kohya_test1.json file is generated. From there you can just load it by typing this,
-
00:06:18 type here and click load and it will load the settings. Okay, everything ready. Let's train
-
00:06:23 model and we will see entire training in here. By the way, we have forgotten to set number of
-
00:06:30 epochs. Therefore, I will kill and restart or shut down all of the terminals. Okay, let's go back to
-
00:06:37 Kohya folder. Open a new terminal, start the web UI with this command. Once again like this, open
-
00:06:44 the new link. Let's copy our saved configuration file like this: put a slash to beginning of it,
-
00:06:51 click load settings are loaded. Let's also set the epochs like 14. Save every one epoch, save
-
00:06:58 and click training okay. Ok, training started. You see there are some errors and warning messages.
-
00:07:05 These are fine. It is just working very well. The important thing is do not use xformers. Okay,
-
00:07:12 it has started and the it per second you are seeing 5.4 with batch size one. I can also
-
00:07:20 increase batch size. Currently gpu memory used is only 60% because Automatic1111 web ui is also
-
00:07:28 running at the same time on the same gpu. This is also using some vram you see I have opened
-
00:07:34 it. The first checkpoint already saved, the second checkpoint already saved. Now it is processing the
-
00:07:40 third checkpoint with 5.15 it per second. Okay, entire training is done in three minutes and only
-
00:07:48 53 seconds. The files are generated inside test1 folder inside model and here our checkpoints.
-
00:07:57 Let's also save the last checkpoint as 14, then I will select all while hitting left shift with
-
00:08:05 cut and then I will move them into my Stable Diffusion web UI inside models inside LoRA folder.
-
00:08:12 Paste here and all pasted. Let's refresh our web ui. Refresh models folder. Let's pick realistic
-
00:08:20 vision. Okay, realistic vision is selected. Click show hide extra networks. In here click LoRA click
-
00:08:27 refresh. Okay LoRA checkpoints arrived. Let's see last checkpoint and see our result photo of ohwx
-
00:08:35 man. Generate and here our picture. It doesn't look very good. We need to do some beautifying
-
00:08:41 and also some checkpoint comparison. So I will go to the tutorials in my GitHub page. I will go to
-
00:08:48 the generate studio quality realistic photos. In here I have some prompts. I will copy the negative
-
00:08:54 prompt as well and let's say dpm SDE Karras, 30 steps cfg scale 5 let's try again. Okay,
-
00:09:04 still not looking very good so let's try different checkpoints. Interestingly, the results are not
-
00:09:11 very good. I have found the reason because I have uploaded only one training image and based on this
-
00:09:19 image, the model was trained. How did I notice it? I noticed it from these processing messages
-
00:09:27 displayed on the command line interface. You see it says that 40 ohwx man is containing one image
-
00:09:36 files so you may also encounter such problem. Be careful. Now i will repeat the training and see
-
00:09:42 what will happen and nothing else is different only I will change the model output name as
-
00:09:49 test3 save hit train. This time it will take more time because it was doing training only on single
-
00:09:56 image. Now the training. Okay it is still seeing only single image. Oh I see because we need to
-
00:10:03 update this folder as well. Don't forget that. So let's kill this too. So I go to the test1 folder.
-
00:10:09 Go to the image folder in here. I will upload the training images into this folder otherwise
-
00:10:16 it won't be effective. Let's go back to the Kohya ss and restart. Okay, test4 save, train and you
-
00:10:24 see now it has found 13 images. Correct, number of steps is correct, total number of steps and other
-
00:10:31 things are correct. Of course this time it will take 13 times more time. I am not deleting any
-
00:10:37 of these parts of the video because you may also encounter such problems. You may also make same
-
00:10:44 mistakes. This is how you debug your mistake, debug your error, and fix it. Training started
-
00:10:51 and this time it is taking like 48 minutes. It has been 10 epochs since the training started.
-
00:10:57 I think this is enough for testing purposes and demonstration. So I will terminate this terminal.
-
00:11:04 The model files are saved inside test1 folder, inside model with the name as test4. So I will cut
-
00:11:13 them, paste into the LoRA folder. Paste it. Let's connect to our Stable Diffusion web UI. Let's load
-
00:11:20 the last prompt. So this time we will use the new LoRA. To do that let's click the show hide extra
-
00:11:28 networks LoRA refresh. Okay, test for LoRA has arrived. Let's look for the checkpoint 6. Okay, it
-
00:11:36 looks like memorized, overtrained because there is no stylization. Let's look for lower checkpoint.
-
00:11:43 With checkpoint 2, we are able to get somewhat okay results. However, this is still not very
-
00:11:51 good. I know the reason. Because I have repeating backgrounds and same clothing in my training
-
00:11:59 images and when I check the generated images I see that it is almost generating same backgrounds
-
00:12:05 in the images. Which means it is memorized. One another thing is, even if I use checkpoint 3, you
-
00:12:14 see it is the same place of the training images. That means this model is already over trained with
-
00:12:21 checkpoint 3, checkpoint 2 is also already over trained I think. Therefore, what we need to do is
-
00:12:29 we need to have better training data set. First of all, this is really important. Another thing is we
-
00:12:36 need to reduce number of repeating because with number of repeating 40, we are not able to save
-
00:12:44 more frequent checkpointing with lesser training data. It is saving checkpoints after every 40
-
00:12:51 multiplied with 30 steps. Therefore, it is 520 steps over for every checkpoint saving. Therefore,
-
00:12:58 we can reduce this to 20 and have more frequent, more fine-tuned checkpoints. Other than that
-
00:13:07 network, rank 128 may work better on unix. Maybe we can try other optimizers you see,
-
00:13:15 there are so many optimizers, but improving our training data set is the number one thing that
-
00:13:21 will improve our training quality. This is all for today. So the link of this page will be in
-
00:13:27 the description and also in the pinned comment of the video. Everything you need is written here:
-
00:13:33 I didn't compare realistic vision half model versus full model so you can test both of them.
-
00:13:39 I will also add the full model link here. If you support us on Patreon I would appreciate that very
-
00:13:44 much. Also, on our channel, we have amazing other Stable Diffusion related videos as well. Just go
-
00:13:51 to the playlist, you will see our Stable Diffusion playlist. All of the Stable Diffusion related
-
00:13:57 videos are in here. Check it out! Also, please support us on Patreon and by joining our youtube
-
00:14:03 channel. I would appreciate those very much. And if you star our repository, fork it and watch it.
-
00:14:10 I would appreciate that too. You will find a lot of useful stuff on our GitHub repository. You will
-
00:14:17 find tutorials, other useful readme files, and in our GitHub page all of our Stable Diffusion
-
00:14:25 tutorials are listed like you are seeing right now. Neatly organized with their thumbnails,
-
00:14:31 their titles so you can check out these links and see which one of them you want to learn. Hopefully
-
00:14:38 see you in another amazing video tutorial. And don't forget to join our Discord channel.
