-
-
Notifications
You must be signed in to change notification settings - Fork 363
How to Install and Run TensorRT on RunPod Unix Linux for 2x Faster Stable Diffusion Inference Speed
Full tutorial link > https://www.youtube.com/watch?v=eKnMVXVjVoU
Stable Diffusion Gets A Major Boost With RTX Acceleration. One of the most common ways to use Stable Diffusion, the popular Generative AI tool that allows users to produce images from simple text descriptions, is through the Stable Diffusion Web UI by Automatic1111. In today’s Game Ready Driver, NVIDIA added TensorRT acceleration for Stable Diffusion Web UI, which boosts GeForce RTX performance by up to 2X. In this tutorial video I will show you everything about this new Speed up via extension installation and TensorRT SD UNET generation on RunPod. The tutorial can be also used on other Unix systems and on local Linux Operating Systems.
#TensorRT #StableDiffusion #NVIDIA
Automatic Installer Of Tutorial
https://www.patreon.com/posts/86438018
Comprehensive TensorRT Main Tutorial
TensorRT Official GitHub Repo
https://github.com/NVIDIA/Stable-Diffusion-WebUI-TensorRT
SECourses Discord To Get Full Support
https://discord.com/servers/software-engineering-courses-secourses-772774097734074388
My LinkedIn
https://www.linkedin.com/in/furkangozukara/
My Instagram
https://www.instagram.com/gozukarafurkan/
My Medium
@FurkanGozukara https://medium.com/@furkangozukara
My CivitAI
https://civitai.com/user/SECourses
00:00:00 Introduction to speed increase of TensorRT - RTX Acceleration on RunPod & Unix
00:03:10 Image quality comparison of TensorRT on vs TensorRT off for Stable Diffusion XL (SDXL)
00:04:14 How to install TensorRT on RunPod and on local Unix operating systems
00:07:30 How to check your current Nvidia driver on RunPod and on Unix
00:08:10 Extra tips for TensorRT
8.45 How to connect / open your Automatic1111 Web UI on RunPod
00:09:27 How to enable quick selection drop down options for VAE and TensorRT UNET
00:10:09 How to generate your first TensorRT model
00:10:19 TensorRT engine generation speed and duration
00:10:55 How to reload last image generation settings quickly
00:11:44 The amount of speed increase on RTX 3090 on RunPod with TensorRT
-
00:00:00 Greetings everyone.
-
00:00:01 At the moment you are seeing two identical pods running on RunPod.
-
00:00:06 Let me show you and I will show you the their speed difference when generating Stable Diffusion
-
00:00:13 XL (SDXL) images.
-
00:00:15 They are totally same pods, same GPU.
-
00:00:18 One of them is running on the default setup.
-
00:00:22 The other one is using the TensorRT RTX acceleration which is newly arrived feature.
-
00:00:30 So I have got 50 amazing prompts generated with the ChatGPT GPT4, as you are seeing right
-
00:00:37 now.
-
00:00:38 I have copied it.
-
00:00:39 Let's go to the prompts from textbox, paste it there, and do the same thing in the other
-
00:00:45 instance, and paste that and then let's hit generate and let's see the speed.
-
00:00:51 So on the left one, which is the default pod, it is going to take about 3 minutes on the
-
00:00:57 right one, which is the TensorRT using pod.
-
00:01:01 It is going to about 1.5 minutes.
-
00:01:04 You see the speed difference.
-
00:01:06 It is huge.
-
00:01:07 So in this tutorial I am going to show you how to install TensorRT on RunPod and also
-
00:01:14 this installation applies to the Unix users because RunPod is using UbuntU operating system.
-
00:01:22 Therefore it's a Unix system.
-
00:01:23 So if you don't know how to install TensorRT on a Unix system or on RunPod, watch this
-
00:01:30 tutorial and get amazing speed.
-
00:01:32 Let me also show you the it per second.
-
00:01:35 So currently I am generating 1024 x 1024 default resolution of SDXL images with 20 steps.
-
00:01:42 This is the default pod, which is not using TensorRT on RTX 4090 GPU.
-
00:01:49 And this is the TensorRT using pod 4090 GPU.
-
00:01:53 The speed is slower than what it should be because this is using still older Nvidia driver.
-
00:02:00 Unfortunately there is no way to upgrade Nvidia driver on RunPod template at the moment.
-
00:02:07 The RunPod developers have to upgrade the Nvidia driver, but still the speed difference
-
00:02:12 is huge, as you are seeing right now.
-
00:02:14 The image quality is totally same, just the speed is much more improved.
-
00:02:20 The VRAM usages are also almost same as you are seeing right now.
-
00:02:24 So the TensorRT version already finished the processing.
-
00:02:27 The TensorRT pod took only 1 minute, 44 seconds to generate 50 images and the images are same
-
00:02:36 as the other pod.
-
00:02:38 Let's go to the first image.
-
00:02:39 Then we will compare that.
-
00:02:41 The regular pod also finished the processing.
-
00:02:44 It took 2 minutes, 53 seconds.
-
00:02:47 So what is the difference?
-
00:02:49 Let's calculate it.
-
00:02:50 The regular pod took 173 seconds.
-
00:02:52 The TensorRT pod took 104 seconds.
-
00:02:57 So what is the speed difference?
-
00:02:59 Let's calculate it.
-
00:03:00 Over 104.
-
00:03:01 The TensorRT pod is 66 percent faster than the regular pod.
-
00:03:08 So with just this installation, you gain 66 percent speed difference.
-
00:03:14 Is the image quality same?
-
00:03:17 Let's also compare them.
-
00:03:18 So on the left we are seeing the regular pod.
-
00:03:21 On the right, we are seeing the TensorRT pod.
-
00:03:24 I used the same seed.
-
00:03:26 Let's move image by image to see they are almost exactly same.
-
00:03:32 There is a very little difference from the xFormers maybe you know that.
-
00:03:37 So you see and when the Nvidia drivers of the pod template got upgraded, we will get
-
00:03:44 even better speeds.
-
00:03:45 On your own computer with installing TensorRT, you can get even much better improvements.
-
00:03:51 For example, on Windows on RTX 3090 I got over 75 percent speed increase.
-
00:03:59 So if you don't know TensorRT watch this amazing tutorial.
-
00:04:02 It is over 40 minutes.
-
00:04:04 I have explained it all of the details of TensorRT in this tutorial.
-
00:04:09 In this video I will show you how to install TensorRT on Unix.
-
00:04:13 So let me close my Tensor pod.
-
00:04:16 Actually I will close both of them.
-
00:04:17 I will make a fresh pod so you will see how to make it.
-
00:04:21 Let's delete them.
-
00:04:22 Let's go to the community Cloud select extreme speed from here.
-
00:04:26 For demonstration I will use RTX 4090.
-
00:04:30 Actually let's this time use RTX 3090 and see the speed difference in that one.
-
00:04:36 So we are getting our pod, wait until the connect button appears here.
-
00:04:41 So the pod has been started.
-
00:04:43 Let's connect, connect to JupyterLab.
-
00:04:45 Okay it is not ready yet.
-
00:04:47 Let's refresh until this becomes ready.
-
00:04:49 This may take a while sometimes.
-
00:04:51 Sometimes it will show you orange.
-
00:04:53 Okay, it turned to blue so it should be ready now.
-
00:04:56 Yeah.
-
00:04:57 So to follow this tutorial we are going to download the attachments from this amazingly
-
00:05:02 detailed Patreon post.
-
00:05:05 For downloading the attachments go to the very bottom, you will see all of the attachments,
-
00:05:09 let's click all of them one by one and download.
-
00:05:11 The first thing is that since we are using Stable Diffusion Web UI template, you are
-
00:05:17 seeing here Stable Diffusion Web UI template, we need to change the relauncher.py.
-
00:05:23 Because it will permanently relaunch your Web UI instance whenever you restart it.
-
00:05:29 I don't like this behavior.
-
00:05:30 You see it has a while loop.
-
00:05:32 So I am going to upload the relauncher.py from my Patreon post.
-
00:05:37 And after that you will see it becomes like this.
-
00:05:40 When you first time do this operation, you need to restart your pod so that it will become
-
00:05:45 effective.
-
00:05:46 So I just did restart and after restart, just wait a little bit and then connect to JupyterLab
-
00:05:52 one more time.
-
00:05:53 Wait until it becomes available.
-
00:05:55 It should be pretty quick this time.
-
00:05:57 So the pod is restarted and we got our pod.
-
00:06:00 So go back to the workspace, click this icon and upload install_tensorRT.sh file and 1_click_auto1111_SDXL.sh
-
00:06:07 file.
-
00:06:09 Then in the Patreon post, you will see that this command.
-
00:06:14 First execute this command, this will download the latest VAE file for us.
-
00:06:19 This doesn't do much at the moment because they have added the accurate SDXL base version
-
00:06:25 into the template.
-
00:06:26 So we are just downloading the best VAE files for both SDXL and both SD 1.5 based version.
-
00:06:35 Then just wait for Stable Diffusion Web UI instance to start.
-
00:06:39 So instance has been started.
-
00:06:41 So let's open another terminal.
-
00:06:43 And what we are going to do is copy this, like this.
-
00:06:47 Copy, copy-paste it here.
-
00:06:49 And this time it will install the TensorRT latest version with its necessary dependencies.
-
00:06:56 When you use the extension install feature of the Automatic1111 Web UI, it will not work.
-
00:07:03 I have tested it, even if you restart it.
-
00:07:06 Even if you remove the skip install, it doesn't work.
-
00:07:08 So, I have come up with a specific way to install it.
-
00:07:13 And if you are a Unix user on your computer, you can just edit the this install_tensorRT.sh
-
00:07:21 file, and change the folder paths and use it on your local installation.
-
00:07:27 The installation is pretty fast actually, as you are seeing right now.
-
00:07:30 Let me also show you the current Nvidia driver on the RunPod.
-
00:07:34 It is 525.
-
00:07:35 This is a super old driver.
-
00:07:39 The TensorRT developers are suggesting this driver for Linux, which is Unix, 450.
-
00:07:47 But we are using a very old one.
-
00:07:49 Therefore, this is why we are not getting the expected speed output from our TensorRT
-
00:07:55 installation.
-
00:07:56 So, when this driver got upgraded, hopefully we will get much better speeds.
-
00:08:02 On your local computer you can install the latest driver and enjoy it.
-
00:08:06 There are also some other tips here.
-
00:08:09 For example, when you generated your TensorRT with necessary resolutions, you can use high
-
00:08:15 resolution fix as well as you are seeing right now.
-
00:08:18 There are also some other tips here.
-
00:08:20 So also read these instructions.
-
00:08:22 I will put the link of this repository and this Patreon post into the description of
-
00:08:28 the video and also the comment section of the video.
-
00:08:31 For Udemy users you will get the attachments in the attachments section.
-
00:08:36 So let's look at the installation.
-
00:08:38 Okay, Web UI is starting right now.
-
00:08:40 The necessary packages have been installed and Web UI started as you are seeing right
-
00:08:45 now.
-
00:08:46 So let's connect it.
-
00:08:47 For connecting I am preferring connect.
-
00:08:48 Connect to HTTP service 3001 port.
-
00:08:52 So we got our default installation.
-
00:08:54 We didn't use the TensorRT yet.
-
00:08:56 Let's try one of the prompt here and see the speed.
-
00:08:59 This is RTX 3090.
-
00:09:01 Okay so let's make this, let's make the batch count 3.
-
00:09:05 This will generate images one by one.
-
00:09:07 So this is not batch size.
-
00:09:09 Let's generate 3 images and look at the it per second.
-
00:09:12 So we are getting about 3.64.
-
00:09:15 3.65 it per second.
-
00:09:18 Okay 3.65 it per second.
-
00:09:23 We are getting the images and 3.65.
-
00:09:26 Yes this is our it per second right now.
-
00:09:29 First of all go to the settings and in here go to the user interface, from here select
-
00:09:36 SD_VAE and SD_UNET here as you are seeing right now.
-
00:09:40 Apply settings, reload UI.
-
00:09:42 This is important.
-
00:09:43 This will reload the UI as you are seeing right now.
-
00:09:46 Then we will have two options two quick selection box which we will use.
-
00:09:52 Okay it is getting reloaded.
-
00:09:54 Okay.
-
00:09:55 So now we can set our VAE and our SD UNET.
-
00:09:58 We don't have any SD UNET yet.
-
00:10:00 Go to the TensorRT.
-
00:10:02 As I said, watch this amazing full tutorial to learn much more about TensorRT.
-
00:10:07 In this tutorial I am just going to show how to install and use it.
-
00:10:11 I will make 1024 1024 batch size 1 static engine.
-
00:10:15 So these are all the settings.
-
00:10:17 Export engine.
-
00:10:18 This export engine duration totally depends on the GPU.
-
00:10:22 For example, on RTX 4090 with this older driver, it took about 3 minutes.
-
00:10:28 Let's see how much time it will take on RTX 3090.
-
00:10:32 During the ONNX file generation, you will not get any messages.
-
00:10:36 You will see this screen.
-
00:10:38 After that it will start generating the TensorRT file that we need.
-
00:10:43 And you will see the progress like this.
-
00:10:45 So the TensorRT generation has been completed.
-
00:10:48 It took 320 seconds and now we can begin using it.
-
00:10:53 So let's go to text to image tab again.
-
00:10:55 We have loaded the last used settings with clicking this icon.
-
00:10:59 Let's make the seed random.
-
00:11:01 Then let's refresh the VAE and UNET from here.
-
00:11:05 I will use the best SDXL VAE FP16.
-
00:11:09 This doesn't bring any speed increase but I am using it so that without using --no-half-vae
-
00:11:17 you can use both SD 1.5 based models and SDXL models.
-
00:11:19 Okay it is selected.
-
00:11:24 After that let's refresh the UNET and you can leave this automatic or select it.
-
00:11:28 I will select it.
-
00:11:30 Okay let's generate image so the first image generation may be slower than the consequent
-
00:11:35 ones.
-
00:11:36 Okay it per second is 6.24.
-
00:11:40 Let's generate 9 images and see the speed.
-
00:11:44 6.16 it per second.
-
00:11:47 So from 3.64 it to 6.17 it per second.
-
00:11:52 How much speed increase does this make?
-
00:11:54 Let's calculate it.
-
00:11:55 So 617 minus 364 over 364.
-
00:12:01 We got about 69 speed increase about 70 increase with RTX 3090 on this very old Nvidia driver.
-
00:12:14 So you see this is huge.
-
00:12:16 TensorRT is huge.
-
00:12:18 You just wait several minutes one time then you can generate images so well, so fast and
-
00:12:24 you can generate thousands of hundreds of images faster and save your time hugely.
-
00:12:30 This is amazing.
-
00:12:32 Hopefully it will get even better over the time.
-
00:12:34 The developer is working very active.
-
00:12:37 Don't forget to watch this tutorial.
-
00:12:39 Then you can use the advanced setup by unchecking this use static shapes change these settings,
-
00:12:45 set up them as your needs and get the speed up.
-
00:12:48 Hopefully you have enjoyed, please like our channel subscribe our channel.
-
00:12:52 The links will be in the description of the video and also in the comment section of the
-
00:12:57 video.
-
00:12:58 We have links here.
-
00:12:59 Hopefully see you in another amazing tutorial video.
