How to Install and Run TensorRT on RunPod Unix Linux for 2x Faster Stable Diffusion Inference Speed

How to Install & Run TensorRT on RunPod, Unix, Linux for 2x Faster Stable Diffusion Inference Speed

Full tutorial link > https://www.youtube.com/watch?v=eKnMVXVjVoU

Stable Diffusion Gets A Major Boost With RTX Acceleration. One of the most common ways to use Stable Diffusion, the popular Generative AI tool that allows users to produce images from simple text descriptions, is through the Stable Diffusion Web UI by Automatic1111. In today’s Game Ready Driver, NVIDIA added TensorRT acceleration for Stable Diffusion Web UI, which boosts GeForce RTX performance by up to 2X. In this tutorial video I will show you everything about this new Speed up via extension installation and TensorRT SD UNET generation on RunPod. The tutorial can be also used on other Unix systems and on local Linux Operating Systems.

#TensorRT #StableDiffusion #NVIDIA

Automatic Installer Of Tutorial ⤵️

https://www.patreon.com/posts/86438018

Comprehensive TensorRT Main Tutorial ⤵️

https://youtu.be/kvxX6NrPtEk

TensorRT Official GitHub Repo ⤵️

https://github.com/NVIDIA/Stable-Diffusion-WebUI-TensorRT

SECourses Discord To Get Full Support ⤵️

https://discord.com/servers/software-engineering-courses-secourses-772774097734074388

My LinkedIn ⤵️

https://www.linkedin.com/in/furkangozukara/

My Instagram ⤵️

https://www.instagram.com/gozukarafurkan/

My Medium ⤵️

@FurkanGozukara https://medium.com/@furkangozukara

My CivitAI ⤵️

https://civitai.com/user/SECourses

00:00:00 Introduction to speed increase of TensorRT - RTX Acceleration on RunPod & Unix

00:03:10 Image quality comparison of TensorRT on vs TensorRT off for Stable Diffusion XL (SDXL)

00:04:14 How to install TensorRT on RunPod and on local Unix operating systems

00:07:30 How to check your current Nvidia driver on RunPod and on Unix

00:08:10 Extra tips for TensorRT

8.45 How to connect / open your Automatic1111 Web UI on RunPod

00:09:27 How to enable quick selection drop down options for VAE and TensorRT UNET

00:10:09 How to generate your first TensorRT model

00:10:19 TensorRT engine generation speed and duration

00:10:55 How to reload last image generation settings quickly

00:11:44 The amount of speed increase on RTX 3090 on RunPod with TensorRT

Video Transcription

00:00:00 Greetings everyone.
00:00:01 At the moment you are seeing two identical pods running on RunPod.
00:00:06 Let me show you and I will show you the their speed difference when generating Stable Diffusion
00:00:13 XL (SDXL) images.
00:00:15 They are totally same pods, same GPU.
00:00:18 One of them is running on the default setup.
00:00:22 The other one is using the TensorRT RTX acceleration which is newly arrived feature.
00:00:30 So I have got 50 amazing prompts generated with the ChatGPT GPT4, as you are seeing right
00:00:37 now.
00:00:38 I have copied it.
00:00:39 Let's go to the prompts from textbox, paste it there, and do the same thing in the other
00:00:45 instance, and paste that and then let's hit generate and let's see the speed.
00:00:51 So on the left one, which is the default pod, it is going to take about 3 minutes on the
00:00:57 right one, which is the TensorRT using pod.
00:01:01 It is going to about 1.5 minutes.
00:01:04 You see the speed difference.
00:01:06 It is huge.
00:01:07 So in this tutorial I am going to show you how to install TensorRT on RunPod and also
00:01:14 this installation applies to the Unix users because RunPod is using UbuntU operating system.
00:01:22 Therefore it's a Unix system.
00:01:23 So if you don't know how to install TensorRT on a Unix system or on RunPod, watch this
00:01:30 tutorial and get amazing speed.
00:01:32 Let me also show you the it per second.
00:01:35 So currently I am generating 1024 x 1024 default resolution of SDXL images with 20 steps.
00:01:42 This is the default pod, which is not using TensorRT on RTX 4090 GPU.
00:01:49 And this is the TensorRT using pod 4090 GPU.
00:01:53 The speed is slower than what it should be because this is using still older Nvidia driver.
00:02:00 Unfortunately there is no way to upgrade Nvidia driver on RunPod template at the moment.
00:02:07 The RunPod developers have to upgrade the Nvidia driver, but still the speed difference
00:02:12 is huge, as you are seeing right now.
00:02:14 The image quality is totally same, just the speed is much more improved.
00:02:20 The VRAM usages are also almost same as you are seeing right now.
00:02:24 So the TensorRT version already finished the processing.
00:02:27 The TensorRT pod took only 1 minute, 44 seconds to generate 50 images and the images are same
00:02:36 as the other pod.
00:02:38 Let's go to the first image.
00:02:39 Then we will compare that.
00:02:41 The regular pod also finished the processing.
00:02:44 It took 2 minutes, 53 seconds.
00:02:47 So what is the difference?
00:02:49 Let's calculate it.
00:02:50 The regular pod took 173 seconds.
00:02:52 The TensorRT pod took 104 seconds.
00:02:57 So what is the speed difference?
00:02:59 Let's calculate it.
00:03:00 Over 104.
00:03:01 The TensorRT pod is 66 percent faster than the regular pod.
00:03:08 So with just this installation, you gain 66 percent speed difference.
00:03:14 Is the image quality same?
00:03:17 Let's also compare them.
00:03:18 So on the left we are seeing the regular pod.
00:03:21 On the right, we are seeing the TensorRT pod.
00:03:24 I used the same seed.
00:03:26 Let's move image by image to see they are almost exactly same.
00:03:32 There is a very little difference from the xFormers maybe you know that.
00:03:37 So you see and when the Nvidia drivers of the pod template got upgraded, we will get
00:03:44 even better speeds.
00:03:45 On your own computer with installing TensorRT, you can get even much better improvements.
00:03:51 For example, on Windows on RTX 3090 I got over 75 percent speed increase.
00:03:59 So if you don't know TensorRT watch this amazing tutorial.
00:04:02 It is over 40 minutes.
00:04:04 I have explained it all of the details of TensorRT in this tutorial.
00:04:09 In this video I will show you how to install TensorRT on Unix.
00:04:13 So let me close my Tensor pod.
00:04:16 Actually I will close both of them.
00:04:17 I will make a fresh pod so you will see how to make it.
00:04:21 Let's delete them.
00:04:22 Let's go to the community Cloud select extreme speed from here.
00:04:26 For demonstration I will use RTX 4090.
00:04:30 Actually let's this time use RTX 3090 and see the speed difference in that one.
00:04:36 So we are getting our pod, wait until the connect button appears here.
00:04:41 So the pod has been started.
00:04:43 Let's connect, connect to JupyterLab.
00:04:45 Okay it is not ready yet.
00:04:47 Let's refresh until this becomes ready.
00:04:49 This may take a while sometimes.
00:04:51 Sometimes it will show you orange.
00:04:53 Okay, it turned to blue so it should be ready now.
00:04:56 Yeah.
00:04:57 So to follow this tutorial we are going to download the attachments from this amazingly
00:05:02 detailed Patreon post.
00:05:05 For downloading the attachments go to the very bottom, you will see all of the attachments,
00:05:09 let's click all of them one by one and download.
00:05:11 The first thing is that since we are using Stable Diffusion Web UI template, you are
00:05:17 seeing here Stable Diffusion Web UI template, we need to change the relauncher.py.
00:05:23 Because it will permanently relaunch your Web UI instance whenever you restart it.
00:05:29 I don't like this behavior.
00:05:30 You see it has a while loop.
00:05:32 So I am going to upload the relauncher.py from my Patreon post.
00:05:37 And after that you will see it becomes like this.
00:05:40 When you first time do this operation, you need to restart your pod so that it will become
00:05:45 effective.
00:05:46 So I just did restart and after restart, just wait a little bit and then connect to JupyterLab
00:05:52 one more time.
00:05:53 Wait until it becomes available.
00:05:55 It should be pretty quick this time.
00:05:57 So the pod is restarted and we got our pod.
00:06:00 So go back to the workspace, click this icon and upload install_tensorRT.sh file and 1_click_auto1111_SDXL.sh
00:06:07 file.
00:06:09 Then in the Patreon post, you will see that this command.
00:06:14 First execute this command, this will download the latest VAE file for us.
00:06:19 This doesn't do much at the moment because they have added the accurate SDXL base version
00:06:25 into the template.
00:06:26 So we are just downloading the best VAE files for both SDXL and both SD 1.5 based version.
00:06:35 Then just wait for Stable Diffusion Web UI instance to start.
00:06:39 So instance has been started.
00:06:41 So let's open another terminal.
00:06:43 And what we are going to do is copy this, like this.
00:06:47 Copy, copy-paste it here.
00:06:49 And this time it will install the TensorRT latest version with its necessary dependencies.
00:06:56 When you use the extension install feature of the Automatic1111 Web UI, it will not work.
00:07:03 I have tested it, even if you restart it.
00:07:06 Even if you remove the skip install, it doesn't work.
00:07:08 So, I have come up with a specific way to install it.
00:07:13 And if you are a Unix user on your computer, you can just edit the this install_tensorRT.sh
00:07:21 file, and change the folder paths and use it on your local installation.
00:07:27 The installation is pretty fast actually, as you are seeing right now.
00:07:30 Let me also show you the current Nvidia driver on the RunPod.
00:07:34 It is 525.
00:07:35 This is a super old driver.
00:07:39 The TensorRT developers are suggesting this driver for Linux, which is Unix, 450.
00:07:47 But we are using a very old one.
00:07:49 Therefore, this is why we are not getting the expected speed output from our TensorRT
00:07:55 installation.
00:07:56 So, when this driver got upgraded, hopefully we will get much better speeds.
00:08:02 On your local computer you can install the latest driver and enjoy it.
00:08:06 There are also some other tips here.
00:08:09 For example, when you generated your TensorRT with necessary resolutions, you can use high
00:08:15 resolution fix as well as you are seeing right now.
00:08:18 There are also some other tips here.
00:08:20 So also read these instructions.
00:08:22 I will put the link of this repository and this Patreon post into the description of
00:08:28 the video and also the comment section of the video.
00:08:31 For Udemy users you will get the attachments in the attachments section.
00:08:36 So let's look at the installation.
00:08:38 Okay, Web UI is starting right now.
00:08:40 The necessary packages have been installed and Web UI started as you are seeing right
00:08:45 now.
00:08:46 So let's connect it.
00:08:47 For connecting I am preferring connect.
00:08:48 Connect to HTTP service 3001 port.
00:08:52 So we got our default installation.
00:08:54 We didn't use the TensorRT yet.
00:08:56 Let's try one of the prompt here and see the speed.
00:08:59 This is RTX 3090.
00:09:01 Okay so let's make this, let's make the batch count 3.
00:09:05 This will generate images one by one.
00:09:07 So this is not batch size.
00:09:09 Let's generate 3 images and look at the it per second.
00:09:12 So we are getting about 3.64.
00:09:15 3.65 it per second.
00:09:18 Okay 3.65 it per second.
00:09:23 We are getting the images and 3.65.
00:09:26 Yes this is our it per second right now.
00:09:29 First of all go to the settings and in here go to the user interface, from here select
00:09:36 SD_VAE and SD_UNET here as you are seeing right now.
00:09:40 Apply settings, reload UI.
00:09:42 This is important.
00:09:43 This will reload the UI as you are seeing right now.
00:09:46 Then we will have two options two quick selection box which we will use.
00:09:52 Okay it is getting reloaded.
00:09:54 Okay.
00:09:55 So now we can set our VAE and our SD UNET.
00:09:58 We don't have any SD UNET yet.
00:10:00 Go to the TensorRT.
00:10:02 As I said, watch this amazing full tutorial to learn much more about TensorRT.
00:10:07 In this tutorial I am just going to show how to install and use it.
00:10:11 I will make 1024 1024 batch size 1 static engine.
00:10:15 So these are all the settings.
00:10:17 Export engine.
00:10:18 This export engine duration totally depends on the GPU.
00:10:22 For example, on RTX 4090 with this older driver, it took about 3 minutes.
00:10:28 Let's see how much time it will take on RTX 3090.
00:10:32 During the ONNX file generation, you will not get any messages.
00:10:36 You will see this screen.
00:10:38 After that it will start generating the TensorRT file that we need.
00:10:43 And you will see the progress like this.
00:10:45 So the TensorRT generation has been completed.
00:10:48 It took 320 seconds and now we can begin using it.
00:10:53 So let's go to text to image tab again.
00:10:55 We have loaded the last used settings with clicking this icon.
00:10:59 Let's make the seed random.
00:11:01 Then let's refresh the VAE and UNET from here.
00:11:05 I will use the best SDXL VAE FP16.
00:11:09 This doesn't bring any speed increase but I am using it so that without using --no-half-vae
00:11:17 you can use both SD 1.5 based models and SDXL models.
00:11:19 Okay it is selected.
00:11:24 After that let's refresh the UNET and you can leave this automatic or select it.
00:11:28 I will select it.
00:11:30 Okay let's generate image so the first image generation may be slower than the consequent
00:11:35 ones.
00:11:36 Okay it per second is 6.24.
00:11:40 Let's generate 9 images and see the speed.
00:11:44 6.16 it per second.
00:11:47 So from 3.64 it to 6.17 it per second.
00:11:52 How much speed increase does this make?
00:11:54 Let's calculate it.
00:11:55 So 617 minus 364 over 364.
00:12:01 We got about 69 speed increase about 70 increase with RTX 3090 on this very old Nvidia driver.
00:12:14 So you see this is huge.
00:12:16 TensorRT is huge.
00:12:18 You just wait several minutes one time then you can generate images so well, so fast and
00:12:24 you can generate thousands of hundreds of images faster and save your time hugely.
00:12:30 This is amazing.
00:12:32 Hopefully it will get even better over the time.
00:12:34 The developer is working very active.
00:12:37 Don't forget to watch this tutorial.
00:12:39 Then you can use the advanced setup by unchecking this use static shapes change these settings,
00:12:45 set up them as your needs and get the speed up.
00:12:48 Hopefully you have enjoyed, please like our channel subscribe our channel.
00:12:52 The links will be in the description of the video and also in the comment section of the
00:12:57 video.
00:12:58 We have links here.
00:12:59 Hopefully see you in another amazing tutorial video.

Uh oh!

How to Install and Run TensorRT on RunPod Unix Linux for 2x Faster Stable Diffusion Inference Speed

How to Install & Run TensorRT on RunPod, Unix, Linux for 2x Faster Stable Diffusion Inference Speed

Full tutorial link > https://www.youtube.com/watch?v=eKnMVXVjVoU

Video Transcription

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!