Testing Stable Diffusion Inference Performance with Latest NVIDIA Driver including TensorRT ONNX

Full tutorial link > https://www.youtube.com/watch?v=TNR2HZRw74E

🚀 UNLOCK INSANE SPEED BOOSTS with NVIDIA's Latest Driver Update or not? 🚀 Are you ready to turbocharge your AI performance? Watch me compare the brand-new NVIDIA 555 driver against the older 552 driver on an RTX 3090 TI for #StableDiffusion. Discover how TensorRT and ONNX models can skyrocket your speed! Don't miss out on these game-changing results!

1-Click fresh Automatic1111 SD Web UI Installer Script with TensorRT and more ⤵️

https://www.patreon.com/posts/86307255

00:00:00 Introduction to the NVIDIA newest driver update performance boost claims

00:00:25 What I am going to test and compare in this video

00:01:11 How to install latest version of Automatic1111 Web UI

00:01:40 The very best sampler of Automatic1111 for Stable Diffusion image generation / inference

00:01:57 Automatic1111 SD Web UI default installation versions

00:02:12 RTX 3090 TI image generation / inference speed for SDXL model with default Automatic1111 SD Web UI installation

00:02:22 How to see your NVIDIA driver version and many more info with nvitop library

00:02:40 Default installation speed for NVIDIA 551.23 driver

00:02:53 How to update Automatic1111 SD Web UI to the latest Torch and xFormers

00:03:05 Which CPU and RAM used to conduct these speed tests CPU-Z results

00:03:54 nvitop status while generating an image with Stable Diffusion XL - SDLX on Automatic1111 Web UI

00:04:10 The new generation speed after updating Torch (2.3.0) and xFormers (0.0.26) to the latest version

00:04:20 How to install TensorRT extension on Automatic1111 SD Web UI

00:05:28 How to generate a TensorRT ONNX model for huge speed up during image generation / inference

00:06:39 How to enable SD Unet selection to be able to use TensorRT generated model

00:07:13 TensorRT pros and cons

00:07:38 TensorRT image generation / inference speed results

00:08:09 How to download and install the latest NVIDIA driver properly and cleanly on Windows

00:09:03 Repeating all the testing again on the newest NVIDIA driver (555.85)

00:10:06 Comparison of other optimizations such as SDP attention or doggettx

00:10:35 Conclusion of the tutorial

NVIDIA's Latest Driver: Does It Really Deliver?

In this video, we dive deep into NVIDIA's newest driver update, comparing the performance of driver versions 552 and 555 on an RTX 3090 TI running Windows 10. We'll explore the claims of speed improvements, particularly with #ONNX runtime and TensorRT integration, using the popular Automatic1111 Web UI.

What You'll Learn:

Driver Comparison: Direct performance comparison between NVIDIA drivers 552 and 555.

Setup and Installation: Step-by-step guide on setting up a fresh #Automatic1111 Web UI installation, including the latest versions of Torch and xFormers.

ONNX and TensorRT Models: Detailed testing of default and TensorRT-generated models to measure speed differences.

Hardware Specifications: Insights into the hardware used for testing, including CPU and memory configurations.

Testing Procedure:

Initial Setup:

Fresh installation using a custom installer script which includes necessary models and styles.

Initial speed test with default settings and configurations.

Driver 552 Performance:

Speed testing on driver 552 with default models and configurations.

Detailed performance metrics and image generation speed analysis.

Upgrading to Latest Torch and xFormers:

Updating to the latest versions of Torch (2.3.0) and xFormers (0.0.26).

Performance testing after updates and comparison with initial setup.

TensorRT Installation and Testing:

Installing TensorRT extension and generating TensorRT models.

Overcoming common installation errors and optimizations.

Speed testing with TensorRT models and analysis of performance improvements.

Upgrading to Driver 555:

Step-by-step guide on downloading and installing NVIDIA driver 555.

Performance comparison between driver 552 and 555.

Analyzing the impact on speed and efficiency.

Results and Conclusions:

Performance Metrics: Detailed analysis of speed improvements (or lack thereof) with the newest NVIDIA driver.

TensorRT Benefits: How TensorRT models significantly boost performance.

Driver Update Impact: Understanding the real-world impact of updating to the latest NVIDIA driver.

Video Transcription

00:00:00 NVIDIA claims that the newest driver brings huge speed improvements when you are using the AI.
00:00:06 It is claimed that the newest driver brings huge performance with ONNX runtime. Automatic1111 Web
00:00:13 UI supports ONNX models with TensorRT. So today I am going to compare this newest driver with basic
00:00:21 installation and also TensorRT ONNX models. I am going to do testing on the RTX 3090 TI on Windows
00:00:30 10. I am going to compare NVIDIA drivers 552 vs 555, which is the latest driver. All tests are
00:00:38 compared on both drivers. I am going to do testing on fresh Automatic1111 Web UI installation. I
00:00:45 am going to test the speed with the latest torch version and the xFormers version. Moreover, I will
00:00:52 install TensorRT and test and repeat the testing on TensorRT generated model, and we will see the
00:01:00 speed differences between older driver, newer driver, between default and TensorRT model.
00:01:07 Make sure to watch the entire tutorial because it is super important. For fresh installation,
00:01:12 I will use my installer script. So let's use this folder. Extract here. Just let's install. This
00:01:19 installer will install everything automatically for us, including downloading the VAE fixed SDXL
00:01:26 base model and the very best styles. So the installation has been completed and the Web
00:01:32 UI started. Let's see the default downloaded models. Let's try the default speed. Okay,
00:01:38 photo of an amazing sports car. The very best sampler that I am finding is UniPC. Let's make it
00:01:44 40 steps. Change the resolution to the default resolution. So initially I will do a warm-up
00:01:51 generation. Then I will generate four images to see the speed. This is the default installation.
00:01:57 You see version 1.9.3, Python 3.10.11, Torch version is 2.1.2, xFormers is 0.0.23
00:02:06 and the image is generated. To see the speed, I will pause the video and generate images. Okay,
00:02:12 four images are generated. The IT per second is 3.42. So what is my GPU and my driver right now?
00:02:21 To show you that, I will use nvitop. You can use this with pip install nvitop. My driver version is
00:02:30 551.23, CUDA version is 12.4. The GPU model is not shown in the nvitop. So this is the GPU
00:02:38 versions that I have. NVIDIA SMI command. 3090 TI. So with default fresh installation, the speed is
00:02:44 3.42 for this driver. Now I am going to update my installation to the latest Torch version and
00:02:52 xFormers. To do that, I will use this .bat file. It will update my installation to the latest.
00:03:00 For the speed comparison, the CPU also matters. This is my CPU 13900-K. This is the frequency it is
00:03:07 working right now. Also, my memory is 64 gigabytes and 2500 megahertz DDR4. The version updater will
00:03:17 show you this error, but it is not important because it will automatically fixed Yes, fixed
00:03:22 and the installation completed. So let's start again our web UI with this .bat file. You can also
00:03:30 start it from the web UI user .bat file here. Okay, the latest version started. You will also
00:03:35 notice that this updater installed the one of the very best extension after detailer (ADetailer) latest version.
00:03:42 So now the Torch version is 2.3.0 and xFormers is 0.0.26. So let's generate the warmup image
00:03:50 with the same settings. Let's set them up and let's generate. Let's see the nvitop status while
00:03:56 generating. You see this is the nivtop displayed parameters: the watt usage, the GPU utilization,
00:04:02 the memory utilization and the warmup image is generated. Now let's try the actual speed. The
00:04:09 generation speed looks like not changed. You see the same speed. Probably for this NVIDIA driver
00:04:16 to be effective, we need TensorRT. So now I am going to install TensorRT extension and
00:04:22 generate TensorRT model to calculate its speed. And after this, I will update my NVIDIA driver
00:04:30 to the latest version, and I will repeat the test. So we will see the speed difference. Okay,
00:04:36 the TensorRT installed. Then let's restart the web UI. In the initial run of TensorRT installation,
00:04:44 it may take a while. Just patiently wait. Okay, during the initial installation,
00:04:48 you will also get this error. Unfortunately, TensorRT developers still didn't fix this error,
00:04:54 but we will fix it. So just click OK to these errors. Okay, okay, okay, okay.
00:05:00 Then it will start and the web UI started with TensorRT. Now I will show you how to fix those
00:05:05 annoying pop-ups. So close the web UI, then run the TensorRT installer one more time like this.
00:05:13 And it is done. Then start the web UI again. So after doing that, you will not see that error
00:05:19 anymore while starting your web UI. You see it is getting started. No errors and started. For
00:05:26 TensorRT to work, we need to generate TensorRT. Make sure that you have selected your model,
00:05:30 which one you want to generate TensorRT. And I will use this batch size one static. Actually,
00:05:37 I am going to change this. So the min batch size one, optimal batch size one, let's say
00:05:42 maximum batch size four. It depends on you. You can set the minimum height, whichever you want
00:05:47 like this. Okay, let's also set this like this and prompt. This is important. Let's make the optimal
00:05:54 prompt like 150 and let's make the maximum 225. Okay, so then export engine. To see the status,
00:06:03 you should check out the CMD window. It may take a while depending on your GPU model. In the CMD
00:06:10 window, it doesn't show anything. However, in the nvitop, I can see that it is using GPU.
00:06:16 GPU utilization increases or decreases. So it is working. You can also see the memory usage of the
00:06:21 GPU. After a while, you will start seeing messages like this. It is working and progressing. Okay,
00:06:27 the model generation has been completed. You can see that TensorRT engines have been saved to the
00:06:33 disk. It took around six minutes, a little bit longer perhaps. And we can see exported
00:06:39 successfully. To be able to use it, we need to go to the settings. And in here search for quick,
00:06:45 you will see this part and type here Unet. And you will see SD Unet apply settings.
00:06:53 Don't worry, you will also get this message and reload UI. Now we have SD Unet and in here
00:06:59 we can select none, which will use the default Unet of the model or we can use the Unet of the
00:07:06 Realistic Vision which we are going to use. This is the TensorRT model. And let's use
00:07:11 the same prompt. This doesn't change quality, but this increases the speed significantly. However,
00:07:18 this takes time to compile, but compiling is only one time and it is ready. And let's generate the
00:07:24 initial warm-up. We can see that it is loading the TensorRT Unet and the generation started. So this
00:07:30 is the initial warm-up generation while recording video. Okay, now it is time to test. So I'm going
00:07:36 to make batch count four. So the test has been completed. We can see that now we are getting
00:07:41 5.51 it per second. This is almost the speed of 4090. So the speed increases like this minus 3.41
00:07:53 over 3.41. We got 61.5% speed increase without any quality change, quality loss, quality difference.
00:08:04 Now I am going to install the NVIDIA driver to see the difference. I am going to install the latest
00:08:10 version of NVIDIA driver. So how are we going to download it? Download NVIDIA drivers, because I
00:08:16 see that some people are having issues. So go to here, select your GPU model. It is this one. And
00:08:23 yes, I am using Game Ready Driver. Lets search. It is working better I think. Download and
00:08:28 download should start. If it doesn't start. Yeah, we need to click here. Download and
00:08:33 the download started. So from driver version 551, I am going to 555. Okay, it is downloaded. So all we
00:08:40 need to do is run as administrator. Click Yes, when it is asking and click OK. I am going to
00:08:46 show you the selection that I am making NVIDIA Graphics Driver, agree and continue. Custom, advanced
00:08:53 and I am going to perform a clean installation. Now I will just do next. But before doing that,
00:08:58 I am going to turn off the video recording and I will restart the computer and we will return back.
00:09:03 Okay, so I have installed the latest driver and restarted the system. You can see the newest CUDA
00:09:09 driver version (555.85) and driver version here. And let's make a warm-up test. And let's see the results. So
00:09:16 after the warm-up, I have generated four images and we can see that we have a significant speed
00:09:22 drop from 3.41 it per second to 3.27 it per second. Let's see the TensorRT speed because
00:09:31 this is what I wonder. So let's just change the batch size to one and do the initial warm-up.
00:09:37 You should watch the messages here. When doing the TensorRT optimization, the xFormers gets
00:09:43 disabled. So the TensorRT works. And yes, we can see the speed right now. So here are the speeds
00:09:50 after generating four images. We got a speed drop after the newest drivers. NVIDIA is
00:09:56 disappointing us. I don't know why. Before ending this video, I will also test the optimization of
00:10:04 the PyTorch itself. So you see there are different optimizations here. Let's try the SDP. This is
00:10:12 the optimization of the PyTorch as far as I know. Let's see the speed of generation. OK,
00:10:18 this one is even slower than the xFormers. I'm going to test these other ones as well. Let's
00:10:24 see which one will perform best. This other one was even slower. Let's see the Doggettx. This is
00:10:31 the latest one. So this is it. It is not always best to upgrade NVIDIA drivers. Unfortunately,
00:10:38 currently, this new driver didn't bring me any speed increase for some reason. If we get a speed
00:10:44 increase, let me know. But the TensorRT is working amazingly. So you should use it if you are going
00:10:51 to generate a lot of images on the same model. Hopefully, see you in another amazing tutorial.

Uh oh!

Testing Stable Diffusion Inference Performance with Latest NVIDIA Driver including TensorRT ONNX

Testing Stable Diffusion Inference Performance with Latest NVIDIA Driver including TensorRT ONNX

Full tutorial link > https://www.youtube.com/watch?v=TNR2HZRw74E

Video Transcription

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!