-
-
Notifications
You must be signed in to change notification settings - Fork 363
Testing Stable Diffusion Inference Performance with Latest NVIDIA Driver including TensorRT ONNX
Full tutorial link > https://www.youtube.com/watch?v=TNR2HZRw74E
🚀 UNLOCK INSANE SPEED BOOSTS with NVIDIA's Latest Driver Update or not? 🚀 Are you ready to turbocharge your AI performance? Watch me compare the brand-new NVIDIA 555 driver against the older 552 driver on an RTX 3090 TI for #StableDiffusion. Discover how TensorRT and ONNX models can skyrocket your speed! Don't miss out on these game-changing results!
1-Click fresh Automatic1111 SD Web UI Installer Script with TensorRT and more
https://www.patreon.com/posts/86307255
00:00:00 Introduction to the NVIDIA newest driver update performance boost claims
00:00:25 What I am going to test and compare in this video
00:01:11 How to install latest version of Automatic1111 Web UI
00:01:40 The very best sampler of Automatic1111 for Stable Diffusion image generation / inference
00:01:57 Automatic1111 SD Web UI default installation versions
00:02:12 RTX 3090 TI image generation / inference speed for SDXL model with default Automatic1111 SD Web UI installation
00:02:22 How to see your NVIDIA driver version and many more info with nvitop library
00:02:40 Default installation speed for NVIDIA 551.23 driver
00:02:53 How to update Automatic1111 SD Web UI to the latest Torch and xFormers
00:03:05 Which CPU and RAM used to conduct these speed tests CPU-Z results
00:03:54 nvitop status while generating an image with Stable Diffusion XL - SDLX on Automatic1111 Web UI
00:04:10 The new generation speed after updating Torch (2.3.0) and xFormers (0.0.26) to the latest version
00:04:20 How to install TensorRT extension on Automatic1111 SD Web UI
00:05:28 How to generate a TensorRT ONNX model for huge speed up during image generation / inference
00:06:39 How to enable SD Unet selection to be able to use TensorRT generated model
00:07:13 TensorRT pros and cons
00:07:38 TensorRT image generation / inference speed results
00:08:09 How to download and install the latest NVIDIA driver properly and cleanly on Windows
00:09:03 Repeating all the testing again on the newest NVIDIA driver (555.85)
00:10:06 Comparison of other optimizations such as SDP attention or doggettx
00:10:35 Conclusion of the tutorial
NVIDIA's Latest Driver: Does It Really Deliver?
In this video, we dive deep into NVIDIA's newest driver update, comparing the performance of driver versions 552 and 555 on an RTX 3090 TI running Windows 10. We'll explore the claims of speed improvements, particularly with #ONNX runtime and TensorRT integration, using the popular Automatic1111 Web UI.
What You'll Learn:
Driver Comparison: Direct performance comparison between NVIDIA drivers 552 and 555.
Setup and Installation: Step-by-step guide on setting up a fresh #Automatic1111 Web UI installation, including the latest versions of Torch and xFormers.
ONNX and TensorRT Models: Detailed testing of default and TensorRT-generated models to measure speed differences.
Hardware Specifications: Insights into the hardware used for testing, including CPU and memory configurations.
Testing Procedure:
Initial Setup:
Fresh installation using a custom installer script which includes necessary models and styles.
Initial speed test with default settings and configurations.
Driver 552 Performance:
Speed testing on driver 552 with default models and configurations.
Detailed performance metrics and image generation speed analysis.
Upgrading to Latest Torch and xFormers:
Updating to the latest versions of Torch (2.3.0) and xFormers (0.0.26).
Performance testing after updates and comparison with initial setup.
TensorRT Installation and Testing:
Installing TensorRT extension and generating TensorRT models.
Overcoming common installation errors and optimizations.
Speed testing with TensorRT models and analysis of performance improvements.
Upgrading to Driver 555:
Step-by-step guide on downloading and installing NVIDIA driver 555.
Performance comparison between driver 552 and 555.
Analyzing the impact on speed and efficiency.
Results and Conclusions:
Performance Metrics: Detailed analysis of speed improvements (or lack thereof) with the newest NVIDIA driver.
TensorRT Benefits: How TensorRT models significantly boost performance.
Driver Update Impact: Understanding the real-world impact of updating to the latest NVIDIA driver.
-
00:00:00 NVIDIA claims that the newest driver brings huge speed improvements when you are using the AI.
-
00:00:06 It is claimed that the newest driver brings huge performance with ONNX runtime. Automatic1111 Web
-
00:00:13 UI supports ONNX models with TensorRT. So today I am going to compare this newest driver with basic
-
00:00:21 installation and also TensorRT ONNX models. I am going to do testing on the RTX 3090 TI on Windows
-
00:00:30 10. I am going to compare NVIDIA drivers 552 vs 555, which is the latest driver. All tests are
-
00:00:38 compared on both drivers. I am going to do testing on fresh Automatic1111 Web UI installation. I
-
00:00:45 am going to test the speed with the latest torch version and the xFormers version. Moreover, I will
-
00:00:52 install TensorRT and test and repeat the testing on TensorRT generated model, and we will see the
-
00:01:00 speed differences between older driver, newer driver, between default and TensorRT model.
-
00:01:07 Make sure to watch the entire tutorial because it is super important. For fresh installation,
-
00:01:12 I will use my installer script. So let's use this folder. Extract here. Just let's install. This
-
00:01:19 installer will install everything automatically for us, including downloading the VAE fixed SDXL
-
00:01:26 base model and the very best styles. So the installation has been completed and the Web
-
00:01:32 UI started. Let's see the default downloaded models. Let's try the default speed. Okay,
-
00:01:38 photo of an amazing sports car. The very best sampler that I am finding is UniPC. Let's make it
-
00:01:44 40 steps. Change the resolution to the default resolution. So initially I will do a warm-up
-
00:01:51 generation. Then I will generate four images to see the speed. This is the default installation.
-
00:01:57 You see version 1.9.3, Python 3.10.11, Torch version is 2.1.2, xFormers is 0.0.23
-
00:02:06 and the image is generated. To see the speed, I will pause the video and generate images. Okay,
-
00:02:12 four images are generated. The IT per second is 3.42. So what is my GPU and my driver right now?
-
00:02:21 To show you that, I will use nvitop. You can use this with pip install nvitop. My driver version is
-
00:02:30 551.23, CUDA version is 12.4. The GPU model is not shown in the nvitop. So this is the GPU
-
00:02:38 versions that I have. NVIDIA SMI command. 3090 TI. So with default fresh installation, the speed is
-
00:02:44 3.42 for this driver. Now I am going to update my installation to the latest Torch version and
-
00:02:52 xFormers. To do that, I will use this .bat file. It will update my installation to the latest.
-
00:03:00 For the speed comparison, the CPU also matters. This is my CPU 13900-K. This is the frequency it is
-
00:03:07 working right now. Also, my memory is 64 gigabytes and 2500 megahertz DDR4. The version updater will
-
00:03:17 show you this error, but it is not important because it will automatically fixed Yes, fixed
-
00:03:22 and the installation completed. So let's start again our web UI with this .bat file. You can also
-
00:03:30 start it from the web UI user .bat file here. Okay, the latest version started. You will also
-
00:03:35 notice that this updater installed the one of the very best extension after detailer (ADetailer) latest version.
-
00:03:42 So now the Torch version is 2.3.0 and xFormers is 0.0.26. So let's generate the warmup image
-
00:03:50 with the same settings. Let's set them up and let's generate. Let's see the nvitop status while
-
00:03:56 generating. You see this is the nivtop displayed parameters: the watt usage, the GPU utilization,
-
00:04:02 the memory utilization and the warmup image is generated. Now let's try the actual speed. The
-
00:04:09 generation speed looks like not changed. You see the same speed. Probably for this NVIDIA driver
-
00:04:16 to be effective, we need TensorRT. So now I am going to install TensorRT extension and
-
00:04:22 generate TensorRT model to calculate its speed. And after this, I will update my NVIDIA driver
-
00:04:30 to the latest version, and I will repeat the test. So we will see the speed difference. Okay,
-
00:04:36 the TensorRT installed. Then let's restart the web UI. In the initial run of TensorRT installation,
-
00:04:44 it may take a while. Just patiently wait. Okay, during the initial installation,
-
00:04:48 you will also get this error. Unfortunately, TensorRT developers still didn't fix this error,
-
00:04:54 but we will fix it. So just click OK to these errors. Okay, okay, okay, okay.
-
00:05:00 Then it will start and the web UI started with TensorRT. Now I will show you how to fix those
-
00:05:05 annoying pop-ups. So close the web UI, then run the TensorRT installer one more time like this.
-
00:05:13 And it is done. Then start the web UI again. So after doing that, you will not see that error
-
00:05:19 anymore while starting your web UI. You see it is getting started. No errors and started. For
-
00:05:26 TensorRT to work, we need to generate TensorRT. Make sure that you have selected your model,
-
00:05:30 which one you want to generate TensorRT. And I will use this batch size one static. Actually,
-
00:05:37 I am going to change this. So the min batch size one, optimal batch size one, let's say
-
00:05:42 maximum batch size four. It depends on you. You can set the minimum height, whichever you want
-
00:05:47 like this. Okay, let's also set this like this and prompt. This is important. Let's make the optimal
-
00:05:54 prompt like 150 and let's make the maximum 225. Okay, so then export engine. To see the status,
-
00:06:03 you should check out the CMD window. It may take a while depending on your GPU model. In the CMD
-
00:06:10 window, it doesn't show anything. However, in the nvitop, I can see that it is using GPU.
-
00:06:16 GPU utilization increases or decreases. So it is working. You can also see the memory usage of the
-
00:06:21 GPU. After a while, you will start seeing messages like this. It is working and progressing. Okay,
-
00:06:27 the model generation has been completed. You can see that TensorRT engines have been saved to the
-
00:06:33 disk. It took around six minutes, a little bit longer perhaps. And we can see exported
-
00:06:39 successfully. To be able to use it, we need to go to the settings. And in here search for quick,
-
00:06:45 you will see this part and type here Unet. And you will see SD Unet apply settings.
-
00:06:53 Don't worry, you will also get this message and reload UI. Now we have SD Unet and in here
-
00:06:59 we can select none, which will use the default Unet of the model or we can use the Unet of the
-
00:07:06 Realistic Vision which we are going to use. This is the TensorRT model. And let's use
-
00:07:11 the same prompt. This doesn't change quality, but this increases the speed significantly. However,
-
00:07:18 this takes time to compile, but compiling is only one time and it is ready. And let's generate the
-
00:07:24 initial warm-up. We can see that it is loading the TensorRT Unet and the generation started. So this
-
00:07:30 is the initial warm-up generation while recording video. Okay, now it is time to test. So I'm going
-
00:07:36 to make batch count four. So the test has been completed. We can see that now we are getting
-
00:07:41 5.51 it per second. This is almost the speed of 4090. So the speed increases like this minus 3.41
-
00:07:53 over 3.41. We got 61.5% speed increase without any quality change, quality loss, quality difference.
-
00:08:04 Now I am going to install the NVIDIA driver to see the difference. I am going to install the latest
-
00:08:10 version of NVIDIA driver. So how are we going to download it? Download NVIDIA drivers, because I
-
00:08:16 see that some people are having issues. So go to here, select your GPU model. It is this one. And
-
00:08:23 yes, I am using Game Ready Driver. Lets search. It is working better I think. Download and
-
00:08:28 download should start. If it doesn't start. Yeah, we need to click here. Download and
-
00:08:33 the download started. So from driver version 551, I am going to 555. Okay, it is downloaded. So all we
-
00:08:40 need to do is run as administrator. Click Yes, when it is asking and click OK. I am going to
-
00:08:46 show you the selection that I am making NVIDIA Graphics Driver, agree and continue. Custom, advanced
-
00:08:53 and I am going to perform a clean installation. Now I will just do next. But before doing that,
-
00:08:58 I am going to turn off the video recording and I will restart the computer and we will return back.
-
00:09:03 Okay, so I have installed the latest driver and restarted the system. You can see the newest CUDA
-
00:09:09 driver version (555.85) and driver version here. And let's make a warm-up test. And let's see the results. So
-
00:09:16 after the warm-up, I have generated four images and we can see that we have a significant speed
-
00:09:22 drop from 3.41 it per second to 3.27 it per second. Let's see the TensorRT speed because
-
00:09:31 this is what I wonder. So let's just change the batch size to one and do the initial warm-up.
-
00:09:37 You should watch the messages here. When doing the TensorRT optimization, the xFormers gets
-
00:09:43 disabled. So the TensorRT works. And yes, we can see the speed right now. So here are the speeds
-
00:09:50 after generating four images. We got a speed drop after the newest drivers. NVIDIA is
-
00:09:56 disappointing us. I don't know why. Before ending this video, I will also test the optimization of
-
00:10:04 the PyTorch itself. So you see there are different optimizations here. Let's try the SDP. This is
-
00:10:12 the optimization of the PyTorch as far as I know. Let's see the speed of generation. OK,
-
00:10:18 this one is even slower than the xFormers. I'm going to test these other ones as well. Let's
-
00:10:24 see which one will perform best. This other one was even slower. Let's see the Doggettx. This is
-
00:10:31 the latest one. So this is it. It is not always best to upgrade NVIDIA drivers. Unfortunately,
-
00:10:38 currently, this new driver didn't bring me any speed increase for some reason. If we get a speed
-
00:10:44 increase, let me know. But the TensorRT is working amazingly. So you should use it if you are going
-
00:10:51 to generate a lot of images on the same model. Hopefully, see you in another amazing tutorial.
