Skip to content

PIXART α First Open Source Rival to Midjourney Better Than Stable Diffusion SDXL Full Tutorial

FurkanGozukara edited this page Oct 21, 2025 · 1 revision

PIXART-α : First Open Source Rival to Midjourney - Better Than Stable Diffusion SDXL - Full Tutorial

PIXART-α : First Open Source Rival to Midjourney - Better Than Stable Diffusion SDXL - Full Tutorial

image Hits Patreon BuyMeACoffee Furkan Gözükara Medium Codio Furkan Gözükara Medium

YouTube Channel Furkan Gözükara LinkedIn Udemy Twitter Follow Furkan Gözükara

Introduction to the new PixArt-α (PixArt Alpha) text to image model which is for real better than Stable Diffusion models even from SDXL. PixArt-α is close to the Midjourney level meanwhile being open source and supporting full fine tuning and DreamBooth training. In this tutorial I show how to install and use PixArt-α both locally and on a cloud service RunPod with automatic installers and step by step guidance.

The link to download resources ⤵️

https://www.patreon.com/posts/pixart-alpha-for-93614549

Stable Diffusion GitHub repository ⤵️

https://github.com/FurkanGozukara/Stable-Diffusion

SECourses Discord To Get Full Support ⤵️

https://discord.com/servers/software-engineering-courses-secourses-772774097734074388

PixArt Repo ⤵️

https://github.com/PixArt-alpha/PixArt-alpha

#PixArt #StableDiffusion #SDXL

00:00:00 Introduction to PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis and the tutorial content

00:02:38 What are the requirements to follow this tutorial and install PixArt Alpha

00:03:05 How to install PixArt Alpha on your machine and start using it

00:03:59 Where Hugging Face models are downloaded by default and how to change this default cache folder

00:05:44 How to return back to using default Hugging Face cache folder

00:06:08 How to fix corrupted files error during installation

00:06:29 How to start PixArt Web APP after installation has been completed

00:07:24 How to use PixArt Web APP and its features

00:07:59 Comparing a dragon prompt with SDXL base version

00:08:14 How to use provided styles csv file

00:08:40 How to start Automatic1111 SD Web UI on your second GPU

00:08:50 Where the PixArt Web APP generated images are saved

00:09:30 How to set parameters in your Automatic1111 SD Web UI to generate high quality images

00:09:49 PixArt generated image vs SDXL generated image for same simple prompt

00:10:15 Anime style same prompt comparison

00:10:55 One another strong aspect of the PixArt Alpha model

00:11:29 Fantasy art style comparison of SDXL vs PixArt-α

00:11:52 3D style comparison of SDXL vs PixArt-α

00:12:16 Manga style image generation comparison between SDXL vs PixArt-α

00:12:44 Comparing PixArt vs SDXL vs Midjourney with same prompt

00:13:41 How to use LLaVA for captioning and obtaining prompt ideas and generating more amazing images

00:16:12 Comparison of PixArt vs SDXL prompt following in details

00:17:29 Getting prompt idea from ChatGPT and comparing SDXL and PixArt prompt following

00:19:46 PixArt owns hard the SDXL with this new detailed prompt

00:22:00 How to install PixArt on a RunPod pod / machine

00:23:54 How to set default Hugging Face cache folder on RunPod / Linux machines

00:25:05 How to understand RunPod machine / pod is not working correctly and fix it

00:26:00 How to properly delete files / folders on RunPod machines / pods

00:26:51 How to connect and use PixArt web UI on a RunPod machine after it was started

00:28:20 How to download all of the generated images on RunPod with runpodctl very fast

The paper introduces PIXART-α, a Transformer-based text-to-image (T2I) diffusion model designed to significantly lower training costs while maintaining high image generation quality, competitive with leading models like Imagen and Midjourney. It achieves high-resolution synthesis up to 1024x1024 pixels at reduced training costs.

Key Innovations:

Training Strategy Decomposition: The process is divided into three steps focusing on pixel dependency, text-image alignment, and image aesthetic quality. This approach reduces learning costs by starting with a low-cost class-condition model and then pretraining and fine-tuning on data rich in information density and aesthetic quality.

Efficient T2I Transformer: Built on the Diffusion Transformer (DiT) framework, it includes cross-attention modules for text conditions and streamlines computation. A reparameterization technique enables loading parameters from class-condition models, leveraging prior knowledge from ImageNet, thus accelerating training.

High-informative Data: To overcome deficiencies in existing text-image datasets, the paper introduces an auto-labeling pipeline using a vision-language model (LLaVA) to generate captions on the SAM dataset. This dataset is selected for its diverse collection of objects, aiding in creating high-information-density text-image pairs for efficient alignment learning.

Image Quality: The model excels in image quality, artistry, and semantic control, surpassing existing models in user studies and benchmarks.

Broader Implications: The paper suggests that PIXART-α's approach allows individual researchers and startups to develop high-quality T2I models at lower costs, potentially democratizing access to advanced AI-generated content.

The paper concludes with the hope that PIXART-α will inspire the AIGC community and enable more entities to build their own generative models efficiently and affordably.

Video Transcription

  • 00:00:00 Greetings, everyone.

  • 00:00:01 In this video, I will introduce you to a new generative AI model to generate images from

  • 00:00:07 prompts: PixArt A, PixArt Alpha, and it is truly a rival for Stable Diffusion XL (SDXL).

  • 00:00:15 Actually, it is better than SDXL, and I will show you that.

  • 00:00:18 The power of PixArt is that it is able to follow prompts much better than Stable Diffusion

  • 00:00:24 XL.

  • 00:00:25 This power comes from the Text Encoder that PixArt uses.

  • 00:00:30 It is using a T5 Text Encoder, which is the most powerful Text Encoder.

  • 00:00:36 They also utilized LLaVA captioning during their training, which helped them significantly.

  • 00:00:41 So, in this tutorial, I will show you how to install PixArt on your computer and run

  • 00:00:46 it with just one-click installers that I have prepared for you.

  • 00:00:51 I will compare it with Stable Diffusion XL base version.

  • 00:00:54 I will compare it with Midjourney with the same prompting.

  • 00:00:57 I will show you how you can change your default Hugging Face caching folder where the models

  • 00:01:04 will be downloaded.

  • 00:01:05 I will share these styles file that you can use on your Automatic1111 Web UI, which comes

  • 00:01:11 with the PixArt Gradio.

  • 00:01:13 Moreover, I will show you how to use LLaVA for captioning images.

  • 00:01:19 Moreover, in this video, I will show you how to install and use PixArt on a RunPod machine

  • 00:01:25 as well.

  • 00:01:26 By following the same steps of RunPod installation, you can install PixArt on a Linux machine

  • 00:01:32 as well.

  • 00:01:33 So if you are a Linux user, then you can follow this tutorial to learn how to install and

  • 00:01:39 use PixArt on a Linux machine, or if you don't have a strong GPU, then you can follow this

  • 00:01:45 tutorial to install and use the PixArt on a RunPod machine.

  • 00:01:50 And finally, I keep working on the interface.

  • 00:01:53 After the tutorial has been completed, I made several improvements.

  • 00:01:58 You see, now it is using a better space of the screen.

  • 00:02:02 You can now see the entire prompt.

  • 00:02:05 Now it will display the generated images like this, as a gallery.

  • 00:02:09 Also, when you click this X icon, it will display the original resolution of the images.

  • 00:02:16 When you click any image back, it will return to the gallery option.

  • 00:02:20 Hopefully, I will keep improving the Gradio application.

  • 00:02:24 So, everything we are going to need is shared in this post.

  • 00:02:28 I am going to share the link of this post in the description of the video and also in

  • 00:02:33 the comment section of the video.

  • 00:02:36 I have prepared amazing installer files.

  • 00:02:38 All you need to do is install Python and Git if you haven't yet.

  • 00:02:44 I have this amazing tutorial for how to install Python and Git.

  • 00:02:48 When you type Python, you should get this message: 3.10.x version.

  • 00:02:54 I am preferring 11.

  • 00:02:56 I haven't tested with other Python versions.

  • 00:02:58 It may work, but it may not also work.

  • 00:03:01 And when you type Git, you should get that this message the Git is installed.

  • 00:03:06 After that, just download the PixArt installer.zip file.

  • 00:03:10 When you click here or click the attachment, you will get the attachment downloaded.

  • 00:03:17 Move it into wherever you want to install.

  • 00:03:20 Let's make a new folder in G drive: Test PixArt.

  • 00:03:25 Paste it there.

  • 00:03:26 Extract it.

  • 00:03:27 This is a zip file, so you can extract it on Windows automatically.

  • 00:03:31 And just double-click install.bat file, and it will install everything fully automatically

  • 00:03:36 for you.

  • 00:03:37 If you are a Linux user, then follow the instructions for the RunPod installer.sh file.

  • 00:03:44 I will also show how to install the RunPod in this tutorial, so you can look at the video

  • 00:03:48 chapters and seek to that section as well.

  • 00:03:52 The installer will automatically install everything for us.

  • 00:03:56 Then the models will get automatically downloaded.

  • 00:03:58 If you have previously used any Hugging Face models, then you will know that they are getting

  • 00:04:04 downloaded into the cache folder.

  • 00:04:06 The cache folder is inside C drive, inside users, inside your username, inside cache,

  • 00:04:14 inside Hugging Face, inside hub.

  • 00:04:17 This is where all of the model files are by default downloaded.

  • 00:04:22 So let me show you the size of my cache folder in my C drive.

  • 00:04:26 You see, I am using 406 gigabytes in my cache folder.

  • 00:04:31 So some people were asking me how they can change the cache folder where the model files

  • 00:04:38 will be automatically downloaded.

  • 00:04:39 To change it, start a new cmd as administrator.

  • 00:04:42 So I type cmd, right-click, and run as administrator.

  • 00:04:46 Click yes.

  • 00:04:47 Then execute this command according to where you want the cache folder to be.

  • 00:04:53 So let me show you closely.

  • 00:04:55 Let's say I want my cache folder to be Hugging Face models folder.

  • 00:04:59 So I right-click and new, generate folder, and enter here.

  • 00:05:04 Copy the path and paste it here.

  • 00:05:06 Then I copy this, paste it into the command line interface.

  • 00:05:09 Hit enter.

  • 00:05:11 This will generate a key in system environment variables.

  • 00:05:15 So when I open edit system environment variables, click environment variables, I will see that

  • 00:05:22 HF_HOME.

  • 00:05:23 This is Hugging Face home.

  • 00:05:25 This is where the Hugging Face libraries will look to download the models as a cache.

  • 00:05:31 So now all of the models will be downloaded into this drive.

  • 00:05:35 Let's okay and okay.

  • 00:05:36 This is done.

  • 00:05:37 Let's see the installer status.

  • 00:05:39 It is still installing.

  • 00:05:40 This totally depends on your hard drive speed and your internet connection speed.

  • 00:05:44 So let's say you wanted to delete this custom model and return to the default cache folder.

  • 00:05:51 What you need to do: open the environment variables again, environment variables, and

  • 00:05:55 just delete this variable.

  • 00:05:56 When you click delete, it will delete the cache folder, and it will download into the

  • 00:06:02 default cache.

  • 00:06:04 I tested this, and it is working.

  • 00:06:05 Okay, installer almost completed.

  • 00:06:08 Let's say during the installation something happened and some of your files are corrupted.

  • 00:06:13 Then you need to delete this temporary folder.

  • 00:06:15 This is usually where the downloaded libraries are temporarily saved.

  • 00:06:21 So if you delete this temporary folder in your case, then it should fix your installation

  • 00:06:26 errors if you encounter any error.

  • 00:06:29 Okay, the installation has been completed.

  • 00:06:31 This is the screen.

  • 00:06:32 Just hit anywhere, and it will close automatically.

  • 00:06:35 Let's return back to our installation folder.

  • 00:06:38 Now there are several options.

  • 00:06:40 I suggest you use the 1024 pixel model.

  • 00:06:43 The 512 pixel model is really bad.

  • 00:06:45 So, it doesn't worth to spend your time.

  • 00:06:48 There are two options: run as 8-bit and run as 16-bit.

  • 00:06:52 The 8-bit version will load the Text Encoder in 8-bit instead of 16-bit.

  • 00:06:57 So, it will use lesser VRAM.

  • 00:07:00 However, it may have a little bit degraded quality.

  • 00:07:04 I will run the 16-bit version.

  • 00:07:07 I have RTX 3090 TI, so I don't need the 8-bit version.

  • 00:07:13 Since I have previously downloaded the model files, it didn't re-download.

  • 00:07:17 I can already see them here.

  • 00:07:19 You see, this is the 1024 pixel model.

  • 00:07:23 It is 20 GB.

  • 00:07:24 So, this is the screen which we will use.

  • 00:07:27 I have edited this screen and added new functionality from the official repository.

  • 00:07:32 Such as that, you have batch count.

  • 00:07:34 This batch count will generate any number of images that you want.

  • 00:07:39 Alright, let's try a prompt: a dragon, and nothing else.

  • 00:07:43 Let's run it.

  • 00:07:44 I also modified the command line interface screen so it will give us more information.

  • 00:07:49 It says starting generating 1 images.

  • 00:07:53 It will also show the average step duration and the average image duration like this,

  • 00:07:58 and we got the image.

  • 00:08:00 And you see, just a dragon.

  • 00:08:01 We got an amazing result.

  • 00:08:03 This is an amazing image.

  • 00:08:05 Let's compare this with SDXL as well.

  • 00:08:09 By the way, currently no style is selected.

  • 00:08:12 I also made the style csv file in here.

  • 00:08:15 You see, style csv.

  • 00:08:16 This is the same as this repository using.

  • 00:08:21 So when I double-click and edit the app.py file, you can see the styles are provided

  • 00:08:27 already here.

  • 00:08:28 So I made a file exactly the same with that, and I will put this into my Automatic1111

  • 00:08:35 Web UI installation to compare with SDXL.

  • 00:08:39 So I just copy-paste it.

  • 00:08:40 Then I need to start my SDXL.

  • 00:08:43 I will start it on my second gpu, which is RTX 3060 here.

  • 00:08:49 So we can use both of them at the same time.

  • 00:08:52 Also, I modified this application to save the generated images inside outputs folder

  • 00:08:58 here.

  • 00:08:59 It will generate with this format: the day of the image generated and the random name.

  • 00:09:05 So you can right-click and sort by date to see the latest generations at the top.

  • 00:09:10 So this is the image generated.

  • 00:09:12 Let's generate the same image in the SDXL.

  • 00:09:14 This was the first image, not a cherry-picking or anything else.

  • 00:09:18 This model also has fine-tuning as well, like DreamBooth training, but I am still searching

  • 00:09:25 them.

  • 00:09:26 Hopefully, I will make tutorials for fine-tuning as well.

  • 00:09:28 It also supports ControlNet as far as I know.

  • 00:09:31 So, let's load our SDXL base model from here.

  • 00:09:36 Let's select the VAE fp16 VAE.

  • 00:09:39 Let's select the sampler.

  • 00:09:40 This is the best sampler that I find.

  • 00:09:43 Let's select the resolution and let's type the same prompt, and nothing else, and generate.

  • 00:09:49 And this is the same prompt output we got from the SDXL base version.

  • 00:09:54 Do you see the difference?

  • 00:09:56 This model generates this.

  • 00:09:58 It is like Midjourney, and this model generates this, SDXL base version.

  • 00:10:04 Let's make this somewhat styled.

  • 00:10:06 Let's try anime, for example, and let's also select the anime.

  • 00:10:10 So, it will be exactly the same prompt.

  • 00:10:12 Okay, let's hit generate in both of them.

  • 00:10:15 This model is also pretty fast, and I find that DPM Solver Inference steps 60 is best.

  • 00:10:22 You can also play with other variables here, and this is the anime version of the image.

  • 00:10:28 Let's open this in a new tab.

  • 00:10:29 Yeah, this is the default resolution.

  • 00:10:31 This is the anime output, and this is the anime output from SDXL.

  • 00:10:36 I think SDXL performed better this time in terms of quality.

  • 00:10:41 Let's try another one.

  • 00:10:43 So, let's try fantasy art.

  • 00:10:45 Okay, run.

  • 00:10:46 Let's select the fantasy art from here and run.

  • 00:10:49 There is also one another very strong aspect of the PixArt Alpha model: It can follow prompts

  • 00:10:56 much better than the SDXL Stable Diffusion XL.

  • 00:10:59 When we read the paper, we can see it.

  • 00:11:03 They are using a T5 Text Encoder, which is an extremely strong text encoder.

  • 00:11:08 So, you can also read this paper.

  • 00:11:11 The official links are shared here.

  • 00:11:13 Open them, and you will see the links.

  • 00:11:15 So, in this case, in the fantasy art, this is the output that PixArt generated.

  • 00:11:21 Let's see it from here.

  • 00:11:23 This is the output PixArt generated, and this is the SDXL generated image.

  • 00:11:29 Let's compare them.

  • 00:11:30 So, really, really cool.

  • 00:11:31 I don't know which one is the winner.

  • 00:11:33 I think PixArt is better than the SDXL model.

  • 00:11:37 Let's try another style.

  • 00:11:39 Let's try a 3D model generate, and let's also try a 3D model from here and generate.

  • 00:11:45 By the way, using the same seed will not have the same effect because of the model's differences.

  • 00:11:51 Okay, it generated this as a 3D output, and this is the 3D output of the SDXL.

  • 00:11:57 I think this time the winner is PixArt.

  • 00:12:01 Maybe we should generate multiple images.

  • 00:12:04 Let's try manga and let's make the batch count 4.

  • 00:12:08 Let's generate 4 images.

  • 00:12:09 Let's do the same in Automatic1111 Web UI for the SDXL base.

  • 00:12:14 Let's generate.

  • 00:12:15 Okay, images have been generated.

  • 00:12:17 Let's compare them.

  • 00:12:18 So, this is the first image of the PixArt.

  • 00:12:21 In the Gradio interface, you can also move between images like this, as you are seeing.

  • 00:12:27 So, these are all of the images generated with the PixArt, and here we see the images

  • 00:12:32 generated by the SDXL base version.

  • 00:12:35 You see, these are the manga images.

  • 00:12:37 I think PixArt is the winner here.

  • 00:12:40 Now, I want to compare them with Midjourney.

  • 00:12:43 So, there is a Midjourney prompt here.

  • 00:12:46 Let's copy it.

  • 00:12:47 You see, this is the output of the Midjourney.

  • 00:12:49 Let's open it in a browser, so we can see it in full resolution.

  • 00:12:53 So, for this prompt, Midjourney generated this output.

  • 00:12:57 Let's try the same in here.

  • 00:12:59 I will try with a default prompt.

  • 00:13:02 Let's generate in Automatic1111 Web UI, and let's also generate here with no style and

  • 00:13:07 run.

  • 00:13:08 Okay, we got the results.

  • 00:13:10 Let's compare.

  • 00:13:11 So, these are the images generated by the PixArt Alpha version.

  • 00:13:16 These are the images generated by the SDXL, and these are the images generated by Midjourney.

  • 00:13:23 So, if we see them in a single image, they are looking like this: First one is SDXL,

  • 00:13:28 the second one is PixArt Alpha, and the third one is Midjourney.

  • 00:13:33 However, as I said, the power of PixArt comes from the prompting itself.

  • 00:13:39 So, what I am going to do is, I am going to use the LLaVA captioning.

  • 00:13:45 I just updated it in my Patreon post.

  • 00:13:48 Now, you can automatically install and caption with LLaVA.

  • 00:13:52 Now, I will start my LLaVA to caption the image of Midjourney, and I will use that caption

  • 00:13:58 in PixArt and SDXL, and we will see the difference.

  • 00:14:01 So, I will run the first part.

  • 00:14:04 This is from my automatic installation, and I will start the second part.

  • 00:14:08 Hopefully, I will also make a full tutorial for all of the captioning models I shared

  • 00:14:13 here.

  • 00:14:14 This is the very best captioning arsenal.

  • 00:14:17 I have prepared everything for you guys.

  • 00:14:19 So, part one started.

  • 00:14:21 Part two started.

  • 00:14:22 Let's start part three.

  • 00:14:23 By the way, this will take a huge amount of RAM.

  • 00:14:26 So, I will use the 8-bit version of the 13b model.

  • 00:14:30 So, let's start it.

  • 00:14:32 This LLaVA also has RunPod installation.

  • 00:14:34 It is working amazingly on RunPod.

  • 00:14:37 If you rent an A6000 from RunPod, it's only 70 cents per hour, even cheaper.

  • 00:14:42 You can use this amazingly on RunPod.

  • 00:14:45 Okay, it is starting.

  • 00:14:47 These are also all modified and made into one-click installation by me.

  • 00:14:53 Okay, the Web UI started.

  • 00:14:55 Let's open the LLaVA chatbot.

  • 00:14:58 You can also chat with it.

  • 00:14:59 You can use it for captioning images.

  • 00:15:02 You can use it for any task you want with LLaVA.

  • 00:15:05 You see the model loaded.

  • 00:15:07 Let's select the image from downloads here.

  • 00:15:10 Okay, so I will use this prompt.

  • 00:15:11 Okay, I need to select again.

  • 00:15:13 Okay, I will use this prompt.

  • 00:15:15 This is a prompt that I have found to caption images.

  • 00:15:19 Just caption the image with details, colors, items, objects, emotions, art style, drawing

  • 00:15:24 style.

  • 00:15:25 You can also try other prompts to generate captions.

  • 00:15:27 Okay, we are getting a detailed caption right now.

  • 00:15:31 You see, the LLaVA is able to caption images amazingly, and it is for free.

  • 00:15:36 Okay, let's copy this.

  • 00:15:37 Then, I will terminate my LLaVA and rerun the PixArt Alpha.

  • 00:15:43 Okay, let's run it again.

  • 00:15:45 My Stable Diffusion is still running on my second GPU.

  • 00:15:49 It is 112 tokens in the Stable Diffusion tokenizer.

  • 00:15:54 Okay, the Stable Diffusion started.

  • 00:15:56 The PixArt also started.

  • 00:15:58 Let's enter the prompt and let's generate four images and run.

  • 00:16:03 Let's see this time what we will get.

  • 00:16:05 So, this is an amazing methodology to get detailed prompts and generate images.

  • 00:16:10 Okay, we got the images.

  • 00:16:12 Now, time to compare how well the prompt is followed by the model.

  • 00:16:18 First, let's begin with the PixArt output.

  • 00:16:22 So, the image features a robot with a fiery orange background.

  • 00:16:26 Very accurate.

  • 00:16:27 The robot is wearing black and orange armor, and its face is glowing red.

  • 00:16:32 Accurate.

  • 00:16:33 The robot is standing in front of a building, which appears to be on fire.

  • 00:16:37 This is also accurate.

  • 00:16:38 The scene is set against a dark sky, adding to the dramatic atmosphere.

  • 00:16:43 Accurate.

  • 00:16:44 The overall color palette of the image is predominantly orange and black, with some

  • 00:16:49 red accents.

  • 00:16:50 Accurate.

  • 00:16:51 The art style seems to be a mix of futuristic and post-apocalyptic, with the robot's design

  • 00:16:56 and burning building creating a sense of danger and intensity.

  • 00:17:01 You see how well it followed the prompt.

  • 00:17:04 It is exactly as the prompt.

  • 00:17:07 Now, let's compare it with the SDXL output.

  • 00:17:11 The SDXL output looks much weaker than the PixArt.

  • 00:17:15 Maybe this is the best one.

  • 00:17:17 So, we can compare it with the prompt.

  • 00:17:20 The SDXL also tried to follow the prompt, but not as accurately as I think the PixArt.

  • 00:17:28 Now, let's try another, and more beautiful, prompt.

  • 00:17:32 Actually, for this prompt, I will use ChatGPT.

  • 00:17:34 Write me a detailed prompt to generate an amazing image of a horse running on a beautiful

  • 00:17:46 mountain on a beautiful day.

  • 00:17:48 Okay, I fixed the typo errors, and let's see the result of the ChatGPT.

  • 00:17:54 So, I will use this prompt to generate images both on the Stable Diffusion XL and also on

  • 00:18:01 the PixArt.

  • 00:18:02 Okay, here, this is a big prompt, so the tokenizer of PixArt may not work.

  • 00:18:08 Let's generate this and let's see the command line interface.

  • 00:18:11 Yes, the tokenizer of PixArt is currently limited to 120 tokens.

  • 00:18:17 So, after this part of the prompt is truncated.

  • 00:18:22 Let's use the Automatic1111 Web UI and generate the images, then compare the results.

  • 00:18:27 Okay, the PixArt images are generated.

  • 00:18:29 Let's compare and see how well it followed the prompt.

  • 00:18:34 However, the prompt was truncated.

  • 00:18:35 Visualize a majestic horse with a glossy chestnut coat, running spiritedly along a lush, vibrant

  • 00:18:42 green mountain trail.

  • 00:18:44 Looking accurate, the scene is set on a gorgeous day, with a clear azure sky above and a few

  • 00:18:50 fluffy white clouds drifting lazily.

  • 00:18:53 Looking accurate.

  • 00:18:54 The sun is shining brightly, casting a warm golden glow over the landscape.

  • 00:18:59 Surrounding the trails are white flowers in a kaleidoscope of colors, swaying gently in

  • 00:19:06 the light breeze.

  • 00:19:07 It is looking amazing, and it was truncated from "stead in snow."

  • 00:19:13 So, where was it?

  • 00:19:15 Okay, it was truncated from "stead in snow."

  • 00:19:20 Okay, it was truncated from here, so it was only up to this part, this image generated.

  • 00:19:28 However, there is some quality loss on the horse.

  • 00:19:33 Let's also try an increased number of steps.

  • 00:19:38 I will try with 60 steps.

  • 00:19:39 I think 60 steps is the best.

  • 00:19:42 Let's generate again, and meanwhile, let's see the results of the Stable Diffusion XL.

  • 00:19:47 The Stable Diffusion XL prompt is like this.

  • 00:19:50 It is nothing like, nothing like the PixArt.

  • 00:19:53 You see, this is PixArt.

  • 00:19:55 This is like Midjourney level, and this is the SDXL.

  • 00:19:58 It cannot even be compared with the PixArt.

  • 00:20:02 PixArt is much better at following and also at the quality, and let's also see the results

  • 00:20:08 of 60 steps.

  • 00:20:09 So, this is a 60-step image.

  • 00:20:12 You see, with 60 steps, there is a significant improvement in the quality of the horse.

  • 00:20:18 This is 20 steps, and this is 60 steps.

  • 00:20:20 The 60 steps is much better, much clearer.

  • 00:20:23 Still, it is able to follow the prompt very well.

  • 00:20:26 It will take longer to generate images, but finally, we can get better quality images

  • 00:20:33 with just a lesser number of trying.

  • 00:20:35 So, the 60 steps definitely improved the quality of the image.

  • 00:20:40 You see, PixArt is the future.

  • 00:20:42 PixArt is also supporting DreamBooth training.

  • 00:20:45 However, there aren't enough resources for how to do it yet.

  • 00:20:49 It is requiring you to prepare a specific config file with images.

  • 00:20:55 So, I am not sure yet how to train it.

  • 00:20:58 However, I will look for it.

  • 00:21:00 Hopefully, Kohya will add this to its training pipeline so we can directly train PixArt and

  • 00:21:07 see the difference of DreamBooth training in PixArt.

  • 00:21:11 We are already able to train DreamBooth of Stable Diffusion XL very well.

  • 00:21:17 I would like to see that in PixArt too.

  • 00:21:19 Wow, the final image of the PixArt is very different than the others.

  • 00:21:23 By the way, we can also apply styles, but we didn't apply it yet.

  • 00:21:28 So, with PixArt, you can generate ideas, prompts from ChatGPT, or wherever you want and make

  • 00:21:34 it follow.

  • 00:21:35 And the difference is just humongous between the Stable Diffusion XL and the PixArt when

  • 00:21:40 it comes to following the prompts.

  • 00:21:43 So, PixArt is definitely better than Stable Diffusion XL, if you ask my opinion.

  • 00:21:49 It really needs to be added to the Automatic1111 Web UI pipeline and the Kohya training pipeline.

  • 00:21:56 I hope they add this PixArt into their repositories.

  • 00:22:00 So, now I will show you how to install and use PixArt on a RunPod machine.

  • 00:22:06 If you are new to RunPod, watch this amazing beginner's RunPod tutorial.

  • 00:22:11 It is over 100 minutes.

  • 00:22:14 It will teach you pretty much everything with RunPod.

  • 00:22:17 Use this link to register or login.

  • 00:22:20 I will log into my account.

  • 00:22:22 I will go to community cloud.

  • 00:22:24 Select the extreme speed from here.

  • 00:22:26 I will use RTX 3090.

  • 00:22:28 This is an amazing GPU.

  • 00:22:31 It works with a lot of things.

  • 00:22:33 From here, search for PyTorch.

  • 00:22:34 I will use RunPod PyTorch 2.0.1.

  • 00:22:38 It doesn't matter.

  • 00:22:39 Customize deployment.

  • 00:22:40 Make the volume disk 50 gigabytes and continue and deploy.

  • 00:22:45 Let's go to my pods.

  • 00:22:46 Let's delete this older one and let's see the logs.

  • 00:22:50 The logs are really important.

  • 00:22:51 Watch the logs.

  • 00:22:53 If the machine is broken, if something is not working, you will see messages here.

  • 00:22:58 So, this is an easy to load template as you are seeing right now.

  • 00:23:01 The pod is loaded.

  • 00:23:03 Click connect.

  • 00:23:04 Connect to JupyterLab.

  • 00:23:06 Let's download the installer again if you haven't downloaded yet.

  • 00:23:10 Extract it into any folder.

  • 00:23:12 Enter inside extraction.

  • 00:23:13 Drag and drop the RunPod installer sh file here.

  • 00:23:18 Open a new terminal.

  • 00:23:19 Open the RunPod instructions file.

  • 00:23:22 All of the instructions are written here.

  • 00:23:25 If you are a Linux user, all you need to do is change the file paths from here.

  • 00:23:30 Nothing else.

  • 00:23:31 This will generate a new virtual environment folder and install everything there.

  • 00:23:37 Make sure that you are using Python 3.10 version because I haven't tested with other different

  • 00:23:43 Python versions.

  • 00:23:44 So, I don't know whether they will work or not.

  • 00:23:47 The first command that we will execute is this one.

  • 00:23:51 And after that, for using we will use this one.

  • 00:23:54 So, you see we did set the Hugging Face home the cache folder as workspace.

  • 00:23:59 This is really important.

  • 00:24:01 This is where the models will get downloaded.

  • 00:24:03 Then, we will use this command to run it after the installation has been completed.

  • 00:24:10 The installation has been completed.

  • 00:24:11 Let's open a new terminal.

  • 00:24:13 Let's copy the start command and execute it.

  • 00:24:17 Copy paste.

  • 00:24:18 When you first time start the web UI, it will download the model into the cache folder.

  • 00:24:25 We will see in a moment.

  • 00:24:27 It is here.

  • 00:24:28 You see, inside hub folder.

  • 00:24:29 It will download the model files and it will start the web UI with Gradio live sharing.

  • 00:24:36 Unfortunately, connecting via the proxy of RunPod via certain port is not working.

  • 00:24:43 I have opened an issue thread for this on a Gradio GitHub.

  • 00:24:48 Let me show you Gradio GitHub.

  • 00:24:51 And let me show you the issue that I have opened.

  • 00:24:54 If they fix this error, hopefully, you can also use the RunPod and PixArt Alpha with

  • 00:25:01 proxy connection.

  • 00:25:02 But for now, we will use Gradio sharing.

  • 00:25:04 So, the models will get downloaded.

  • 00:25:07 By the way, is this machine not working correctly?

  • 00:25:09 Yes, the download speeds of this machine are horrific.

  • 00:25:13 If this happens, you need to get a new machine, usually because you see the downloads are

  • 00:25:19 really, really slow for some reason.

  • 00:25:22 Maybe we can restart and see if the speed will get fixed.

  • 00:25:26 But this is how you install and run.

  • 00:25:29 Meanwhile, I will start another pod.

  • 00:25:31 Sometimes, you may also have these problems.

  • 00:25:34 I will go with RTX A5000 this time.

  • 00:25:36 Maybe this machine will work better.

  • 00:25:40 Okay.

  • 00:25:41 Set overrides and deploy.

  • 00:25:43 Okay.

  • 00:25:44 This machine has better hard drive speed perhaps.

  • 00:25:48 Let's see which one is faster.

  • 00:25:50 Let's go to the JupyterLab.

  • 00:25:52 You see the RunPod prices are really, really competitive.

  • 00:25:55 They are really the best prices having cloud GPU.

  • 00:25:59 Okay.

  • 00:26:00 I will delete the hub folder with this command.

  • 00:26:02 This is really important.

  • 00:26:04 rm -r hub.

  • 00:26:06 Do not use the Jupyter interface to delete folders.

  • 00:26:10 Okay, let's copy paste and start again.

  • 00:26:13 And let's connect the second machine meanwhile and try it.

  • 00:26:17 Okay.

  • 00:26:18 The speed is yeah.

  • 00:26:20 The speed is fixed after I closed the server and opened back.

  • 00:26:23 Now it is downloading the files accurately.

  • 00:26:25 So, we don't need the second machine, which is this one.

  • 00:26:30 Let's stop it and delete it.

  • 00:26:32 With the RunPods, you may have also similar problems.

  • 00:26:35 So, you should join our Discord channel and ask me questions or join the official RunPod

  • 00:26:41 Discord channel and ask questions there.

  • 00:26:43 This should be quickly completed because the download speed is really, really good around

  • 00:26:48 100 megabytes per second.

  • 00:26:50 Okay.

  • 00:26:51 The models are downloaded.

  • 00:26:52 Now we have a Gradio public link.

  • 00:26:55 Let's open it.

  • 00:26:56 Do not worry.

  • 00:26:57 No one can find this link if you don't share it.

  • 00:27:01 So, it is pretty safe.

  • 00:27:02 You can also define passwords to connect and that's it.

  • 00:27:06 Just type your prompts and run it.

  • 00:27:08 It will generate an image.

  • 00:27:10 You can also follow this terminal to see what is happening.

  • 00:27:14 You can see memory usage, the GPU utilization, and everything.

  • 00:27:19 And the image is here.

  • 00:27:20 It was really, really fast.

  • 00:27:22 Okay.

  • 00:27:23 Let's generate more images, like batch count 10 run.

  • 00:27:27 So, the speed is amazing.

  • 00:27:29 Average step count is only 350 milliseconds.

  • 00:27:33 Each image is being generated with seven seconds with 20 steps.

  • 00:27:38 We can see the outputs getting saved inside this folder.

  • 00:27:44 Let's see the images.

  • 00:27:45 Okay.

  • 00:27:46 We are seeing the images generated on RunPod.

  • 00:27:48 Really, really cool.

  • 00:27:49 Really, really amazing quality.

  • 00:27:52 Just amazing.

  • 00:27:53 We just typed a car, nothing else.

  • 00:27:55 And it is using its imagination to generate these images.

  • 00:28:00 You can also define much advanced prompt.

  • 00:28:03 So, this is how we install the PixArt on RunPod and use it.

  • 00:28:09 Exactly the same installer will work on Linux machines as well.

  • 00:28:13 So, if you have a Linux machine, you can follow the exactly same steps and install it on a

  • 00:28:19 Linux machine.

  • 00:28:20 And let's say you want to download all of the images.

  • 00:28:22 I suggest you to use runpodctl.

  • 00:28:24 So, runpodctl, send the folder name, type it, it will give you a link and you can download

  • 00:28:32 it anywhere you want.

  • 00:28:33 And let's download it here.

  • 00:28:35 Open a new cmd here, copy paste the link, and it will download all of the images into

  • 00:28:41 here.

  • 00:28:42 You see all of the generated images are downloaded into this folder.

  • 00:28:45 If you don't know how to use runpodctl, on this tutorial, runpodctl is explained with

  • 00:28:53 full details.

  • 00:28:54 I hope you have enjoyed.

  • 00:28:55 If you sponsor me, if you fork my repository, star it, I would appreciate that very much.

  • 00:29:01 In our GitHub repository, you will find all of the links and tutorials like this.

  • 00:29:06 You see, they are all written in structure from sorted by the date they were released.

  • 00:29:12 So, if you're a beginner of Stable Diffusion, you can watch all of the tutorials here.

  • 00:29:17 They are all amazing.

  • 00:29:18 They will teach you how the AI field is progressing with the timely manner.

  • 00:29:25 I am adding the new tutorials as I record them to the here, to the list, to the end

  • 00:29:29 of the list.

  • 00:29:30 So, you will not be disappointed.

  • 00:29:33 You should also join our Discord from here.

  • 00:29:35 You can support me also with buy me a coffee.

  • 00:29:38 You can follow me on Medium, CivitAI, DeviantArt.

  • 00:29:41 You should definitely subscribe to our YouTube channel.

  • 00:29:43 You can follow me on LinkedIn.

  • 00:29:45 You can also purchase my Udemy course and you can follow me on Twitter as well.

  • 00:29:47 Hopefully, see you in another amazing tutorial video.

Clone this wiki locally