PSLLM is a PowerShell module for managing and interacting with a locally hosted Large Language Model (LLM) using the Cortex server. It enables AI-driven text generation, conversation management, Retrieval-Augmented Generation (RAG), and local model operations.
- AI Completion & Conversations: Generate AI responses and manage chat threads.
- Configuration & Server Control: Install, start, stop, and configure the LLM server.
- Model & Engine Management: Install, start, stop, and retrieve models and engines.
- File & RAG Integration: Upload and retrieve files for AI-augmented searches.
PSLLM should be used for...
- Sensitive Data: Completely local LLMs - no data ever leaves the computer.
- Asynchronous Workflows: With e.g., scheduled tasks or in potentially long-running scripts.
- Bulk Operations: Because it can be scheduled and run in the background, it is perfect for operating the same or multiple LLM operations based on an array of inputs.
- Cost-sensitive Automation: It is free - what more could you want?
- PowerShell Integrations: Everything you can access from PowerShell (local and Internet) can be used in the LLM workflow, e.g., as input data or output mechanism.
PSLLM should not be used for...
- Acting as a Chatbot: Speed depends heavily on your hardware. A cloud GPU cluster will be faster, but not every workflow depends on speed. And at the pace models currently advance in quality and speed, this will not be an issue for long.
- PowerShell 5.1
- Internet for installation, not for LLM usage
Install-Module -Name PSLLM -Scope CurrentUser
- Download the latest release from GitHub Releases.
- Extract the module to your PowerShell modules directory (
$env:PSModulePath
). - Import the module:
Import-Module PSLLM
Get-PSLLMCompletion -Message "What is the capital of France?"
On the first run, the following happens:
- Download and install the Cortex Windows installer (~1.3 GB).
- Download the default engine (llama-cpp).
- Download and load the default model (Mistral 7B) - Model size depends on the number of parameters as well as the quantization level. Check out Managing Models for more information.
- Generate the response.
Subsequent executions start the server, if not started, load the model, if not loaded, and generate the response.
This command starts or adds to a multi-turn conversation. It sends the whole thread to the LLM and adds the new message as well as the AI answer to the thread.
Enter-PSLLMConversation -Message "Explain list comprehensions in Python" -ThreadName "Python Basics"
Display the whole thread:
Get-PSLLMThreadMessages -ThreadName "Python Basics" -FormatAsChatHistory
Model selection can be tricky, because the options are vast. The recommendation is to start with specially prepared models by Cortex.so, found on HuggingFace. Every model in the *.gguf format can be used, but let's start with the Cortex.so models.
The easiest way is through their model page. Copy the command for the model you'd like to try (e.g., cortex run llama3.2
).
Open up a command prompt and run the command, after installing Cortex (run in PowerShell: Install-PSLLMServer
).
Then you should be presented a selection of models. In this example:
Available to download:
1. llama3.2:1b
2. llama3.2:3b
Copy the name of the model, size, and quantization you want (e.g., "llama3.2:3b"), for reference check the table below.
This name can then be used as the $ModelName
parameter within the PowerShell module. The command prompt can now be closed.
Model size approximations based on the number of Parameters and the used Quantization level:
P/Q | q2 | q3 | q4 | q5 | q6 | q8 |
---|---|---|---|---|---|---|
1B | 0.6GB | 0.7GB | 0.8GB | 0.9GB | 1GB | 1.3GB |
3B | 1.4GB | 1.7GB | 2GB | 2.3GB | 2.6GB | 3.4GB |
7B | 2.7GB | 3.5GB | 4.3GB | 5.1GB | 6GB | 7.7GB |
14B | 5.7GB | 7.3GB | 9GB | 10GB | 12GB | 16GB |
32B | 12GB | 16GB | 19GB | 23GB | 27GB | 35GB |
70B | 26GB | 34GB | 42GB | 50GB | N/A | N/A |
This is also roughly the amount of physical memory (RAM, not GPU) needed to run the models. Inference can be run on GPUs as well as CPUs; the primary difference is speed.
Some parameters that are used throughout the module can be stored centrally. This eliminates the need for specifying each time.
This example enables logging to '$env:localappdata\PSLLM\PSLLM.log
' and sets the Llama 3.2 model with 3 billion parameters as default. If not already, the model will be downloaded and loaded by default.
Save-PSLLMConfig -Logging $true -ModelName 'llama3.2:3b'
For all configuration options, see Save-PSLLMConfig.
For interactive usage, for example during development, it is highly recommended to make use of the '-Verbose' parameter, available for every PSLLM function.
See the full command reference for details on available cmdlets.
Contributions are welcome!
This project is licensed under the Apache License 2.0. See the LICENSE file for details. PSLLM is built on top of other open source projects, most directly on Cortex.so. Therefore, PSLLM uses the same license.
All the large language models that can be used, are open source but they are licensed individually and differently. You are responsible to research and know the license requirements of the configuration you use (check on HuggingFace for example).
For issues, please open a ticket on the GitHub Issues page.
- Start-PSLLMServer
- Stop-PSLLMServer
- Install-PSLLMServer
- Uninstall-PSLLMServer
- Get-PSLLMHardwareInfo
- Test-PSLLMHealth
- Add-PSLLMThreadMessage
- Get-PSLLMThreadMessages
- Get-PSLLMThread
- Get-PSLLMThreads
- New-PSLLMThread
- Remove-PSLLMThread
- Get-PSLLMModel
- Get-PSLLMModels
- Start-PSLLMModel
- Stop-PSLLMModel
- Install-PSLLMModel
- Remove-PSLLMModel
- Get-PSLLMEngine
- Get-PSLLMEngineReleases
- Start-PSLLMEngine
- Update-PSLLMEngine
- Stop-PSLLMEngine
- Install-PSLLMEngine
- Uninstall-PSLLMEngine
Retrieves an AI-generated response from a local language model via the Cortex server.
This advanced function interacts with a local AI model through the Cortex server's chat completion endpoint. It supports flexible message input methods:
- Single message with system context
- Multiple message thread conversations
- Customizable model parameters
- Synchronous and asynchronous processing modes
- Optional detailed response metadata
Key Capabilities:
- Supports both single message and multi-message conversation contexts
- Configurable model parameters (temperature, max tokens)
- Async processing with file or window output
- Detailed response tracking and logging
Type: String
Description: A single user message to send to the language model. Used when not providing a full message thread.
Type: Object
Description: An array of message objects representing a conversation thread. Allows for more complex conversational contexts.
Type: String
Description: Specifies the AI model to use. Defaults to the model configured in the system settings.
Type: String
Description: Optional. Specifies a particular engine for model processing.
Type: String
Description: Defines the system role or persona for the AI. Defaults to a helpful assistant persona.
Type: Int32
Description: Maximum number of tokens in the AI's response. Controls response length. Default is 2048.
Type: Single
Description: Controls response randomness (0.0-1.0):
- 0.0: Deterministic, focused responses
- 1.0: Maximum creativity and variation Default is 0.8.
Type: Single
Description: Controls token selection probability. Influences response diversity. Lower values make responses more focused, higher values increase variability. Default is 0.95.
Type: SwitchParameter
Description: When specified, returns comprehensive metadata about the response instead of just the text.
Type: SwitchParameter
Description: Enables asynchronous processing of the request.
Type: String
Description: Specifies async output method: "File", "Window", or "Both".
Type: String
Description: Directory for storing async results. Defaults to %LOCALAPPDATA%\PSLLM.
Type: SwitchParameter
Description: If set, saves the response to a JSON file in the DataDirectory.
Type: Object
Description: Configuration object containing system settings.
Basic usage with a simple question
Get-PSLLMCompletion -Message "What is the capital of France?"
Retrieve detailed response metadata
Get-PSLLMCompletion -Message "Explain quantum computing" -Detailed
Async processing with window display
Get-PSLLMCompletion -Message "Generate a Python script" -Async -AsyncType Window
Complex conversation thread
$thread = @(
@{ role = "user"; content = "Explain machine learning" },
@{ role = "assistant"; content = "Machine learning is..." }
)
Get-PSLLMCompletion -Messages $thread -Temperature 0.7
Continues or starts a conversation (thread) with a local Language Model (LLM).
The Enter-PSLLMConversation function allows you to interact with a local language model by sending a message and receiving a response. It manages conversation threads, creating a new thread if one doesn't exist or adding to an existing thread with the specified title.
Key features:
- Automatically creates a new thread if the specified title doesn't exist
- Adds user message to the thread
- Generates an AI response using the specified or default model
- Adds the AI response back to the thread
- Supports customization of model parameters like temperature and max tokens
Type: String
Description: The user input message for which the AI model will generate a response. This is a mandatory parameter.
Type: String
Description: The name of the conversation to create or add to. This helps in organizing and tracking multiple conversations.
Type: String
Description: Optional. The name of the AI model to use for generating responses. If not specified, uses the model from the configuration.
Type: String
Description: Optional. The initial system role or persona that defines the AI's behavior. Defaults to "You are a helpful assistant."
Type: Int32
Description: Optional. Maximum number of tokens in the AI's response. Defaults to 2048. Controls the length of the generated response.
Type: Single
Description: Optional. Controls the randomness of the AI's response. Range is 0.0-1.0.
- Lower values (closer to 0) make the output more focused and deterministic
- Higher values (closer to 1) make the output more creative and varied Defaults to 0.8.
Type: Single
Description: Optional. Controls the cumulative probability cutoff for token selection.
- Helps in controlling the diversity of the generated text
- Defaults to 0.95
Type: Object
Description: Optional. The configuration object containing settings for the LLM interaction. If not provided, the function will import the default configuration.
Start a new conversation about Python programming
Enter-PSLLMConversation -Message "Explain list comprehensions in Python" -ThreadName "Python Basics"
Continue an existing conversation with more context
Enter-PSLLMConversation -Message "Can you provide an example of a list comprehension?" -ThreadName "Python Basics" -Temperature 0.5
Use a specific model with custom settings
Enter-PSLLMConversation -Message "Write a short poem about technology" -ThreadName "Creative Writing" -ModelName "mistral:7b" -MaxTokens 2048 -Temperature 0.9
Retrieves relevant content from RAG (Retrieval-Augmented Generation) storage based on input text.
Uses embeddings to find and retrieve the most semantically similar content from previously stored RAG data. Calculates cosine similarity between the input text and stored embeddings to identify the most relevant content.
Type: String
Description: The input text to find similar content for.
Type: String
Description: The RAG group to search in. Defaults to "Default".
Type: String
Description: Optional. The name of the model to use. If not specified, uses the model from configuration.
Type: String
Description: Optional. The name of the engine to use. If not specified, uses the engine from configuration.
Type: Object
Description: The current configuration object.
Retrieves content most similar to the question about virtual machines.
Get-PSLLMRAGContent -Text "How do I create a new virtual machine?"
Searches for content about Azure Storage in the AzureDocs RAG group.
Get-PSLLMRAGContent -Text "What is Azure Storage?" -RAGGroup "AzureDocs"
Imports the PSLLM configurations.
Imports the PSLLM configurations from a JSON file in the local AppData directory.
Import-PSLLMConfig
Saves the PSLLM configurations.
Saves the PSLLM configurations to a JSON file in the local AppData directory.
Type: String
Description: The name of the engine to use.
Type: String
Description: The name of the model to use.
Type: Boolean
Description: Whether verbose outputs are logged to a file.
Type: String
Description: Base URI of the Cortex server. Defaults to "http://127.0.0.1:39281".
Save-PSLLMConfig -EngineName "llama-cpp" -ModelName "mistral:7b" -Logging $true
Starts the local LLM server with specified engine and model.
Initializes and starts the local LLM server, installing components if necessary. This function will:
- Install the server if not present
- Start the server process if not running
- Install and start the specified engine
- Install and start the specified model
Type: String
Description: The name of the engine to use. Must be one of: 'llama-cpp', 'onnxruntime', or 'tensorrt-llm'. If not specified, uses the engine from configuration.
Type: String
Description: The name of the model to load. If not specified, uses the model from configuration.
Type: Boolean
Description: Determines if only server needs to be started, or model needs to be loaded. Defaults to server only.
Type: SwitchParameter
Description: Not only start but first stop the server.
Type: Object
Description: The current configuration object.
Starts server with default engine and model from config
Start-PSLLMServer
Starts server with specific engine and model
Start-PSLLMServer -EngineName "llama-cpp" -ModelName "mistral:7b"
Stops the local LLM server process.
Sends a request to gracefully stop the local LLM server process.
Type: Object
Description: The current configuration object.
Stop-PSLLMServer
Installs the Cortex server for local LLM operations.
Downloads and installs the Cortex server application required for running local LLM operations. This function handles the complete installation process including:
- Checking for existing installation
- Downloading the installer (~1.3 GB)
- Running the installation
- Verifying the installation
Type: SwitchParameter
Description: If specified, skips confirmation prompts and proceeds with download and installation. Use this for automated installations.
Type: String
Description: The address from where to download the latest Cortex Windows installer.
Type: Object
Description: The current configuration object.
Interactively installs the server with confirmation prompts
Install-PSLLMServer
Installs the server without confirmation prompts
Install-PSLLMServer -Force
Installs the server with detailed progress information
Install-PSLLMServer -Verbose
Removes the Cortex application from the system.
Uninstalls the Cortex server and optionally deletes its associated data directory. The function identifies the uninstaller in the application directory and executes it silently.
If specified, the data directory is also deleted to ensure a clean uninstallation.
Type: SwitchParameter
Description: Skips confirmation prompts and directly executes the uninstallation.
Type: SwitchParameter
Description: Removes the data directory after uninstallation.
Type: String
Description: Specifies the path to the data directory. Defaults to %LOCALAPPDATA%\PSLLM
.
Type: Object
Description: The current configuration object.
Uninstalls the Cortex server and deletes its data directory.
Uninstall-PSLLMServer -DeleteData
Retrieves hardware information from the local LLM server.
Gets information about the hardware configuration and capabilities of the local LLM server.
Type: Object
Description: The current configuration object.
Get-PSLLMHardwareInfo
Tests the health status of the local LLM server.
Performs a health check on the local LLM server by making a request to the health endpoint. This function will return the server's health status and can be used to verify connectivity and server readiness.
Type: Object
Description: The current configuration object.
Test-PSLLMHealth
Adds a message to a chat thread.
Adds a new message to a specified chat thread using either its ID or title. Can optionally create the thread if it doesn't exist.
Type: Object
Description: The whole thread to add the message to.
Type: String
Description: The ID of the thread to add the message to.
Type: String
Description: The title of the thread to add the message to.
Type: String
Description: The content of the message to add.
Type: String
Description: The role of the message sender. Can be either "system", "user" or "assistant".
Type: SwitchParameter
Description: If specified, creates a new thread with the given name if it doesn't exist.
Type: Object
Description: The current configuration object.
Add-PSLLMThreadMessage -ThreadId "thread-123456" -Message "Hello!"
Add-PSLLMThreadMessage -ThreadName "My Chat" -Message "Hi there" -CreateThreadIfNotExists
Retrieves messages from a chat thread.
Gets all messages from a specified chat thread using either its ID or title. Can optionally format the messages as a chat history.
Type: Object
Description: The whole thread to retrieve messages from.
Type: String
Description: The ID of the thread to retrieve messages from.
Type: String
Description: The title of the thread to retrieve messages from.
Type: SwitchParameter
Description: If specified, formats the output as a readable chat history.
Type: Object
Description: The current configuration object.
Get-PSLLMThreadMessages -ThreadId "thread-123456"
Get-PSLLMThreadMessages -ThreadName "My Chat" -FormatAsChatHistory
Retrieves a specific chat thread by title.
Gets a chat thread from the local LLM server using its title.
Type: String
Description: The name of the thread to retrieve.
Type: Object
Description: The current configuration object.
Get-PSLLMThread -ThreadName "My Chat Session"
Retrieves all chat threads from the local LLM server.
Gets a list of all available chat threads from the local LLM server.
Type: Object
Description: The current configuration object.
Get-PSLLMThreads
Creates a new chat thread.
Creates a new chat thread on the local LLM server with the specified title. Optionally can reuse an existing thread if one exists with the same title.
Type: String
Description: The name for the new thread.
Type: SwitchParameter
Description: If specified, will return an existing thread with the same title instead of creating a new one.
Type: Object
Description: The current configuration object.
New-PSLLMThread -ThreadName "New Chat Session"
New-PSLLMThread -ThreadName "My Chat" -ReuseExisting
Removes a chat thread from the local LLM server.
Deletes a specified chat thread from the local LLM server using either its ID or title.
Type: Object
Description: The whole thread to remove.
Type: String
Description: The ID of the thread to remove.
Type: String
Description: The title of the thread to remove.
Type: Object
Description: The current configuration object.
Type: SwitchParameter
Description:
Type: SwitchParameter
Description:
Remove-PSLLMThread -ThreadId "thread-123456"
Remove-PSLLMThread -ThreadName "My Chat Session"
Uploads a file to the local LLM server.
Uploads a specified file to the local LLM server for use with assistants or other purposes. Supports various file purposes and handles the multipart form data upload.
Type: String
Description: The path to the file to upload.
Type: String
Description: The purpose of the file. Defaults to "assistants".
Type: String
Description: The RAG group to add the file to. Defaults to "Default".
Type: Int32
Description: The size of the chunk to embedd. Defaults to 1024.
Type: String
Description: Optional. The name of the model to use. If not specified, uses the model from configuration.
Type: Object
Description: The current configuration object.
Add-PSLLMFile -FilePath "C:\data\context.txt"
Add-PSLLMFile -FilePath "C:\data\training.json" -Purpose "fine-tuning"
Retrieves the content of a file from the local LLM server.
Gets the content of a specified file from the local LLM server using its file ID.
Type: String
Description: The ID of the file to retrieve.
Type: Object
Description: The current configuration object.
Get-PSLLMFileContent -FileId "file-123456"
Retrieves a list of files available on the local LLM server.
Gets all files that have been uploaded to the local LLM server for use with assistants or other purposes.
Type: Object
Description: The current configuration object.
Get-PSLLMFiles
Removes a file from the local LLM server.
Deletes a specified file from the local LLM server using its file ID.
Type: String
Description: The ID of the file to remove.
Type: Object
Description: The current configuration object.
Remove-PSLLMFile -FileId "file-123456"
Retrieves a specific model by name.
Gets a model from the local LLM server using its name.
Type: String
Description: The name of the model to retrieve.
Type: Object
Description: The current configuration object.
Get-PSLLMModel -ModelName "tinyllama"
Retrieves all available models from the local LLM server.
Gets a list of all models that are available on the local LLM server.
Type: Object
Description: The current configuration object.
Get-PSLLMModels
Starts a model on the local LLM server.
Initializes and starts a specified model on the local LLM server. If the model is not already installed, it will be downloaded and installed first. This function handles the complete lifecycle of getting a model ready for use, including:
- Checking if the model exists
- Installing if necessary
- Starting the model
- Verifying the model is running
Type: String
Description: The name and version of the model to start, in the format "name:version". If not specified, uses the model from configuration.
Type: Object
Description: The current configuration object.
Starts the default model specified in configuration
Start-PSLLMModel
Starts the specified model
Start-PSLLMModel -ModelName "mistral:7b"
Stops a running model on the local LLM server.
Gracefully stops a specified model that is running on the local LLM server.
Type: Object
Description: The model to stop.
Type: String
Description: The name of the model to stop.
Type: Object
Description: The current configuration object.
Stop-PSLLMModel -ModelName "mistral:7b"
Installs a new model on the local LLM server.
Downloads and installs a specified model on the local LLM server for use with chat completions and other tasks. Chose any model from "https://cortex.so/models".
Type: String
Description: The name of the model to install.
Type: Object
Description: The current configuration object.
Install-PSLLMModel -ModelName "mistral:7b"
Removes a model from the local LLM server.
Deletes a specified model from the local LLM server using either its ID or name.
Type: Object
Description: The whole model to remove.
Type: String
Description: The ID of the model to remove.
Type: String
Description: The title of the model to remove.
Type: Object
Description: The current configuration object.
Type: SwitchParameter
Description:
Type: SwitchParameter
Description:
Remove-PSLLMModel -ModelId "model-123456"
Remove-PSLLMModel -ModelName "mistral:7b"
Retrieves the requested LLM engine from the local server.
Gets the requested LLM engine (llama-cpp, onnxruntime, tensorrt-llm) from the local server.
Type: String
Description: The name of the engine to use.
Type: Object
Description: The current configuration object.
Get-PSLLMEngine -EngineName "llama-cpp"
Retrieves all available releases for a specific LLM engine.
Gets a list of all releases for the specified LLM engine from the local server.
Type: String
Description: The name of the engine (llama-cpp, onnxruntime, or tensorrt-llm).
Type: SwitchParameter
Description: Switch to only get the latest release.
Type: Object
Description: The current configuration object.
Get-PSLLMEngineReleases -EngineName "llama-cpp"
Loads and starts a specific LLM engine on the local server.
Initializes and starts the specified LLM engine on the local server.
Type: String
Description: The name of the engine to start (llama-cpp, onnxruntime, or tensorrt-llm).
Type: Object
Description: The current configuration object.
Start-PSLLMEngine -EngineName "llama-cpp"
Updates a specific LLM engine on the local server.
Updates the specified LLM engine to the latest version on the local server.
Type: Object
Description: The engine to update.
Type: String
Description: The name of the engine to update (llama-cpp, onnxruntime, or tensorrt-llm).
Type: Object
Description: The current configuration object.
Update-PSLLMEngine -EngineName "llama-cpp"
Stops a loaded engine on the local LLM server.
Gracefully stops a specified engine that is running on the local LLM server.
Type: Object
Description: The engine to stop.
Type: String
Description: The name of the model to stop.
Type: Object
Description: The current configuration object.
Stop-PSLLMEngine -EngineName "llama-cpp"
Installs a specific LLM engine on the local server.
Downloads and installs the specified LLM engine on the local server.
Type: String
Description: The name of the engine to install (llama-cpp, onnxruntime, or tensorrt-llm).
Type: Object
Description: The current configuration object.
Install-PSLLMEngine -EngineName "llama-cpp"
Uninstalls a specific LLM engine from the local server.
Removes the specified LLM engine from the local server.
Type: Object
Description: The engine to uninstall.
Type: String
Description: The name of the engine to uninstall (llama-cpp, onnxruntime, or tensorrt-llm).
Type: Object
Description: The current configuration object.
Uninstall-PSLLMEngine -EngineName "llama-cpp"