RFC: Deprecating AutoDeploy Backend
Rationale and Improving Model Support Moving Forward
Over the last year or so the TensorRT LLM team has been testing AutoDeploy as a compiler-based approach to supporting models closer to release date. Based on external usage and internal engineering alignment, the team plans to deprecate the AutoDeploy backend in TensorRT LLM. This will mean there will be no more feature development or added models. After a 3 month period, the backend will be removed from TensorRT LLM.
We are aware that earlier model support is a critical priority for many users of TensorRT LLM and are working on agentic approaches to improve time to functional model support in the PyTorch backend. As an early indicator, this was used to release Minimax M3 functional support on Day 0.
We plan to continue improving the modeling agent for implementing more models and more features as well as releasing the modeling agent for users once we are more confident about its reliability.
Retrospective
Successes
- Over 100+ LLMs and VLMs supported out of the box.
- Performance comparable to manually-tuned baselines for some models
- Successful production deployments with a select set of customers
Why AutoDeploy didn't win as a product feature
- AutoDeploy’s value-proposition (less effort/time to support a new model) was lessened by agents
- Agentic coding has reduced the cost of manual implementation
- Automation stopped short of the last mile
- AutoDeploy automated graph-level work, but peak performance still relies on human and agent written optimizations for kernels
Feedback
We welcome feedback for this change and are interested to hear if you have any use cases that are not currently supported by the PyTorch backend. Our plan is that the PyTorch backend should have parity for any features that users currently rely on AutoDeploy backend for.
RFC: Deprecating AutoDeploy Backend
Rationale and Improving Model Support Moving Forward
Over the last year or so the TensorRT LLM team has been testing AutoDeploy as a compiler-based approach to supporting models closer to release date. Based on external usage and internal engineering alignment, the team plans to deprecate the AutoDeploy backend in TensorRT LLM. This will mean there will be no more feature development or added models. After a 3 month period, the backend will be removed from TensorRT LLM.
We are aware that earlier model support is a critical priority for many users of TensorRT LLM and are working on agentic approaches to improve time to functional model support in the PyTorch backend. As an early indicator, this was used to release Minimax M3 functional support on Day 0.
We plan to continue improving the modeling agent for implementing more models and more features as well as releasing the modeling agent for users once we are more confident about its reliability.
Retrospective
Successes
Why AutoDeploy didn't win as a product feature
Feedback
We welcome feedback for this change and are interested to hear if you have any use cases that are not currently supported by the PyTorch backend. Our plan is that the PyTorch backend should have parity for any features that users currently rely on AutoDeploy backend for.