Open
Description
TL;DR
Runtime optimization in torch-tensorrt is crucial for maximizing model performance in real-world applications.
This story tracks the effort to improve runtime performance.
Goal(s)
- Understand the overhead in cpp/python runtime module and improve the inference performance
- Ensure no or minimized impact on accuracy and resource with optimization
Tasks
### Tasks
- [ ] https://github.com/pytorch/TensorRT/pull/3276
- [ ] https://github.com/pytorch/TensorRT/issues/3277
Additional context
### Tasks