### 🚀 The feature, motivation and pitch - [ ] Object Detection: yolov[8-10], detr-resnet-50, yolos, EfficientDet, MobileViT, OWL-ViT - [ ] Depth Estimation: Depth Anything 2 - [ ] TTS: kokoro, parler-tts, microsoft/speecht5_tts - [ ] Audio: Whisper, Wav2Vec2, AST, CLAP - [ ] OCR: TrOCR, PaddlePaddle/PP-OCRv5_mobile_rec - [ ] Image super resolution: Swin2SR, Real-ESRGAN - [ ] Image Understanding: OpenCLIP, SegFormer - [ ] Image reasoning: SmolVLM-256M-Instruct - [ ] Semantic Text Search: sentence-transformers/all-MiniLM-L6-v2, BAAI/bge-base-en-v1.5 - [ ] Text Sentiment Analysis: RoBERTa ### Alternatives _No response_ ### Additional context _No response_ ### RFC (Optional) _No response_