A curated collection of cutting-edge research and resources for reasoning in Large Audio Language Models. Explore the frontier of multimodal reasoning through speech processing, chain-of-thought techniques, and audio-visual understanding.
π₯ Hot Topics
Multimodal Reasoning β’ Chain-of-Thought Prompting β’ Speech Translation β’ Emotion Recognition β’ Reinforcement Learning
| Title | Conference | Code | Highlights |
|---|---|---|---|
| Audio-CoT: Exploring Chain-of-Thought Reasoning in Large Audio Language Model | Preprint | Pioneering work on CoT in audio models | |
| CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought | INTERSPEECH 2024 | Multimodal CoT framework | |
| Chain-of-thought prompting for speech translation | EMNLP 2024 | Zero-shot speech translation |
| Title | Conference | Code | Highlights |
|---|---|---|---|
| Audio-Reasoner: Improving Reasoning Capability in Large Audio Language Models | ICASSP 2025 | Dedicated audio reasoning architecture | |
| Internalizing ASR with Implicit Chain of Thought for Efficient Speech-to-Speech Conversational LLM | ACL 2024 | End-to-end speech conversation |
| Title | Conference | Code | Highlights |
|---|---|---|---|
| Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio QA | NeurIPS 2024 | RL vs SFT comparison | |
| Steering Language Model to Stable Speech Emotion Recognition via Contextual Perception and CoT | ICMI 2024 | Emotion recognition stability |
- Multimodal Chain-of-Thought
Exploring reasoning paths through audio-text interactions - Speech Translation
Enhancing translation quality with reasoning mechanisms - Emotion Understanding
Stable emotion recognition through contextual reasoning - Efficient Architectures
Optimizing model structures for real-time applications
Coming soon...
Submit your project!
Coming soon...
Suggest a dataset!
We welcome contributions! Please see our:
Ways to contribute:
- Add missing papers (with summary)
- Suggest new categories
- Add dataset/resources section
- Improve documentation
This repository is licensed under CC-BY-4.0.
Please check individual paper licenses for specific usage.
Maintained with β€οΈ by [SK-HUANG]
Last updated: 2025/3/21