Skip to content

Add Intel Gaudi HPU Support #1023

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Conversation

BartoszBLL
Copy link

This PR introduces support for Habana Gaudi (HPU) acceleration and improves device handling across the project. Key changes include:

New HPU-Compatible Dockerfile

  • Added Dockerfile.hpu to support Habana Gaudi with PyTorch.
  • Uses vault.habana.ai/gaudi-docker as the base image.
  • Installs necessary dependencies and sets environment variables for HPU.

Benchmarking on HPU

  • Introduced benchmark_hpu.py to compare execution times between CPU and HPU.
  • Uses habana_frameworks.torch for HPU acceleration.

Refactored Device Selection

  • Unified device selection via get_device() in utils/util.py.
  • Replaced multiple scattered torch.device("cuda" if torch.cuda.is_available() else "cpu") checks.
  • Now prioritizes HPU if available, then CUDA, falling back to CPU.

Model Loading Adjustments

  • Ensured models are first loaded on CPU before transferring to the appropriate device to prevent issues with incompatible state dictionaries.
  • Updated synthesizer, vocoder, encoder, and other model scripts accordingly.

Dependency Updates

  • Updated requirements.txt to allow NumPy versions >=1.21.0 for better compatibility.

Why

  • Adds Habana Gaudi acceleration support for improved training and inference performance.
  • Standardizes device management for easier maintainability.
  • Enhances model compatibility across different hardware configurations.

Testing

  • Verified model loading and inference on HPU, CUDA, and CPU.
  • Benchmark (3 runs averaged) shows a 16.25× speedup on HPU vs CPU:
    • HPU: ~23.04 seconds
    • CPU: ~374.27 seconds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant