Skip to content

Brax + PPO integration#313

Draft
vwxyzjn wants to merge 2 commits intomasterfrom
brax
Draft

Brax + PPO integration#313
vwxyzjn wants to merge 2 commits intomasterfrom
brax

Conversation

@vwxyzjn
Copy link
Copy Markdown
Owner

@vwxyzjn vwxyzjn commented Nov 6, 2022

Description

Test out integration with brax. It seems to work out of the box without having to implement observation normalization —
https://wandb.ai/costa-huang/cleanRL/runs/2aemjwey?workspace=user-costa-huang

image

Compilation takes ~400 seconds, and getting 6000 rewards in Ant takes about 100 seconds with GPU. In comparison, the official demo takes 30 seconds to compile and about 80 seconds to reach ~8000 rewards (using TPU I presume). Our compilation time takes significantly longer, most likely because we didn't use lax.scan or jax.foriloop, but once the compilation finished the SPS is about 600k.

CC @joaogui1

Types of changes

  • Bug fix
  • New feature
  • New algorithm
  • Documentation

Checklist:

  • I've read the CONTRIBUTION guide (required).
  • I have ensured pre-commit run --all-files passes (required).
  • I have updated the documentation and previewed the changes via mkdocs serve.
  • I have updated the tests accordingly (if applicable).

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.

  • I have contacted vwxyzjn to obtain access to the openrlbenchmark W&B team (required).
  • I have tracked applicable experiments in openrlbenchmark/cleanrl with --capture-video flag toggled on (required).
  • I have added additional documentation and previewed the changes via mkdocs serve.
    • I have explained note-worthy implementation details.
    • I have explained the logged metrics.
    • I have added links to the original paper and related papers (if applicable).
    • I have added links to the PR related to the algorithm.
    • I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
    • I have added the learning curves (in PNG format with width=500 and height=300).
    • I have added links to the tracked experiments.
    • I have updated the overview sections at the docs and the repo
  • I have updated the tests accordingly (if applicable).

@vercel
Copy link
Copy Markdown

vercel bot commented Nov 6, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
cleanrl ✅ Ready (Inspect) Visit Preview Nov 6, 2022 at 9:28PM (UTC)

@Surya-77
Copy link
Copy Markdown

Hi @vwxyzjn ,

I hope you're doing well. I was reviewing the PR for the ( Brax + PPO integration #313 ) and noticed that it's currently closed. I wanted to check in with you to see if there have been any difficulties in merging this change into the main repository. Additionally, is there an updated version of this integration available that addresses any issues or incorporates new changes? Looking forward to your response.

Best regards,
Surya

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants