Conversation
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
|
Hi @vwxyzjn , I hope you're doing well. I was reviewing the PR for the ( Brax + PPO integration #313 ) and noticed that it's currently closed. I wanted to check in with you to see if there have been any difficulties in merging this change into the main repository. Additionally, is there an updated version of this integration available that addresses any issues or incorporates new changes? Looking forward to your response. Best regards, |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Test out integration with brax. It seems to work out of the box without having to implement observation normalization —
https://wandb.ai/costa-huang/cleanRL/runs/2aemjwey?workspace=user-costa-huang
Compilation takes ~400 seconds, and getting 6000 rewards in Ant takes about 100 seconds with GPU. In comparison, the official demo takes 30 seconds to compile and about 80 seconds to reach ~8000 rewards (using TPU I presume). Our compilation time takes significantly longer, most likely because we didn't use
lax.scanorjax.foriloop, but once the compilation finished the SPS is about 600k.CC @joaogui1
Types of changes
Checklist:
pre-commit run --all-filespasses (required).mkdocs serve.If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.
--capture-videoflag toggled on (required).mkdocs serve.width=500andheight=300).