Replies: 6 comments 1 reply
-
Thank you for posting this. Replacing an entrypoint is not typically an easy task. Will review this with the team and follow up. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the swift response! Replacing the entry point like above seems to work smoothly and is quite common, as far as I understand. Maybe I'm missing something though. |
Beta Was this translation helpful? Give feedback.
-
@hhansen-bdai for vis. Thanks for any help with this. |
Beta Was this translation helpful? Give feedback.
-
We implemented our own minimal docker image for IsaacLab, and the computation is still ~3 times slower on the enroot cluster. I noticed that when converting from docker to enroot, my enroot container is missing some configurations that are present in the docker container, such as the exported bash aliases in
I am thinking maybe some other important settings from the IsaacSim base container might be missing in my enroot container? Is there any way we can benchmark an Isaac-Sim simulation in a container on the enroot cluster? (This way we would know if the issue is due to Isaac-Sim container or Isaac-Lab container) |
Beta Was this translation helpful? Give feedback.
-
I think that some instance of Isaac-Sim is starting in the background. While I can avoid that by overwriting the entrypoint during I identified However, even with that file removed in enroot it is still 3x slower. Might there be any other file starting Isaac-Sim in the background? |
Beta Was this translation helpful? Give feedback.
-
Thanks for following up. I'll move this to our Discussions for the team to follow up. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all,
I'd like to run an IsaacLab container on a cluster powered by enroot. I build my container with
./docker/container.py start
. Then, I have my docker container.When I run
docker run --entrypoint tail --gpus all isaac-lab-base -f /dev/null
on my local machine (NOTE: overwriting the entrypoint is important, otherwise I think IsaacSim starts in the container and my trainings are 3x slower.), I get the usual training speed.However, when I try to start the same container with enroot on the cluster, I get ~3x slower training speed, also when I overwrite the containers entrypoint.
The speed on the cluster should really be the same as on my local machine (using RTX4090 and H100). I made sure that I have GPU access on local and remote. If I train a plain pytorch NN I get the same training speeds.
I tried for about 1.5 days and all options that slurm, docker, and enroot provide.
Did anyone make a similar experience or has an idea what the issue could be?
Beta Was this translation helpful? Give feedback.
All reactions