[Question] Load an rl_games trained model for real robot deployment and control frequency setup #938

dhruvkm2402 · 2024-09-04T19:28:20Z

dhruvkm2402
Sep 4, 2024

Hello,
I have a trained rl_games model and I want to load it to use it with ROS. The model does not function correctly at all which I can understand could be because of several reasons. However I wanted to confirm if the process loading the rl_games model trained in Isaac Lab could be different.
Here is the code:

class LoadModel(nn.Module):
    def __init__(self):
        super(LoadModel, self).__init__()
   
        self.fc1 = nn.Linear(22, 256)
        self.fc2 = nn.Linear(256, 128)
        self.fc3 = nn.Linear(128, 64)
        self.fc4 = nn.Linear(64, 7)

    def forward(self, x):
        x = torch.nn.ELU()(self.fc1(x))
        x = torch.nn.ELU()(self.fc2(x))
        x = torch.nn.ELU()(self.fc3(x))
        x = self.fc4(x)
        return x

And then we load it as

        self.model_path = args.model_path
        #torch.save(self.model.state_dict(), self.model_path)
        # print(self.model_path)
        self.model.load_state_dict(torch.load(self.model_path), strict=False)

We had to use strict=False because of missing keys error.

I have to send the control commands at 10 Hz and receive observations accordingly. I have kept dt=1/120 and decimation as 12, is that setup correct?
I'd like to get quick help on it if possible since I'm aiming it to be a part of my research.
Thank You

Answered by akruzliak

Sep 6, 2024

Line 101: https://github.com/isaac-sim/IsaacLab/blob/788a061d57ead17ff669eccce3d776fbccc59790/source/standalone/workflows/rsl_rl/play.py#L101 Jit scripted model exporting.

View full answer

Mayankm96 · 2024-09-05T07:57:36Z

Mayankm96
Sep 5, 2024
Maintainer

I don't think this is a IsaacLab specific issue.

RL-Games provides the checkpoint of the model, and you need to load this checkpoint into your deployment code ideally while considering real-time control interfaces for your robot. The way you're doing above seems okay. But it is better to check with RL-Games developers if the above is correct.

Usually, in my experience, most robots have a C++ control layer that is updated at the highest possible actuator frequency (for instance, 400Hz on ANYmal). We add the network inference in this control loop to make sure the policy is called at the exact frequency. Otherwise, real-time control can become difficult. Most groups make an ONNX model and load that policy for inferencing.

I haven't personally done real-time control through Python, so I can't comment on whether the "decimation" loop that you're doing will work effectively.

1 reply

dhruvkm2402 Sep 6, 2024
Author

Thank you @Mayankm96 for your response

akruzliak · 2024-09-05T09:53:37Z

akruzliak
Sep 5, 2024

Hi @dhruvkm2402, you might benefit from exporting and loading a scripted model instead of the state dict. In that way, its less hassle. I am not sure if there is this option implemented already in rl_games, but there is an option implemented in rsl_rl workflow.

For the second part, yes its possible to run the inference on python.
Those facts are important:

You run your physical actuator at the same rate that the physics is simulated in IsaacSim. So your realtime dt must be the same (measure it!) as is the physics_dt you trained on.
Your real-life policy inference should come at the same frequency you updated the environment in the simulation (called environment_step or environment_dt or smth like that, its computed as physics_dt * decimation. If you train on one environment frequency and deploy on another envirronment frequency, I assume your policy could be confused, because the physical observations all of a sudden change slower or faster. So effectively you would put the policy out of the "training domain".

2 replies

dhruvkm2402 Sep 6, 2024
Author

Hi @dhajnes , Thank you for your answer. Basically, I'm just sending high level commands to the robots. Do I still need to get information on the actuator frequency and use that for training?

dhruvkm2402 Sep 6, 2024
Author

And could you also point me to a reference example for the rsl_rl workflow you mentioned?

akruzliak · 2024-09-06T06:12:41Z

akruzliak
Sep 6, 2024

What do you mean by high level? E.g. a translation or pose goal in 3D? My intuition is that when you move 12 actuators of a mobile robot for locomotion, you need to look at the environment quite often because it can change very quickly. So that is a design choice for your environment step frequency. In your code, if you drive a car or move a static robot (i.e. UR10) it is usually constantly in a stable pose. If you stop turtlebot at any second, it will just stand. If you stop UR10 it will just stay still. However if you stop Anymal or Spot at a random movement moment it can pretty much fall. So how those parameters change, implies how often should the environment be sampled. You should always run the policy at the frequency it was trained. If you choose not to, it MIGHT work for the turtlebot or UR10, but for a more dynamic system like a quadruped, the result is erratic shaking movement that almost always leads to unstability and failure.

…

On Fri, 6 Sept 2024, 08:02 Dhruv Mehta, ***@***.***> wrote: Hi @dhajnes <https://github.com/dhajnes> , Thank you for your answer. Basically, I'm just sending high level commands to the robots. Do I still need to get information on the actuator frequency and use that for training? — Reply to this email directly, view it on GitHub <#938 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/APPIYM7MYQOUAR25EW526SDZVFAPNAVCNFSM6AAAAABNX4FSOSVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTANJWGUYDAMI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

0 replies

akruzliak · 2024-09-06T06:16:33Z

akruzliak
Sep 6, 2024

Line 101: https://github.com/isaac-sim/IsaacLab/blob/788a061d57ead17ff669eccce3d776fbccc59790/source/standalone/workflows/rsl_rl/play.py#L101 Jit scripted model exporting.

…

On Fri, 6 Sept 2024, 08:09 Dhruv Mehta, ***@***.***> wrote: And could you also point me to a reference example for the rsl_rl workflow you mentioned? — Reply to this email directly, view it on GitHub <#938 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/APPIYM4RP53UQPLKMENGJKDZVFBJRAVCNFSM6AAAAABNX4FSOSVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTANJWGUYDKNA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

1 reply

dhruvkm2402 Sep 6, 2024
Author

Just a quick doubt. Are the actions clipped from [-1,1] in rsl_rl?

Toni-SM · 2024-09-06T12:21:44Z

Toni-SM
Sep 6, 2024
Maintainer

Hi @dhruvkm2402

As another reference, you can take a look at skrl's real world examples that includes ROS

0 replies

[Question] Load an rl_games trained model for real robot deployment and control frequency setup #938

Uh oh!

Uh oh!

dhruvkm2402 Sep 4, 2024

Replies: 5 comments · 4 replies

Uh oh!

Mayankm96 Sep 5, 2024 Maintainer

Uh oh!

dhruvkm2402 Sep 6, 2024 Author

Uh oh!

akruzliak Sep 5, 2024

Uh oh!

dhruvkm2402 Sep 6, 2024 Author

Uh oh!

dhruvkm2402 Sep 6, 2024 Author

Uh oh!

akruzliak Sep 6, 2024

Uh oh!

akruzliak Sep 6, 2024

Uh oh!

dhruvkm2402 Sep 6, 2024 Author

Uh oh!

Toni-SM Sep 6, 2024 Maintainer

dhruvkm2402
Sep 4, 2024

Replies: 5 comments 4 replies

Mayankm96
Sep 5, 2024
Maintainer

dhruvkm2402 Sep 6, 2024
Author

akruzliak
Sep 5, 2024

dhruvkm2402 Sep 6, 2024
Author

dhruvkm2402 Sep 6, 2024
Author

akruzliak
Sep 6, 2024

akruzliak
Sep 6, 2024

dhruvkm2402 Sep 6, 2024
Author

Toni-SM
Sep 6, 2024
Maintainer