[Question] Single action episode #507

riccardobussola · 2024-04-16T21:00:01Z

riccardobussola
Apr 16, 2024

Hi everyone,
A brief introduction: my RL task consists of learning the optimal parameters of a trajectory (e.g. a Spline, or a Bezier Curve) in Cartesian Space that a quadruped has to follow for a non-constant time t_follow.

I'm trying to implement this using Orbit's RLTaskEnv but I'm struggling with a few things.
My episode asks for the action one single time (since once the parameters are obtained I have only to calculate the trajectory). For the rest of the simulation, the robot has to follow the obtained trajectory for a certain amount of time that varies from episode to episode (so a Task space IK controller has to run in the background).
In this case, the episode corresponds to a single policy step where the reward and a possible NN update are done at the termination once I can verify where is the robot's final position/orientation.

Is there a way to implement this with Orbit without rewriting the logic of the RLTaskEnv provided?
I can't change only the decimation factor since an episode has a variable duration depending on t_follow.

Many thanks for considering my request.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question] Single action episode #507

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

[Question] Single action episode #507

Uh oh!

riccardobussola Apr 16, 2024

Replies: 0 comments

riccardobussola
Apr 16, 2024