Skip to content

Conversation

@jadechoghari
Copy link
Member

Title

feat(policies): add autoregressive VLAs with tokenization PiFast

This PR brings autoregressive Vision-Language-Action (VLA) models back to LeRobot, alongside the existing flow-matching–based policies.

Unlike flow matching, which predicts actions in parallel over a horizon, autoregressive VLAs model actions sequentially as discrete tokens.
As a first step toward supporting multiple action tokenizers, this PR introduces PiFast, together with a training script for FAST tokenization, this provides a concrete reference implementation for autoregressive action modeling in LeRobot.

Future work will extend this framework to additional tokenizers and autoregressive variants.

TODO:
1- Support KV-caching for faster inference (a must for this PR) https://mett29.github.io/posts/kv-cache/
2- Provide PiFast pretrained checkpoints, and unveil HF LeRobot new AR VLA work.
3- Add testing and docs.

DONE:
1- Trained and evaluated successfully on libero, we will share the ckpts along with the results.

Copilot AI review requested due to automatic review settings December 30, 2025 15:59
@jadechoghari jadechoghari added the policies Items related to robot policies label Dec 30, 2025
@github-actions github-actions bot added the processor Issue related to processor label Dec 30, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces autoregressive Vision-Language-Action (VLA) models to LeRobot, implementing PiFast alongside existing flow-matching policies. Unlike flow matching which predicts actions in parallel over a horizon, this implementation models actions sequentially as discrete tokens using the FAST (Fast Action Sequence Tokenization) tokenizer. The PR provides a complete reference implementation including model architecture, training scripts, and processor pipelines.

Key Changes:

  • Implements PI0Fast policy with autoregressive action token prediction using cross-entropy loss
  • Adds FAST tokenizer integration for converting continuous actions to discrete tokens via DCT coefficients and BPE
  • Introduces custom attention masking patterns supporting bidirectional attention for images/language and causal attention for action tokens

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
src/lerobot/utils/constants.py Adds constants for action tokens and token masks
src/lerobot/processor/tokenizer_processor.py Implements ActionTokenizerProcessorStep for tokenizing actions using FAST with PaliGemma token space conversion
src/lerobot/processor/init.py Exports ActionTokenizerProcessorStep for use in pipelines
src/lerobot/policies/pi0_fast/train_fast_tokenizer.py Provides training script for FAST tokenizer with delta transforms, normalization, and compression statistics
src/lerobot/policies/pi0_fast/processor_pi0_fast.py Creates pre/post-processor pipelines including state discretization and language tokenization
src/lerobot/policies/pi0_fast/modeling_pi0_fast.py Implements core PI0FastPytorch model with PaliGemma+Gemma expert architecture and autoregressive decoding
src/lerobot/policies/pi0_fast/configuration_pi0_fast.py Defines PI0FastConfig with model hyperparameters and training settings
src/lerobot/policies/pi0_fast/init.py Exports PI0Fast components for module access
src/lerobot/policies/factory.py Registers PI0FastPolicy in the policy factory
src/lerobot/policies/init.py Exports PI0FastConfig at package level

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

padding="max_length",
),
ActionTokenizerProcessorStep(
tokenizer_name="/fsx/jade_choghari/outputs/fast_tokenizer", # TODO: jade put the PI
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line contains a hardcoded file path that appears to be a personal development path. This should be replaced with a configurable parameter or removed before merging. The tokenizer path should be passed via the config or made configurable through a proper mechanism.

Copilot uses AI. Check for mistakes.
padding="max_length",
),
ActionTokenizerProcessorStep(
tokenizer_name="/fsx/jade_choghari/outputs/fast_tokenizer", # TODO: jade put the PI
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment mentions a TODO with the author's name. This indicates that the tokenizer path configuration is incomplete and needs to be properly addressed. The hardcoded path should be replaced with a proper configuration parameter that can be passed to the ActionTokenizerProcessorStep.

Copilot uses AI. Check for mistakes.
Comment on lines +914 to +922
# # Optionally visualize the attention mask
# self.visualize_attention_mask(
# att_mask_segments=att_mask_segments,
# att_2d_masks=att_masks,
# save_path="/admin/home/jade_choghari/lerobot/src/lerobot/policies/pi05/attention_mask_visualization.png",
# batch_idx=0,
# max_display_tokens=512 # Limit display for very long sequences
# )

Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's commented-out visualization code that should be removed or properly implemented. If attention mask visualization is needed for debugging, it should be controlled by a configuration parameter rather than left as commented code.

Suggested change
# # Optionally visualize the attention mask
# self.visualize_attention_mask(
# att_mask_segments=att_mask_segments,
# att_2d_masks=att_masks,
# save_path="/admin/home/jade_choghari/lerobot/src/lerobot/policies/pi05/attention_mask_visualization.png",
# batch_idx=0,
# max_display_tokens=512 # Limit display for very long sequences
# )

Copilot uses AI. Check for mistakes.
)
# Detokenize action tokens to continuous actions
action_horizon = self.config.n_action_steps
action_dim = 7
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded action dimension value of 7 should be made configurable. This magic number limits the flexibility of the model and should be replaced with a configuration parameter, possibly using self.config.max_action_dim or a similar configurable value.

Suggested change
action_dim = 7
action_dim = getattr(self.config, "max_action_dim", 7)

Copilot uses AI. Check for mistakes.
"""
Inefficient but safe autoregressive decoding for FAST tokens.
Matches the pattern of _generate_subtask_tokens.
TODO: jadechoghari, should we move this logic to PI0FastPolicy class?
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO comment indicates incomplete implementation. The author questions whether this method is necessary and whether to move the logic to the PI0FastPolicy class. This should be resolved before merging - either implement the proper location for this logic or confirm the current implementation is correct and remove the TODO.

Suggested change
TODO: jadechoghari, should we move this logic to PI0FastPolicy class?

Copilot uses AI. Check for mistakes.
Comment on lines +95 to +96
tokenizer_max_length: int = 200 # see openpi `__post_init__`

Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The configuration has a duplicate field definition. The field 'tokenizer_max_length' is defined both at line 65 and line 95 with the same default value. This duplication should be removed.

Suggested change
tokenizer_max_length: int = 200 # see openpi `__post_init__`

Copilot uses AI. Check for mistakes.
Comment on lines +536 to +542
# # Apply dtype conversion to FAST layers to match model precision
# if config.dtype == "bfloat16":
# self.fast_action_embedding = self.fast_action_embedding.to(dtype=torch.bfloat16)
# self.fast_action_lm_head = self.fast_action_lm_head.to(dtype=torch.bfloat16)
# elif config.dtype == "float32":
# self.fast_action_embedding = self.fast_action_embedding.to(dtype=torch.float32)
# self.fast_action_lm_head = self.fast_action_lm_head.to(dtype=torch.float32)
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's commented-out code that should either be removed or properly implemented before merging. This appears to be related to FAST layer dtype conversion. If this functionality is not needed, it should be removed to keep the codebase clean.

Suggested change
# # Apply dtype conversion to FAST layers to match model precision
# if config.dtype == "bfloat16":
# self.fast_action_embedding = self.fast_action_embedding.to(dtype=torch.bfloat16)
# self.fast_action_lm_head = self.fast_action_lm_head.to(dtype=torch.bfloat16)
# elif config.dtype == "float32":
# self.fast_action_embedding = self.fast_action_embedding.to(dtype=torch.float32)
# self.fast_action_lm_head = self.fast_action_lm_head.to(dtype=torch.float32)

Copilot uses AI. Check for mistakes.
Comment on lines 1142 to 1159
# from transformers import AutoTokenizer
# self._paligemma_tokenizer = AutoTokenizer.from_pretrained(
# "google/paligemma-3b-pt-224",
# trust_remote_code=True,
# add_eos_token=True,
# add_bos_token=False
# )
# # remove
# decoded_tokens = [
# self._paligemma_tokenizer.convert_ids_to_tokens(seq.tolist())
# for seq in fast_targets
# ]
# corrected_tokens = [
# self._paligemma_tokenizer.convert_ids_to_tokens(seq.tolist())
# for seq in fast_logits_for_pred.argmax(dim=-1)
# ]
# breakpoint()

Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a large block of commented-out debugging code that should be removed before merging. Commented code like this makes the codebase harder to maintain and should be deleted or moved to proper debug utilities if needed.

Suggested change
# from transformers import AutoTokenizer
# self._paligemma_tokenizer = AutoTokenizer.from_pretrained(
# "google/paligemma-3b-pt-224",
# trust_remote_code=True,
# add_eos_token=True,
# add_bos_token=False
# )
# # remove
# decoded_tokens = [
# self._paligemma_tokenizer.convert_ids_to_tokens(seq.tolist())
# for seq in fast_targets
# ]
# corrected_tokens = [
# self._paligemma_tokenizer.convert_ids_to_tokens(seq.tolist())
# for seq in fast_logits_for_pred.argmax(dim=-1)
# ]
# breakpoint()

Copilot uses AI. Check for mistakes.

# Get optional parameters
temperature = kwargs.get("temperature", 0.0)
max_decoding_steps = 256
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded value of 256 for max_decoding_steps should be made configurable or derived from the configuration. This should use self.config.max_action_tokens or a similar configuration parameter instead of a magic number.

Suggested change
max_decoding_steps = 256
max_decoding_steps = getattr(self.config, "max_action_tokens", 256) or 256

Copilot uses AI. Check for mistakes.
if tasks is None:
raise ValueError("No task found in complementary data")

# TODO: check if this necessary
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment has a typo: 'mistaken' should be 'mistake'. This is part of the documentation explaining when to differentiate comment updates from code changes.

Suggested change
# TODO: check if this necessary
# TODO: check if this is necessary

Copilot uses AI. Check for mistakes.
@jadechoghari jadechoghari self-assigned this Dec 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

policies Items related to robot policies processor Issue related to processor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants