feat(policies): add autoregressive VLAs with tokenization PiFast #2734

jadechoghari · 2025-12-30T15:59:33Z

Title

feat(policies): add autoregressive VLAs with tokenization PiFast

This PR brings autoregressive Vision-Language-Action (VLA) models back to LeRobot, alongside the existing flow-matching–based policies.

Unlike flow matching, which predicts actions in parallel over a horizon, autoregressive VLAs model actions sequentially as discrete tokens.
As a first step toward supporting multiple action tokenizers, this PR introduces PiFast, together with a training script for FAST tokenization, this provides a concrete reference implementation for autoregressive action modeling in LeRobot.

Future work will extend this framework to additional tokenizers and autoregressive variants.

TODO:
1- Support KV-caching for faster inference (a must for this PR) https://mett29.github.io/posts/kv-cache/
2- Provide PiFast pretrained checkpoints, and unveil HF LeRobot new AR VLA work.
3- Add testing and docs.

DONE:
1- Trained and evaluated successfully on libero, we will share the ckpts along with the results.

Copilot

Pull request overview

This PR introduces autoregressive Vision-Language-Action (VLA) models to LeRobot, implementing PiFast alongside existing flow-matching policies. Unlike flow matching which predicts actions in parallel over a horizon, this implementation models actions sequentially as discrete tokens using the FAST (Fast Action Sequence Tokenization) tokenizer. The PR provides a complete reference implementation including model architecture, training scripts, and processor pipelines.

Key Changes:

Implements PI0Fast policy with autoregressive action token prediction using cross-entropy loss
Adds FAST tokenizer integration for converting continuous actions to discrete tokens via DCT coefficients and BPE
Introduces custom attention masking patterns supporting bidirectional attention for images/language and causal attention for action tokens

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
src/lerobot/utils/constants.py	Adds constants for action tokens and token masks
src/lerobot/processor/tokenizer_processor.py	Implements ActionTokenizerProcessorStep for tokenizing actions using FAST with PaliGemma token space conversion
src/lerobot/processor/init.py	Exports ActionTokenizerProcessorStep for use in pipelines
src/lerobot/policies/pi0_fast/train_fast_tokenizer.py	Provides training script for FAST tokenizer with delta transforms, normalization, and compression statistics
src/lerobot/policies/pi0_fast/processor_pi0_fast.py	Creates pre/post-processor pipelines including state discretization and language tokenization
src/lerobot/policies/pi0_fast/modeling_pi0_fast.py	Implements core PI0FastPytorch model with PaliGemma+Gemma expert architecture and autoregressive decoding
src/lerobot/policies/pi0_fast/configuration_pi0_fast.py	Defines PI0FastConfig with model hyperparameters and training settings
src/lerobot/policies/pi0_fast/init.py	Exports PI0Fast components for module access
src/lerobot/policies/factory.py	Registers PI0FastPolicy in the policy factory
src/lerobot/policies/init.py	Exports PI0FastConfig at package level

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-30T16:02:42Z

src/lerobot/policies/pi0_fast/processor_pi0_fast.py

+            padding="max_length",
+        ),
+        ActionTokenizerProcessorStep(
+            tokenizer_name="/fsx/jade_choghari/outputs/fast_tokenizer", # TODO: jade put the PI


This line contains a hardcoded file path that appears to be a personal development path. This should be replaced with a configurable parameter or removed before merging. The tokenizer path should be passed via the config or made configurable through a proper mechanism.

Copilot · 2025-12-30T16:02:42Z

src/lerobot/policies/pi0_fast/processor_pi0_fast.py

+            padding="max_length",
+        ),
+        ActionTokenizerProcessorStep(
+            tokenizer_name="/fsx/jade_choghari/outputs/fast_tokenizer", # TODO: jade put the PI


The comment mentions a TODO with the author's name. This indicates that the tokenizer path configuration is incomplete and needs to be properly addressed. The hardcoded path should be replaced with a proper configuration parameter that can be passed to the ActionTokenizerProcessorStep.

Copilot · 2025-12-30T16:02:42Z

src/lerobot/policies/pi0_fast/modeling_pi0_fast.py

+        # # Optionally visualize the attention mask
+        # self.visualize_attention_mask(
+        #     att_mask_segments=att_mask_segments,
+        #     att_2d_masks=att_masks,
+        #     save_path="/admin/home/jade_choghari/lerobot/src/lerobot/policies/pi05/attention_mask_visualization.png",
+        #     batch_idx=0,
+        #     max_display_tokens=512  # Limit display for very long sequences
+        # )
+


There's commented-out visualization code that should be removed or properly implemented. If attention mask visualization is needed for debugging, it should be controlled by a configuration parameter rather than left as commented code.

Suggested change

# # Optionally visualize the attention mask

# self.visualize_attention_mask(

# att_mask_segments=att_mask_segments,

# att_2d_masks=att_masks,

# save_path="/admin/home/jade_choghari/lerobot/src/lerobot/policies/pi05/attention_mask_visualization.png",

# batch_idx=0,

# max_display_tokens=512 # Limit display for very long sequences

# )

Copilot · 2025-12-30T16:02:43Z

src/lerobot/policies/pi0_fast/modeling_pi0_fast.py

+        )
+        # Detokenize action tokens to continuous actions
+        action_horizon = self.config.n_action_steps
+        action_dim = 7


The hardcoded action dimension value of 7 should be made configurable. This magic number limits the flexibility of the model and should be replaced with a configuration parameter, possibly using self.config.max_action_dim or a similar configurable value.

Suggested change

action_dim = 7

action_dim = getattr(self.config, "max_action_dim", 7)

Copilot · 2025-12-30T16:02:43Z

src/lerobot/policies/pi0_fast/modeling_pi0_fast.py

+        """
+        Inefficient but safe autoregressive decoding for FAST tokens.
+        Matches the pattern of _generate_subtask_tokens.
+        TODO: jadechoghari, should we move this logic to PI0FastPolicy class?


The TODO comment indicates incomplete implementation. The author questions whether this method is necessary and whether to move the logic to the PI0FastPolicy class. This should be resolved before merging - either implement the proper location for this logic or confirm the current implementation is correct and remove the TODO.

Suggested change

TODO: jadechoghari, should we move this logic to PI0FastPolicy class?

Copilot · 2025-12-30T16:02:44Z

src/lerobot/policies/pi0_fast/configuration_pi0_fast.py

+    tokenizer_max_length: int = 200  # see openpi `__post_init__`
+


The configuration has a duplicate field definition. The field 'tokenizer_max_length' is defined both at line 65 and line 95 with the same default value. This duplication should be removed.

Suggested change

tokenizer_max_length: int = 200 # see openpi `__post_init__`

Copilot · 2025-12-30T16:02:44Z

src/lerobot/policies/pi0_fast/modeling_pi0_fast.py

+        # # Apply dtype conversion to FAST layers to match model precision
+        # if config.dtype == "bfloat16":
+        #     self.fast_action_embedding = self.fast_action_embedding.to(dtype=torch.bfloat16)
+        #     self.fast_action_lm_head = self.fast_action_lm_head.to(dtype=torch.bfloat16)
+        # elif config.dtype == "float32":
+        #     self.fast_action_embedding = self.fast_action_embedding.to(dtype=torch.float32)
+        #     self.fast_action_lm_head = self.fast_action_lm_head.to(dtype=torch.float32)


There's commented-out code that should either be removed or properly implemented before merging. This appears to be related to FAST layer dtype conversion. If this functionality is not needed, it should be removed to keep the codebase clean.

Suggested change

# # Apply dtype conversion to FAST layers to match model precision

# if config.dtype == "bfloat16":

# self.fast_action_embedding = self.fast_action_embedding.to(dtype=torch.bfloat16)

# self.fast_action_lm_head = self.fast_action_lm_head.to(dtype=torch.bfloat16)

# elif config.dtype == "float32":

# self.fast_action_embedding = self.fast_action_embedding.to(dtype=torch.float32)

# self.fast_action_lm_head = self.fast_action_lm_head.to(dtype=torch.float32)

Copilot · 2025-12-30T16:02:44Z

src/lerobot/policies/pi0_fast/modeling_pi0_fast.py

+        # from transformers import AutoTokenizer
+        # self._paligemma_tokenizer = AutoTokenizer.from_pretrained(
+        #             "google/paligemma-3b-pt-224", 
+        #             trust_remote_code=True, 
+        #             add_eos_token=True, 
+        #             add_bos_token=False
+        #         )
+        # # remove
+        # decoded_tokens = [
+        #     self._paligemma_tokenizer.convert_ids_to_tokens(seq.tolist())
+        #     for seq in fast_targets
+        # ]
+        # corrected_tokens = [
+        #     self._paligemma_tokenizer.convert_ids_to_tokens(seq.tolist())
+        #     for seq in fast_logits_for_pred.argmax(dim=-1)
+        # ]
+        # breakpoint()
+


There's a large block of commented-out debugging code that should be removed before merging. Commented code like this makes the codebase harder to maintain and should be deleted or moved to proper debug utilities if needed.

Suggested change

# from transformers import AutoTokenizer

# self._paligemma_tokenizer = AutoTokenizer.from_pretrained(

# "google/paligemma-3b-pt-224",

# trust_remote_code=True,

# add_eos_token=True,

# add_bos_token=False

# )

# # remove

# decoded_tokens = [

# self._paligemma_tokenizer.convert_ids_to_tokens(seq.tolist())

# for seq in fast_targets

# ]

# corrected_tokens = [

# self._paligemma_tokenizer.convert_ids_to_tokens(seq.tolist())

# for seq in fast_logits_for_pred.argmax(dim=-1)

# ]

# breakpoint()

Copilot · 2025-12-30T16:02:45Z

src/lerobot/policies/pi0_fast/modeling_pi0_fast.py

+
+        # Get optional parameters
+        temperature = kwargs.get("temperature", 0.0)
+        max_decoding_steps = 256


The hardcoded value of 256 for max_decoding_steps should be made configurable or derived from the configuration. This should use self.config.max_action_tokens or a similar configuration parameter instead of a magic number.

Suggested change

max_decoding_steps = 256

max_decoding_steps = getattr(self.config, "max_action_tokens", 256) or 256

Copilot · 2025-12-30T16:02:45Z

src/lerobot/policies/pi0_fast/processor_pi0_fast.py

+        if tasks is None:
+            raise ValueError("No task found in complementary data")
+
+        # TODO: check if this necessary


The comment has a typo: 'mistaken' should be 'mistake'. This is part of the documentation explaining when to differentiate comment updates from code changes.

Suggested change

# TODO: check if this necessary

# TODO: check if this is necessary

add pifast policy with tokenization

81f52c6

Copilot AI review requested due to automatic review settings December 30, 2025 15:59

jadechoghari added the policies Items related to robot policies label Dec 30, 2025

github-actions bot added the processor Issue related to processor label Dec 30, 2025

Copilot started reviewing on behalf of jadechoghari December 30, 2025 16:00 View session

Copilot AI reviewed Dec 30, 2025

View reviewed changes

fix quality

53b288e

jadechoghari self-assigned this Dec 30, 2025

jadechoghari added 4 commits January 1, 2026 08:22

add kwargs

7dfbd57

fix preprocessor

5f2e5f8

remove brkpt

508d65a

put meanstd as default

f0d0faa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(policies): add autoregressive VLAs with tokenization PiFast #2734

feat(policies): add autoregressive VLAs with tokenization PiFast #2734

Uh oh!

jadechoghari commented Dec 30, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 30, 2025

Uh oh!

Copilot AI Dec 30, 2025

Uh oh!

Copilot AI Dec 30, 2025

Uh oh!

Copilot AI Dec 30, 2025

Uh oh!

Copilot AI Dec 30, 2025

Uh oh!

Copilot AI Dec 30, 2025

Uh oh!

Copilot AI Dec 30, 2025

Uh oh!

Copilot AI Dec 30, 2025

Uh oh!

Copilot AI Dec 30, 2025

Uh oh!

Copilot AI Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	action_dim = 7
	action_dim = getattr(self.config, "max_action_dim", 7)

		tokenizer_max_length: int = 200 # see openpi `__post_init__`

	max_decoding_steps = 256
	max_decoding_steps = getattr(self.config, "max_action_tokens", 256) or 256

	# TODO: check if this necessary
	# TODO: check if this is necessary

feat(policies): add autoregressive VLAs with tokenization PiFast #2734

Are you sure you want to change the base?

feat(policies): add autoregressive VLAs with tokenization PiFast #2734

Uh oh!

Conversation

jadechoghari commented Dec 30, 2025

Title

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants