Update the generate function to enable configuring index/offset/length type (#2767)

basilwong · facebook-github-bot · commit cc48945f3c83 · 2025-02-26T19:22:45.000-08:00
Summary: Pull Request resolved: #2767 # Diff Specific Changes Adding new parameters for the index/offset/length type, as well as modifying the [generate](https://www.internalfb.com/code/fbsource/[4217c068fa966d569d2042a7263cefe1a06dc87a]/fbcode/torchrec/distributed/test_utils/test_model.py?lines=55) function to use these parameters. In this diff we maintain the `long_indices` argument for backwards compatibility. Will create a follow up diff to deprecate `long_indices` as an argument (since it is redundant) as well as references depending on it downstream. # Context Doc: https://docs.google.com/document/d/1YVfxsafqXkxAAdRyXbjmSH4AEz3-6DBiTGjs1rT8ZHQ/edit?usp=sharing Updating the TorchRec unit test suite to cover int32 and int64 indices/offets support. # Summary Specifically for the [test_model_parallel](https://www.internalfb.com/code/fbsource/[3505ccb75a649a7d21218bcda126d1e8392afc5a]/fbcode/torchrec/distributed/test_utils/test_model_parallel.py?lines=34) suite that I am looking at the change appears to be fairly straightforward. 1.The [ModelParallelTestShared](https://www.internalfb.com/code/fbsource/fbcode/torchrec/distributed/test_utils/test_model_parallel.py?lines=34) class defines a [test suite python library](https://www.internalfb.com/code/fbsource/[cbd0bd0020a7afbec4922d8abc0d88b7d45cba56]/fbcode/torchrec/distributed/test_utils/TARGETS?lines=65-69) referenced by multiple unit tests in the TorchRec codebase including [test_model_parallel_nccl](https://www.internalfb.com/code/fbsource/[cbd0bd0020a7afbec4922d8abc0d88b7d45cba56]/fbcode/torchrec/distributed/tests/TARGETS?lines=85-100) in which we are particularly interested in for this particular case. The method all of the unit tests in this class use is [`_test_sharding`](https://www.internalfb.com/code/fbsource/[fa9508a29b62ce57681ee73cd6d4cac56f153a58]/fbcode/torchrec/distributed/test_utils/test_model_parallel.py?lines=132). Within the `_test_sharding` function, the "callable" argument input to the [`_run_multi_process_test`](https://www.internalfb.com/code/symbol/fbsource/py/fbcode/caffe2.torch.fb.hpc.tests.sparse_data_dist_test.SparseDataDistTest._run_multi_process_test) function is [`sharding_single_rank_test`](https://www.internalfb.com/code/fbsource/[fa9508a29b62ce57681ee73cd6d4cac56f153a58]/fbcode/torchrec/distributed/test_utils/test_sharding.py?lines=296) which shows us how the input data/model is generated. Additional arguments will need to be added to both the [`_test_sharding`](https://www.internalfb.com/code/fbsource/[fa9508a29b62ce57681ee73cd6d4cac56f153a58]/fbcode/torchrec/distributed/test_utils/test_model_parallel.py?lines=132) and [`_run_multi_process_test`](https://www.internalfb.com/code/symbol/fbsource/py/fbcode/caffe2.torch.fb.hpc.tests.sparse_data_dist_test.SparseDataDistTest._run_multi_process_test) functions. 2.The [`sharding_single_rank_test`](https://www.internalfb.com/code/fbsource/[fa9508a29b62ce57681ee73cd6d4cac56f153a58]/fbcode/torchrec/distributed/test_utils/test_sharding.py?lines=296) function is where we define additional kwargs. This function leverages the [`gen_model_and_input`](https://www.internalfb.com/code/fbsource/[f7e6a3281d924b465e0e90ff079aa9df83ae9530]/fbcode/torchrec/distributed/test_utils/test_sharding.py?lines=131) to define the test model and more importantly for our purposes the input tables. ``` generate=(cast(VariableBatchModelInputCallable, ModelInput.generate_variable_batch_input) if variable_batch_per_feature else ModelInput.generate), ``` 3.The [ModelInput](https://www.internalfb.com/code/fbsource/[4217c068fa966d569d2042a7263cefe1a06dc87a]/fbcode/torchrec/distributed/test_utils/test_model.py?lines=48) class' [`generate`](https://www.internalfb.com/code/fbsource/[4217c068fa966d569d2042a7263cefe1a06dc87a]/fbcode/torchrec/distributed/test_utils/test_model.py?lines=55) and [`generate_variable_batch_input`](https://www.internalfb.com/code/fbsource/[4217c068fa966d569d2042a7263cefe1a06dc87a]/fbcode/torchrec/distributed/test_utils/test_model.py?lines=589) methods are used to generate the input tensors used in the unit tests. All we need to do is add new arguments that enable configuring the index/offset type of the tables. # Diff stack change summary: a. Update the generate_variable_batch_input to enable configuring index/offset/length type b. Update the generate to enable configuring index/offset/length type c. Update Model Input Callable Protocol to Enable Configuring index/offset/length type d. test_model_parallel: new test for different table index types e. Deprecate long_indices argument for torch.dtype arguments Reviewed By: TroyGarden Differential Revision: D70055042 fbshipit-source-id: f0563bca57047b41fbefc61f177ab92caec48a21
diff --git a/torchrec/distributed/test_utils/test_model.py b/torchrec/distributed/test_utils/test_model.py
@@ -71,19 +71,30 @@ def generate(
             ]
         ] = None,
         variable_batch_size: bool = False,
-        long_indices: bool = True,
         tables_pooling: Optional[List[int]] = None,
         weighted_tables_pooling: Optional[List[int]] = None,
         randomize_indices: bool = True,
         device: Optional[torch.device] = None,
         max_feature_lengths: Optional[List[int]] = None,
         input_type: str = "kjt",
+        use_offsets: bool = False,
+        indices_dtype: torch.dtype = torch.int64,
+        offsets_dtype: torch.dtype = torch.int64,
+        lengths_dtype: torch.dtype = torch.int64,
+        long_indices: bool = True,  # TODO - remove this once code base is updated to support more than long_indices spec
     ) -> Tuple["ModelInput", List["ModelInput"]]:
         """
         Returns a global (single-rank training) batch
         and a list of local (multi-rank training) batches of world_size.
         """
-
+        if long_indices:
+            indices_dtype = torch.int64
+            lengths_dtype = torch.int64
+            use_offsets = False
+        else:
+            indices_dtype = torch.int32
+            lengths_dtype = torch.int32
+            use_offsets = False
         batch_size_by_rank = [batch_size] * world_size
         if variable_batch_size:
             batch_size_by_rank = [
@@ -119,7 +130,6 @@ def _validate_pooling_factor(
                     if tables[idx].num_embeddings_post_pruning is not None
                     else tables[idx].num_embeddings
                 )
-
                 idlist_features_to_max_length[feature] = (
                     max_feature_lengths[feature_idx] if max_feature_lengths else None
                 )
@@ -144,18 +154,21 @@ def _validate_pooling_factor(
 
         idlist_pooling_factor = list(idlist_features_to_pooling_factor.values())
         idscore_pooling_factor = weighted_tables_pooling
-
         idlist_max_lengths = list(idlist_features_to_max_length.values())
 
         # Generate global batch.
         global_idlist_lengths = []
         global_idlist_indices = []
+        global_idlist_offsets = []
+
         global_idscore_lengths = []
         global_idscore_indices = []
+        global_idscore_offsets = []
         global_idscore_weights = []
 
         for idx in range(len(idlist_ind_ranges)):
             ind_range = idlist_ind_ranges[idx]
+
             if idlist_pooling_factor:
                 lengths_ = torch.max(
                     torch.normal(
@@ -165,17 +178,19 @@ def _validate_pooling_factor(
                         device=device,
                     ),
                     torch.tensor(1.0, device=device),
-                ).int()
+                ).to(lengths_dtype)
             else:
                 lengths_ = torch.abs(
                     torch.randn(batch_size * world_size, device=device) + pooling_avg,
-                ).int()
+                ).to(lengths_dtype)
 
             if idlist_max_lengths[idx]:
                 lengths_ = torch.clamp(lengths_, max=idlist_max_lengths[idx])
 
             if variable_batch_size:
-                lengths = torch.zeros(batch_size * world_size, device=device).int()
+                lengths = torch.zeros(batch_size * world_size, device=device).to(
+                    lengths_dtype
+                )
                 for r in range(world_size):
                     lengths[r * batch_size : r * batch_size + batch_size_by_rank[r]] = (
                         lengths_[
@@ -186,42 +201,30 @@ def _validate_pooling_factor(
                 lengths = lengths_
 
             num_indices = cast(int, torch.sum(lengths).item())
+
             if randomize_indices:
                 indices = torch.randint(
                     0,
                     ind_range,
                     (num_indices,),
-                    dtype=torch.long if long_indices else torch.int32,
+                    dtype=indices_dtype,
                     device=device,
                 )
             else:
                 indices = torch.zeros(
-                    (num_indices),
-                    dtype=torch.long if long_indices else torch.int32,
+                    (num_indices,),
+                    dtype=indices_dtype,
                     device=device,
                 )
+
+            # Calculate offsets from lengths
+            offsets = torch.cat(
+                [torch.tensor([0], device=device), lengths.cumsum(0)]
+            ).to(offsets_dtype)
+
             global_idlist_lengths.append(lengths)
             global_idlist_indices.append(indices)
-
-        if input_type == "kjt":
-            global_idlist_input = KeyedJaggedTensor(
-                keys=idlist_features,
-                values=torch.cat(global_idlist_indices),
-                lengths=torch.cat(global_idlist_lengths),
-            )
-        elif input_type == "td":
-            dict_of_nt = {
-                k: torch.nested.nested_tensor_from_jagged(
-                    values=values,
-                    lengths=lengths,
-                )
-                for k, values, lengths in zip(
-                    idlist_features, global_idlist_indices, global_idlist_lengths
-                )
-            }
-            global_idlist_input = TensorDict(source=dict_of_nt)
-        else:
-            raise ValueError(f"For IdList features, unknown input type {input_type}")
+            global_idlist_offsets.append(offsets)
 
         for idx, ind_range in enumerate(idscore_ind_ranges):
             lengths_ = torch.abs(
@@ -231,9 +234,12 @@ def _validate_pooling_factor(
                     if idscore_pooling_factor
                     else pooling_avg
                 )
-            ).int()
+            ).to(lengths_dtype)
+
             if variable_batch_size:
-                lengths = torch.zeros(batch_size * world_size, device=device).int()
+                lengths = torch.zeros(batch_size * world_size, device=device).to(
+                    lengths_dtype
+                )
                 for r in range(world_size):
                     lengths[r * batch_size : r * batch_size + batch_size_by_rank[r]] = (
                         lengths_[
@@ -242,39 +248,68 @@ def _validate_pooling_factor(
                     )
             else:
                 lengths = lengths_
+
             num_indices = cast(int, torch.sum(lengths).item())
+
             if randomize_indices:
                 indices = torch.randint(
                     0,
                     # pyre-ignore [6]
                     ind_range,
                     (num_indices,),
-                    dtype=torch.long if long_indices else torch.int32,
+                    dtype=indices_dtype,
                     device=device,
                 )
             else:
                 indices = torch.zeros(
-                    (num_indices),
-                    dtype=torch.long if long_indices else torch.int32,
+                    (num_indices,),
+                    dtype=indices_dtype,
                     device=device,
                 )
             weights = torch.rand((num_indices,), device=device)
+            # Calculate offsets from lengths
+            offsets = torch.cat(
+                [torch.tensor([0], device=device), lengths.cumsum(0)]
+            ).to(offsets_dtype)
+
             global_idscore_lengths.append(lengths)
             global_idscore_indices.append(indices)
             global_idscore_weights.append(weights)
+            global_idscore_offsets.append(offsets)
 
         if input_type == "kjt":
+            global_idlist_input = KeyedJaggedTensor(
+                keys=idlist_features,
+                values=torch.cat(global_idlist_indices),
+                offsets=torch.cat(global_idlist_offsets) if use_offsets else None,
+                lengths=torch.cat(global_idlist_lengths) if not use_offsets else None,
+            )
+
             global_idscore_input = (
                 KeyedJaggedTensor(
                     keys=idscore_features,
                     values=torch.cat(global_idscore_indices),
-                    lengths=torch.cat(global_idscore_lengths),
+                    offsets=torch.cat(global_idscore_offsets) if use_offsets else None,
+                    lengths=(
+                        torch.cat(global_idscore_lengths) if not use_offsets else None
+                    ),
                     weights=torch.cat(global_idscore_weights),
                 )
                 if global_idscore_indices
                 else None
             )
         elif input_type == "td":
+            dict_of_nt = {
+                k: torch.nested.nested_tensor_from_jagged(
+                    values=values,
+                    lengths=lengths,
+                )
+                for k, values, lengths in zip(
+                    idlist_features, global_idlist_indices, global_idlist_lengths
+                )
+            }
+            global_idlist_input = TensorDict(source=dict_of_nt)
+
             assert (
                 len(idscore_features) == 0
             ), "TensorDict does not support weighted features"
@@ -295,14 +330,20 @@ def _validate_pooling_factor(
 
         # Split global batch into local batches.
         local_inputs = []
+
         for r in range(world_size):
             local_idlist_lengths = []
             local_idlist_indices = []
+            local_idlist_offsets = []
+
             local_idscore_lengths = []
             local_idscore_indices = []
             local_idscore_weights = []
+            local_idscore_offsets = []
 
-            for lengths, indices in zip(global_idlist_lengths, global_idlist_indices):
+            for lengths, indices, offsets in zip(
+                global_idlist_lengths, global_idlist_indices, global_idlist_offsets
+            ):
                 local_idlist_lengths.append(
                     lengths[r * batch_size : r * batch_size + batch_size_by_rank[r]]
                 )
@@ -312,9 +353,15 @@ def _validate_pooling_factor(
                 local_idlist_indices.append(
                     indices[lengths_cumsum[r] : lengths_cumsum[r + 1]]
                 )
+                local_idlist_offsets.append(
+                    offsets[r * batch_size : r * batch_size + batch_size_by_rank[r] + 1]
+                )
 
-            for lengths, indices, weights in zip(
-                global_idscore_lengths, global_idscore_indices, global_idscore_weights
+            for lengths, indices, weights, offsets in zip(
+                global_idscore_lengths,
+                global_idscore_indices,
+                global_idscore_weights,
+                global_idscore_offsets,
             ):
                 local_idscore_lengths.append(
                     lengths[r * batch_size : r * batch_size + batch_size_by_rank[r]]
@@ -329,18 +376,32 @@ def _validate_pooling_factor(
                     weights[lengths_cumsum[r] : lengths_cumsum[r + 1]]
                 )
 
+                local_idscore_offsets.append(
+                    offsets[r * batch_size : r * batch_size + batch_size_by_rank[r] + 1]
+                )
+
             if input_type == "kjt":
                 local_idlist_input = KeyedJaggedTensor(
                     keys=idlist_features,
                     values=torch.cat(local_idlist_indices),
-                    lengths=torch.cat(local_idlist_lengths),
+                    offsets=torch.cat(local_idlist_offsets) if use_offsets else None,
+                    lengths=(
+                        torch.cat(local_idlist_lengths) if not use_offsets else None
+                    ),
                 )
 
                 local_idscore_input = (
                     KeyedJaggedTensor(
                         keys=idscore_features,
                         values=torch.cat(local_idscore_indices),
-                        lengths=torch.cat(local_idscore_lengths),
+                        offsets=(
+                            torch.cat(local_idscore_offsets) if use_offsets else None
+                        ),
+                        lengths=(
+                            torch.cat(local_idscore_lengths)
+                            if not use_offsets
+                            else None
+                        ),
                         weights=torch.cat(local_idscore_weights),
                     )
                     if local_idscore_indices
@@ -353,15 +414,16 @@ def _validate_pooling_factor(
                         lengths=lengths,
                     )
                     for k, values, lengths in zip(
-                        idlist_features, local_idlist_indices, local_idlist_lengths
+                        idlist_features,
+                        local_idlist_indices,
+                        local_idlist_lengths,
                     )
                 }
                 local_idlist_input = TensorDict(source=dict_of_nt)
                 assert (
                     len(idscore_features) == 0
                 ), "TensorDict does not support weighted features"
                 local_idscore_input = None
-
             else:
                 raise ValueError(
                     f"For weighted features, unknown input type {input_type}"
diff --git a/torchrec/distributed/test_utils/test_model_parallel_base.py b/torchrec/distributed/test_utils/test_model_parallel_base.py
@@ -93,6 +93,7 @@ def _test_sharded_forward(
         dedup_tables: Optional[List[EmbeddingTableConfig]] = None,
         weighted_tables: Optional[List[EmbeddingTableConfig]] = None,
         constraints: Optional[Dict[str, ParameterConstraints]] = None,
+        # pyre-ignore [9]
         generate: ModelInputCallable = ModelInput.generate,
     ) -> None:
         default_rank = 0
diff --git a/torchrec/distributed/test_utils/test_sharding.py b/torchrec/distributed/test_utils/test_sharding.py
@@ -148,6 +148,7 @@ def gen_model_and_input(
     tables: List[EmbeddingTableConfig],
     embedding_groups: Dict[str, List[str]],
     world_size: int,
+    # pyre-ignore [9]
     generate: Union[
         ModelInputCallable, VariableBatchModelInputCallable
     ] = ModelInput.generate,
@@ -344,6 +345,7 @@ def sharding_single_rank_test(
         (global_model, inputs) = gen_model_and_input(
             model_class=model_class,
             tables=tables,
+            # pyre-ignore [6]
             generate=(
                 cast(
                     VariableBatchModelInputCallable,