Update Model Input Callable Protocol to enable configuring index/offset/length type (#2768)

basilwong · facebook-github-bot · commit a3eee1927372 · 2025-02-27T00:38:03.000-08:00
Summary: Pull Request resolved: #2768 # Diff Specific Changes Updates the Model Input Callable Protocol in TorchRec to enable configuring the index/offset/length type. The changes include adding new parameters to the ModelInput class constructor, which allow users to specify the data type of indices, offsets, and lengths. # Context Doc: https://docs.google.com/document/d/1YVfxsafqXkxAAdRyXbjmSH4AEz3-6DBiTGjs1rT8ZHQ/edit?usp=sharing Updating the TorchRec unit test suite to cover int32 and int64 indices/offets support. # Summary Specifically for the [test_model_parallel](https://www.internalfb.com/code/fbsource/[3505ccb75a649a7d21218bcda126d1e8392afc5a]/fbcode/torchrec/distributed/test_utils/test_model_parallel.py?lines=34) suite that I am looking at the change appears to be fairly straightforward. 1.The [ModelParallelTestShared](https://www.internalfb.com/code/fbsource/fbcode/torchrec/distributed/test_utils/test_model_parallel.py?lines=34) class defines a [test suite python library](https://www.internalfb.com/code/fbsource/[cbd0bd0020a7afbec4922d8abc0d88b7d45cba56]/fbcode/torchrec/distributed/test_utils/TARGETS?lines=65-69) referenced by multiple unit tests in the TorchRec codebase including [test_model_parallel_nccl](https://www.internalfb.com/code/fbsource/[cbd0bd0020a7afbec4922d8abc0d88b7d45cba56]/fbcode/torchrec/distributed/tests/TARGETS?lines=85-100) in which we are particularly interested in for this particular case. The method all of the unit tests in this class use is [`_test_sharding`](https://www.internalfb.com/code/fbsource/[fa9508a29b62ce57681ee73cd6d4cac56f153a58]/fbcode/torchrec/distributed/test_utils/test_model_parallel.py?lines=132). Within the `_test_sharding` function, the "callable" argument input to the [`_run_multi_process_test`](https://www.internalfb.com/code/symbol/fbsource/py/fbcode/caffe2.torch.fb.hpc.tests.sparse_data_dist_test.SparseDataDistTest._run_multi_process_test) function is [`sharding_single_rank_test`](https://www.internalfb.com/code/fbsource/[fa9508a29b62ce57681ee73cd6d4cac56f153a58]/fbcode/torchrec/distributed/test_utils/test_sharding.py?lines=296) which shows us how the input data/model is generated. Additional arguments will need to be added to both the [`_test_sharding`](https://www.internalfb.com/code/fbsource/[fa9508a29b62ce57681ee73cd6d4cac56f153a58]/fbcode/torchrec/distributed/test_utils/test_model_parallel.py?lines=132) and [`_run_multi_process_test`](https://www.internalfb.com/code/symbol/fbsource/py/fbcode/caffe2.torch.fb.hpc.tests.sparse_data_dist_test.SparseDataDistTest._run_multi_process_test) functions. 2.The [`sharding_single_rank_test`](https://www.internalfb.com/code/fbsource/[fa9508a29b62ce57681ee73cd6d4cac56f153a58]/fbcode/torchrec/distributed/test_utils/test_sharding.py?lines=296) function is where we define additional kwargs. This function leverages the [`gen_model_and_input`](https://www.internalfb.com/code/fbsource/[f7e6a3281d924b465e0e90ff079aa9df83ae9530]/fbcode/torchrec/distributed/test_utils/test_sharding.py?lines=131) to define the test model and more importantly for our purposes the input tables. ``` generate=(cast(VariableBatchModelInputCallable, ModelInput.generate_variable_batch_input) if variable_batch_per_feature else ModelInput.generate), ``` 3.The [ModelInput](https://www.internalfb.com/code/fbsource/[4217c068fa966d569d2042a7263cefe1a06dc87a]/fbcode/torchrec/distributed/test_utils/test_model.py?lines=48) class' [`generate`](https://www.internalfb.com/code/fbsource/[4217c068fa966d569d2042a7263cefe1a06dc87a]/fbcode/torchrec/distributed/test_utils/test_model.py?lines=55) and [`generate_variable_batch_input`](https://www.internalfb.com/code/fbsource/[4217c068fa966d569d2042a7263cefe1a06dc87a]/fbcode/torchrec/distributed/test_utils/test_model.py?lines=589) methods are used to generate the input tensors used in the unit tests. All we need to do is add new arguments that enable configuring the index/offset type of the tables. # Diff stack change summary: a. Update the generate_variable_batch_input to enable configuring index/offset/length type b. Update the generate to enable configuring index/offset/length type c. Update Model Input Callable Protocol to Enable Configuring index/offset/length type d. test_model_parallel: new test for different table index types e. Deprecate long_indices argument for torch.dtype arguments Reviewed By: TroyGarden Differential Revision: D70055498 fbshipit-source-id: 2e06c3d647e206bd92aa2becb05fd27f05818f62
diff --git a/torchrec/distributed/test_utils/test_sharding.py b/torchrec/distributed/test_utils/test_sharding.py
@@ -126,6 +126,10 @@ def __call__(
             Union[List[EmbeddingTableConfig], List[EmbeddingBagConfig]]
         ] = None,
         variable_batch_size: bool = False,
+        use_offsets: bool = False,
+        indices_dtype: torch.dtype = torch.int64,
+        offsets_dtype: torch.dtype = torch.int64,
+        lengths_dtype: torch.dtype = torch.int64,
         long_indices: bool = True,
     ) -> Tuple["ModelInput", List["ModelInput"]]: ...
 
@@ -140,6 +144,10 @@ def __call__(
         weighted_tables: Union[List[EmbeddingTableConfig], List[EmbeddingBagConfig]],
         pooling_avg: int = 10,
         global_constant_batch: bool = False,
+        use_offsets: bool = False,
+        indices_dtype: torch.dtype = torch.int64,
+        offsets_dtype: torch.dtype = torch.int64,
+        lengths_dtype: torch.dtype = torch.int64,
     ) -> Tuple["ModelInput", List["ModelInput"]]: ...
 
 
@@ -161,10 +169,14 @@ def gen_model_and_input(
     variable_batch_size: bool = False,
     batch_size: int = 4,
     feature_processor_modules: Optional[Dict[str, torch.nn.Module]] = None,
-    long_indices: bool = True,
+    use_offsets: bool = False,
+    indices_dtype: torch.dtype = torch.int64,
+    offsets_dtype: torch.dtype = torch.int64,
+    lengths_dtype: torch.dtype = torch.int64,
     global_constant_batch: bool = False,
     num_inputs: int = 1,
     input_type: str = "kjt",  # "kjt" or "td"
+    long_indices: bool = True,
 ) -> Tuple[nn.Module, List[Tuple[ModelInput, List[ModelInput]]]]:
     torch.manual_seed(0)
     if dedup_feature_names:
@@ -205,6 +217,10 @@ def gen_model_and_input(
                     tables=tables,
                     weighted_tables=weighted_tables or [],
                     global_constant_batch=global_constant_batch,
+                    use_offsets=use_offsets,
+                    indices_dtype=indices_dtype,
+                    offsets_dtype=offsets_dtype,
+                    lengths_dtype=lengths_dtype,
                 )
             )
     elif generate == ModelInput.generate:
@@ -218,8 +234,12 @@ def gen_model_and_input(
                     num_float_features=num_float_features,
                     variable_batch_size=variable_batch_size,
                     batch_size=batch_size,
-                    long_indices=long_indices,
                     input_type=input_type,
+                    use_offsets=use_offsets,
+                    indices_dtype=indices_dtype,
+                    offsets_dtype=offsets_dtype,
+                    lengths_dtype=lengths_dtype,
+                    long_indices=long_indices,
                 )
             )
     else:
@@ -233,6 +253,10 @@ def gen_model_and_input(
                     num_float_features=num_float_features,
                     variable_batch_size=variable_batch_size,
                     batch_size=batch_size,
+                    use_offsets=use_offsets,
+                    indices_dtype=indices_dtype,
+                    offsets_dtype=offsets_dtype,
+                    lengths_dtype=lengths_dtype,
                     long_indices=long_indices,
                 )
             )
@@ -336,6 +360,10 @@ def sharding_single_rank_test(
     input_type: str = "kjt",  # "kjt" or "td"
     allow_zero_batch_size: bool = False,
     custom_all_reduce: bool = False,  # 2D parallel
+    use_offsets: bool = False,
+    indices_dtype: torch.dtype = torch.int64,
+    offsets_dtype: torch.dtype = torch.int64,
+    lengths_dtype: torch.dtype = torch.int64,
 ) -> None:
     with MultiProcessContext(rank, world_size, backend, local_size) as ctx:
         batch_size = (
@@ -363,6 +391,10 @@ def sharding_single_rank_test(
             feature_processor_modules=feature_processor_modules,
             global_constant_batch=global_constant_batch,
             input_type=input_type,
+            use_offsets=use_offsets,
+            indices_dtype=indices_dtype,
+            offsets_dtype=offsets_dtype,
+            lengths_dtype=lengths_dtype,
         )
         global_model = global_model.to(ctx.device)
         global_input = inputs[0][0].to(ctx.device)