Commit 5cea44a
Set intra_group_size from env var inside comm.py. (meta-pytorch#3697)
Summary:
Context
---------
TorchRec comms needs a way to obtain the pod-size (topology-domain-multiple) and the total amount of process groups of the topology group for TWRW/Grid-sharding.
We obtain the number of intra-nodes within a pod by obtaining the `TOPOLOGY_DOMAIN_MULTIPLE` from the environment variables (see diff stack).
Implementation
------------------
- created `get_intra_group_size` function that obtains the number of intra-node-size from envrionemnt variable, and if not it defaults to usual `get_local_size`.
- updated `intra_and_cross_node_pg` to utilize `get_intra_node_size` instead.
Differential Revision: D916178891 parent 302da75 commit 5cea44a
File tree
4 files changed
+77
-6
lines changed- torchrec/distributed
- test_utils
- tests
4 files changed
+77
-6
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
115 | 115 | | |
116 | 116 | | |
117 | 117 | | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
118 | 158 | | |
119 | 159 | | |
120 | 160 | | |
121 | 161 | | |
122 | 162 | | |
123 | 163 | | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
124 | 180 | | |
125 | 181 | | |
126 | 182 | | |
| |||
130 | 186 | | |
131 | 187 | | |
132 | 188 | | |
133 | | - | |
134 | | - | |
135 | | - | |
136 | | - | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
137 | 195 | | |
138 | 196 | | |
139 | 197 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
140 | 140 | | |
141 | 141 | | |
142 | 142 | | |
| 143 | + | |
143 | 144 | | |
144 | 145 | | |
145 | 146 | | |
| |||
173 | 174 | | |
174 | 175 | | |
175 | 176 | | |
| 177 | + | |
176 | 178 | | |
177 | 179 | | |
178 | 180 | | |
| |||
205 | 207 | | |
206 | 208 | | |
207 | 209 | | |
| 210 | + | |
208 | 211 | | |
209 | 212 | | |
210 | 213 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
770 | 770 | | |
771 | 771 | | |
772 | 772 | | |
| 773 | + | |
773 | 774 | | |
774 | 775 | | |
775 | 776 | | |
| |||
869 | 870 | | |
870 | 871 | | |
871 | 872 | | |
| 873 | + | |
872 | 874 | | |
873 | 875 | | |
874 | 876 | | |
875 | 877 | | |
876 | | - | |
877 | 878 | | |
878 | 879 | | |
879 | 880 | | |
| |||
1057 | 1058 | | |
1058 | 1059 | | |
1059 | 1060 | | |
| 1061 | + | |
1060 | 1062 | | |
1061 | 1063 | | |
1062 | 1064 | | |
| |||
1098 | 1100 | | |
1099 | 1101 | | |
1100 | 1102 | | |
| 1103 | + | |
1101 | 1104 | | |
1102 | 1105 | | |
1103 | 1106 | | |
| |||
Lines changed: 8 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
| 63 | + | |
63 | 64 | | |
64 | 65 | | |
65 | 66 | | |
| |||
92 | 93 | | |
93 | 94 | | |
94 | 95 | | |
| 96 | + | |
95 | 97 | | |
96 | 98 | | |
97 | 99 | | |
| |||
111 | 113 | | |
112 | 114 | | |
113 | 115 | | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
114 | 120 | | |
115 | 121 | | |
116 | 122 | | |
| |||
123 | 129 | | |
124 | 130 | | |
125 | 131 | | |
| 132 | + | |
126 | 133 | | |
127 | | - | |
| 134 | + | |
128 | 135 | | |
129 | 136 | | |
130 | 137 | | |
| |||
0 commit comments