Skip to content

Commit 3169481

Browse files
TomekWeifacebook-github-bot
authored andcommitted
Fix bug in criteo.py that caused NaN issues (#2150)
Summary: The original script simply added 3 to the target value before taking the log. This led to the issue that in data preprocessing, if there was a value of -3, it would result in a value of -inf. This problem was mentioned in the issue facebookresearch/dlrm#363 (comment). I changed the preprocessing operation to dense_np -= dense_np.min() - 2 in the tsv_to_npys function, and correctly handled the Criteo Kaggle dataset. Pull Request resolved: #2150 Reviewed By: spmex Differential Revision: D77049372 Pulled By: TroyGarden fbshipit-source-id: c9e2d0de08babd97066810e8f66ac7009e8bd29b
1 parent 4b3f60c commit 3169481

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

torchrec/datasets/criteo.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -258,7 +258,7 @@ def row_mapper(row: List[str]) -> Tuple[List[int], List[int], int]:
258258
del labels
259259

260260
# Log is expensive to compute at runtime.
261-
dense_np += 3
261+
dense_np -= dense_np.min() - 2
262262
dense_np = np.log(dense_np, dtype=np.float32)
263263

264264
# To be consistent with dense and sparse.

0 commit comments

Comments
 (0)