-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
When I use aggregate_moe_loss_stats function then i get import error
Traceback (most recent call last):
File "/nemo_run/code/train_moe.py", line 141, in _log_moe_metrics
moe_loss_dict = aggregate_moe_loss_stats(loss_scale=1.0)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/NeMo/nemo/lightning/megatron_parallel.py", line 1833, in aggregate_moe_loss_stats
tracker = parallel_state.get_moe_layer_wise_logging_tracker()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: module 'megatron.core.parallel_state' has no attribute 'get_moe_layer_wise_logging_tracker'
This is present in nvcr.io/nvidia/nemo:25.07 & nvcr.io/nvidia/nemo:25.07.02