Add support for Gradient Accumulation to enable training with larger effective batch sizes on resource-constrained hardware. This feature is crucial for training high-quality embedding models when GPU memory is limited, allowing users to simulate large batch training by accumulating gradients over multiple smaller batches before performing optimization steps.
Add support for Gradient Accumulation to enable training with larger effective batch sizes on resource-constrained hardware. This feature is crucial for training high-quality embedding models when GPU memory is limited, allowing users to simulate large batch training by accumulating gradients over multiple smaller batches before performing optimization steps.