-
Notifications
You must be signed in to change notification settings - Fork 22
Open
Labels
enhancementNew feature or requestNew feature or request
Description
The parallel eachobs
implementation is not deterministic in that observations are returned as soon as they are loaded, so they may be returned out of order. This is very performant, and fine for some use cases like training, where data should be shuffled anyway.
To give the option to have a deterministic iteration would be helpful in many use cases, though.
This could be implemented as a wrapper around an existing iterator that does the following:
- instead of iterating over
data
with the wrapped iterator, iterate over(1:nobs(data), data)
to preserve ordering information - collect returned observations, stripping the index
- return an observation only if all previous (by index) observations have been returned
I am unsure by how much this will affect performance and memory usage and how the interplay is with buffersize
. Are there alternative approaches to this implementation?
terasakisatoshi
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request