Deterministic, parallel data iteration

The parallel `eachobs` implementation is not deterministic in that observations are returned as soon as they are loaded, so they may be returned out of order. This is very performant, and fine for some use cases like training, where data should be shuffled anyway.

To give the option to have a deterministic iteration would be helpful in many use cases, though.

This could be implemented as a wrapper around an existing iterator that does the following:

- instead of iterating over `data` with the wrapped iterator, iterate over `(1:nobs(data), data)` to preserve ordering information
- collect returned observations, stripping the index
- return an observation only if all previous (by index) observations have been returned

I am unsure by how much this will affect performance and memory usage and how the interplay is with `buffersize`. Are there alternative approaches to this implementation? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deterministic, parallel data iteration #68

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Deterministic, parallel data iteration #68

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions