Skip to content

Update items in the dataset without map #7520

Open
@mashdragon

Description

@mashdragon

Feature request

I would like to be able to update items in my dataset without affecting all rows. At least if there was a range option, I would be able to process those items, save the dataset, and then continue.

If I am supposed to split the dataset first, that is not clear, since the docs suggest that any of those functions returns a new object, so I don't think I can do that.

Motivation

I am applying an extremely time-consuming function to each item in my Dataset. Unfortunately, datasets only supports updating values via map, so if my computer dies in the middle of this long-running process, I lose all progress. This is far from ideal. I would like to use datasets throughout this processing, but this limitation is now forcing me to write my own dataset format just to do this intermediary operation.

It would be less intuitive but I suppose I could split and then concatenate the dataset before saving? But this feels very inefficient.

Your contribution

I can test the feature.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions