Asynchronous algorithms and "Good enough" RandomSearchCV

It would be interesting to start exploring asynchronous algorithms within this project using the dask.distributed API.  Because this API is somewhat different it might be wise to start with something simple.  

One simple application would be to build a variant of RandomSearchCV that, instead of taking a number of candidates to try, instead took a stopping criterion like "have tried 100 options and not improved more than 1%" and then continued submitting computations while this has not been met.

My initial approach to do this would be to periodically check the number of cores I had

    client = `distributed.client.default_client()`
    ncores = sum(client.ncores().valuse())

and try to keep roughly twice that many candidates in flight

    candidate_pool = create_infinite_candidates(parameterspace)
    futures = client.map(try_and_score, list(toolz.take(ncores * 2, candidate_pool)))

Then I would consume those futures as they finished

    af = distributed.client.as_finished(futures)
    for future in af:
        score, params = future.result()
        if score > best:
            best = score
            best_params = params
            ...

and then submit new futures as necessary

        future = client.submit(try_and_score, next(candidate_pool))
        af.add(future)

If we wanted to be cool, we could also check the number of cores periodically and submit more or less accordingly.

cc @jcrist @amueller

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Asynchronous algorithms and "Good enough" RandomSearchCV #32

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Asynchronous algorithms and "Good enough" RandomSearchCV #32

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions