Skip to content
This repository was archived by the owner on Oct 14, 2018. It is now read-only.
This repository was archived by the owner on Oct 14, 2018. It is now read-only.

Asynchronous algorithms and "Good enough" RandomSearchCV #32

@mrocklin

Description

@mrocklin

It would be interesting to start exploring asynchronous algorithms within this project using the dask.distributed API. Because this API is somewhat different it might be wise to start with something simple.

One simple application would be to build a variant of RandomSearchCV that, instead of taking a number of candidates to try, instead took a stopping criterion like "have tried 100 options and not improved more than 1%" and then continued submitting computations while this has not been met.

My initial approach to do this would be to periodically check the number of cores I had

client = `distributed.client.default_client()`
ncores = sum(client.ncores().valuse())

and try to keep roughly twice that many candidates in flight

candidate_pool = create_infinite_candidates(parameterspace)
futures = client.map(try_and_score, list(toolz.take(ncores * 2, candidate_pool)))

Then I would consume those futures as they finished

af = distributed.client.as_finished(futures)
for future in af:
    score, params = future.result()
    if score > best:
        best = score
        best_params = params
        ...

and then submit new futures as necessary

    future = client.submit(try_and_score, next(candidate_pool))
    af.add(future)

If we wanted to be cool, we could also check the number of cores periodically and submit more or less accordingly.

cc @jcrist @amueller

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions