Skip to content

Implementing HA Active/Active with distributed file system #594

@saipradeepkumar

Description

@saipradeepkumar

Hi kevin,

We are trying to implement the HA active/active.as you mentioned here #562 (comment) we are able to handle the scenario3" logic to the kernel web handlers such that if a request comes in for an unknown kernel ID, we attempt to re-establish connections to that kernel using persisted state (if any)".we had tested one scenario by running 2 instances of enterprise gateway and launched 4 spark-python yarn cluster kernels.in that i killed killed one instance by using kill-9 process id .after that i checked my kernels out of them two are responding other two are not responding.if i do reconnect or close & open my notebook it is working.

I need your help in understanding how only two of them are working with out reconnect.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions