This repository was archived by the owner on Jan 9, 2020. It is now read-only.
This repository was archived by the owner on Jan 9, 2020. It is now read-only.
job failed after shuffle pod restart #606
Open
Description
How to reproduce
- Submit a job like PageRank use external shuffle service
- After executors running, stop some external-shuffle-service pod in executor's host
- The external-shuffle-service pod will restart with some new pod IP
- Driver exit with failed status
See the log in driver/executor, it shows pod always try to fetch block using old shuffle-pod-ip
Metadata
Metadata
Assignees
Labels
No labels