Training a sdxl image to text model, but getting following error #8706
preethamp0197
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Map: 49%|████▉ | 977000/1985039 [5:01:52<5:08:01, 54.54 examples/s]
Map: 50%|█████ | 993000/1985039 [5:01:56<4:58:49, 55.33 examples/s]
Map: 50%|█████ | 994000/1985039 [5:01:56<5:05:28, 54.07 examples/s]
Map: 51%|█████ | 1017000/1985039 [5:01:56<4:45:43, 56.47 examples/s]
Map: 49%|████▉ | 975000/1985039 [5:01:56<5:11:09, 54.10 examples/s]
Map: 50%|█████ | 1000000/1985039 [5:01:57<5:05:26, 53.75 examples/s]
Map: 50%|█████ | 994000/1985039 [5:01:58<4:56:14, 55.76 examples/s]
Map: 51%|█████ | 1009000/1985039 [5:01:58<4:57:23, 54.70 examples/s]
Map: 49%|████▉ | 976000/1985039 [5:02:01<5:11:59, 53.90 examples/s]
Map: 51%|█████ | 1010000/1985039 [5:02:01<4:55:56, 54.91 examples/s]
Map: 50%|█████ | 1001000/1985039 [5:02:04<5:01:38, 54.37 examples/s]
Map: 50%|█████ | 994000/1985039 [5:02:07<5:05:28, 54.07 examples/s]
Map: 51%|█████ | 1017000/1985039 [5:02:08<4:45:43, 56.47 examples/s]
Map: 49%|████▉ | 977000/1985039 [5:02:09<5:08:01, 54.54 examples/s]
Map: 49%|████▉ | 978000/1985039 [5:02:10<5:07:39, 54.55 examples/s]
Map: 50%|█████ | 994000/1985039 [5:02:10<4:56:14, 55.76 examples/s]
Map: 50%|█████ | 995000/1985039 [5:02:14<5:01:36, 54.71 examples/s]
Map: 51%|█████▏ | 1018000/1985039 [5:02:14<4:46:07, 56.33 examples/s][2024-06-25 13:00:06,795] torch.distributed.elastic.agent.server.api: [WARNING] Received Signals.SIGTERM death signal, shutting down workers
[2024-06-25 13:00:06,796] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 75 closing signal SIGTERM
[2024-06-25 13:00:06,796] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 76 closing signal SIGTERM
[2024-06-25 13:00:06,797] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 77 closing signal SIGTERM
[2024-06-25 13:00:06,797] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 78 closing signal SIGTERM
[2024-06-25 13:00:06,798] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 79 closing signal SIGTERM
[2024-06-25 13:00:06,798] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 80 closing signal SIGTERM
[2024-06-25 13:00:06,798] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 81 closing signal SIGTERM
[2024-06-25 13:00:06,799] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 82 closing signal SIGTERM
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.9/dist-packages/accelerate/commands/accelerate_cli.py", line 46, in main
args.func(args)
File "/usr/local/lib/python3.9/dist-packages/accelerate/commands/launch.py", line 1073, in launch_command
multi_gpu_launcher(args)
File "/usr/local/lib/python3.9/dist-packages/accelerate/commands/launch.py", line 718, in multi_gpu_launcher
distrib_run.run(args)
File "/usr/local/lib/python3.9/dist-packages/torch/distributed/run.py", line 803, in run
elastic_launch(
File "/usr/local/lib/python3.9/dist-packages/torch/distributed/launcher/api.py", line 135, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.9/dist-packages/torch/distributed/launcher/api.py", line 259, in launch_agent
result = agent.run()
File "/usr/local/lib/python3.9/dist-packages/torch/distributed/elastic/metrics/api.py", line 123, in wrapper
result = f(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/torch/distributed/elastic/agent/server/api.py", line 727, in run
result = self._invoke_run(role)
File "/usr/local/lib/python3.9/dist-packages/torch/distributed/elastic/agent/server/api.py", line 868, in _invoke_run
time.sleep(monitor_interval)
File "/usr/local/lib/python3.9/dist-packages/torch/distributed/elastic/multiprocessing/api.py", line 62, in _terminate_process_handler
raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval)
torch.distributed.elastic.multiprocessing.api.SignalException: Process 1 got signal: 15
Please help me solve this issue
Beta Was this translation helpful? Give feedback.
All reactions