-
Notifications
You must be signed in to change notification settings - Fork 35
Description
Describe the bug
If I have a pipeline error and I pass nextflow the -resume flag the pipeline restarts at SPADES, even if downstream modules have completed. I think this is because of the way that SPADES is being run. Because the input is given path(full_outdir) that would mean anytime the output dir changes, this module would need to be re-run.
Impact
When there is an error in one of the last rules, the majority (and most time-consuming) part of the pipeline has to be re-run, even though these rules have already gone to completion.
To Reproduce
- What environment you are running the pipeline in (i.e. HPC, local, cloud etc): AWS EC2
- What version of the pipeline you are using: 2.0.2
- The command that was run to cause the error:
nextflow run phoenix/main.nf -resume -profile docker -entry PHOENIX --input manifest/samplesheet_01.csv --outdir /path/to/output - If you used a custom config file please provide it: NA
- The text of the error itself - screenshots are great for this! : There isn't an error, just a bug
- Include the files that caused the error
- (if the file(s) contains something you don't want public open an issue with a brief description AND then email [email protected] with the title QH ISSUE # so we can track the problem. DO NOT send PII!: happens with any file input
Expected behavior
I would expect that the resume flag would continue the pipeline from the point at which it failed.
Additional context
I did try to see if I could come up with a quick solution that didn't include rewriting the SPADES module.... If you remove the path(outdir) from the input and edit the code to avoid using the full path variable (I basically stopped the fail log from being created) it runs with a warning (which makes sense given the tuple is incomplete):
WARN: Input tuple does not match input set cardinality declared by process `PHOENIX_SLIM:PHOENIX_EXTERNAL_SLIM:SPADES_WF:SPADES` -- offending value: [[id:v2], [/output/dir/work/82/e8d8d3abb9c77bd86eb3baad10465c/v2_1.trim.fastq.gz, /output/dir/work/82/e8d8d3abb9c77bd86eb3baad10465c/v2_2.trim.fastq.gz], /output/dir/work/c5/aef45b22ff908136d1fb98e291ce76/v2.singles.fastq.gz, /output/dir/work/fb/b3d5f52649fc64c19d8e2aea70f4e0/v2.kraken2_trimd.top_kraken_hit.txt, /output/dir/work/89/5d37276a9cc8e20e0ac46dd90b15d5/v2_raw_read_counts.txt, /output/dir/work/c1/cf4bb3c83f69750599b08228b27da5/v2_trimmed_read_counts.txt, /output/dir/work/a4/48219be006a02dcd541fb808ab5cb5/v2.kraken2_trimd.summary.txt, /output/dir/]
This isn't ideal though, since I can see that this is important for error logging, if SPADES doesn't create the outputs as expected. Hoping you guys might have another cleaner solution, or maybe a different perspective on why this might not be working as expected for me?