You have almost certainly encountered a problem with some software in your day to day life at some point. The error message may have given you a cryptic number, or code to describe what went wrong. If documented and understood, such error codes could be used to diagnose and perhaps resolve a technical fault. For example, when your browser shows you a message to explain that a web page has not been found, it is reacting to a 404 code; which is defined by a well-established set of agreed upon codes for web requests.
We try our best to make our workflows as robust as possible, but encountering software errors is an inevitability. Your cluster could lose connectivity with your job scheduler, or you could run out of disk space, or the website we use to host our workflows is unreachable at just the wrong moment. Some of the user reports we get are the result of problems that can be fixed by a user in-the-know without having to contact us.
The aim of this blog post is to introduce you to error codes that may arise while running our workflows on the command line, or through our EPI2ME Desktop application, and what you need to do to resolve them. Much of what is introduced here is equally applicable to other command line software too.
One of the first things that we will do when reading a user report for a problem with a workflow, is look for the exit status. To the uninitiated, these simple numbers seem like an unhelpful piece of information, but once you have finished reading this blog post, you too will be able to cut through lines of red text and diagnose some problems immediately. Bioinformatics is generally not well known for its user experience (and our team’s core goal is to change this), so you would be forgiven if you have not looked at an error produced by Nextflow too closely before — here is one now:
% nextflow run main.nfN E X T F L O W ~ version 23.04.4Launching `main.nf` [nauseous_ramanujan] DSL2 - revision: 734d48c47dexecutor > local (3)executor > local (3)[b1/4f1c6b] process > countOwls (1) [100%] 1 of 1, failed: 1ERROR ~ Error executing process > 'countOwls (2)'Caused by:Process `countOwls (2)` terminated with an error exit status (1)Command executed:#!/usr/bin/env python3raise Exception("Not enough owls.")Command exit status:1Command output:(empty)Command error:Traceback (most recent call last):File ".command.sh", line 2, in <module>raise Exception("Not enough owls.")Exception: Not enough owls.Work dir:/Users/Sam.Nicholls/scratch/nf-testing/exit_codes/work/79/f9172717a1de210d6e6e17a45a0481Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`-- Check '.nextflow.log' file for details
Here’s what we know by reading each of the sections of this human readable output:
This is not much of a whodunnit, our Python script has raised an Exception. We’re going to have to get some more owls! In the case of a real problem, the command execution, output and error sections may have a lot more lines of information which may or may not be related to the problem. This is the very reason why the command exit status can be so useful.
If you’re using the EPI2ME Desktop application, this Nextflow output can be found on the “Logs” tab:
Now that you know how to read Nextflow errors and locate the command exit status, we can cover what these mysterious numbers mean. The tables below enumerate common (and not so common) command exit status codes, why they might manifest and what you could try to do about it.
Despite their ubiquity, exit codes mostly rely on historical conventions and few are reserved for specific meaning. The most common exit codes are:
Exit status | Name | Potential causes | Possible next steps |
---|---|---|---|
0 | OK (successful exit) | A successful process | Nextflow will error on a successful process if output files that should be present are not present. The process has failed to produce output that we always expect and our workflow has not accounted for a case where a file may optionally be generated. You should open an issue with us. |
1 | Failure | General catch-all for an application error | Open an issue with us. |
2 | Shell misuse | General catch-all for misuse of the shell commands | Open an issue with us. |
125 | Docker command error | The command to run a Docker container did not execute successfully. | Check the Docker engine is running and that you have permissions to start containers. See our install guide. |
127 | Command not found | The command line tool the workflow is trying to run cannot be found. | You may encounter this if you do not have Docker or Singularity installed, which is required to run the dependencies of our workflows. Check our install guide for how to install Docker. Otherwise, ensure that you have not provided a Nextflow configuration file that has changed the executor or container directives. You may have inadvertently caused the workflow to run outside of a container and the dependencies are not on your computer. |
255 | Out of range | The exit status was beyond the allowed range of 0-255 | Check the error message for actionable information about your compute environment and contact your system administrator. Otherwise, open an issue with us. |
Exit codes beyond 128 have a special meaning and indicate that a process was sent a “signal” by the operating system. Many common signals are associated with a signal number and the signal can be determined by subtracting 128 from the exit code. These codes are often encountered when running workflows in compute clusters and containers and useful examples are included below.
Exit status | Name | Potential causes | Possible next steps |
---|---|---|---|
128 + 2 = 130 | SIGINT | Process was interrupted | The process was sent the interrupt signal, you may have used Ctrl+C to quit the Nextflow workflow and Nextflow forwarded the signal to your processes. Some clusters will use this signal to stop processes that have exceeded their time or memory limit. You should speak to your system administrator to diagnose the reason that the job was terminated. |
128 + 6 = 134 | SIGABRT | Process aborted | The process has aborted as it cannot continue. Some clusters will use this signal to stop processes that have exceeded their time or memory limit. You should speak to your system administrator to diagnose the reason that the job was terminated. If you are not using a cluster, open an issue with us. |
128 + 7 = 135 | SIGBUS | Invalid memory address | You should speak to your system administrator to check the installation of prerequisites like Docker. If you are not using a cluster, open an issue with us. |
128 + 9 = 137 | SIGKILL | Process was killed | The process was forcibly killed. This may be your operating system invoking the out-of-memory (OOM) killer to stop a process from freezing your computer. Some clusters will send a kill signal to a job for exceeding its memory or time limit. You should speak to your system administrator to diagnose the reason that the job was terminated. See additional guidance below this table. |
128 + 10 = 138 | SIGUSR1 | Process received user signal (1) | Typically observed when a your cluster is signalling that the job should wrap up because it is about to exceed a time or memory limit. You should speak to your system administrator to help you adjust your job resource requests. See additional guidance below this table. |
128 + 11 = 139 | SIGSEGV | Invalid memory access | The process tried to access some memory it should not have, open an issue with us. |
128 + 12 = 140 | SIGUSR2 | Process received user signal (2) | Typically observed when your cluster is signalling that the job should wrap up because it is about to exceed a time or memory limit. You should speak to your system administrator to help you adjust your job resource requests. See additional guidance below this table. |
128 + 15 = 143 | SIGTERM | Process was terminated | Some clusters will use this signal to stop processes that have exceeded their time or memory limit. You should speak to your system administrator to diagnose the reason that the job was terminated. |
Several of the above exit codes are encountered across different cluster schedulers when a job tries to use more memory than has been requested by the workflow configuration. We try to define sensible limits for common use cases of each workflow but we cannot get this right for every possible use case. In these cases, you will need to increase the memory limit applied to the process that encountered the problem. You can do this by using Nextflow process selectors to apply configuration to particular parts of our workflow.
For example, to increase the memory for a process called “countOwls”, create a file (for example, named increase_memory.config
) with the following configuration (replacing XX with a number higher than the memory limit defined in the workflow’s configuration; i.e. the nextflow.config
file):
process {withName:countOwls {memory = "XX GB"}}
When invoking Nextflow with nextflow run
, reference this extra configuration with -c increase_memory.config
.
If you are using the EPI2ME Desktop application, you can instead specify additional Nextflow config options such as the example provided here in the “Extra configuration” options section at the bottom of the launch window. The configuration you provide in that text box will be loaded into the workflow.
Introduced by the BSD operating system for its mail utility, the sysexits
library attempted to standardise exit codes to indicate common problems when processing input arguments and files from users.
Usage of sysexit codes does not appear to have caught on very widely but some well used programs such as the AWS CLI do use them.
We aim to use some of the sysexit codes in Python scripts that we include with our workflows. Although we use Nextflow to catch these codes and generate clear error messages to users, they are presented here in case you see one!
Exit status | Name | Potential causes | Possible next steps |
---|---|---|---|
64 | Usage error | A command was used incorrectly | Check your workflow parameters to ensure they are set correctly. We try our best to make sure incorrect parameters are caught by the Nextflow workflow before the workflow is run. This may be a bug inside the workflow, open an issue with us. |
65 | Data error | The input data was incorrect | Check your input files are in the correct format and not corrupted. If the error persists, open an issue with us. |
66 | No input | The input data is missing, not readable or empty | Check your files exist, can be read by your user and are not empty. |
68 | No host | The network host could not be found | If you are specifying a remote host (e.g. a kraken classification server), check the host is correct. |
70 | Software error | An unspecific software error | You have likely encountered an error state we hoped would not be possible. Open an issue with us. |
73 | Cannot create | An output file cannot be created | Check that you have permission to write to any defined output locations and that you have not run out of disk space. |
74 | I/O error | An error occurred while reading or writing a file | This could be temporary, try the workflow again. Check you can access your inputs and defined output locations. If this persists, open an issue with us. |
75 | Temporary failure | A task failed for a reason that is probably temporary | Wait a little while and retry the workflow. If this persists, open an issue with us. |
78 | Config error | A configuration loaded is incorrect or missing | Check your workflow parameters to ensure they are set correctly. We try our best to make sure incorrect parameters are caught by the Nextflow workflow before the workflow is run. This may be a bug inside the workflow, open an issue with us. |
Hopefully this post has enlightened you to recognise some common exit codes and help you determine problems that require a sanity check of your workflow parameters, a configuration change, or opening a bug report with us. If you have a Github account, you can open an issue on the relevant epi2me-labs workflow repository. Please use the “Bug report” issue type and provide answers to all the questions to help us help you! If you’re using the EPI2ME Desktop application, you can follow the instructions on the “Report issue” option on the errored workflow’s “Overview” tab to generate a bug report and send it to us.
This post continues our weekly How To series, if you have any feedback on these posts or our workflows, please get in touch!
Information