You have almost certainly encountered a problem with some software in your day to day life at some point. The error message may have given you a cryptic number, or code to describe what went wrong. If documented and understood, such error codes could be used to diagnose and perhaps resolve a technical fault. For example, when your browser shows you a message to explain that a web page has not been found, it is reacting to a 404 code; which is defined by a well-established set of agreed upon codes for web requests.

We try our best to make our workflows as robust as possible, but encountering software errors is an inevitability. Your cluster could lose connectivity with your job scheduler, or you could run out of disk space, or the website we use to host our workflows is unreachable at just the wrong moment. Some of the user reports we get are the result of problems that can be fixed by a user in-the-know without having to contact us.

The aim of this blog post is to introduce you to error codes that may arise while running our workflows on the command line, or through our EPI2ME Desktop application, and what you need to do to resolve them. Much of what is introduced here is equally applicable to other command line software too.

How do I know the exit code of a workflow?

One of the first things that we will do when reading a user report for a problem with a workflow, is look for the exit status. To the uninitiated, these simple numbers seem like an unhelpful piece of information, but once you have finished reading this blog post, you too will be able to cut through lines of red text and diagnose some problems immediately. Bioinformatics is generally not well known for its user experience (and our team’s core goal is to change this), so you would be forgiven if you have not looked at an error produced by Nextflow too closely before — here is one now:

% nextflow run main.nf
N E X T F L O W  ~  version 23.04.4
Launching `main.nf` [nauseous_ramanujan] DSL2 - revision: 734d48c47d
executor >  local (3)
executor >  local (3)
[b1/4f1c6b] process > countOwls (1) [100%] 1 of 1, failed: 1
ERROR ~ Error executing process > 'countOwls (2)'

Caused by:
  Process `countOwls (2)` terminated with an error exit status (1)

Command executed:

  #!/usr/bin/env python3
  raise Exception("Not enough owls.")

Command exit status:
  1

Command output:
  (empty)

Command error:
  Traceback (most recent call last):
    File ".command.sh", line 2, in <module>
      raise Exception("Not enough owls.")
  Exception: Not enough owls.

Work dir:
  /Users/Sam.Nicholls/scratch/nf-testing/exit_codes/work/79/f9172717a1de210d6e6e17a45a0481

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details

Here’s what we know by reading each of the sections of this human readable output:

Caused by: This section tells you the Nextflow process that encountered the problem, here the error was triggered by the “countOwls” process.
Command executed: This section will print out the relevant command that triggered the problem. Here it was an inline Python script.
Command exit status: The error code returned by the process, which was 1.
Command output: Any lines of text written to the terminal by the process, in our case there weren’t any.
Command error: Similarly, any lines of error text written to the terminal by the process will be printed here. For this example we can see a Python traceback.
Work dir: Where to go and look on the file system to inspect any of the inputs and outputs for this process.

This is not much of a whodunnit, our Python script has raised an Exception. We’re going to have to get some more owls! In the case of a real problem, the command execution, output and error sections may have a lot more lines of information which may or may not be related to the problem. This is the very reason why the command exit status can be so useful.

If you’re using the EPI2ME Desktop application, this Nextflow output can be found on the “Logs” tab:

Nextflow error inside EPI2ME — Figure 1 - An exemplar Nextflow error message inside the EPI2ME desktop application.

What error codes might I see while running a workflow?

Now that you know how to read Nextflow errors and locate the command exit status, we can cover what these mysterious numbers mean. The tables below enumerate common (and not so common) command exit status codes, why they might manifest and what you could try to do about it.

Common exit codes

Despite their ubiquity, exit codes mostly rely on historical conventions and few are reserved for specific meaning. The most common exit codes are:

Exit status	Name	Potential causes	Possible next steps
0	OK (successful exit)	A successful process	Nextflow will error on a successful process if output files that should be present are not present. The process has failed to produce output that we always expect and our workflow has not accounted for a case where a file may optionally be generated. You should open an issue with us.
1	Failure	General catch-all for an application error	Open an issue with us.
2	Shell misuse	General catch-all for misuse of the shell commands	Open an issue with us.
125	Docker command error	The command to run a Docker container did not execute successfully.	Check the Docker engine is running and that you have permissions to start containers. See our install guide.
127	Command not found	The command line tool the workflow is trying to run cannot be found.	You may encounter this if you do not have Docker or Singularity installed, which is required to run the dependencies of our workflows. Check our install guide for how to install Docker. Otherwise, ensure that you have not provided a Nextflow configuration file that has changed the executor or container directives. You may have inadvertently caused the workflow to run outside of a container and the dependencies are not on your computer.
255	Out of range	The exit status was beyond the allowed range of 0-255	Check the error message for actionable information about your compute environment and contact your system administrator. Otherwise, open an issue with us.

Exit codes from clusters and containers

Exit codes beyond 128 have a special meaning and indicate that a process was sent a “signal” by the operating system. Many common signals are associated with a signal number and the signal can be determined by subtracting 128 from the exit code. These codes are often encountered when running workflows in compute clusters and containers and useful examples are included below.

Exit status	Name	Potential causes	Possible next steps
128 + 2 = 130	SIGINT	Process was interrupted	The process was sent the interrupt signal, you may have used Ctrl+C to quit the Nextflow workflow and Nextflow forwarded the signal to your processes. Some clusters will use this signal to stop processes that have exceeded their time or memory limit. You should speak to your system administrator to diagnose the reason that the job was terminated.
128 + 6 = 134	SIGABRT	Process aborted	The process has aborted as it cannot continue. Some clusters will use this signal to stop processes that have exceeded their time or memory limit. You should speak to your system administrator to diagnose the reason that the job was terminated. If you are not using a cluster, open an issue with us.
128 + 7 = 135	SIGBUS	Invalid memory address	You should speak to your system administrator to check the installation of prerequisites like Docker. If you are not using a cluster, open an issue with us.
128 + 9 = 137	SIGKILL	Process was killed	The process was forcibly killed. This may be your operating system invoking the out-of-memory (OOM) killer to stop a process from freezing your computer. Some clusters will send a kill signal to a job for exceeding its memory or time limit. You should speak to your system administrator to diagnose the reason that the job was terminated. See additional guidance below this table.
128 + 10 = 138	SIGUSR1	Process received user signal (1)	Typically observed when a your cluster is signalling that the job should wrap up because it is about to exceed a time or memory limit. You should speak to your system administrator to help you adjust your job resource requests. See additional guidance below this table.
128 + 11 = 139	SIGSEGV	Invalid memory access	The process tried to access some memory it should not have, open an issue with us.
128 + 12 = 140	SIGUSR2	Process received user signal (2)	Typically observed when your cluster is signalling that the job should wrap up because it is about to exceed a time or memory limit. You should speak to your system administrator to help you adjust your job resource requests. See additional guidance below this table.
128 + 15 = 143	SIGTERM	Process was terminated	Some clusters will use this signal to stop processes that have exceeded their time or memory limit. You should speak to your system administrator to diagnose the reason that the job was terminated.

Several of the above exit codes are encountered across different cluster schedulers when a job tries to use more memory than has been requested by the workflow configuration. We try to define sensible limits for common use cases of each workflow but we cannot get this right for every possible use case. In these cases, you will need to increase the memory limit applied to the process that encountered the problem. You can do this by using Nextflow process selectors to apply configuration to particular parts of our workflow.

For example, to increase the memory for a process called “countOwls”, create a file (for example, named increase_memory.config) with the following configuration (replacing XX with a number higher than the memory limit defined in the workflow’s configuration; i.e. the nextflow.config file):

process {
    withName:countOwls {
        memory = "XX GB"
    }
}

When invoking Nextflow with nextflow run, reference this extra configuration with -c increase_memory.config. If you are using the EPI2ME Desktop application, you can instead specify additional Nextflow config options such as the example provided here in the “Extra configuration” options section at the bottom of the launch window. The configuration you provide in that text box will be loaded into the workflow.

Sysexit codes

Introduced by the BSD operating system for its mail utility, the sysexits library attempted to standardise exit codes to indicate common problems when processing input arguments and files from users. Usage of sysexit codes does not appear to have caught on very widely but some well used programs such as the AWS CLI do use them.

We aim to use some of the sysexit codes in Python scripts that we include with our workflows. Although we use Nextflow to catch these codes and generate clear error messages to users, they are presented here in case you see one!

Exit status	Name	Potential causes	Possible next steps
64	Usage error	A command was used incorrectly	Check your workflow parameters to ensure they are set correctly. We try our best to make sure incorrect parameters are caught by the Nextflow workflow before the workflow is run. This may be a bug inside the workflow, open an issue with us.
65	Data error	The input data was incorrect	Check your input files are in the correct format and not corrupted. If the error persists, open an issue with us.
66	No input	The input data is missing, not readable or empty	Check your files exist, can be read by your user and are not empty.
68	No host	The network host could not be found	If you are specifying a remote host (e.g. a kraken classification server), check the host is correct.
70	Software error	An unspecific software error	You have likely encountered an error state we hoped would not be possible. Open an issue with us.
73	Cannot create	An output file cannot be created	Check that you have permission to write to any defined output locations and that you have not run out of disk space.
74	I/O error	An error occurred while reading or writing a file	This could be temporary, try the workflow again. Check you can access your inputs and defined output locations. If this persists, open an issue with us.
75	Temporary failure	A task failed for a reason that is probably temporary	Wait a little while and retry the workflow. If this persists, open an issue with us.
78	Config error	A configuration loaded is incorrect or missing	Check your workflow parameters to ensure they are set correctly. We try our best to make sure incorrect parameters are caught by the Nextflow workflow before the workflow is run. This may be a bug inside the workflow, open an issue with us.

Conclusion

Hopefully this post has enlightened you to recognise some common exit codes and help you determine problems that require a sanity check of your workflow parameters, a configuration change, or opening a bug report with us. If you have a Github account, you can open an issue on the relevant epi2me-labs workflow repository. Please use the “Bug report” issue type and provide answers to all the questions to help us help you! If you’re using the EPI2ME Desktop application, you can follow the instructions on the “Report issue” option on the errored workflow’s “Overview” tab to generate a bug report and send it to us.

This post continues our weekly How To series, if you have any feedback on these posts or our workflows, please get in touch!