How to interpret exit codes

By Sam Nicholls
Published in How Tos
October 06, 2023
4 min read
How to interpret exit codes

You have almost certainly encountered a problem with some software in your day to day life at some point. The error message may have given you a cryptic number, or code to describe what went wrong. If documented and understood, such error codes could be used to diagnose and perhaps resolve a technical fault. For example, when your browser shows you a message to explain that a web page has not been found, it is reacting to a 404 code; which is defined by a well-established set of agreed upon codes for web requests.

We try our best to make our workflows as robust as possible, but encountering software errors is an inevitability. Your cluster could lose connectivity with your job scheduler, or you could run out of disk space, or the website we use to host our workflows is unreachable at just the wrong moment. Some of the user reports we get are the result of problems that can be fixed by a user in-the-know without having to contact us.

The aim of this blog post is to introduce you to error codes that may arise while running our workflows on the command line, or through our EPI2ME Desktop application, and what you need to do to resolve them. Much of what is introduced here is equally applicable to other command line software too.

How do I know the exit code of a workflow?

One of the first things that we will do when reading a user report for a problem with a workflow, is look for the exit status. To the uninitiated, these simple numbers seem like an unhelpful piece of information, but once you have finished reading this blog post, you too will be able to cut through lines of red text and diagnose some problems immediately. Bioinformatics is generally not well known for its user experience (and our team’s core goal is to change this), so you would be forgiven if you have not looked at an error produced by Nextflow too closely before — here is one now:

% nextflow run main.nf
N E X T F L O W ~ version 23.04.4
Launching `main.nf` [nauseous_ramanujan] DSL2 - revision: 734d48c47d
executor > local (3)
executor > local (3)
[b1/4f1c6b] process > countOwls (1) [100%] 1 of 1, failed: 1
ERROR ~ Error executing process > 'countOwls (2)'
Caused by:
Process `countOwls (2)` terminated with an error exit status (1)
Command executed:
#!/usr/bin/env python3
raise Exception("Not enough owls.")
Command exit status:
1
Command output:
(empty)
Command error:
Traceback (most recent call last):
File ".command.sh", line 2, in <module>
raise Exception("Not enough owls.")
Exception: Not enough owls.
Work dir:
/Users/Sam.Nicholls/scratch/nf-testing/exit_codes/work/79/f9172717a1de210d6e6e17a45a0481
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
-- Check '.nextflow.log' file for details

Here’s what we know by reading each of the sections of this human readable output:

  • Caused by: This section tells you the Nextflow process that encountered the problem, here the error was triggered by the “countOwls” process.
  • Command executed: This section will print out the relevant command that triggered the problem. Here it was an inline Python script.
  • Command exit status: The error code returned by the process, which was 1.
  • Command output: Any lines of text written to the terminal by the process, in our case there weren’t any.
  • Command error: Similarly, any lines of error text written to the terminal by the process will be printed here. For this example we can see a Python traceback.
  • Work dir: Where to go and look on the file system to inspect any of the inputs and outputs for this process.

This is not much of a whodunnit, our Python script has raised an Exception. We’re going to have to get some more owls! In the case of a real problem, the command execution, output and error sections may have a lot more lines of information which may or may not be related to the problem. This is the very reason why the command exit status can be so useful.

If you’re using the EPI2ME Desktop application, this Nextflow output can be found on the “Logs” tab:

Nextflow error inside EPI2ME
Figure 1 - An exemplar Nextflow error message inside the EPI2ME desktop application.

What error codes might I see while running a workflow?

Now that you know how to read Nextflow errors and locate the command exit status, we can cover what these mysterious numbers mean. The tables below enumerate common (and not so common) command exit status codes, why they might manifest and what you could try to do about it.

Common exit codes

Despite their ubiquity, exit codes mostly rely on historical conventions and few are reserved for specific meaning. The most common exit codes are:

Exit statusNamePotential causesPossible next steps
0OK (successful exit)A successful processNextflow will error on a successful process if output files that should be present are not present. The process has failed to produce output that we always expect and our workflow has not accounted for a case where a file may optionally be generated. You should open an issue with us.
1FailureGeneral catch-all for an application errorOpen an issue with us.
2Shell misuseGeneral catch-all for misuse of the shell commandsOpen an issue with us.
125Docker command errorThe command to run a Docker container did not execute successfully.Check the Docker engine is running and that you have permissions to start containers. See our install guide.
127Command not foundThe command line tool the workflow is trying to run cannot be found.You may encounter this if you do not have Docker or Singularity installed, which is required to run the dependencies of our workflows. Check our install guide for how to install Docker. Otherwise, ensure that you have not provided a Nextflow configuration file that has changed the executor or container directives. You may have inadvertently caused the workflow to run outside of a container and the dependencies are not on your computer.
255Out of rangeThe exit status was beyond the allowed range of 0-255Check the error message for actionable information about your compute environment and contact your system administrator. Otherwise, open an issue with us.

Exit codes from clusters and containers

Exit codes beyond 128 have a special meaning and indicate that a process was sent a “signal” by the operating system. Many common signals are associated with a signal number and the signal can be determined by subtracting 128 from the exit code. These codes are often encountered when running workflows in compute clusters and containers and useful examples are included below.

Exit statusNamePotential causesPossible next steps
128 + 2 = 130SIGINTProcess was interruptedThe process was sent the interrupt signal, you may have used Ctrl+C to quit the Nextflow workflow and Nextflow forwarded the signal to your processes. Some clusters will use this signal to stop processes that have exceeded their time or memory limit. You should speak to your system administrator to diagnose the reason that the job was terminated.
128 + 6 = 134SIGABRTProcess abortedThe process has aborted as it cannot continue. Some clusters will use this signal to stop processes that have exceeded their time or memory limit. You should speak to your system administrator to diagnose the reason that the job was terminated. If you are not using a cluster, open an issue with us.
128 + 7 = 135SIGBUSInvalid memory addressYou should speak to your system administrator to check the installation of prerequisites like Docker. If you are not using a cluster, open an issue with us.
128 + 9 = 137SIGKILLProcess was killedThe process was forcibly killed. This may be your operating system invoking the out-of-memory (OOM) killer to stop a process from freezing your computer. Some clusters will send a kill signal to a job for exceeding its memory or time limit. You should speak to your system administrator to diagnose the reason that the job was terminated. See additional guidance below this table.
128 + 10 = 138SIGUSR1Process received user signal (1)Typically observed when a your cluster is signalling that the job should wrap up because it is about to exceed a time or memory limit. You should speak to your system administrator to help you adjust your job resource requests. See additional guidance below this table.
128 + 11 = 139SIGSEGVInvalid memory accessThe process tried to access some memory it should not have, open an issue with us.
128 + 12 = 140SIGUSR2Process received user signal (2)Typically observed when your cluster is signalling that the job should wrap up because it is about to exceed a time or memory limit. You should speak to your system administrator to help you adjust your job resource requests. See additional guidance below this table.
128 + 15 = 143SIGTERMProcess was terminatedSome clusters will use this signal to stop processes that have exceeded their time or memory limit. You should speak to your system administrator to diagnose the reason that the job was terminated.

Several of the above exit codes are encountered across different cluster schedulers when a job tries to use more memory than has been requested by the workflow configuration. We try to define sensible limits for common use cases of each workflow but we cannot get this right for every possible use case. In these cases, you will need to increase the memory limit applied to the process that encountered the problem. You can do this by using Nextflow process selectors to apply configuration to particular parts of our workflow.

For example, to increase the memory for a process called “countOwls”, create a file (for example, named increase_memory.config) with the following configuration (replacing XX with a number higher than the memory limit defined in the workflow’s configuration; i.e. the nextflow.config file):

process {
withName:countOwls {
memory = "XX GB"
}
}

When invoking Nextflow with nextflow run, reference this extra configuration with -c increase_memory.config. If you are using the EPI2ME Desktop application, you can instead specify additional Nextflow config options such as the example provided here in the “Extra configuration” options section at the bottom of the launch window. The configuration you provide in that text box will be loaded into the workflow.

Sysexit codes

Introduced by the BSD operating system for its mail utility, the sysexits library attempted to standardise exit codes to indicate common problems when processing input arguments and files from users. Usage of sysexit codes does not appear to have caught on very widely but some well used programs such as the AWS CLI do use them.

We aim to use some of the sysexit codes in Python scripts that we include with our workflows. Although we use Nextflow to catch these codes and generate clear error messages to users, they are presented here in case you see one!

Exit statusNamePotential causesPossible next steps
64Usage errorA command was used incorrectlyCheck your workflow parameters to ensure they are set correctly. We try our best to make sure incorrect parameters are caught by the Nextflow workflow before the workflow is run. This may be a bug inside the workflow, open an issue with us.
65Data errorThe input data was incorrectCheck your input files are in the correct format and not corrupted. If the error persists, open an issue with us.
66No inputThe input data is missing, not readable or emptyCheck your files exist, can be read by your user and are not empty.
68No hostThe network host could not be foundIf you are specifying a remote host (e.g. a kraken classification server), check the host is correct.
70Software errorAn unspecific software errorYou have likely encountered an error state we hoped would not be possible. Open an issue with us.
73Cannot createAn output file cannot be createdCheck that you have permission to write to any defined output locations and that you have not run out of disk space.
74I/O errorAn error occurred while reading or writing a fileThis could be temporary, try the workflow again. Check you can access your inputs and defined output locations. If this persists, open an issue with us.
75Temporary failureA task failed for a reason that is probably temporaryWait a little while and retry the workflow. If this persists, open an issue with us.
78Config errorA configuration loaded is incorrect or missingCheck your workflow parameters to ensure they are set correctly. We try our best to make sure incorrect parameters are caught by the Nextflow workflow before the workflow is run. This may be a bug inside the workflow, open an issue with us.

Conclusion

Hopefully this post has enlightened you to recognise some common exit codes and help you determine problems that require a sanity check of your workflow parameters, a configuration change, or opening a bug report with us. If you have a Github account, you can open an issue on the relevant epi2me-labs workflow repository. Please use the “Bug report” issue type and provide answers to all the questions to help us help you! If you’re using the EPI2ME Desktop application, you can follow the instructions on the “Report issue” option on the errored workflow’s “Overview” tab to generate a bug report and send it to us.

This post continues our weekly How To series, if you have any feedback on these posts or our workflows, please get in touch!


Tags

#workflows#troubleshooting

Share

Sam Nicholls

Sam Nicholls

Workflow wrangler

Table Of Contents

1
How do I know the exit code of a workflow?
2
What error codes might I see while running a workflow?
3
Conclusion

Related Posts

Unexpected results, so now what?
July 02, 2024
3 min

Quick Links

TutorialsWorkflowsOpen DataContact

Social Media

© 2020 - 2024 Oxford Nanopore Technologies plc. All rights reserved. Registered Office: Gosling Building, Edmund Halley Road, Oxford Science Park, OX4 4DQ, UK | Registered No. 05386273 | VAT No 336942382. Oxford Nanopore Technologies, the Wheel icon, EPI2ME, Flongle, GridION, Metrichor, MinION, MinIT, MinKNOW, Plongle, PromethION, SmidgION, Ubik and VolTRAX are registered trademarks of Oxford Nanopore Technologies plc in various countries. Oxford Nanopore Technologies products are not intended for use for health assessment or to diagnose, treat, mitigate, cure, or prevent any disease or condition.