Conda or Mamba for production?

By Tom Rich
Published in Articles
June 18, 2021
3 min read

Conda is a great choice for installing scientific software, permitting users to manage multiple isolated and reproducible environments. It’s known as a Python package manager, but really it’s a general purpose system that is also highly portable.

  • Conda is the package manager.
  • Anaconda includes Conda and is the scientific distribution that comes with many packages pre-installed alongside Python.
  • Miniconda installs Conda and Python, but it doesn’t include all the extra scientific packages. This makes it ideal for quickly getting a new environment up and running.

To get up and running with Miniconda, the instructions from Bioconda are easy to follow:

# E.g. for linux
curl -O <https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh>
sh Miniconda3-latest-Linux-x86_64.sh
# Installing packages is easy
conda install python=3.8 jupyter -c conda-forge

Workflows in production

Miniconda is great for general purpose use, e.g. in research, but when it comes to moving bioinformatics software into production, there are some extra considerations we have to make.

In order to test and distribute our workflows, we have CI (continuous integration) pipelines set-up to automatically build docker images each time a workflow is updated, which can be many times a day for the development version of a given project.

Hence, we need to think about the following:

  • How long does a CI job take? Conda has a reputation for taking its time when dealing with complex sets of dependencies and we owe it to ourselves to make sure that CI jobs don’t take longer than they need to.
  • How big will the docker image be? We want to make sure that our images are as small as possible to make it quicker to download for our users. Whilst Miniconda is small as compared with full-fat anaconda, the latest Miniconda3 Linux 64-bit Python 3.9 download size is 58.6 MiB, could this be better?

This is where Mamba comes in, the fast drop-in replacement for conda, which reimplements the slow bits in in C++. Mamba is most akin to Miniconda, in that it comes with Python, but doesn’t ship with a whole load of extra software.

Here’s how to get started with Mamba:

# Again, assuming linux
# If you already have conda
conda install mamba -n base -c conda-forge
# Or if you don't have anything, use miniforge's mambaforge
wget -O Mambaforge.sh <https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$>(uname -m).sh
bash Mambaforge.sh -b
source mambaforge/bin/activate
# Installing packages is similarly easy
mamba install python=3.8 jupyter -c conda-forge

Speed

How much faster is Mamba? We took the environmental dependencies of the Medaka project from Bioconda, unpinned their versions (to make it harder for the package manager to solve the problem) and timed how long Miniconda and Mamba took to re-create the environment respectively:

m5.12xLarge (48cpus, ~200gigs RAM)
mamba 1m9.490s
conda 8m14.317s

As we can see, Mamba’s performance is significantly better when dealing with a complex environment like this. The results above aren’t close to being a formal benchmark, but are simply representative of the kinds of gains you can make with mamba out of the box.

However, our workflows usually specify a small to medium number of pinned dependencies. So, is there still a benefit? We re-created the environment specified by our wf-artic workflow on three sizes of fresh AWS ec2 instance using Mamba and Miniconda and measured the time taken.

t2.micro (1cpu, 1gig RAM)
mamba 2m32.842s
conda killed after 14 mins, didn't finish
t2.large (1cpu, 8gigs RAM)
mamba 2m4.234s
conda 2m49.810s
m5.12xLarge (48cpus, ~200gigs RAM)
mamba 1m47.622s
conda 2m15.770s

In general, even for this relatively small environment, Mamba was approximately half a minute quicker to set everything up. This may seem like a significant step down from the massive gains seen previously, but factor this out over a year’s worth of CI runs and one starts to see how this could be beneficial, especially if you’re paying for CI time.

In addition, on the smallest instance Miniconda failed to complete its task because it ran out of memory and the process was unceremoniously terminated. This may be important if the computers where your CI jobs are running are particularly weak.

Size

Mamba is definitely faster than Miniconda, but unfortunately it is still quite a wedge to download. Because it depends on Conda, either way of installing it means you end up with both package managers on your system. The download size for the mambaforge package is ~100 MiB, which is a fair bit larger than Miniconda is.

For regular use, this isn’t a big deal, but when you’re trying to create lean docker images in a production environment without requiring lots of space-saving manual intervention, it helps to have as small a download as possible.

This is where Micromamba comes into play!

Micromamba is a standalone binary version of Mamba, i.e. with no dependency on Conda, and that doesn’t include a default Python version making it perfect for setting up fresh environments with as small a footprint as possible. Best of all? It’s only ~13 MiB to download.

This means you can save time and effort in building your images.

Here’s how to install Micromamba:

# Assuming linux
wget -qO- <https://micromamba.snakepit.net/api/micromamba/linux-64/latest> | tar -xvj bin/micromamba
./bin/micromamba shell init -s bash -p ~/micromamba
source ~/.bashrc
# Installing packages is mostly similar
micromamba activate
micromamba install python=3.6 jupyter -c conda-forge

Here’s the catch: Micromamba is still experimental, and lacks some features from Conda. In general, use Mamba in day to day use, and Micromamba in contexts just like the one we’ve been discussing, i.e. building images in CI.

Conclusions

Mamba is a great drop-in replacement as your daily-driver scientific package manager. In some cases it will significantly speed up your workflow over using Miniconda. However, consider using Micromamba if space or minimalism matters.


Tags

#workflows#mamba#conda

Share

Tom Rich

Tom Rich

Bioinformatician

Table Of Contents

1
Workflows in production
2
Conclusions

Related Posts

IGV for EPI2ME workflows
June 10, 2024
1 min

Quick Links

TutorialsWorkflowsOpen DataContact

Social Media

© 2020 - 2024 Oxford Nanopore Technologies plc. All rights reserved. Registered Office: Gosling Building, Edmund Halley Road, Oxford Science Park, OX4 4DQ, UK | Registered No. 05386273 | VAT No 336942382. Oxford Nanopore Technologies, the Wheel icon, EPI2ME, Flongle, GridION, Metrichor, MinION, MinIT, MinKNOW, Plongle, PromethION, SmidgION, Ubik and VolTRAX are registered trademarks of Oxford Nanopore Technologies plc in various countries. Oxford Nanopore Technologies products are not intended for use for health assessment or to diagnose, treat, mitigate, cure, or prevent any disease or condition.