Conda is a great choice for installing scientific software, permitting users to manage multiple isolated and reproducible environments. It’s known as a Python package manager, but really it’s a general purpose system that is also highly portable.
To get up and running with Miniconda, the instructions from Bioconda are easy to follow:
# E.g. for linux curl -O <https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh> sh Miniconda3-latest-Linux-x86_64.sh # Installing packages is easy conda install python=3.8 jupyter -c conda-forge
Miniconda is great for general purpose use, e.g. in research, but when it comes to moving bioinformatics software into production, there are some extra considerations we have to make.
In order to test and distribute our workflows, we have CI (continuous integration) pipelines set-up to automatically build docker images each time a workflow is updated, which can be many times a day for the development version of a given project.
Hence, we need to think about the following:
This is where Mamba comes in, the fast drop-in replacement for conda, which reimplements the slow bits in in C++. Mamba is most akin to Miniconda, in that it comes with Python, but doesn’t ship with a whole load of extra software.
Here’s how to get started with Mamba:
# Again, assuming linux # If you already have conda conda install mamba -n base -c conda-forge # Or if you don't have anything, use miniforge's mambaforge wget -O Mambaforge.sh <https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$>(uname -m).sh bash Mambaforge.sh -b source mambaforge/bin/activate # Installing packages is similarly easy mamba install python=3.8 jupyter -c conda-forge
How much faster is Mamba? We took the environmental dependencies of the Medaka project from Bioconda, unpinned their versions (to make it harder for the package manager to solve the problem) and timed how long Miniconda and Mamba took to re-create the environment respectively:
m5.12xLarge (48cpus, ~200gigs RAM) mamba 1m9.490s conda 8m14.317s
As we can see, Mamba’s performance is significantly better when dealing with a complex environment like this. The results above aren’t close to being a formal benchmark, but are simply representative of the kinds of gains you can make with mamba out of the box.
However, our workflows usually specify a small to medium number of pinned dependencies. So, is there still a benefit? We re-created the environment specified by our wf-artic workflow on three sizes of fresh AWS ec2 instance using Mamba and Miniconda and measured the time taken.
t2.micro (1cpu, 1gig RAM) mamba 2m32.842s conda killed after 14 mins, didn't finish t2.large (1cpu, 8gigs RAM) mamba 2m4.234s conda 2m49.810s m5.12xLarge (48cpus, ~200gigs RAM) mamba 1m47.622s conda 2m15.770s
In general, even for this relatively small environment, Mamba was approximately half a minute quicker to set everything up. This may seem like a significant step down from the massive gains seen previously, but factor this out over a year’s worth of CI runs and one starts to see how this could be beneficial, especially if you’re paying for CI time.
In addition, on the smallest instance Miniconda failed to complete its task because it ran out of memory and the process was unceremoniously terminated. This may be important if the computers where your CI jobs are running are particularly weak.
Mamba is definitely faster than Miniconda, but unfortunately it is still quite a wedge to download. Because it depends on Conda, either way of installing it means you end up with both package managers on your system. The download size for the mambaforge package is ~100 MiB, which is a fair bit larger than Miniconda is.
For regular use, this isn’t a big deal, but when you’re trying to create lean docker images in a production environment without requiring lots of space-saving manual intervention, it helps to have as small a download as possible.
This is where Micromamba comes into play!
Micromamba is a standalone binary version of Mamba, i.e. with no dependency on Conda, and that doesn’t include a default Python version making it perfect for setting up fresh environments with as small a footprint as possible. Best of all? It’s only ~13 MiB to download.
This means you can save time and effort in building your images.
Here’s how to install Micromamba:
# Assuming linux wget -qO- <https://micromamba.snakepit.net/api/micromamba/linux-64/latest> | tar -xvj bin/micromamba ./bin/micromamba shell init -s bash -p ~/micromamba source ~/.bashrc # Installing packages is mostly similar micromamba activate micromamba install python=3.6 jupyter -c conda-forge
Here’s the catch: Micromamba is still experimental, and lacks some features from Conda. In general, use Mamba in day to day use, and Micromamba in contexts just like the one we’ve been discussing, i.e. building images in CI.
Mamba is a great drop-in replacement as your daily-driver scientific package manager. In some cases it will significantly speed up your workflow over using Miniconda. However, consider using Micromamba if space or minimalism matters.