Running MitoPilot on your own HPC cluster
MitoPilot ships built-in executor templates for local,
awsbatch, the Smithsonian Hydra cluster
(NMNH_Hydra), and the NOAA SEDNA cluster
(NOAA_SEDNA). For any other cluster, MitoPilot includes
generic templates for the four most common schedulers,
plus a helper function to build a Nextflow config for your cluster once
and reuse it for every project.
| Scheduler | scheduler = |
Nextflow executor |
|---|---|---|
| SLURM | "slurm" |
slurm |
| SGE / UGE | "sge" |
sge |
| PBS Pro / OpenPBS / Torque | "pbs" |
pbspro (edit to pbs for
Torque/OpenPBS) |
| IBM Spectrum LSF | "lsf" |
lsf |
Configure MitoPilot for your cluster
generate_config() builds a Nextflow config from a
generic template, fills in your cluster-specific settings, and saves it
as a named profile in your persistent MitoPilot config
directory (see mitopilot_config_dir()).
library(MitoPilot)
generate_config(
name = "my_cluster", # what you'll pass to new_project(executor = ...)
scheduler = "slurm",
queue = "general", # partition / queue (omit to use the cluster default)
account = "my_allocation", # folded into clusterOptions (SLURM --account, PBS -A, LSF -P, SGE -P)
cluster_options = "--qos=long", # any extra raw scheduler directives (optional)
container_engine = "apptainer", # "singularity", "apptainer", or "docker"
container_cache = "/scratch/$USER/sif" # optional cacheDir for singularity/apptainer
)This writes config.my_cluster to your config directory.
Prefer to be prompted interactively? Pass
interactive = TRUE and leave the cluster values unset.
Reuse your cluster config for every project
Once saved, the profile name behaves like any built-in executor:
new_project(
path = "path/to/project",
mapping_fn = "path/to/mapping.csv",
data_path = "path/to/raw_data",
executor = "my_cluster"
)new_project() resolves the saved profile and fills in
the per-project values (raw data directory, container image, genetic
code, etc.), leaving a ready-to-run .config in the project
directory.
List everything available (built-in templates plus your saved profiles):
Per-process resources
The generated config sets defaults for CPUs and memory that work for
most datasets. Per-step resource requests (coverage,
blast_ref_align, blast_gb) live in the
params { } block of the .config file and can
be edited by hand. Memory and CPU for all other steps in the workflow
are pulled dynamically from the project database and can be set directly
in the MitoPilot app.
Validate before running
If nextflow is on your PATH, you can
sanity-check the generated config:
cd path/to/project
cp .config nextflow.config
nextflow config . # prints the resolved config, or a parse error
rm nextflow.configAfter reviewing the .config file, launch the pipeline
from the MitoPilot app or submit it as a batch job using your cluster’s
normal submission tools (e.g. sbatch, qsub,
bsub) wrapping the nextflow run command. See
the NMNH Hydra and NOAA SEDNA vignettes for batch submission
examples you can adapt.
