Initialize a new project database
Usage
new_db(
db_path = "./.sqlite",
mapping_fn = NULL,
mapping_id = "ID",
mapping_taxon = "Taxon",
genetic_code = 2,
assemble_cpus = 6,
assemble_memory = 24,
assembler = "GetOrganelle",
seeds_db =
"https://raw.githubusercontent.com/smithsonian/MitoPilot/main/ref_dbs/getOrganelle/seeds/fish_mito_seeds.fasta",
labels_db =
"https://raw.githubusercontent.com/smithsonian/MitoPilot/main/ref_dbs/getOrganelle/labels/fish_mito_labels.fasta",
getOrganelle = paste("-F 'anonym'", "-R 10 -k '21,45,65,85,105,115'",
"--larger-auto-ws", "--expected-max-size 20000", "--target-genome-size 16500"),
mitofinder_db =
"https://raw.githubusercontent.com/Smithsonian/MitoPilot/refs/heads/main/ref_dbs/MitoFinder/NC_002333_Danio_rerio.gb",
mitofinder = paste("--megahit"),
annotate_cpus = 6,
annotate_memory = 36,
annotate_ref_db = "Chordata",
annotate_ref_dir = "/ref_dbs/Mitos2",
mitos_opts = "--intron 0 --oril 0",
trnaScan_opts = "-M vert",
curate_cpus = 4,
curate_memory = 8,
curate_target = "fish_mito",
max_blast_hits = 100,
curate_params = NULL
)
Arguments
- db_path
Path to the new database file
- mapping_fn
Path to the mapping CSV file. Must contain columns "ID", "Taxon, "R1", and "R2"
- mapping_id
Column name of the mapping file to use as the primary key
- mapping_taxon
Column name of the mapping file containing a Taxonomic identifier (eg, species name)
- genetic_code
Translation table for your organisms. See NCBI website for more info https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi
- assemble_cpus
Default # cpus for assembly
- assemble_memory
default memory (GB) for assembly
- assembler
Assembler, choice of "GetOrgnalle" (default) or "MitoFinder"
- seeds_db
Path to the gotOrganelle seeds database, can be a URL, cannot have same file name as labels_db. Default is a fish database built from RefSeq. https://raw.githubusercontent.com/smithsonian/MitoPilot/main/ref_dbs/getOrganelle/seeds/fish_mito_seeds.fasta
- labels_db
Path to the gotOrganelle labels database, can be a URL, cannot have same file name as seeds_db. Default is a fish database built from RefSeq. https://raw.githubusercontent.com/smithsonian/MitoPilot/main/ref_dbs/getOrganelle/seeds/fish_mito_labels.fasta
- getOrganelle
Default getOrganelle command line options
- mitofinder_db
Path to MitoFinder reference db, must be GenBank format (.gb), can be a URL. Default is the Danio rerio mitogenome (https://raw.githubusercontent.com/Smithsonian/MitoPilot/refs/heads/main/ref_dbs/MitoFinder/NC_002333_Danio_rerio.gb)
- mitofinder
Default MitoFinder command line options
- annotate_cpus
Default # cpus for annotation
- annotate_memory
Default memory (GB) for annotation
- annotate_ref_db
Default Mitos2 reference database
- annotate_ref_dir
Default Mitos2 reference database directory
- mitos_opts
Default MITOS2 command line options
- trnaScan_opts
Default tRNAscan-SE command line options
- curate_cpus
Default # cpus for curation
- curate_memory
Default memory (GB) for curation
- curate_target
Default target database for curation
- max_blast_hits
Maximum number of top BLAST hits to retain (default = 100)
- curate_params
Default curation parameters