Command Line Interface

This page shows the MicroHapulator command line interface: how inputs and settings are specified for each subcommand.

NOTE: The MicroHapulator CLI is under Semantic Versioning. In brief, this means that every stable version of the MicroHapulator software is assigned a version number, and that any changes to the software's behavior or interface require the software version number to be updated in prescribed and predictable ways.


Haplotype calling

mhpl8r type

Perform haplotype calling

usage: mhpl8r type [-h] [-o FILE] [-b B] [-m M] tsv bam

Positional Arguments

tsv

path of a TSV file containing marker metadata, specifically the offset of each SNP for every marker in the panel

bam

path of a BAM file containing NGS reads aligned to marker reference sequences and sorted

Named Arguments

-o, --out

write output to FILE; by default, output is written to the terminal (standard output)

-b, --base-qual

minimum base quality (PHRED score) to be considered reliable for haplotype calling; by default B=10, corresponding to Q10, i.e., 90% probability that the base call is correct

-m, --max-depth

maximum permitted read depth; by default M=1000000

mhpl8r filter

Apply static and/or dynamic thresholds to distinguish true and false haplotypes. Thresholds are applied to the haplotype read counts of a raw typing result. Static integer thresholds are commonly used as detection thresholds, below which any haplotype count is considered noise. Dynamic thresholds are commonly used as analytical thresholds and represent a percentage of the total read count at the marker, after any haplotypes failing a static threshold are discarded.

usage: mhpl8r filter [-h] [-o FILE] [-s ST] [-d DT] [-c FILE] result

Positional Arguments

result

MicroHapulator typing result in JSON format

Named Arguments

-o, --out

write output to FILE; by default, output is written to the terminal (standard output)

-s, --static

global fixed read count threshold

-d, --dynamic

global percentage of total read count; e.g. use --dynamic=0.02 to apply a 2% analytical threshold

-c, --config

CSV file specifying marker-specific thresholds to override global thresholds; three required columns: 'Marker' for the marker name; 'Static' and 'Dynamic' for marker-specific thresholds

Analysis and interpretation

mhpl8r balance

Compute interlocus balance

usage: mhpl8r balance [-h] [-c FILE] [-D] input

Positional Arguments

input

a typing result including haplotype counts in JSON format

Named Arguments

-c, --csv

write read counts to FILE in CSV format

-D, --no-discarded

do not included mapping but discarded reads in read counts; by default, reads that are mapped to the marker but discarded because they do not span all variants at the marker are included

mhpl8r contrib

Estimate the minimum number of DNA contributors to a suspected mixture

usage: mhpl8r contrib [-h] [-o FILE] result

Positional Arguments

result

typing result in JSON format

Named Arguments

-o, --out

write output to FILE; by default, output is written to the terminal (standard output)

mhpl8r prob

Compute a profile random match probability (RMP) or an RMP-based likelihood ratio (LR) test

usage: mhpl8r prob [-h] [-e ε] [-o FILE] freq profile1 [profile2]

Positional Arguments

freq

population haplotype frequencies in tabular (TSV) format

profile1

typing result or simulated genotype in JSON format

profile2

typing result or simulated genotype in JSON format; optional

Named Arguments

-e, --erate

rate of genotyping error; by default ε=0.001

-o, --out

write output to FILE; by default, output is written to the terminal (standard output)

mhpl8r diff

Compare two profiles and determine the markers at which their genotypes differ

usage: mhpl8r diff [-h] [-o FILE] profile1 profile2

Positional Arguments

profile1

typing result or simulated profile in JSON format

profile2

typing result or simulated profile in JSON format

Named Arguments

-o, --out

write output to "FILE"; by default, output is written to the terminal (standard output)

mhpl8r dist

Compute a simple Hamming distance between two profiles

usage: mhpl8r dist [-h] [-o FILE] profile1 profile2

Positional Arguments

profile1

typing result or simulated profile in JSON format

profile2

typing result or simulated profile in JSON format

Named Arguments

-o, --out

write output to "FILE"; by default, output is written to the terminal (standard output)

mhpl8r contain

Perform a simple containment test

usage: mhpl8r contain [-h] [-o FILE] profile1 profile2

Positional Arguments

profile1

simulated or inferred genotype profile in JSON format

profile2

simulated or inferred genotype profile in JSON format

Named Arguments

-o, --out

write output to "FILE"; by default, output is written to the terminal (standard output)

mhpl8r convert

Convert a typing result to a format compatible with probabilistic genotyping software applications

usage: mhpl8r convert [-h] [-o FILE] [--no-counts] [-f] result sample

Positional Arguments

result

filtered MicroHapulator typing result in JSON format

sample

sample name

Named Arguments

-o, --out

write output to 'FILE'; by default, output is written to the terminal (standard output)

--no-counts

do not include haplotype counts if you are interpreting your data with a semi-continuous probgen model such as LRMix Studio; by default, haplotype counts are included for interpretation with fully continuous probgen model such as EuroForMix

-f, --fix-homo

duplicate a homozygous haplotype so that it is reported twice

Simulation

mhpl8r sim

Simulate a diploid genotype from the specified microhaplotype frequencies

usage: mhpl8r sim [-h] [-s INT] [-o FILE] [--haplo-seq FILE]
                  [--sequences FILE] [--markers FILE]
                  freq

Positional Arguments

freq

population microhaplotype frequencies in tabular (tab separated) format

Named Arguments

-s, --seed

seed for random number generator

-o, --out

write simulated profile data in JSON format to FILE

--haplo-seq

write simulated haplotype sequences in FASTA format to FILE

--sequences

microhaplotype sequences in FASTA format; required if --haplo-seq enabled, ignored if not

--markers

microhaplotype marker definitions in tabular (tab separated) format; required if --haplo-seq enabled, ignored if not

mhpl8r mix

Combine simulated profiles into a mock DNA mixture

usage: mhpl8r mix [-h] [-o FILE] profiles [profiles ...]

Positional Arguments

profiles

simulated genotype profiles in JSON format

Named Arguments

-o, --out

write output to "FILE"; by default, output is written to the terminal (standard output)

mhpl8r unite

Simulate the creation of a new profile from a mother and father

usage: mhpl8r unite [-h] [-o FILE] [-s INT] mom dad

Positional Arguments

mom

simulated or inferred genotype in JSON format

dad

simulated or inferred genotype in JSON format

Named Arguments

-o, --out

write output to "FILE"; by default, output is written to the terminal (standard output)

-s, --seed

seed for random number generator

mhpl8r seq

Simulate paired-end Illumina MiSeq sequencing of the given profile(s)

usage: mhpl8r seq [-h] [-o OUT [OUT ...]] [-n N] [-p P [P ...]]
                  [-s INT [INT ...]]
                  tsv refrseqs profiles [profiles ...]

Positional Arguments

tsv

microhaplotype marker definitions in tabular (TSV) format

refrseqs

microhaplotype reference sequences in FASTA format

profiles

one or more simple or complex profiles (JSON files)

Named Arguments

-o, --out

write simulated paired-end MiSeq reads in FASTQ format to the specified file(s); if one filename is provided, reads are interleaved and written to the file; if two filenames are provided, reads are written to paired files; by default, reads are interleaved and written to the terminal (standard output)

-n, --num-reads

number of reads to simulate; default is 500000

-p, --proportions

simulated mixture samples with multiple contributors at the specified proportions; by default even proportions are used

-s, --seeds

seeds for random number generator, 1 per profile