Command Line Interface

This page shows the MicroHapulator command line interface: how inputs and settings are specified for each subcommand.

NOTE: The MicroHapulator CLI is under Semantic Versioning. In brief, this means that every stable version of the MicroHapulator software is assigned a version number, and that any changes to the software's behavior or interface require the software version number to be updated in prescribed and predictable ways.

End-to-end analysis workflow

mhpl8r pipe

Perform a complete end-to-end microhap analysis pipeline

usage: mhpl8r pipe [-h] [-w D] [-n] [-t T] [-s ST] [-d DT] [-a AT] [-l LT]
                   [-c CSV] [--single] [--copy-input] [--hspace HS]
                   markerrefr markerdefn seqpath samples [samples ...]

Positional Arguments


path to a FASTA file containing marker reference sequences


path to a TSV file containing marker definitions


path to a directory containing FASTQ files


list of sample names or path to .txt file containing sample names

Named Arguments

-w, --workdir

pipeline working directory; default is current directory

-n, --dryrun

do not execute the workflow, but display what would have been done

-t, --threads

process each batch using T threads; by default, one thread per available core is used

-s, --static

global fixed read count threshold; ST=5 by default

-d, --dynamic

global percentage of total read count threshold; e.g. use --dynamic=0.02 to apply a 2% analytical threshold; DT=0.02 by default

-a, --ambiguous-thresh

filter out reads with more than AT percent of ambiguous characters ('N'); AT=0.2 by default

-l, --length-thresh

filter out reads that are less than LT bp long; LT=50 by default

-c, --config

CSV file specifying marker-specific thresholds to override global thresholds; three required columns: 'Marker' for the marker name; 'Static' and 'Dynamic' for marker-specific thresholds


accept single-end reads only; by default, only paired-end reads are accepted


copy input files to working directory; by default, input files are symlinked


horizontal spacing between samples in the read distribution length ridge plots; negative value for this parameter enables overlapping plots; HS=-0.7 by default

Haplotype calling

mhpl8r type

Perform haplotype calling

usage: mhpl8r type [-h] [-o FILE] [-b B] [-m M] tsv bam

Positional Arguments


path of a TSV file containing marker metadata, specifically the offset of each SNP for every marker in the panel


path of a BAM file containing NGS reads aligned to marker reference sequences and sorted

Named Arguments

-o, --out

write output to FILE; by default, output is written to the terminal (standard output)

-b, --base-qual

minimum base quality (PHRED score) to be considered reliable for haplotype calling; by default B=10, corresponding to Q10, i.e., 90% probability that the base call is correct

-m, --max-depth

maximum permitted read depth; by default M=1000000

mhpl8r filter

Apply static and/or dynamic thresholds to distinguish true and false haplotypes. Thresholds are applied to the haplotype read counts of a raw typing result. Static integer thresholds are commonly used as detection thresholds, below which any haplotype count is considered noise. Dynamic thresholds are commonly used as analytical thresholds and represent a percentage of the total read count at the marker, after any haplotypes failing a static threshold are discarded.

usage: mhpl8r filter [-h] [-o FILE] [-s ST] [-d DT] [-c CSV] result

Positional Arguments


MicroHapulator typing result in JSON format

Named Arguments

-o, --out

write output to FILE; by default, output is written to the terminal (standard output)

-s, --static

global fixed read count threshold

-d, --dynamic

global percentage of total read count threshold; e.g. use --dynamic=0.02 to apply a 2% analytical threshold

-c, --config

CSV file specifying marker-specific thresholds to override global thresholds; three required columns: 'Marker' for the marker name; 'Static' and 'Dynamic' for marker-specific thresholds

Analysis, QA/QC, and interpretation

mhpl8r locbalance

Plot interlocus balance in the terminal and/or a high-resolution graphic. Also normalize read counts and perform a chi-square goodness-of-fit test assuming uniform read coverage across markers. The reported chi-square statistic measures the extent of imbalance, and can be compared among samples sequenced using the same panel: the minimum value of 0 represents perfectly uniform coverage, while the maximum value of D occurs when all reads map to a single marker (D represents the degrees of freedom, or the number of markers minus 1).

usage: mhpl8r locbalance [-h] [-c FILE] [-D] [-q] [--figure FILE]
                         [--figsize W H] [--dpi DPI] [-t T] [--color C]

Positional Arguments


a typing result including haplotype counts in JSON format

Named Arguments

-c, --csv

write read counts to FILE in CSV format

-D, --no-discarded

do not included mapping but discarded reads in read counts; by default, reads that are mapped to the marker but discarded because they do not span all variants at the marker are included

-q, --quiet

do not print interlocus balance histogram to standard output in ASCII


plot interlocus balance histogram to FILE using Matplotlib; image format is inferred from extension of provided file name


dimensions (width × height in inches) of the image file to be generated; 6 4 by default


resolution (in dots per inch) of the image file to be generated; DPI=200 by default

-t, --title

add a title (such as a sample name) to the histogram plot


override histogram plot color; green by default

mhpl8r hetbalance

Compute and plot heterozygote balance

usage: mhpl8r hetbalance [-h] [-c FILE] [--figure FILE] [--figsize W H]
                         [--dpi DPI] [-t T] [--labels] [--absolute]

Positional Arguments


a typing result including haplotype counts in JSON format

Named Arguments

-c, --csv

write read counts to FILE in CSV format


plot heterzygote balance bar graph to FILE using Matplotlib; image format is inferred from extension of provided file name


dimensions (width × height in inches) of the image file to be generated; figure dimensions determined automatically by default


resolution (in dots per inch) of the image file to be generated; DPI=200 by default

-t, --title

add a title (such as a sample name) to the histogram plot


include labels showing marker names and read counts


plot absolute rather than relative read counts

mhpl8r repetitive

Calculate number of reads that map to a marker sequence but map preferentially to another locus when aligned to the whole genome

usage: mhpl8r repetitive [-h] [-o FILE] [-b B] markerbam refbam tsv

Positional Arguments


alignment file of reads aligned to marker sequences


alignment file in BAM format of reads aligned to hg38


marker definitions tsv including chromosome and full reference genome offset columns

Named Arguments

-o, --out

write output to FILE; by default, output is written to the terminal (standard output)

-b, --base-qual

minimum base quality (PHRED score) to be considered reliable for haplotype calling; by default B=10, corresponding to Q10, i.e., 90% probability that the base call is correct

mhpl8r mappingqc

Calculate number of on target, off target, repetitive, and contaminant reads and create a donut plot

usage: mhpl8r mappingqc [-h] --marker MARKER --refr REFR --rep REP --csv CSV
                        --figure FIGURE [--title TITLE]

Named Arguments


path of csv file containing number of reads mapped to marker sequences


path of csv file containing number of reads mapped to full reference genome


path of csv file containing number of repetitive reads per marker


write read counts to FILE in CSV format


create donut plot to FILE showing porportions of on target, off target, repetitive, and contaminant reads


add a title (such as a sample name) to the histogram plot

mhpl8r contrib

Estimate the minimum number of DNA contributors to a suspected mixture

usage: mhpl8r contrib [-h] [-o FILE] result

Positional Arguments


typing result in JSON format

Named Arguments

-o, --out

write output to FILE; by default, output is written to the terminal (standard output)

mhpl8r prob

Compute a profile random match probability (RMP) or an RMP-based likelihood ratio (LR) test

usage: mhpl8r prob [-h] [-e ε] [-o FILE] freq profile1 [profile2]

Positional Arguments


population haplotype frequencies in tabular (TSV) format


typing result or simulated genotype in JSON format


typing result or simulated genotype in JSON format; optional

Named Arguments

-e, --erate

rate of genotyping error; by default ε=0.01

-o, --out

write output to FILE; by default, output is written to the terminal (standard output)

mhpl8r diff

Compare two profiles and determine the markers at which their genotypes differ

usage: mhpl8r diff [-h] [-o FILE] profile1 profile2

Positional Arguments


typing result or simulated profile in JSON format


typing result or simulated profile in JSON format

Named Arguments

-o, --out

write output to "FILE"; by default, output is written to the terminal (standard output)

mhpl8r dist

Compute a simple Hamming distance between two profiles

usage: mhpl8r dist [-h] [-o FILE] profile1 profile2

Positional Arguments


typing result or simulated profile in JSON format


typing result or simulated profile in JSON format

Named Arguments

-o, --out

write output to "FILE"; by default, output is written to the terminal (standard output)

mhpl8r contain

Perform a simple containment test

usage: mhpl8r contain [-h] [-o FILE] profile1 profile2

Positional Arguments


simulated or inferred genotype profile in JSON format


simulated or inferred genotype profile in JSON format

Named Arguments

-o, --out

write output to "FILE"; by default, output is written to the terminal (standard output)

mhpl8r convert

Convert a typing result to a format compatible with probabilistic genotyping software applications

usage: mhpl8r convert [-h] [-o FILE] [--no-counts] [-f] result sample

Positional Arguments


filtered MicroHapulator typing result in JSON format


sample name

Named Arguments

-o, --out

write output to 'FILE'; by default, output is written to the terminal (standard output)


do not include haplotype counts if you are interpreting your data with a semi-continuous probgen model such as LRMix Studio; by default, haplotype counts are included for interpretation with fully continuous probgen model such as EuroForMix

-f, --fix-homo

duplicate a homozygous haplotype so that it is reported twice

mhpl8r getrefr

Download and index a GRCh38 assembly file suitable as a whole-genome mapping reference

usage: mhpl8r getrefr [-h]


mhpl8r sim

Simulate a diploid genotype from the specified microhaplotype frequencies

usage: mhpl8r sim [-h] [-s INT] [-o FILE] [--haplo-seq FILE]
                  [--sequences FILE] [--markers FILE]

Positional Arguments


population microhaplotype frequencies in tabular (tab separated) format

Named Arguments

-s, --seed

seed for random number generator

-o, --out

write simulated profile data in JSON format to FILE


write simulated haplotype sequences in FASTA format to FILE


microhaplotype sequences in FASTA format; required if --haplo-seq enabled, ignored if not


microhaplotype marker definitions in tabular (tab separated) format; required if --haplo-seq enabled, ignored if not

mhpl8r mix

Combine simulated profiles into a mock DNA mixture

usage: mhpl8r mix [-h] [-o FILE] profiles [profiles ...]

Positional Arguments


simulated genotype profiles in JSON format

Named Arguments

-o, --out

write output to "FILE"; by default, output is written to the terminal (standard output)

mhpl8r unite

Simulate the creation of a new profile from a mother and father

usage: mhpl8r unite [-h] [-o FILE] [-s INT] mom dad

Positional Arguments


simulated or inferred genotype in JSON format


simulated or inferred genotype in JSON format

Named Arguments

-o, --out

write output to "FILE"; by default, output is written to the terminal (standard output)

-s, --seed

seed for random number generator

mhpl8r seq

Simulate paired-end Illumina MiSeq sequencing of the given profile(s)

usage: mhpl8r seq [-h] [-o OUT [OUT ...]] [-n N] [-p P [P ...]]
                  [-s INT [INT ...]]
                  tsv refrseqs profiles [profiles ...]

Positional Arguments


microhaplotype marker definitions in tabular (TSV) format


microhaplotype reference sequences in FASTA format


one or more simple or complex profiles (JSON files)

Named Arguments

-o, --out

write simulated paired-end MiSeq reads in FASTQ format to the specified file(s); if one filename is provided, reads are interleaved and written to the file; if two filenames are provided, reads are written to paired files; by default, reads are interleaved and written to the terminal (standard output)

-n, --num-reads

number of reads to simulate; default is 500000

-p, --proportions

simulated mixture samples with multiple contributors at the specified proportions; by default even proportions are used

-s, --seeds

seeds for random number generator, 1 per profile