Command Line Interface

This page shows the MicroHapulator command line interface: how inputs and settings are specified for each subcommand.

NOTE: The MicroHapulator CLI is under Semantic Versioning. In brief, this means that every stable version of the MicroHapulator software is assigned a version number, and that any changes to the software's behavior or interface require the software version number to be updated in prescribed and predictable ways.


End-to-end analysis workflow

mhpl8r pipe

Perform a complete end-to-end microhap analysis pipeline

usage: mhpl8r pipe [-h] [-w D] [-n] [-t T] [--copy-input]
                   markerrefr markerdefn seqpath samples [samples ...]

Positional Arguments

markerrefr

path to a FASTA file containing marker reference sequences

markerdefn

path to a TSV file containing marker definitions

seqpath

path to a directory containing FASTQ files

samples

list of sample names or path to .txt file containing sample names

Named Arguments

-w, --workdir

pipeline working directory; default is current directory

-n, --dryrun

do not execute the workflow, but display what would have been done

-t, --threads

process each batch using T threads; by default, one thread per available core is used

--copy-input

copy input files to working directory; by default, input files are symlinked

Haplotype calling

mhpl8r type

Perform haplotype calling

usage: mhpl8r type [-h] [-o FILE] [-b B] [-m M] tsv bam

Positional Arguments

tsv

path of a TSV file containing marker metadata, specifically the offset of each SNP for every marker in the panel

bam

path of a BAM file containing NGS reads aligned to marker reference sequences and sorted

Named Arguments

-o, --out

write output to FILE; by default, output is written to the terminal (standard output)

-b, --base-qual

minimum base quality (PHRED score) to be considered reliable for haplotype calling; by default B=10, corresponding to Q10, i.e., 90% probability that the base call is correct

-m, --max-depth

maximum permitted read depth; by default M=1000000

mhpl8r filter

Apply static and/or dynamic thresholds to distinguish true and false haplotypes. Thresholds are applied to the haplotype read counts of a raw typing result. Static integer thresholds are commonly used as detection thresholds, below which any haplotype count is considered noise. Dynamic thresholds are commonly used as analytical thresholds and represent a percentage of the total read count at the marker, after any haplotypes failing a static threshold are discarded.

usage: mhpl8r filter [-h] [-o FILE] [-s ST] [-d DT] [-c FILE] result

Positional Arguments

result

MicroHapulator typing result in JSON format

Named Arguments

-o, --out

write output to FILE; by default, output is written to the terminal (standard output)

-s, --static

global fixed read count threshold

-d, --dynamic

global percentage of total read count; e.g. use --dynamic=0.02 to apply a 2% analytical threshold

-c, --config

CSV file specifying marker-specific thresholds to override global thresholds; three required columns: 'Marker' for the marker name; 'Static' and 'Dynamic' for marker-specific thresholds

Analysis, QA/QC, and interpretation

mhpl8r locbalance

Plot interlocus balance in the terminal and/or a high-resolution graphic. Also normalize read counts and perform a chi-square goodness-of-fit test assuming uniform read coverage across markers. The reported chi-square statistic measures the extent of imbalance, and can be compared among samples sequenced using the same panel: the minimum value of 0 represents perfectly uniform coverage, while the maximum value of D occurs when all reads map to a single marker (D represents the degrees of freedom, or the number of markers minus 1).

usage: mhpl8r locbalance [-h] [-c FILE] [-D] [-q] [--figure FILE]
                         [--figsize W H] [--dpi DPI] [-t T]
                         input

Positional Arguments

input

a typing result including haplotype counts in JSON format

Named Arguments

-c, --csv

write read counts to FILE in CSV format

-D, --no-discarded

do not included mapping but discarded reads in read counts; by default, reads that are mapped to the marker but discarded because they do not span all variants at the marker are included

-q, --quiet

do not print interlocus balance histogram to standard output in ASCII

--figure

plot interlocus balance histogram to FILE using Matplotlib; image format is inferred from extension of provided file name

--figsize

dimensions (width × height in inches) of the image file to be generated; 6 4 by default

--dpi

resolution (in dots per inch) of the image file to be generated; DPI=200 by default

-t, --title

add a title (such as a sample name) to the histogram plot

mhpl8r hetbalance

Compute and plot heterozygote balance

usage: mhpl8r hetbalance [-h] [-c FILE] [--figure FILE] [--figsize W H]
                         [--dpi DPI] [-t T] [--labels] [--absolute]
                         input

Positional Arguments

input

a typing result including haplotype counts in JSON format

Named Arguments

-c, --csv

write read counts to FILE in CSV format

--figure

plot heterzygote balance bar graph to FILE using Matplotlib; image format is inferred from extension of provided file name

--figsize

dimensions (width × height in inches) of the image file to be generated; figure dimensions determined automatically by default

--dpi

resolution (in dots per inch) of the image file to be generated; DPI=200 by default

-t, --title

add a title (such as a sample name) to the histogram plot

--labels

include labels showing marker names and read counts

--absolute

plot absolute rather than relative read counts

mhpl8r contrib

Estimate the minimum number of DNA contributors to a suspected mixture

usage: mhpl8r contrib [-h] [-o FILE] result

Positional Arguments

result

typing result in JSON format

Named Arguments

-o, --out

write output to FILE; by default, output is written to the terminal (standard output)

mhpl8r prob

Compute a profile random match probability (RMP) or an RMP-based likelihood ratio (LR) test

usage: mhpl8r prob [-h] [-e ε] [-o FILE] freq profile1 [profile2]

Positional Arguments

freq

population haplotype frequencies in tabular (TSV) format

profile1

typing result or simulated genotype in JSON format

profile2

typing result or simulated genotype in JSON format; optional

Named Arguments

-e, --erate

rate of genotyping error; by default ε=0.01

-o, --out

write output to FILE; by default, output is written to the terminal (standard output)

mhpl8r diff

Compare two profiles and determine the markers at which their genotypes differ

usage: mhpl8r diff [-h] [-o FILE] profile1 profile2

Positional Arguments

profile1

typing result or simulated profile in JSON format

profile2

typing result or simulated profile in JSON format

Named Arguments

-o, --out

write output to "FILE"; by default, output is written to the terminal (standard output)

mhpl8r dist

Compute a simple Hamming distance between two profiles

usage: mhpl8r dist [-h] [-o FILE] profile1 profile2

Positional Arguments

profile1

typing result or simulated profile in JSON format

profile2

typing result or simulated profile in JSON format

Named Arguments

-o, --out

write output to "FILE"; by default, output is written to the terminal (standard output)

mhpl8r contain

Perform a simple containment test

usage: mhpl8r contain [-h] [-o FILE] profile1 profile2

Positional Arguments

profile1

simulated or inferred genotype profile in JSON format

profile2

simulated or inferred genotype profile in JSON format

Named Arguments

-o, --out

write output to "FILE"; by default, output is written to the terminal (standard output)

mhpl8r convert

Convert a typing result to a format compatible with probabilistic genotyping software applications

usage: mhpl8r convert [-h] [-o FILE] [--no-counts] [-f] result sample

Positional Arguments

result

filtered MicroHapulator typing result in JSON format

sample

sample name

Named Arguments

-o, --out

write output to 'FILE'; by default, output is written to the terminal (standard output)

--no-counts

do not include haplotype counts if you are interpreting your data with a semi-continuous probgen model such as LRMix Studio; by default, haplotype counts are included for interpretation with fully continuous probgen model such as EuroForMix

-f, --fix-homo

duplicate a homozygous haplotype so that it is reported twice

mhpl8r getrefr

Download and index a GRCh38 assembly file suitable as a whole-genome mapping reference

usage: mhpl8r getrefr [-h]

Simulation

mhpl8r sim

Simulate a diploid genotype from the specified microhaplotype frequencies

usage: mhpl8r sim [-h] [-s INT] [-o FILE] [--haplo-seq FILE]
                  [--sequences FILE] [--markers FILE]
                  freq

Positional Arguments

freq

population microhaplotype frequencies in tabular (tab separated) format

Named Arguments

-s, --seed

seed for random number generator

-o, --out

write simulated profile data in JSON format to FILE

--haplo-seq

write simulated haplotype sequences in FASTA format to FILE

--sequences

microhaplotype sequences in FASTA format; required if --haplo-seq enabled, ignored if not

--markers

microhaplotype marker definitions in tabular (tab separated) format; required if --haplo-seq enabled, ignored if not

mhpl8r mix

Combine simulated profiles into a mock DNA mixture

usage: mhpl8r mix [-h] [-o FILE] profiles [profiles ...]

Positional Arguments

profiles

simulated genotype profiles in JSON format

Named Arguments

-o, --out

write output to "FILE"; by default, output is written to the terminal (standard output)

mhpl8r unite

Simulate the creation of a new profile from a mother and father

usage: mhpl8r unite [-h] [-o FILE] [-s INT] mom dad

Positional Arguments

mom

simulated or inferred genotype in JSON format

dad

simulated or inferred genotype in JSON format

Named Arguments

-o, --out

write output to "FILE"; by default, output is written to the terminal (standard output)

-s, --seed

seed for random number generator

mhpl8r seq

Simulate paired-end Illumina MiSeq sequencing of the given profile(s)

usage: mhpl8r seq [-h] [-o OUT [OUT ...]] [-n N] [-p P [P ...]]
                  [-s INT [INT ...]]
                  tsv refrseqs profiles [profiles ...]

Positional Arguments

tsv

microhaplotype marker definitions in tabular (TSV) format

refrseqs

microhaplotype reference sequences in FASTA format

profiles

one or more simple or complex profiles (JSON files)

Named Arguments

-o, --out

write simulated paired-end MiSeq reads in FASTQ format to the specified file(s); if one filename is provided, reads are interleaved and written to the file; if two filenames are provided, reads are written to paired files; by default, reads are interleaved and written to the terminal (standard output)

-n, --num-reads

number of reads to simulate; default is 500000

-p, --proportions

simulated mixture samples with multiple contributors at the specified proportions; by default even proportions are used

-s, --seeds

seeds for random number generator, 1 per profile