Command Line Interface

This page shows the MicroHapulator command line interface: how inputs and settings are specified for each subcommand.

NOTE: The MicroHapulator CLI is under Semantic Versioning. In brief, this means that every stable version of the MicroHapulator software is assigned a version number, and that any changes to the software's behavior or interface require the software version number to be updated in prescribed and predictable ways.


End-to-end analysis workflow

mhpl8r pipe

Perform a complete end-to-end microhap analysis pipeline

usage: mhpl8r pipe [-h] [-w D] [-n] [-t T] [-s ST] [-d DT] [-c CSV] [--single]
                   [--copy-input]
                   markerrefr markerdefn seqpath samples [samples ...]

Positional Arguments

markerrefr

path to a FASTA file containing marker reference sequences

markerdefn

path to a TSV file containing marker definitions

seqpath

path to a directory containing FASTQ files

samples

list of sample names or path to .txt file containing sample names

Named Arguments

-w, --workdir

pipeline working directory; default is current directory

-n, --dryrun

do not execute the workflow, but display what would have been done

-t, --threads

process each batch using T threads; by default, one thread per available core is used

-s, --static

global fixed read count threshold; ST=5 by default

-d, --dynamic

global percentage of total read count threshold; e.g. use --dynamic=0.02 to apply a 2% analytical threshold; DT=0.02 by default

-c, --config

CSV file specifying marker-specific thresholds to override global thresholds; three required columns: 'Marker' for the marker name; 'Static' and 'Dynamic' for marker-specific thresholds

--single

accept single-end reads only; by default, only paired-end reads are accepted

--copy-input

copy input files to working directory; by default, input files are symlinked

Haplotype calling

mhpl8r type

Perform haplotype calling

usage: mhpl8r type [-h] [-o FILE] [-b B] [-m M] tsv bam

Positional Arguments

tsv

path of a TSV file containing marker metadata, specifically the offset of each SNP for every marker in the panel

bam

path of a BAM file containing NGS reads aligned to marker reference sequences and sorted

Named Arguments

-o, --out

write output to FILE; by default, output is written to the terminal (standard output)

-b, --base-qual

minimum base quality (PHRED score) to be considered reliable for haplotype calling; by default B=10, corresponding to Q10, i.e., 90% probability that the base call is correct

-m, --max-depth

maximum permitted read depth; by default M=1000000

mhpl8r filter

Apply static and/or dynamic thresholds to distinguish true and false haplotypes. Thresholds are applied to the haplotype read counts of a raw typing result. Static integer thresholds are commonly used as detection thresholds, below which any haplotype count is considered noise. Dynamic thresholds are commonly used as analytical thresholds and represent a percentage of the total read count at the marker, after any haplotypes failing a static threshold are discarded.

usage: mhpl8r filter [-h] [-o FILE] [-s ST] [-d DT] [-c CSV] result

Positional Arguments

result

MicroHapulator typing result in JSON format

Named Arguments

-o, --out

write output to FILE; by default, output is written to the terminal (standard output)

-s, --static

global fixed read count threshold

-d, --dynamic

global percentage of total read count threshold; e.g. use --dynamic=0.02 to apply a 2% analytical threshold

-c, --config

CSV file specifying marker-specific thresholds to override global thresholds; three required columns: 'Marker' for the marker name; 'Static' and 'Dynamic' for marker-specific thresholds

Analysis, QA/QC, and interpretation

mhpl8r locbalance

Plot interlocus balance in the terminal and/or a high-resolution graphic. Also normalize read counts and perform a chi-square goodness-of-fit test assuming uniform read coverage across markers. The reported chi-square statistic measures the extent of imbalance, and can be compared among samples sequenced using the same panel: the minimum value of 0 represents perfectly uniform coverage, while the maximum value of D occurs when all reads map to a single marker (D represents the degrees of freedom, or the number of markers minus 1).

usage: mhpl8r locbalance [-h] [-c FILE] [-D] [-q] [--figure FILE]
                         [--figsize W H] [--dpi DPI] [-t T] [--color C]
                         input

Positional Arguments

input

a typing result including haplotype counts in JSON format

Named Arguments

-c, --csv

write read counts to FILE in CSV format

-D, --no-discarded

do not included mapping but discarded reads in read counts; by default, reads that are mapped to the marker but discarded because they do not span all variants at the marker are included

-q, --quiet

do not print interlocus balance histogram to standard output in ASCII

--figure

plot interlocus balance histogram to FILE using Matplotlib; image format is inferred from extension of provided file name

--figsize

dimensions (width × height in inches) of the image file to be generated; 6 4 by default

--dpi

resolution (in dots per inch) of the image file to be generated; DPI=200 by default

-t, --title

add a title (such as a sample name) to the histogram plot

--color

override histogram plot color; green by default

mhpl8r hetbalance

Compute and plot heterozygote balance

usage: mhpl8r hetbalance [-h] [-c FILE] [--figure FILE] [--figsize W H]
                         [--dpi DPI] [-t T] [--labels] [--absolute]
                         input

Positional Arguments

input

a typing result including haplotype counts in JSON format

Named Arguments

-c, --csv

write read counts to FILE in CSV format

--figure

plot heterzygote balance bar graph to FILE using Matplotlib; image format is inferred from extension of provided file name

--figsize

dimensions (width × height in inches) of the image file to be generated; figure dimensions determined automatically by default

--dpi

resolution (in dots per inch) of the image file to be generated; DPI=200 by default

-t, --title

add a title (such as a sample name) to the histogram plot

--labels

include labels showing marker names and read counts

--absolute

plot absolute rather than relative read counts

mhpl8r offtarget

Calculate off target read mapping

usage: mhpl8r offtarget [-h] [-o FILE] [-b B] markerbam refbam tsv

Positional Arguments

markerbam

alignment file of reads aligned to marker sequences

refbam

alignment file in BAM format of reads aligned to hg38

tsv

marker definitions tsv including chromosome and full reference genome offset columns

Named Arguments

-o, --out

write output to FILE; by default, output is written to the terminal (standard output)

-b, --base-qual

minimum base quality (PHRED score) to be considered reliable for haplotype calling; by default B=10, corresponding to Q10, i.e., 90% probability that the base call is correct

mhpl8r contrib

Estimate the minimum number of DNA contributors to a suspected mixture

usage: mhpl8r contrib [-h] [-o FILE] result

Positional Arguments

result

typing result in JSON format

Named Arguments

-o, --out

write output to FILE; by default, output is written to the terminal (standard output)

mhpl8r prob

Compute a profile random match probability (RMP) or an RMP-based likelihood ratio (LR) test

usage: mhpl8r prob [-h] [-e ε] [-o FILE] freq profile1 [profile2]

Positional Arguments

freq

population haplotype frequencies in tabular (TSV) format

profile1

typing result or simulated genotype in JSON format

profile2

typing result or simulated genotype in JSON format; optional

Named Arguments

-e, --erate

rate of genotyping error; by default ε=0.01

-o, --out

write output to FILE; by default, output is written to the terminal (standard output)

mhpl8r diff

Compare two profiles and determine the markers at which their genotypes differ

usage: mhpl8r diff [-h] [-o FILE] profile1 profile2

Positional Arguments

profile1

typing result or simulated profile in JSON format

profile2

typing result or simulated profile in JSON format

Named Arguments

-o, --out

write output to "FILE"; by default, output is written to the terminal (standard output)

mhpl8r dist

Compute a simple Hamming distance between two profiles

usage: mhpl8r dist [-h] [-o FILE] profile1 profile2

Positional Arguments

profile1

typing result or simulated profile in JSON format

profile2

typing result or simulated profile in JSON format

Named Arguments

-o, --out

write output to "FILE"; by default, output is written to the terminal (standard output)

mhpl8r contain

Perform a simple containment test

usage: mhpl8r contain [-h] [-o FILE] profile1 profile2

Positional Arguments

profile1

simulated or inferred genotype profile in JSON format

profile2

simulated or inferred genotype profile in JSON format

Named Arguments

-o, --out

write output to "FILE"; by default, output is written to the terminal (standard output)

mhpl8r convert

Convert a typing result to a format compatible with probabilistic genotyping software applications

usage: mhpl8r convert [-h] [-o FILE] [--no-counts] [-f] result sample

Positional Arguments

result

filtered MicroHapulator typing result in JSON format

sample

sample name

Named Arguments

-o, --out

write output to 'FILE'; by default, output is written to the terminal (standard output)

--no-counts

do not include haplotype counts if you are interpreting your data with a semi-continuous probgen model such as LRMix Studio; by default, haplotype counts are included for interpretation with fully continuous probgen model such as EuroForMix

-f, --fix-homo

duplicate a homozygous haplotype so that it is reported twice

mhpl8r getrefr

Download and index a GRCh38 assembly file suitable as a whole-genome mapping reference

usage: mhpl8r getrefr [-h]

Simulation

mhpl8r sim

Simulate a diploid genotype from the specified microhaplotype frequencies

usage: mhpl8r sim [-h] [-s INT] [-o FILE] [--haplo-seq FILE]
                  [--sequences FILE] [--markers FILE]
                  freq

Positional Arguments

freq

population microhaplotype frequencies in tabular (tab separated) format

Named Arguments

-s, --seed

seed for random number generator

-o, --out

write simulated profile data in JSON format to FILE

--haplo-seq

write simulated haplotype sequences in FASTA format to FILE

--sequences

microhaplotype sequences in FASTA format; required if --haplo-seq enabled, ignored if not

--markers

microhaplotype marker definitions in tabular (tab separated) format; required if --haplo-seq enabled, ignored if not

mhpl8r mix

Combine simulated profiles into a mock DNA mixture

usage: mhpl8r mix [-h] [-o FILE] profiles [profiles ...]

Positional Arguments

profiles

simulated genotype profiles in JSON format

Named Arguments

-o, --out

write output to "FILE"; by default, output is written to the terminal (standard output)

mhpl8r unite

Simulate the creation of a new profile from a mother and father

usage: mhpl8r unite [-h] [-o FILE] [-s INT] mom dad

Positional Arguments

mom

simulated or inferred genotype in JSON format

dad

simulated or inferred genotype in JSON format

Named Arguments

-o, --out

write output to "FILE"; by default, output is written to the terminal (standard output)

-s, --seed

seed for random number generator

mhpl8r seq

Simulate paired-end Illumina MiSeq sequencing of the given profile(s)

usage: mhpl8r seq [-h] [-o OUT [OUT ...]] [-n N] [-p P [P ...]]
                  [-s INT [INT ...]]
                  tsv refrseqs profiles [profiles ...]

Positional Arguments

tsv

microhaplotype marker definitions in tabular (TSV) format

refrseqs

microhaplotype reference sequences in FASTA format

profiles

one or more simple or complex profiles (JSON files)

Named Arguments

-o, --out

write simulated paired-end MiSeq reads in FASTQ format to the specified file(s); if one filename is provided, reads are interleaved and written to the file; if two filenames are provided, reads are written to paired files; by default, reads are interleaved and written to the terminal (standard output)

-n, --num-reads

number of reads to simulate; default is 500000

-p, --proportions

simulated mixture samples with multiple contributors at the specified proportions; by default even proportions are used

-s, --seeds

seeds for random number generator, 1 per profile