Command Line Interface¶
This page shows the MicroHapulator command line interface: how inputs and settings are specified for each subcommand.
NOTE: The MicroHapulator CLI is under Semantic Versioning. In brief, this means that every stable version of the MicroHapulator software is assigned a version number, and that any changes to the software's behavior or interface require the software version number to be updated in prescribed and predictable ways.
End-to-end analysis workflow¶
mhpl8r pipe
¶
Perform a complete end-to-end microhap analysis pipeline
usage: mhpl8r pipe [-h] [-w D] [-n] [-t T] [--copy-input]
markerrefr markerdefn seqpath samples [samples ...]
Positional Arguments¶
- markerrefr
path to a FASTA file containing marker reference sequences
- markerdefn
path to a TSV file containing marker definitions
- seqpath
path to a directory containing FASTQ files
- samples
list of sample names or path to .txt file containing sample names
Named Arguments¶
- -w, --workdir
pipeline working directory; default is current directory
- -n, --dryrun
do not execute the workflow, but display what would have been done
- -t, --threads
process each batch using T threads; by default, one thread per available core is used
- --copy-input
copy input files to working directory; by default, input files are symlinked
Haplotype calling¶
mhpl8r type
¶
Perform haplotype calling
usage: mhpl8r type [-h] [-o FILE] [-b B] [-m M] tsv bam
Positional Arguments¶
- tsv
path of a TSV file containing marker metadata, specifically the offset of each SNP for every marker in the panel
- bam
path of a BAM file containing NGS reads aligned to marker reference sequences and sorted
Named Arguments¶
- -o, --out
write output to FILE; by default, output is written to the terminal (standard output)
- -b, --base-qual
minimum base quality (PHRED score) to be considered reliable for haplotype calling; by default B=10, corresponding to Q10, i.e., 90% probability that the base call is correct
- -m, --max-depth
maximum permitted read depth; by default M=1000000
mhpl8r filter
¶
Apply static and/or dynamic thresholds to distinguish true and false haplotypes. Thresholds are applied to the haplotype read counts of a raw typing result. Static integer thresholds are commonly used as detection thresholds, below which any haplotype count is considered noise. Dynamic thresholds are commonly used as analytical thresholds and represent a percentage of the total read count at the marker, after any haplotypes failing a static threshold are discarded.
usage: mhpl8r filter [-h] [-o FILE] [-s ST] [-d DT] [-c FILE] result
Positional Arguments¶
- result
MicroHapulator typing result in JSON format
Named Arguments¶
- -o, --out
write output to FILE; by default, output is written to the terminal (standard output)
- -s, --static
global fixed read count threshold
- -d, --dynamic
global percentage of total read count; e.g. use --dynamic=0.02 to apply a 2% analytical threshold
- -c, --config
CSV file specifying marker-specific thresholds to override global thresholds; three required columns: 'Marker' for the marker name; 'Static' and 'Dynamic' for marker-specific thresholds
Analysis, QA/QC, and interpretation¶
mhpl8r locbalance
¶
Plot interlocus balance in the terminal and/or a high-resolution graphic. Also normalize read counts and perform a chi-square goodness-of-fit test assuming uniform read coverage across markers. The reported chi-square statistic measures the extent of imbalance, and can be compared among samples sequenced using the same panel: the minimum value of 0 represents perfectly uniform coverage, while the maximum value of D occurs when all reads map to a single marker (D represents the degrees of freedom, or the number of markers minus 1).
usage: mhpl8r locbalance [-h] [-c FILE] [-D] [-q] [--figure FILE]
[--figsize W H] [--dpi DPI] [-t T]
input
Positional Arguments¶
- input
a typing result including haplotype counts in JSON format
Named Arguments¶
- -c, --csv
write read counts to FILE in CSV format
- -D, --no-discarded
do not included mapping but discarded reads in read counts; by default, reads that are mapped to the marker but discarded because they do not span all variants at the marker are included
- -q, --quiet
do not print interlocus balance histogram to standard output in ASCII
- --figure
plot interlocus balance histogram to FILE using Matplotlib; image format is inferred from extension of provided file name
- --figsize
dimensions (width × height in inches) of the image file to be generated; 6 4 by default
- --dpi
resolution (in dots per inch) of the image file to be generated; DPI=200 by default
- -t, --title
add a title (such as a sample name) to the histogram plot
mhpl8r hetbalance
¶
Compute and plot heterozygote balance
usage: mhpl8r hetbalance [-h] [-c FILE] [--figure FILE] [--figsize W H]
[--dpi DPI] [-t T] [--labels] [--absolute]
input
Positional Arguments¶
- input
a typing result including haplotype counts in JSON format
Named Arguments¶
- -c, --csv
write read counts to FILE in CSV format
- --figure
plot heterzygote balance bar graph to FILE using Matplotlib; image format is inferred from extension of provided file name
- --figsize
dimensions (width × height in inches) of the image file to be generated; figure dimensions determined automatically by default
- --dpi
resolution (in dots per inch) of the image file to be generated; DPI=200 by default
- -t, --title
add a title (such as a sample name) to the histogram plot
- --labels
include labels showing marker names and read counts
- --absolute
plot absolute rather than relative read counts
mhpl8r contrib
¶
Estimate the minimum number of DNA contributors to a suspected mixture
usage: mhpl8r contrib [-h] [-o FILE] result
Positional Arguments¶
- result
typing result in JSON format
Named Arguments¶
- -o, --out
write output to FILE; by default, output is written to the terminal (standard output)
mhpl8r prob
¶
Compute a profile random match probability (RMP) or an RMP-based likelihood ratio (LR) test
usage: mhpl8r prob [-h] [-e ε] [-o FILE] freq profile1 [profile2]
Positional Arguments¶
- freq
population haplotype frequencies in tabular (TSV) format
- profile1
typing result or simulated genotype in JSON format
- profile2
typing result or simulated genotype in JSON format; optional
Named Arguments¶
- -e, --erate
rate of genotyping error; by default ε=0.01
- -o, --out
write output to FILE; by default, output is written to the terminal (standard output)
mhpl8r diff
¶
Compare two profiles and determine the markers at which their genotypes differ
usage: mhpl8r diff [-h] [-o FILE] profile1 profile2
Positional Arguments¶
- profile1
typing result or simulated profile in JSON format
- profile2
typing result or simulated profile in JSON format
Named Arguments¶
- -o, --out
write output to "FILE"; by default, output is written to the terminal (standard output)
mhpl8r dist
¶
Compute a simple Hamming distance between two profiles
usage: mhpl8r dist [-h] [-o FILE] profile1 profile2
Positional Arguments¶
- profile1
typing result or simulated profile in JSON format
- profile2
typing result or simulated profile in JSON format
Named Arguments¶
- -o, --out
write output to "FILE"; by default, output is written to the terminal (standard output)
mhpl8r contain
¶
Perform a simple containment test
usage: mhpl8r contain [-h] [-o FILE] profile1 profile2
Positional Arguments¶
- profile1
simulated or inferred genotype profile in JSON format
- profile2
simulated or inferred genotype profile in JSON format
Named Arguments¶
- -o, --out
write output to "FILE"; by default, output is written to the terminal (standard output)
mhpl8r convert
¶
Convert a typing result to a format compatible with probabilistic genotyping software applications
usage: mhpl8r convert [-h] [-o FILE] [--no-counts] [-f] result sample
Positional Arguments¶
- result
filtered MicroHapulator typing result in JSON format
- sample
sample name
Named Arguments¶
- -o, --out
write output to 'FILE'; by default, output is written to the terminal (standard output)
- --no-counts
do not include haplotype counts if you are interpreting your data with a semi-continuous probgen model such as LRMix Studio; by default, haplotype counts are included for interpretation with fully continuous probgen model such as EuroForMix
- -f, --fix-homo
duplicate a homozygous haplotype so that it is reported twice
mhpl8r getrefr
¶
Download and index a GRCh38 assembly file suitable as a whole-genome mapping reference
usage: mhpl8r getrefr [-h]
Simulation¶
mhpl8r sim
¶
Simulate a diploid genotype from the specified microhaplotype frequencies
usage: mhpl8r sim [-h] [-s INT] [-o FILE] [--haplo-seq FILE]
[--sequences FILE] [--markers FILE]
freq
Positional Arguments¶
- freq
population microhaplotype frequencies in tabular (tab separated) format
Named Arguments¶
- -s, --seed
seed for random number generator
- -o, --out
write simulated profile data in JSON format to FILE
- --haplo-seq
write simulated haplotype sequences in FASTA format to FILE
- --sequences
microhaplotype sequences in FASTA format; required if --haplo-seq enabled, ignored if not
- --markers
microhaplotype marker definitions in tabular (tab separated) format; required if --haplo-seq enabled, ignored if not
mhpl8r mix
¶
Combine simulated profiles into a mock DNA mixture
usage: mhpl8r mix [-h] [-o FILE] profiles [profiles ...]
Positional Arguments¶
- profiles
simulated genotype profiles in JSON format
Named Arguments¶
- -o, --out
write output to "FILE"; by default, output is written to the terminal (standard output)
mhpl8r unite
¶
Simulate the creation of a new profile from a mother and father
usage: mhpl8r unite [-h] [-o FILE] [-s INT] mom dad
Positional Arguments¶
- mom
simulated or inferred genotype in JSON format
- dad
simulated or inferred genotype in JSON format
Named Arguments¶
- -o, --out
write output to "FILE"; by default, output is written to the terminal (standard output)
- -s, --seed
seed for random number generator
mhpl8r seq
¶
Simulate paired-end Illumina MiSeq sequencing of the given profile(s)
usage: mhpl8r seq [-h] [-o OUT [OUT ...]] [-n N] [-p P [P ...]]
[-s INT [INT ...]]
tsv refrseqs profiles [profiles ...]
Positional Arguments¶
- tsv
microhaplotype marker definitions in tabular (TSV) format
- refrseqs
microhaplotype reference sequences in FASTA format
- profiles
one or more simple or complex profiles (JSON files)
Named Arguments¶
- -o, --out
write simulated paired-end MiSeq reads in FASTQ format to the specified file(s); if one filename is provided, reads are interleaved and written to the file; if two filenames are provided, reads are written to paired files; by default, reads are interleaved and written to the terminal (standard output)
- -n, --num-reads
number of reads to simulate; default is 500000
- -p, --proportions
simulated mixture samples with multiple contributors at the specified proportions; by default even proportions are used
- -s, --seeds
seeds for random number generator, 1 per profile