Running Jaeger¶
This guide covers the main prediction workflow, output interpretation, prophage extraction, and available utility commands.
For inference speedups and model quantization, see the Performance optimizations page.
Table of contents¶
Basic prediction¶
The simplest way to run Jaeger:
jaeger predict -i contigs.fasta -o output_dir
Run on CPU explicitly:
jaeger predict -i contigs.fasta -o output_dir --cpu
Run with Apptainer/Singularity:
apptainer run --nv jaeger.sif jaeger predict -i contigs.fasta -o output_dir
Common options¶
Option |
Description |
Default |
|---|---|---|
|
Input FASTA file (required) |
— |
|
Output directory (required) |
— |
|
Parallel batch size |
96 |
|
Force CPU execution |
False |
|
Number of CPU threads |
4 |
|
Sliding window length |
2000 |
|
Step size between windows |
2000 |
|
Overwrite existing output |
False |
Choosing a model¶
Jaeger ships with a default model. Additional models can be downloaded and selected via the -m flag.
# List all available models
jaeger download --list
# Download a specific model
jaeger download --model_name jaeger_57341_1.5M_fragment --path ~/jaeger_models
# Use the downloaded model
jaeger predict -i contigs.fasta -o output_dir -m jaeger_57341_1.5M_fragment
To register a custom model path:
jaeger register-models --path ~/my_custom_models
Batch size and memory¶
The --batch option controls how many sequences are processed in parallel. If you encounter out-of-memory (OOM) errors, reduce this value.
GPU memory |
Suggested |
|---|---|
4 GB |
32–64 |
8 GB |
64–96 |
16 GB |
96–128 |
24+ GB |
128–256 |
You can also limit GPU memory allocation:
jaeger predict -i contigs.fasta -o output_dir --mem 8
Output files¶
After running Jaeger, the output directory contains:
output_dir/
├── <input>_jaeger.tsv # Main predictions table
├── <input>_phages_jaeger.tsv # Subset of phage-only predictions
└── <input>_jaeger.log # Runtime log
When prophage extraction is enabled (-p):
output_dir/
├── <input>_jaeger.tsv
├── <input>_phages_jaeger.tsv
├── <input>_jaeger.log
└── <sample_name>_prophages/
├── prophages_jaeger.tsv # Prophage coordinates
└── plots/
└── *.pdf # Genome visualizations (circular and/or linear)
Understanding the output table¶
The main TSV file (<input>_jaeger.tsv) contains one row per contig:
Column |
Description |
Interpretation |
|---|---|---|
|
FASTA header |
— |
|
Sequence length |
≥ |
|
Final call |
|
|
Softmax entropy |
0 (confident) → 2 (uncertain) |
|
Model confidence |
0 (uncertain) → 1 (confident) |
|
Potential host contamination |
|
|
Potential prophage contamination |
|
|
GC content |
0–1 |
|
Fraction of N bases |
0–1 |
|
Secondary prediction |
e.g., |
|
Windows classified as bacteria |
— |
|
Mean logit for bacteria windows |
— |
|
Variance of bacteria logits |
— |
|
Windows classified as phage |
— |
|
Mean logit for phage windows |
— |
|
Variance of phage logits |
— |
|
Windows classified as eukarya |
— |
|
Mean logit for eukarya windows |
— |
|
Variance of eukarya logits |
— |
|
Windows classified as archaea |
— |
|
Mean logit for archaea windows |
— |
|
Variance of archaea logits |
— |
|
Pattern of V (phage) / n (non-phage) windows |
e.g., |
|
Detected terminal repeat type |
|
|
Length of terminal repeat |
bp |
Filtering tips¶
High-confidence phages:
prediction == "phage"andreliability_score > 0.5Uncertain calls:
entropy > 1.0orreliability_score < 0.3Potential prophages:
prophage_contam == True
Prophage extraction¶
Enable prophage detection with the -p flag:
jaeger predict -p -i genome.fna -o output_dir
Tune sensitivity with -s (0–4, higher = more sensitive):
jaeger predict -p -i genome.fna -o output_dir -s 2.0
Set minimum contig length for prophage scanning:
jaeger predict -p -i genome.fna -o output_dir --lc 100000
The plots/ directory contains genome visualizations with prophage regions highlighted. By default, circular plots are generated. Use --plot-type to select linear plots, both, or none.
Plot types¶
|
Description |
|---|---|
|
Circular genome map (default) |
|
Horizontal 4-panel linear plot |
|
Generate both circular and linear plots |
|
Skip plot generation (report only) |
# Generate linear plots instead of circular
jaeger predict -p -i genome.fna -o output_dir --plot-type linear
# Generate both circular and linear plots
jaeger predict -p -i genome.fna -o output_dir --plot-type both
# Skip plots, generate TSV report only
jaeger predict -p -i genome.fna -o output_dir --plot-type none
Command-line reference¶
Main commands¶
jaeger [COMMAND] [OPTIONS]
Command |
Purpose |
|---|---|
|
Run phage detection on a FASTA file |
|
Check installation and hardware |
|
Download pre-trained models |
|
Register custom model directories |
|
Train new models from scratch |
|
Auxiliary tools (see below) |
|
Experimental taxonomy prediction |
jaeger predict full help¶
Usage: jaeger predict [OPTIONS]
Runs Jaeger on a dataset
Options:
-i, --input PATH Path to input file [required]
-o, --output TEXT Path to output directory [required]
--fsize INTEGER Length of the sliding window [default: 2000]
--stride INTEGER The gap between two the sliding windows
(stride==fsize) [default: 2000]
-m, --model TEXT Select a deep-learning model to use. [default:
default]
--model_path TEXT Give the path to a model. overrides --model
--config PATH Path to Jaeger config file (e.g., when using
Apptainer or Docker)
-p, --prophage Extract and report prophage-like regions
-s, --sensitivity FLOAT Sensitivity of the prophage extraction algorithm
(0-4) [default: 1.5]
--lc INTEGER Minimum contig length for prophage extraction
[default: 500000]
--plot-type [circular|linear|both|none]
Prophage plot type [default: circular]
--rc FLOAT Minimum reliability score required to accept
predictions [default: 0.1]
--pc INTEGER Minimum phage score required to accept predictions
[default: 3]
--batch INTEGER Parallel batch size, lower if GPU runs out of
memory [default: 96]
--workers INTEGER Number of threads to use [default: 4]
--getalllogits Writes window-wise scores to a .npy file
--getsequences Writes the putative phage sequences to a .fasta
file
--cpu Ignore available GPUs and explicitly run on CPU
--physicalid INTEGER Set default GPU device ID for multi-GPU systems
[default: 0]
--mem INTEGER GPU memory limit [default: 4]
--getalllabels Get predicted labels for Non-Viral contigs
-v, --verbose Verbosity level: -vv debug, -v info [default: 1]
-f, --overwrite Overwrite existing files
--help Show this message and exit.
Utility commands (jaeger utils)¶
Subcommand |
Purpose |
|---|---|
|
Generate non-redundant fragment databases for training |
|
Simulate metagenome assemblies from genomes |
|
Convert between CSV and FASTA formats |
|
Mask or mutate positions in FASTA files |
|
Generate out-of-distribution (shuffled) sequences |
|
Combine multiple models into an ensemble |
|
Calculate statistics from Jaeger output |
Python integration¶
Note: The Python API (
jaeger.api) is currently experimental and not available in the latest release. For programmatic access, use the CLI viasubprocess:import subprocess subprocess.run([ "jaeger", "predict", "-i", "input.fasta", "-o", "output_dir", "--batch", "128" ])