# Running Jaeger This guide covers the main prediction workflow, output interpretation, prophage extraction, and available utility commands. --- ## Table of contents - [Basic prediction](#basic-prediction) - [Choosing a model](#choosing-a-model) - [Batch size and memory](#batch-size-and-memory) - [Output files](#output-files) - [Understanding the output table](#understanding-the-output-table) - [Prophage extraction](#prophage-extraction) - [Command-line reference](#command-line-reference) - [Python integration](#python-integration) --- ## Basic prediction The simplest way to run Jaeger: ```bash jaeger predict -i contigs.fasta -o output_dir ``` Run on CPU explicitly: ```bash jaeger predict -i contigs.fasta -o output_dir --cpu ``` Run with Apptainer/Singularity: ```bash apptainer run --nv jaeger.sif jaeger predict -i contigs.fasta -o output_dir ``` ### Common options | Option | Description | Default | |--------|-------------|---------| | `-i, --input` | Input FASTA file (required) | — | | `-o, --output` | Output directory (required) | — | | `--batch` | Parallel batch size | 96 | | `--cpu` | Force CPU execution | False | | `--workers` | Number of CPU threads | 4 | | `--fsize` | Sliding window length | 2000 | | `--stride` | Step size between windows | 2000 | | `-f, --overwrite` | Overwrite existing output | False | --- ## Choosing a model Jaeger ships with a `default` model. Additional models can be downloaded and selected via the `-m` flag. ```bash # List all available models jaeger download --list # Download a specific model jaeger download --model_name jaeger_57341_1.5M_fragment --path ~/jaeger_models # Use the downloaded model jaeger predict -i contigs.fasta -o output_dir -m jaeger_57341_1.5M_fragment ``` To register a custom model path: ```bash jaeger register-models --path ~/my_custom_models ``` --- ## Batch size and memory The `--batch` option controls how many sequences are processed in parallel. If you encounter out-of-memory (OOM) errors, reduce this value. | GPU memory | Suggested `--batch` | |------------|---------------------| | 4 GB | 32–64 | | 8 GB | 64–96 | | 16 GB | 96–128 | | 24+ GB | 128–256 | You can also limit GPU memory allocation: ```bash jaeger predict -i contigs.fasta -o output_dir --mem 8 ``` --- ## Output files After running Jaeger, the output directory contains: ``` output_dir/ ├── _jaeger.tsv # Main predictions table ├── _phages_jaeger.tsv # Subset of phage-only predictions └── _jaeger.log # Runtime log ``` When prophage extraction is enabled (`-p`): ``` output_dir/ ├── _jaeger.tsv ├── _phages_jaeger.tsv ├── _jaeger.log └── _prophages/ ├── prophages_jaeger.tsv # Prophage coordinates └── plots/ └── *.pdf # Circular genome visualizations ``` --- ## Understanding the output table The main TSV file (`_jaeger.tsv`) contains one row per contig: | Column | Description | Interpretation | |--------|-------------|----------------| | `contig_id` | FASTA header | — | | `length` | Sequence length | ≥ `fsize` | | `prediction` | Final call | `phage` or `bacteria` | | `entropy` | Softmax entropy | 0 (confident) → 2 (uncertain) | | `reliability_score` | Model confidence | 0 (uncertain) → 1 (confident) | | `host_contam` | Potential host contamination | `True` / `False` | | `prophage_contam` | Potential prophage contamination | `True` / `False` | | `G+C` | GC content | 0–1 | | `N%` | Fraction of N bases | 0–1 | | `prediction_2` | Secondary prediction | e.g., `bacteria` | | `#_bacteria_windows` | Windows classified as bacteria | — | | `bacteria_score` | Mean logit for bacteria windows | — | | `bacteria_var` | Variance of bacteria logits | — | | `#_phage_windows` | Windows classified as phage | — | | `phage_score` | Mean logit for phage windows | — | | `phage_var` | Variance of phage logits | — | | `#_eukarya_windows` | Windows classified as eukarya | — | | `eukarya_score` | Mean logit for eukarya windows | — | | `eukarya_var` | Variance of eukarya logits | — | | `#_archaea_windows` | Windows classified as archaea | — | | `archaea_score` | Mean logit for archaea windows | — | | `archaea_var` | Variance of archaea logits | — | | `window_summary` | Pattern of V (phage) / n (non-phage) windows | e.g., `5V1n2V` | | `terminal_repeats` | Detected terminal repeat type | `DTR`, `ITR`, or `null` | | `repeat_length` | Length of terminal repeat | bp | ### Filtering tips - High-confidence phages: `prediction == "phage"` and `reliability_score > 0.5` - Uncertain calls: `entropy > 1.0` or `reliability_score < 0.3` - Potential prophages: `prophage_contam == True` --- ## Prophage extraction Enable prophage detection with the `-p` flag: ```bash jaeger predict -p -i genome.fna -o output_dir ``` Tune sensitivity with `-s` (0–4, higher = more sensitive): ```bash jaeger predict -p -i genome.fna -o output_dir -s 2.0 ``` Set minimum contig length for prophage scanning: ```bash jaeger predict -p -i genome.fna -o output_dir --lc 100000 ``` The `plots/` directory contains circular genome visualizations with prophage regions highlighted. --- ## Command-line reference ### Main commands ``` jaeger [COMMAND] [OPTIONS] ``` | Command | Purpose | |---------|---------| | `jaeger predict` | Run phage detection on a FASTA file | | `jaeger health` | Check installation and hardware | | `jaeger download` | Download pre-trained models | | `jaeger register-models` | Register custom model directories | | `jaeger train` | Train new models from scratch | | `jaeger utils` | Auxiliary tools (see below) | | `jaeger taxonomy` | Experimental taxonomy prediction | ### `jaeger predict` full help ```` Usage: jaeger predict [OPTIONS] Runs Jaeger on a dataset Options: -i, --input PATH Path to input file [required] -o, --output TEXT Path to output directory [required] --fsize INTEGER Length of the sliding window [default: 2000] --stride INTEGER The gap between two the sliding windows (stride==fsize) [default: 2000] -m, --model TEXT Select a deep-learning model to use. [default: default] --model_path TEXT Give the path to a model. overrides --model --config PATH Path to Jaeger config file (e.g., when using Apptainer or Docker) -p, --prophage Extract and report prophage-like regions -s, --sensitivity FLOAT Sensitivity of the prophage extraction algorithm (0-4) [default: 1.5] --lc INTEGER Minimum contig length for prophage extraction [default: 500000] --rc FLOAT Minimum reliability score required to accept predictions [default: 0.1] --pc INTEGER Minimum phage score required to accept predictions [default: 3] --batch INTEGER Parallel batch size, lower if GPU runs out of memory [default: 96] --workers INTEGER Number of threads to use [default: 4] --getalllogits Writes window-wise scores to a .npy file --getsequences Writes the putative phage sequences to a .fasta file --cpu Ignore available GPUs and explicitly run on CPU --physicalid INTEGER Set default GPU device ID for multi-GPU systems [default: 0] --mem INTEGER GPU memory limit [default: 4] --getalllabels Get predicted labels for Non-Viral contigs -v, --verbose Verbosity level: -vv debug, -v info [default: 1] -f, --overwrite Overwrite existing files --help Show this message and exit. ```` ### Utility commands (`jaeger utils`) | Subcommand | Purpose | |------------|---------| | `jaeger utils dataset` | Generate non-redundant fragment databases for training | | `jaeger utils fragment` | Simulate metagenome assemblies from genomes | | `jaeger utils convert` | Convert between CSV and FASTA formats | | `jaeger utils mask` | Mask or mutate positions in FASTA files | | `jaeger utils ood-data` | Generate out-of-distribution (shuffled) sequences | | `jaeger utils combine-models` | Combine multiple models into an ensemble | | `jaeger utils stats` | Calculate statistics from Jaeger output | --- ## Python integration > **Note:** The Python API (`jaeger.api`) is currently experimental and not available in the latest release. For programmatic access, use the CLI via `subprocess`: > > ```python > import subprocess > subprocess.run([ > "jaeger", "predict", > "-i", "input.fasta", > "-o", "output_dir", > "--batch", "128" > ]) > ```