Troubleshooting Guide

This guide covers common errors and how to resolve them.

Quick Reference

Error	Cause	Fix
“Input is not sorted”	Unsorted BED file	Run `grit sort -i file.bed`
“Chromosome not found in genome”	Missing chromosome in genome file	Update genome file or check chromosome names
“Expected at least 3 fields”	Malformed BED line	Check file format (tab-separated)
“start > end”	Invalid interval	Fix coordinates or check for parsing issues
High memory usage	Large file in parallel mode	Use `--streaming` mode

Sorted Input Errors

Error: “Input is not sorted”

Full message:

Error: Input is not sorted: record at line 42 out of order

Fix: Run 'grit sort -i input.bed' first.
Or use '--assume-sorted' if you know the input is sorted.

Cause: Streaming operations require input sorted by chromosome, then by start position.

Solutions:

Sort your file:

grit sort -i input.bed > sorted.bed
grit intersect -a sorted.bed -b other.bed --streaming

Use parallel mode (handles unsorted input):

# Remove --streaming flag
grit intersect -a unsorted.bed -b other.bed > output.bed

Skip validation (only if you’re sure input is sorted):

grit intersect -a input.bed -b other.bed --streaming --assume-sorted

Error: “Chromosome revisited”

Full message:

Error: Chromosome 'chr1' revisited at line 500 (previously seen, then other chromosomes appeared)

Cause: In streaming mode, all intervals for a chromosome must be contiguous. This error means:

chr1    100    200    ✓
chr1    300    400    ✓
chr2    100    200    ✓
chr1    500    600    ✗ ERROR: chr1 after chr2

Solutions:

Sort your file:
```
grit sort -i input.bed > sorted.bed
```

Use parallel mode:

grit intersect -a input.bed -b other.bed > output.bed

Genome File Errors

Error: “Chromosome not found in genome file”

Full message:

Error: Chromosome 'chrUn_gl000220' at line 42 not found in genome file

Cause: The BED file contains a chromosome not listed in the genome file.

Solutions:

Add missing chromosome to genome file:

echo -e "chrUn_gl000220\t168386" >> genome.txt

Filter out unknown chromosomes:

# Get chromosomes from genome file
cut -f1 genome.txt > valid_chroms.txt
grep -f valid_chroms.txt input.bed > filtered.bed

Use a complete genome file:

Human hg38: Download from UCSC

Download chromosome sizes:

mysql --host=genome-mysql.soe.ucsc.edu --user=genome -N -A \
  -e "SELECT chrom,size FROM hg38.chromInfo" > hg38.genome

Error: “Genome file required”

Full message:

Error: genome file is required for this operation

Cause: Commands like slop, complement, and genomecov need chromosome sizes.

Solution: Provide a genome file (tab-separated chromosome and size):

# Create genome file
echo -e "chr1\t248956422\nchr2\t242193529\nchr3\t198295559" > genome.txt

# Use it
grit slop -i input.bed -g genome.txt -b 1000 > extended.bed

BED Format Errors

Error: “Expected at least 3 fields”

Full message:

Error: Parse error at line 5: Expected at least 3 fields, got 2

Cause: BED format requires at least 3 tab-separated fields (chrom, start, end).

Common causes:

Space-separated instead of tab-separated
Missing columns
Header lines without # prefix

Solutions:

Convert spaces to tabs:
```
sed 's/ \+/\t/g' input.txt > input.bed
```

Check file format:

# Show tabs as visible characters
cat -A input.bed | head
# Tabs appear as ^I

Add header comment:

# If your file has a header, prefix with #
sed '1s/^/#/' input.bed > fixed.bed

Error: “start > end”

Full message:

Error: Parse error at line 10: start (500) > end (100)

Cause: BED format requires start ≤ end.

Solutions:

Swap coordinates:

awk -F'\t' 'BEGIN{OFS="\t"} {if($2>$3){t=$2;$2=$3;$3=t} print}' input.bed > fixed.bed

Check for parsing issues (wrong column order):

head input.bed
# Verify: chrom<TAB>start<TAB>end

Error: “Invalid coordinate”

Full message:

Error: Parse error at line 3: invalid coordinate 'abc'

Cause: Non-numeric value in start or end column.

Solutions:

Check for header lines:

# Remove header or add # prefix
tail -n +2 input.bed > no_header.bed
# Or:
sed '1s/^/#/' input.bed > fixed.bed

Find problematic lines:

awk -F'\t' '$2 !~ /^[0-9]+$/ || $3 !~ /^[0-9]+$/' input.bed

Memory Issues

High Memory Usage

Symptom: Process uses excessive RAM, system becomes slow or swaps.

Cause: Large files loaded entirely into memory in parallel mode.

Solutions:

Use streaming mode:

grit intersect -a large.bed -b large.bed --streaming --assume-sorted

Process chromosomes separately:

for chr in chr1 chr2 chr3; do
    grep "^$chr\t" large.bed > ${chr}.bed
    grit merge -i ${chr}.bed > ${chr}_merged.bed
done
cat *_merged.bed > all_merged.bed

Limit threads (reduces memory):

RAYON_NUM_THREADS=2 grit intersect -a a.bed -b b.bed

Out of Memory (OOM)

Symptom: Process killed by system, “out of memory” error.

Solutions:

Switch to streaming mode (see above)

Increase swap space (temporary fix):

# Linux: Add swap file
sudo fallocate -l 8G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

Use a machine with more RAM

Performance Issues

Slow Processing

Possible causes and solutions:

Unsorted input being re-sorted:

# Pre-sort once, reuse
grit sort -i input.bed > sorted.bed

Wrong mode for data size:
- Large files: Use --streaming
- Small files: Use parallel mode (default)

High overlap density:

# Use filtering to reduce output
grit intersect -a a.bed -b b.bed -u  # unique only
grit intersect -a a.bed -b b.bed -f 0.5  # minimum overlap

I/O bottleneck:

# Use SSD if available
# Avoid network filesystems for large files

No Output

Possible causes:

No overlaps exist: Verify intervals actually overlap

# Check chromosome names match
cut -f1 a.bed | sort -u
cut -f1 b.bed | sort -u

Coordinates don’t overlap: Check your data

# Sample from each file
head a.bed b.bed

Filtering too strict:

# Remove fraction requirement to test
grit intersect -a a.bed -b b.bed  # without -f flag

Python-Specific Issues

Import Error: “No module named ‘pygrit’”

Solution: Install the package:

pip install grit-genomics

Note: Package name is grit-genomics, import name is pygrit.

Incorrect Results in Python

Cause: Input files not sorted (Python API uses streaming internally).

Solution: Always sort input files:

import pygrit

# Sort first
pygrit.sort("unsorted.bed", output="sorted.bed")

# Then process
result = pygrit.intersect("sorted.bed", "other_sorted.bed")

TypeError: “expected str, got bytes”

Cause: Passing bytes instead of file path string.

Solution:

# Wrong
pygrit.intersect(b"a.bed", b"b.bed")

# Correct
pygrit.intersect("a.bed", "b.bed")

File Issues

Error: “No such file or directory”

Solutions:

Check file exists:
```
ls -la input.bed
```

Use absolute path:

grit intersect -a /full/path/to/a.bed -b /full/path/to/b.bed

Check permissions:
```
chmod +r input.bed
```

Error: “Permission denied”

Solutions:

Check read permissions:
```
ls -la input.bed
chmod +r input.bed
```

Check output directory is writable:

ls -la output_directory/
chmod +w output_directory/

Getting Help

If your issue isn’t covered here:

Check the documentation:
- STREAMING_MODEL.md - Memory and algorithm details
- PERFORMANCE.md - Performance tuning
- COMMANDS.md - Command reference
Search existing issues:
- GitHub Issues
Open a new issue with:
- GRIT version (grit --version)
- Command that failed
- Full error message
- Sample input (if possible)
- Operating system

Common Fixes Summary

# Fix: Unsorted input
grit sort -i input.bed > sorted.bed

# Fix: Memory issues
grit intersect -a a.bed -b b.bed --streaming --assume-sorted

# Fix: Chromosome naming mismatch
# Check if files use "chr1" vs "1"
cut -f1 a.bed | head
cut -f1 b.bed | head

# Fix: Space-separated file
sed 's/ \+/\t/g' input.txt > input.bed

# Fix: Header in file
sed '1s/^/#/' input.bed > fixed.bed
# Or remove it:
tail -n +2 input.bed > no_header.bed