intersect

Description

Find overlapping intervals between two BED files. Supports both parallel in-memory mode and streaming mode.

Example Input

cat example_a.bed
chr1	100	200	gene1	100	+
chr1	150	250	gene2	200	-
chr1	400	500	gene3	300	+
chr2	100	300	gene4	400	+
chr2	500	700	gene5	500	-
cat example_b.bed
chr1	120	180	feat1	50	+
chr1	220	280	feat2	60	-
chr1	450	480	feat3	70	+
chr2	150	250	feat4	80	+
chr2	600	650	feat5	90	-

Command

grit intersect -a example_a.bed -b example_b.bed

Output

chr1	120	180	gene1	100	+
chr1	150	180	gene2	200	-
chr1	220	250	gene2	200	-
chr1	450	480	gene3	300	+
chr2	150	250	gene4	400	+
chr2	600	650	gene5	500	-

Options

Flag Description
-a, --file-a Input BED file A
-b, --file-b Input BED file B
--wa Write original A entry
--wb Write original B entry
-u, --unique Only report unique A intervals
-v, --no-overlap Only report A intervals with NO overlap
-f, --fraction Minimum overlap fraction for A
-r, --reciprocal Require reciprocal fraction overlap
-c, --count Report the number of overlaps
--streaming Use streaming mode (constant memory)
--assume-sorted Skip sorted validation
--stats Print streaming statistics

Write Both Entries

grit intersect -a example_a.bed -b example_b.bed --wa --wb

Output:

chr1	100	200	gene1	100	+	chr1	120	180	feat1	50	+
chr1	150	250	gene2	200	-	chr1	120	180	feat1	50	+
chr1	150	250	gene2	200	-	chr1	220	280	feat2	60	-
chr1	400	500	gene3	300	+	chr1	450	480	feat3	70	+
chr2	100	300	gene4	400	+	chr2	150	250	feat4	80	+
chr2	500	700	gene5	500	-	chr2	600	650	feat5	90	-

Count Overlaps

grit intersect -a example_a.bed -b example_b.bed -c

Output:

chr1	100	200	gene1	100	+	1
chr1	150	250	gene2	200	-	2
chr1	400	500	gene3	300	+	1
chr2	100	300	gene4	400	+	1
chr2	500	700	gene5	500	-	1

Streaming Mode

For large files with sorted input:

grit intersect -a example_a.bed -b example_b.bed --streaming --assume-sorted

Notes

  • Streaming mode requires sorted input and uses O(k) memory where k = max overlapping intervals
  • Parallel mode loads files into memory and uses interval trees
  • Use --assume-sorted to skip validation for pre-sorted files