bgzf-randreader is a BGZF reader supports random access relative to uncompressed data.
Suppose we have a BGZF file compressed from some text:
$ echo 'The quick brown fox jumps over the lazy dog' | bgzip > test.gz
We can random access any part of it via RandomAccessBgzFile
without decompression:
RandomAccessBgzFile file = new RandomAccessBgzFile(new File("test.gz"));
try {
byte[] b = new byte[5];
file.seek(4);
file.read(b);
System.out.println(new String(b)); // outputs: quick
} finally {
file.close(); // always close it, prevent memory leak
}
To use bgzf-randreader in Maven-based projects, use following dependency:
<dependency>
<groupId>com.vivimice</groupId>
<artifactId>bgzf-randreader</artifactId>
<version>1.1.1</version>
</dependency>
BGZF is a GZip compatible compression format. It is a block compression implemented on top of the standard gzip file format.
BGZF file can be generated from existing gzip file or any uncompressed data by bgzip
utility. Any gzip compatible utility (like gunzip
, zcat
, zgrep
, GZIPInputStream
, etc.) can decompress BGZF compressed file.
On debian/Ubuntu, bgzip
utility is included in tabix
package.
More about BGZF: http://samtools.github.io/hts-specs/SAMv1.pdf