Emulates BAM index so that we can request chunks of records from SRAFileReader
Here is how it works:
SRA allows reading of alignments by Reference position fast, so we divide our "file" range for alignments as
a length of all references. Reading unaligned reads is then fast if we use read positions for lookup and (internally)
filter out aligned fragments.
Total SRA "file" range is calculated as sum of all reference lengths plus number of reads (both aligned and unaligned)
in SRA archive.
Now, we can use Chunks to lookup for aligned and unaligned fragments.
We emulate BAM index bins by mapping SRA reference positions to bin numbers.
And then we map from bin number to list of chunks, which represent SRA "file" positions (which are simply reference
We only emulate last level of BAM index bins (and they refer to a portion of reference SRA_BIN_SIZE bases long).
For all other bins RuntimeException will be returned (but since nobody else creates bins, except SRAIndex class
that is fine).
But since the last level of bins was not meant to refer to fragments that only partially overlap bin reference
positions, we also return chunk that goes 5000 bases left before beginning of the bin to assure fragments that
start before the bin positions but still overlap with it can be retrieved by SRA reader.
Later we will add support to NGS API to get a maximum number of bases that we need to go left to retrieve such fragments.
Created by andrii.nikitiuk on 9/4/15.
Gets the compressed chunks which should be searched for the contents of records contained by the span
referenceIndex:startPos-endPos, inclusive. See the BAM spec for more information on how a chunk is