Bio::DB::Qual - Fast indexed access to quality files

  use Bio::DB::Qual;
  # create database from directory of qual files
  my $db      = Bio::DB::Qual->new('/path/to/qual/files/');
  my @ids     = $db->get_all_primary_ids;
  # Simple access
  my @qualarr = @{$db->qual('CHROMOSOME_I',4_000_000 => 4_100_000)};
  my @revqual = @{$db->qual('CHROMOSOME_I',4_100_000 => 4_000_000)};
  my $length  = $db->length('CHROMOSOME_I');
  my $header  = $db->header('CHROMOSOME_I');
  # Access to sequence objects. See Bio::PrimarySeqI.
  my $obj     = $db->get_Qual_by_id('CHROMOSOME_I');
  my @qual    = @{$obj->qual};
  my @subqual = @{$obj->subqual(4_000_000 => 4_100_000)};
  my $length  = $obj->length;
  # Loop through sequence objects
  my $stream  = $db->get_PrimarySeq_stream;
  while (my $qual = $stream->next_seq) {
    # Bio::Seq::PrimaryQual operations
  # Filehandle access
  my $fh = Bio::DB::Qual->newFh('/path/to/qual/files/');
  while (my $qual = <$fh>) {
    # Bio::Seq::PrimaryQual operations
  # Tied hash access
  tie %qualities,'Bio::DB::Qual','/path/to/qual/files/';
  print $qualities{'CHROMOSOME_I:1,20000'};


Bio::DB::Qual provides indexed access to a single Fasta file, several files, or a directory of files. It provides random access to each quality score entry without having to read the file from the beginning. Access to subqualities (portions of a quality score) is provided, although contrary to Bio::DB::Fasta, the full quality score has to be brought in memory. Bio::DB::Qual is based on Bio::DB::IndexedBase. See this module's documentation for details.
The qual files should contain decimal quality scores. Entries may have any line length up to 65,536 characters, and different line lengths are allowed in the same file. However, within a quality score entry, all lines must be the same length except for the last. An error will be thrown if this is not the case.
The module uses /^>(\S+)/ to extract the primary ID of each quality score from the qual header. See -makeid in Bio::DB::IndexedBase to pass a callback routine to reversibly modify this primary ID, e.g. if you wish to extract a specific portion of the gi|gb|abc|xyz GenBank IDs.


The object-oriented constructor is new(), the filehandle constructor is newFh() and the tied hash constructor is tie(). They all allow to index a single Fasta file, several files, or a directory of files. See Bio::DB::IndexedBase.




When a quality score is deleted from one of the qual files, this deletion is not detected by the module and removed from the index. As a result, a "ghost" entry will remain in the index and will return garbage results if accessed. Currently, the only way to accommodate deletions is to rebuild the entire index, either by deleting it manually, or by passing -reindex=>1 to new() when initializing the module.
