Bio::DB::GFF::Adaptor::berkeleydb -- Bio::DB::GFF database adaptor for in-memory
my $db = Bio::DB::GFF->new(-adaptor=> 'berkeleydb',
-create => 1, # on initial build you need this
-dsn => '/usr/local/share/gff/dmel');
# initialize an empty database, then load GFF and FASTA files
# do queries
my $segment = $db->segment(Chromosome => '1R');
my $subseg = $segment->subseq(5000,6000);
my @features = $subseg->features('gene');
See Bio::DB::GFF for other methods.
This adaptor implements a berkeleydb-indexed version of Bio::DB::GFF. It
requires the DB_File and Storable modules. It can be used to store and
retrieve short to medium-length GFF files of several million features in
Use Bio::DB::GFF-> new()
to construct new instances of this class.
Three named arguments are recommended:
-adaptor Set to "berkeleydb" to create an instance of this class.
-dsn Path to directory where the database index files will be stored (alias -db)
-autoindex Monitor the indicated directory path for FASTA and GFF files, and update the
indexes automatically if they change (alias -dir)
-write Set to a true value in order to update the database.
-create Set to a true value to create the database the first time
-tmp Location of temporary directory for storing intermediate files
during certain queries.
-preferred_groups Specify the grouping tag. See L<Bio::DB::GFF>
The -dsn argument selects the directory in which to store the database index
files. If the directory does not exist it will be created automatically,
provided that the current process has sufficient privileges. If no -dsn
argument is specified, a database named "test" will be created in
your system's temporary files directory.
The -tmp argument specifies the temporary directory to use for storing
intermediate search results. If not specified, your system's temporary files
directory will be used. On Unix systems, the TMPDIR environment variable is
honored. Note that some queries can require a lot of space.
The -autoindex argument, if present, selects a directory to be monitored for GFF
and FASTA files (which can be compressed with the gzip program if desired).
Whenever any file in this directory is changed, the index files will be
updated. Note that the indexing can take a long time to run: anywhere from 5
to 10 minutes for a million features. An alias for this argument is -dir,
which gives this adaptor a similar flavor to the "memory" adaptor.
-dsn and -dir can point to the same directory. If -dir is given but -dsn is
absent the index files will be stored into the directory containing the source
files. For autoindexing to work, you must specify the same -dir path each time
you open the database.
If you do not choose autoindexing, then you will want to load the database using
the bp_load_gff.pl command-line tool. For example:
bp_load_gff.pl -a berkeleydb -c -d /usr/local/share/gff/dmel dna1.fa dna2.fa features.gff
See Bio::DB::GFF for inherited methods
The various get_Stream_* methods and the features()
method with the
-iterator argument only return an iterator after the query runs completely and
the module has been able to generate a temporary results file on disk. This
means that iteration is not as big a win as it is for the relational-database
Like the dbi::mysqlopt adaptor, this module uses a binning scheme to speed up
range-based searches. The binning scheme used here imposes a hard-coded 1
gigabase (1000 Mbase) limit on the size of the largest chromosome or other
Vsevolod (Simon) Ilyushchenko >firstname.lastname@example.org< Lincoln Stein
Copyright (c) 2005 Cold Spring Harbor Laboratory.
This library is free software; you can redistribute it and/or modify it under
the same terms as Perl itself.
Title : _feature_by_name
Usage : $db->get_features_by_name($class,$name,$callback)
Function: get a list of features by name and class
Returns : count of number of features retrieved
Args : name of feature, class of feature, and a callback
Status : protected
This method is used internally. The callback arguments are those used by