dnaindex - index dna file for use with ANFO
dnaindex [ option ... ]
builds an index for a dna file. Dna files must be indexed to be
useable with anfo
(1), it is possible to have multiple indices for the
same dna file.
- -V, --version
- Print version number and exit.
- -o file, --output file
- Write output to file. file customarily ends
in .idx. Default is genomename_wordsize.idx.
- -g file, --genome file
- Read the genome from file. This file name is also
stored in the resulting index so it can be found automatically whenever
the index is used. It is therefore best if file is just a file name
- -G dir, --genome-dir dir
- Add dir to the genome search path. This is useful if
the genome to be indexed is not yet in the place where it will later be
- -d text, --description text
- Add text as description to the index. This is purely
- -s size, --wordsize size
- Set the wordsize to size. A smaller wordsize
increases precision at the expense of higher computational investment. The
default is 12, which with a stride of 8 yields a good compromise.
- -S num, --stride num
- Set the stride to num. Only one out of num
possible words of dna is actually indexed. A smaller stride increases
precicion at the expense of a bigger index. The default is 8, which in
conjunction with a wordsize of 12 yields a good compromise.
- -l lim, --limit lim
- Prevents the indexing of words that occur more often than
lim times. This can be used to ignore repetitive seeds and save the
space to store them. A good default depends on the size of the genome
being indexed, something like 500 works for the human genome with wordsize
12 and stride 8.
- -h, --histogram
- Produce a histogram of word frequencies. This can be used
to get an indea how the frequency distribution to select an appropriate
value for --limit.
- -v, --verbose
- Print a progress indicator during operation.
is limited to genomes no longer than 4 gigabases due to its use
of 32 bit indices. The index is quite large, so depending on parameters, a 64
bit platform is needed for genomes in the gigabase range.
If a genome contains IUPAC ambiguity codes, the affected seeds need to be
expanded. If there are many ambiguity codes in a small region, that results in
an unacceptably large index.
- Colon separated list of directories searched for genome
The system wide configuration file for
popt(3). dnaindex identifies itself as "dnaindex" to
Per user configuration file for
Udo Stenzel <firstname.lastname@example.org>