Man pages sections > man1 > BaitFilter

BaitFilter-v1.0.5 - manual page for BaitFilter-v1.0.5

BAITFILTER-V1.0.5(1) User Commands BAITFILTER-V1.0.5(1)

NAME

BaitFilter-v1.0.5 - manual page for BaitFilter-v1.0.5

DESCRIPTION

USAGE:
./BaitFilter-v1.0.5
-i <string> [-o <string>] [-c <string>] [-m <string>] [--blast-second-hit-evalue <floating point number>] [--blast-first-hit-evalue <floating point number>] [--blast-min-hit-coverage-of-baits-in-tiling-stack <floating point number>] [--ref-blast-db <string>] [--blast-extra-commandline <string>] [--blast-evalue-cutoff <floating point number>] [-B <string>] [-t <positive integer>] [--ID-prefix <string>] [--verbosity <unsigned integer>] [-S] [--] [--version] [-h]
Where:
-i <string>, --input-bait-file-name <string>
(required)
Name of the input bait locus file. This is the bait file
obtained from the Bait-Fisher program.
-o <string>, --output-bait-file-name <string>
Name of the output bait file. This is the file the contains the filtered bait loci and the baits.
-c <string>, --convert <string>
Allows the user to produce the final output file for the bait producing company. In this mode, BaitFilter reads the input bait file and instead of doing a filtering step, it produces a costumn bait file that can be uploaded to the baits producing company. In order to avoid confuction a filtering step cannot be done in the same run as the conversion. If you want to filter a bait file and convert the output, you will need to call this program twice, first to do the filtering and second to do the conversion. Allowed conversion parameters currently are: "Agilent-Germany".
New output formats can be added upon request. Please contact the author: Christoph Mayer, Email: Mayer Christoph <c.mayer.zfmk@uni-bonn.de>
-m <string>, --mode <string>
Appart form the input file this is the most essential option. This option specifies which filter mode Bait-Filter uses. (See the user manual for more details):
"ab":
Retain only the best bait locus for each alignemntfile
(e.g. gene) when using the optimality criterion
to
minimize the total number of required baits.
"as":
Retain only the best bait locus for each alignemntfile
(e.g. gene) when using the optimality criterion
to
maximize the number of sequences the result is based on.
"fb":
Retain only the best bait locus for each feature (e.g. CDS)
when using the optimality criterion
to minimize the total
number of required baits. Only applicable if alignment cutting has been used in Bait-Fisher.
"fs":
Retain only the best bait locus for each feature (e.g. CDS)
when using the optimality criterion
to maximize the number
of sequences the result is based on. Only applicable if alignment cutting has been used in Bait-Fisher.
"blast-a": Remove all bait loci of ALIGNMENTs for which one or more baits have multiple good hits to a reference genome.
"blast-f": Remove all bait loci of FEATUREs for which one or more baits have multiple good hits to a reference genome.
"blast-l": Remove bait LOCI that contain a bait that hos multiple good hits to a reference genome.
"thin-b":
Thin out a bait file to every Nth bait region, by finding
the start position that minimizes the number of baits.
"thin-s":
Thin out a bait file to every Nth bait region, by finding
the start position that maximizes the number of sequences.
--blast-second-hit-evalue <floating point number>
Maximum evalue for the second hit. A bait is characterized to bind ambiguously, if we have two good hits. This option is the evalue threshold for the second hit.
--blast-first-hit-evalue <floating point number>
Maximum evalue for the first hit. A bait is characterized to bind ambiguously, if we have two good hits. This option is the evalue threshold for the first hit.
--blast-min-hit-coverage-of-baits-in-tiling-stack <floating point
number>
Specifies a minimum hit coverage for the primary hit which at least one bait has to have in each tiling stack. Otherwise the bait region is discarded. If not specified, no hit coverage is checked. This parameter can only be used in conjunction with other filters. Since the order in which the coverage filter and the other filters are applied matters, the order is defined as follows: For the mode options: ab, as, fb, fs the coverage is checked before determining the optimal bait region. For the mode options: blast-a, blast-f, blast-l the hit coverage is checked after filtering for baits with multiple good hits to the reference genome.
--ref-blast-db <string>
Base name to a blast data base file. This name is passed to the blast command. This is the name of the fasta file of your reference genome. IMPORTANT: The makeblastdb program has to be called before starting the Bait-Filter program. makeblastdb takes the fasta file and creates data base files out of it.
--blast-extra-commandline <string>
When invoking the blast command, extra commandline parameters can be passed to the blast command. As an example with this option it is possible to specifiy the number of threads the blast command should use.
--blast-evalue-cutoff <floating point number>
When invoking the blast command, a default value of twice the --<blast-first-hit-evalue is used. This should guarantee that all hits necessary for the blast filter are found. However, for the coverage filtering a smaller threshold might be necessary. This can be specified here.
-B <string>, --blast-executable <string>
Name of or path+name to the blast executable. Default: nblast. Minimum version number: Blast+ 2.2.x
-t <positive integer>, --thinning-step-width <positive integer>
Thin out the bait file by retaining only every Nth bait region. This option specified the step width N. If one of the moded thin-b, thin-s is active, this option is required, otherwise it is not allowed to set this parameter.
--ID-prefix <string>
In the conversion mode Agilent-Germany each converted file should get a unique ProdeID prefix, since even among multiple files, ProbeIDs are not allowed to be identical. This this option the user is able to specifiy a prefix string to all probe IDs in this file.
--verbosity <unsigned integer>
The verbosity option controls the amount of information Bait-Filter writes to the console while running. 0: report only error messages that lead to exiting the program. 1: report also wanrings, 2: report also progress, 3: report more detailed progress, >10: debug output. 10000: all possible dignostic output.
-S, --stats
Compute stats for the input file and report these. This mode is automatically used for all modes specified with -m or the conversion mode specified with -c.The purpose of the -S option is to compute stats without having to filter or convert the input file. In particular, the -S mode does not require specifying an output file.
This option has no effect if combined with the -m or -c modes.
--, --ignore_rest
Ignores the rest of the labeled arguments following this flag.
--version
Displays version information and exits.
-h, --help
Displays usage information and exits.
This program can be used to produce the final output file for creating baits, or it can be used to filter bait loci obtained from the Bait-Fisher program according to different criteria. The bait file produced by BaitFisher computes a tiling desing for each possible starting position. The purpuse of BaitFilter is to determine for each target alignment/gene/feature the optimal bait region. As input it requires the bait file generated by the BaitFisher program or a BaitFile generated by a previous filtering run of BaitFilter. This bait file is specified with the -i command line parameter (see below). Furthermore, the user has to specifiy an output file name with the -o parameter and a filter mode with the -m parameter.
To convert a file to a customn output format, see the -c option below.
To compute stats of an input file, see the -S option below.
The different filter modes provided by BaitFilter are the following:
1a) Retain only the best bait locus per alignment file. Criterion: Minimize number of required baits.
1b) Retain only the best bait locus per alignment file. Criterion: Maximize number of sequenences.
2a) Retain only best bait locus per feature (requires that features were selected in Bait-Fisher). Criterion: Minimize number of required baits.
2b) Retain only best bait locus per feature (requires that features were selected in Bait-Fisher). Criterion: Maximize number of sequenences.
3) Use a blast search of the bait sequences against a reference genome to detect putative non-unique target loci. Non unique target sites will have multiple good hits against the reference genome. Furthermore, a minimum coverage of the best blast hit of bait sequence against the genome can be specified. Note that all blast modes require additional command line parameters! These modes remove bait regions for which multiple blast hits where found. Different versions of this mode are available:
3a) If a single bait is not unique, remove all bait regions from the current gene.
3b) If a single bait is not unique, remove all bait regions from the current feature (if applicable).
3c) If a single bait is not unique, remove only the bait region that contains this bait.
4) Thin out the given bait file: Retain only every Nth bait region, where N has to be specified by the user. Two submodes are available:
4a) Thin out bait regions by retaining only every Nth bait region in a bait file. The starting offset will by chosen such that the number of required baits is minimized.
4b) Thin out bait regions by retaining only every Nth bait region in a bait file. The starting offset will by chosen such that the number of sequences the result is baised on is maximized.
./BaitFilter-v1.0.5 version: 1.0.5

SEE ALSO

The full documentation for BaitFilter-v1.0.5 is maintained as a Texinfo manual. If the info and BaitFilter-v1.0.5 programs are properly installed at your site, the command
info BaitFilter-v1.0.5
should give you access to the complete manual.
September 2017 BaitFilter-v1.0.5