cd-hit-para.pl - divide a big clustering job into pieces to run cd-hit or
- This script divide a big clustering job into pieces and
submit jobs to remote computers over a network to make it parallel. After
all the jobs finished, the script merge the clustering results as if you
just run a single cd-hit or cd-hit-est.
- You can also use it to divide big jobs on a single computer
if your computer does not have enough RAM (with -L option).
- 1 When run this script over a network, the directory where
- run the scripts and the input files must be available on
all the remote hosts with identical path.
- 2 If you choose "ssh" to submit jobs, you have to
- passwordless ssh to any remote host, see ssh manual to know
how to set up passwordless ssh.
- 3 I suggest to use queuing system instead of ssh,
- I currently support PBS and SGE
- 4 cd-hit cd-hit-2d cd-hit-est cd-hit-est-2d
- cd-hit-div cd-hit-div.pl must be in same directory where
this script is in.
input filename in fasta format, required
output filename, required
program, "cd-hit" or "cd-hit-est", default
filename of list of hosts,
- requred unless -Q or -L option is
number of cpus on local computer, default 0
- when you are not running it over a cluster, you can use
this option to divide a big clustering jobs into small pieces, I suggest
you just use "--L 1" unless you have enough RAM for each
Number of segments to split input DB into, default 64
number of jobs to submit to queue queuing system, default 0
- by default, the program use ssh mode to submit remote
type of queuing system, "PBS", "SGE" are supported,
restart file, used after a crash of run
print this help
More cd-hit/cd-hit-est options can be speicified in command line
- Questions, bugs, contact Weizhong Li at email@example.com