gmt music path-scan - Find signifcantly mutated pathways in a cohort given a
list of somatic mutations.
This document describes gmt music path-scan version 0.04 (2016-01-01 at
gmt music path-scan --gene-covg-dir=? --bam-list=? --pathway-file=? --maf-file=?
--output-file=? [--bmr=?] [--genes-to-ignore=?] [--min-mut-genes-per-path=?]
... music path-scan \
--bam-list input_dir/bam_file_list \
--gene-covg-dir output_dir/gene_covgs/ \
--maf-file input_dir/myMAF.tsv \
--output-file output_dir/sm_pathways \
--pathway-file input_dir/pathway_dbs/KEGG.txt \
- gene-covg-dir Text
- Directory containing per-gene coverage files (Created using
music bmr calc-covg)
- bam-list Text
- Tab delimited list of BAM files [sample_name, normal_bam,
tumor_bam] (See Description)
- pathway-file Text
- Tab-delimited file of pathway information (See
- maf-file Text
- List of mutations using TCGA MAF specifications v2.3
- output-file Text
- Output file that will list the significant pathways and
- bmr Number
- Background mutation rate in the targeted regions
Default value '1e-06' if not specified
- genes-to-ignore Text
- Comma-delimited list of genes whose mutations should be
- min-mut-genes-per-path Number
- Pathways with fewer mutated genes than this, will be
Default value '1' if not specified
- skip-non-coding Boolean
- Skip non-coding mutations from the provided MAF file
Default value 'true' if not specified
- noskip-non-coding Boolean
- Make skip-non-coding 'false'
- skip-silent Boolean
- Skip silent mutations from the provided MAF file
Default value 'true' if not specified
- noskip-silent Boolean
- Make skip-silent 'false'
Only the following four columns in the MAF are used. All other columns may be
Col 1: Hugo_Symbol (Need not be HUGO, but must match gene names used in the pathway file)
Col 2: Entrez_Gene_Id (Matching Entrez ID trump gene name matches between pathway file and MAF)
Col 9: Variant_Classification
Col 16: Tumor_Sample_Barcode (Must match the name in sample-list, or contain it as a substring)
The Entrez_Gene_Id can also be left blank (or set to 0), but it is highly
recommended, in case genes are named differently in the pathway file and the
- This is a tab-delimited file prepared from a pathway
database (such as KEGG), with the columns: [path_id, path_name, class,
gene_line, diseases, drugs, description] The latter three columns are
optional (but are available on KEGG). The gene_line contains the
"entrez_id:gene_name" of all genes involved in this pathway, each
separated by a "|" symbol.
- For example, a line in the pathway-file would look like:
hsa00061 Fatty acid biosynthesis Lipid Metabolism 31:ACACA|32:ACACB|27349:MCAT|2194:FASN|54995:OXSM|55301:OLAH
Ensure that the gene names and entrez IDs used match those used in the MAF
file. Entrez IDs are not mandatory (use a 0 if Entrez ID unknown). But if
a gene name in the MAF does not match any gene name in this file, the
entrez IDs are used to find a match (unless it's a 0).
- This is usually the gene_covgs subdirectory created when
you run "music bmr calc-covg". It should contain files for each
sample that report per-gene covered base counts.
- Provide a file containing sample names and normal/tumor BAM
locations for each. Use the tab- delimited format [sample_name normal_bam
tumor_bam] per line. This tool only needs sample_name, so all other columns
can be skipped. The sample_name must be the same as the tumor sample names
used in the MAF file (16th column, with the header
- The overall background mutation rate. This can be
calculated using "music bmr calc-bmr".
- A comma-delimited list of genes to ignore from the MAF
file. This is useful when there are recurrently mutated genes like TP53
which might mask the significance of other genes.
Michael Wendl, Ph.D.
This module uses reformatted copies of data from the Kyoto Encyclopedia of Genes
and Genomes (KEGG) database:
* KEGG - http://www.genome.jp/kegg/