最近在捣鼓宏基因组数据分类注释工作,一直在倒腾这个软件,中文资料很少,可能大家都觉得太好用了……确实也不难,个人感觉就是参数设置要多花时间研究研究。
MEGAN的自我介绍
Pasted from: http://ab.inf.uni-tuebingen.de/software/megan6/ MEGAN不仅可以处理宏基因组数据还可以处理宏转录组、宏蛋白质组以及扩增子测序产生的数据。 序列比较是整个一个计算瓶颈。序列比较是整个一个计算瓶颈。
MEGAN的安装
# 共享版
wget -c http://ab.inf.uni-tuebingen.de/data/software/megan6/download/MEGAN_Community_unix_6_5_10.sh
# 旗舰版(免费30天)
wget -c http://ab.inf.uni-tuebingen.de/data/software/megan6/download/MEGAN_Ultimate_unix_6_5_10.sh
chmod +x MEGAN_Community_unix_6_5_10.sh
# New feature: The Ultimate Edition comes with the latest KEGG classification and pathway files. Computomics GmbH is a licensed reseller of KEGG.
# 打开Xming,以图形界面的方式进行安装;
# 下载比对文件
# 在megan/class/resources/files路径下安装以下文件
wget -c http://www-ab2.informatik.uni-tuebingen.de/megan/taxonomy/ncbi.zip
unzip ncbi.zip
rm ncbi.zip
mkdir MEGAN_resource
cd MEGAN-resource
wget -c http://ab.inf.uni-tuebingen.de/data/software/megan6/download/prot_acc2tax-Aug2016.abin.zip
wget -c http://ab.inf.uni-tuebingen.de/data/software/megan6/download/nucl_acc2tax-Aug2016.abin.zip
wget -c http://ab.inf.uni-tuebingen.de/data/software/megan6/download/prot_gi2tax-Aug2016X.bin.zip
wget -c http://ab.inf.uni-tuebingen.de/data/software/megan6/download/nucl_gi2tax-Aug2016.bin.zip
wget -c http://ab.inf.uni-tuebingen.de/data/software/megan6/download/gi2eggnog-June2016X.bin.zip
wget -c http://ab.inf.uni-tuebingen.de/data/software/megan6/download/acc2eggnog-June2016X.abin.zip
wget -c http://ab.inf.uni-tuebingen.de/data/software/megan6/download/gi2interpro-June2016X.bin.zip
wget -c http://ab.inf.uni-tuebingen.de/data/software/megan6/download/acc2interpro-June2016X.abin.zip
wget -c http://ab.inf.uni-tuebingen.de/data/software/megan6/download/gi2seed-May2015X.bin.zip
wget -c http://ab.inf.uni-tuebingen.de/data/software/megan6/download/gi2kegg-Aug2016X-ue.bin.zip
# prot_acc2tax-Aug2016.abin.zip Protein accession to NCBI-taxonomy mapping file. Must be unzipped before use.
# nucl_acc2tax-Aug2016.abin.zip Nucleotide accession to NCBI-taxonomy mapping file. Must be unzipped before use.
# prot_gi2tax-Aug2016X.bin.zip GI to NCBI-taxonomy mapping file. Must be unzipped before use. Please note that NCBI is discontinuing the use of GI numbers.
# nucl_gi2tax-Aug2016.bin.zip GI to NCBI-taxonomy mapping file. Must be unzipped before use. Please note that NCBI is discontinuing the use of GI numbers.
# gi2eggnog-June2016X.bin.zip GI to eggNOG mapping file. Must be unzipped before use.
# acc2eggnog-June2016X.abin.zip Accession to eggNOG mapping file. Must be unzipped before use.
# gi2interpro-June2016X.bin.zip GI to InterPro mapping file. Must be unzipped before use.
# acc2interpro-June2016X.abin.zip Accession to InterPro mapping file. Must be unzipped before use.
# gi2seed-May2015X.bin.zip GI to SEED mapping file. Must be unzipped before use.
# Additional mapping files for the Ultimate Edition:
# gi2kegg-Aug2016X-ue.bin.zip GI to KEGG mapping file. Only for use with the Ultimate Edition of MEGAN. Must be unzipped before use.
LCA parameters设置问题
刚开始我采用默认参数,发现会分到非常非常多物种,大概接近800种,显然不科学。后来我在官方论坛上看了一些关于LCA参数设置的讨论,尝试改变各种参数,分类注释的结果又明显的改变。
The min support and min support percent filters are tricky to use. I would suggest to use min support percent 0.1%, but this also depends on your dataset and the type of question that you are asking. Filtering using min support reduces the spread of taxa that you will see, but of course if your goal is to search for specific low abundance taxa, then you should set this value to something much smaller. Whether you need to use the min complexity filter or not will depend on what tool you used to compute the initial alignments. Blast or DIAMOND do there own complexity filtering, so in these cases using MEGAN’s filter may not be necessary, but on the other hand, it won’t do any harm (except that it will make processing of files take longer). So, in both cases, no 100% straight answer, I’m sorry… 注:min support和min supports percent使用很微妙。建议min support and min support percent设置为0.1%,但是还是要看你的数据集和要解决问题的类型。设置min support去过滤会很大程度上减少分类物种的分散,但是如果你想要探索低丰度的分类,你可能就需要把这个值设置得更小些。
来源:http://megan.informatik.uni-tuebingen.de/t/default-lca-params-different-from-manual/43
In version 5, we had implemented(应用) the LCA of X%. Using that algorithm, a read was placed on the lowest node (“LCA node”) that was above X% of all significant matches for that read. Version 6 does not contain that algorithm. Instead, version 6 contains an algorithm that we believe produces more accurate results, the “weighted LCA”, wLCA. The wLCA first assigns a weight to each reference sequence, which is given by the number of reads that align only to that reference sequence, or to reference sequences that belong to the same taxonomic species. Then, the wLCA places each read above 75% of the total weight of all the references to which it has a significant alignment. Could you please try using the weighted LCA algorithm to see whether this improves over the results obtained by MEGAN 5. I could look into also supporting the old LCA of X% in MEGAN 6, but I am worried that having two similar, but providing multiple variations of LCA will be confusing for most users.
注:wLCA首先会给每一个参考序列赋值权重,给出比对到参考序列的reads数目,或参考序列属于同一分类物种。然后,wLCA会将大于75%的每一条read视为显著性比对。
来源:http://megan.informatik.uni-tuebingen.de/t/help-with-the-lca-parameters/88
Dear Carmen, could you please try using the wLCA with a threshold of 50 %, to see whether you get the same results as with the LCA of 50%? The percent threshold used by the wLCA is currently not exposed directly in the user interface. However, you can access it in the Ultimate Edition of MEGAN; Use the Window-> Command Input menu item to enter the following command: setprop WeightedLCAPercent=50; If changing the percent does prove useful, then I will add it to the user interface.
来源:http://megan.informatik.uni-tuebingen.de/t/help-with-the-lca-parameters/88/3
I discovered another subtle bug in the assignment algorithm, which I have now fixed. The analysis of your data now looks absolutely sensible. Please update to 6.4.3 Here are the parameters that I used: minSupportPercent=0.3 minSupport=1 minScore=170.0 maxExpected=0.0 minPercent Identity=0.0 topPercent=3.0 weightedLCA=true weightedLCAPercent=50.0 minComplexity=-1.0 pairedReads=false useIdentityFilter=true;
来源:http://megan.informatik.uni-tuebingen.de/t/help-with-the-lca-parameters/88/15
参考资料
官方论坛:http://megan.informatik.uni-tuebingen.de/ 说明文档:http://ab.inf.uni-tuebingen.de/data/software/megan6/download/manual.pdf
软件使用视频
下面两个视频是我谷歌搜索到的,感觉有一定的指导作用,转移到墙内了。 MEGAN2使用说明 MEGAN6使用例子