|Table of Contents|

Benchmarking Fast Homology Search Softwares Based on Model Organisms(PDF)

《南京师大学报(自然科学版)》[ISSN:1001-4616/CN:32-1239/N]

Issue:
2022年02期
Page:
44-51
Research Field:
·生物学·
Publishing date:

Info

Title:
Benchmarking Fast Homology Search Softwares Based on Model Organisms
Author(s):
Wang YinweiWu JingjingZhang ChenningHua YijiaLi PengYan Jie
(School of Life Sciences,Nanjing Normal University,Nanjing 210023,China)
Keywords:
homology searchorthology inferenceRBHfast algorithmssequence comparision
PACS:
Q33
DOI:
10.3969/j.issn.1001-4616.2022.02.006
Abstract:
Blastp in the traditional blast+package has been extremely slow in the era of big data. The development of homology search software has made great progress in the past decade or so,but comprehensive uations are scarce. In this study,a comprehensive comparison between 7 fast homology search softwares and blastp was conducted,and it was found that fast mode in diamond is generally faster than the others and has the lowest false discover rate. In memory consumption,MMseqs2 is the lowest while ghostx is the highest. In terms of the number of identified hits,s7.5 mode in MMseqs2 had the highest number at medium Genomic Similarity Scores(GSS)except blastp,but the s5 model should be a better choice. As GSS decreases,ghostx obtains the most results,while ublast obtains the most results as GSS increases. In terms of the number of identified Reciprocal Best Hits(RBH),ghostx has an advantage in remote search,and this advantage is also supported by synteny evidence. In terms of homology search,there is a large overlap among almost all software,with the exception of ghostx,which has 43.4% additional results and the highest false discovery rate while s3 mode in MMseqs2 has the lowest. Overall,compared to blastp,MMseqs2,diamond and ghostx are the three best alternatives to blastp. Diamond is well suited for orthology inference and can search accurately and quickly in “fast” mode,and “very” is the best search mode on balance,but for remote search,ghostx is more advantageous,while for identification of homologous proteins at medium GSS,s5 mode in MMseqs2 may be a good choice.

References:

[1] CONSORTIUM G O. The Gene Ontology(GO)database and informatics resource[J]. Nucleic acids research,2004,32(suppl_1):D258-D261.
[2]KANEHISA M,GOTO S. KEGG:kyoto encyclopedia of genes and genomes[J]. Nucleic acids research,2000,28(1):27-30.
[3]LI L,STOECKERT C J J R,ROOS D S. OrthoMCL:identification of ortholog groups for eukaryotic genomes[J]. Genome research,2003,13(9):2178-2189.
[4]EMMS D M,KELLY S. OrthoFinder:solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy[J]. Genome biology,2015,16(1):157.
[5]FITCH W M. Homology a personal view on some of the problems[J]. Trends in genetics,2000,16(5):227-231.
[6]KRISTENSEN D M,WOLF Y I,MUSHEGIAN A R,et al. Computational methods for Gene Orthology inference[J]. Briefings in bioinformatics,2011,12(5):379-91.
[7]MORENO-HAGELSIEB G,LATIMER K. Choosing BLAST options for better detection of orthologs as reciprocal best hits[J]. Bioinformatics,2008,24(3):319-324.
[8]WOLF Y I,KOONIN E V. A tight link between orthologs and bidirectional best hits in bacterial and archaeal genomes[J]. Genome biology and evolution,2012,4(12):1286-1294.
[9]WARD N,MORENO-HAGELSIEB G. Quickly finding orthologs as reciprocal best hits with BLAT,LAST,and UBLAST:how much do we miss?[J]. PLoS one,2014,9(7):e101850.
[10]CAMACHO C,COULOURIS G,AVAGYAN V,et al. BLAST+:architecture and applications[J]. BMC bioinformatics,2009,10(1):421.
[11]KIEBASA S M,WAN R,SATO K,et al. Adaptive seeds tame genomic sequence comparison[J]. Genome research,2011,21(3):487-493.
[12]EDGAR R C. Search and clustering orders of magnitude faster than BLAST[J]. Bioinformatics,2010,26(19):2460-2461.
[13]KENT W J. BLAT—the BLAST-like alignment tool[J]. Genome research,2002,12(4):656-664.
[14]MORENO H G,WANG Z,WALSH S,et al. Phylogenomic clustering for selecting non-redundant genomes for comparative genomics[J]. Bioinformatics,2013,29(7):947-949.
[15]SARIPELLA G V,SONNHAMMER E L,FORSLUND K. Benchmarking the next generation of homology inference tools[J]. Bioinformatics,2016,32(17):2636-2641.
[16]BIEGERT A,S?ING J. Sequence context-specific profiles for homology searching[J]. Proceedings of the national academy of science of the United States of America,2009,106(10):3770-3775.
[17]S?ING J. Protein homology detection by HMM-HMM comparison[J]. Bioinformatics,2005,21(7):951-960.
[18]FINN R D,CLEMENTS J,EDDY S R. HMMER web server:interactive sequence similarity searching[J]. Nucleic acids research,2011,39(Web Server issue):W29-W37.
[19]PEARSON W R,LIPMAN D J. Improved tools for biological sequence comparison[J]. Proceedings of the national academy of science of the United States of America,1988,85(8):2444-2448.
[20]HERNáNDEZ S J E,MORENO H G. Progress in quickly finding orthologs as reciprocal best hits:comparing blast,last,diamond and MMseqs2[J]. BMC genomics,2020,21(1):741.
[21]BUCHFINK B,XIE C,HUSON D H. Fast and sensitive protein alignment using DIAMOND[J]. Nature methods,2015,12(1):59-60.
[22]STEINEGGER M,S?ING J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets[J]. Nature biotechnology,2017,35(11):1026-1028.
[23]HAUSWEDELL H,SINGER J,REINERT K. Lambda:the local aligner for massive biological data[J]. Bioinformatics,2014,30(17):i349-i355.
[24]SUZUKI S,KAKUTA M,ISHIDA T,et al. GHOSTX:an improved sequence homology search algorithm using a query suffix array and a database suffix array[J]. PLoS one,2014,9(8):e103833.
[25]DUTILH B E,VAN NOORT V,VAN DER HEIJDEN R T,et al. Assessment of phylogenomic and orthology approaches for phylogenetic inference[J]. Bioinformatics,2007,23(7):815-824.
[26]DESSIMOZ C,GABALDóN T,ROOS D S,et al. Toward community standards in the quest for orthologs[J]. Bioinformatics,2012,28(6):900-904.
[27]JONES P,BINNS D,CHANG H Y,et al. InterProScan 5:genome-scale protein function classification[J]. Bioinformatics,2014,30(9):1236-1240.
[28]GOUGH J. The SUPERFAMILY database in structural genomics[J]. Acta crystallogr section D biology crystallography,2002,58(Pt 11):1897-1900.

Memo

Memo:
-
Last Update: 1900-01-01