gene ontology prediction


The collected GO annotations are still quite incomplete, imbalanced, and rather shallow (Rhee et al., 2008; Thomas et al., 2012; Dessimoz and kunca, 2017). (2013). (2002).

Metrics for GO based protein semantic similarity: a systematic evaluation. doi: 10.1093/nar/gkw1108, Thomas, P. D., Mi, H., and Lewis, S. (2007). First, the GO annotations of genes are still incomplete, shallow, imbalanced across species and even noisy (Thomas et al., 2012; Dessimoz and kunca, 2017). NTEL assumed a gene is a document and all terms affiliated with that gene are words of that document; then it used a Latent Dirichlet Allocation topic model (Blei et al., 2003) to select negative examples. To solve these problems, Zhao et al. Figure 1. (2013b, 2015a) further selectively fused multiple functional networks for gene function prediction. First, we introduce the conventions of GO and the widely adopted evaluation metrics for gene function prediction.

Bioinformatics 17, 238249.

Twin Cities: Department of Computer Science and Engineering; University of Minnesota. Bioinformatics 29, 11901198. Bioinformatics 16, 396406.

RankingLoss evaluates the average fraction of GO-term pairs that are incorrectly ranked. Ontology annotation: mapping genomic regions to biological function. Hashing with graphs? in Proceedings of the 28th International Conference on Machine Learning (Bellevue, WA), 18. As more evidence of gene functions is accumulated from experiments, the gene function prediction solutions will become more competent. IEEE/ACM Trans. Accurate quantification of functional analogy among close homologs. Bioinformatics 29, 14241432. Deepgoplus: improved protein function prediction from sequence. Predicting gene function from patterns of annotation. Learn. Section 5 concludes the survey. where IC(t) is the information content of the term t, which estimates a term's specificity by its frequency of annotation to genes (Lin, 1998). Buza (2008) estimated the annotation quality with respect to terms in BPO via a rank of evidence codes.

Zhao et al.

(2017c). 10:361. doi: 10.1186/s12918-016-0361-5. doi: 10.1371/journal.pone.0003420, Makrodimitris, S., van Ham, R. C., and Reinders, M. J. On the use of gene ontology annotations to assess functional similarity among orthologs and paralogs: a short report. Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. GO now includes more than 45,000 GO terms, and most GO annotations of genes are sparse and incomplete. ITSS (Tao et al., 2007), dRW (Yu et al., 2015d), HashGO (Yu et al., 2017e), HPHash (Zhao et al., 2019a), and NMFGO (Yu et al., 2020b) are some representative methods introduced in sections 3.1.2, 3.2.2.

Front. To address the last issue, some efforts have been made toward compressing these terms before measuring the semantic similarity (Done et al., 2010; Yu et al., 2017e, 2020b; Zhao et al., 2019a); these were reviewed in previous subsections. (2019). No use, distribution or reproduction is permitted which does not comply with these terms. Measures of the similarity between genes can be extended from taxonomic similarity measures between GO terms. Biol. doi: 10.1007/978-1-4939-3743-1, Done, B., Khatri, P., Done, A., and Draghici, S. (2010). (2001). But this inter-species method only considered a small number of GO terms. Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature. 34(Suppl.

To bridge this gap, we review the existing methods with an emphasis on recent solutions. IEEE/ACM Trans. BMC Bioinformatics 9:327. doi: 10.1186/1471-2105-9-327, Mitrofanova, A., Pavlovic, V., and Mishra, B. Three issues in gene function prediction (left), and categorization of existing computational solutions based on GO (right). TreeFam: a curated database of phylogenetic trees of animal gene families. Clark and Radivojac (2011) investigated the quality of NAS and IEA annotations, and found IEA annotations were much more reliable than NAS ones in MFO branch. The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. Biol. gene annotations ontology percentages

36:e12. In fact, gene function prediction relies on the known positive and negative annotations of a gene, but conventionally only the positive annotations of genes are reported and, thus, recorded in GO. Biol.

Bioinformatics 23, i529?i538. Kulmanov et al.

Northwestern Polytechnical University, China. A tutorial on multilabel learning. As such, predicting the associations between genes and massive terms is rather difficult. More advanced integrative solutions must integrate these heterogeneous biological data and the GO knowledge more effectively. Yu et al. Lin, D. (1998). where par(t) denotes the parent term of term t, gpar(t) is the grandparent term of t, and uncle(t) is the uncle (parent's sibling) term of t. p(t|par(t)) is the conditional probability that a gene is annotated with t given this gene is already annotated with par(t). (2012). Network medicine: a network-based approach to human disease. (2013). Genome Biol.

LncRNAdisease: a database for long-non-coding RNA-associated diseases.

(2008) assumed that the negative examples of a given term were all genes not annotated with that term. Selective matrix factorization for multi-relational data fusion,?

Hierarchical ensemble methods for protein function prediction. (2016a) proposed a negative GO annotations selection approach (NegGOA) that leveraged GO hierarchy, random walks, and co-occurrence patterns of annotations to select negative examples of a gene.

Jones et al. PLoS Comput. 3, 9931022.

Bioinformatics 17, 193203. Given the complexity of gene function prediction, these metrics aim to evaluate the performance from different aspects (Radivojac et al., 2013; Jiang et al., 2016). PLoS ONE 3:e3420.

Based on the target tasks, we further divide those two methods into three subtypes based on whether they predict missing, noisy or negative annotations of genes, as listed in Table 1.

U.S.A. 101, 28882893. Liu, X., Yu, G., Domeniconi, C., Wang, J., Ren, Y., and Guo, M. (2019).

doi: 10.1093/bioinformatics/btt160, The Gene Ontology Consortium (2017). Each GO annotation is tagged with one or more evidence codes, which state the type of evidence (or source) from which the annotation is collected. doi: 10.1038/nmeth.2340, Raychaudhuri, S., Chang, J. T., Sutphin, P. D., and Altman, R. B. Experimental study has demonstrated that NegGOA suffered less from incomplete annotations than NETL or SNOB, and that the selected negative examples improved the performance of gene function prediction.

where m() is the number of genes, which have at least one predicted score . TPi counts the number of true positive predictions, FPi is the number of false positive predictions and FNi counts the number of false negative predictions for gene i. Smin utilizes information theoretic analogs based on the GO hierarchy to evaluate the minimum semantic distance between the predictions and ground-truths across all possible thresholds (Jiang et al., 2014). 9:S2. Except IEA, all other evidence codes are curated by curators. Kahanda and Ben-Hur (2017) proposed a structured output solution that adopted a structural kernel function. doi: 10.1093/nar/gkn276, Zhou, N., Jiang, Y., Bergquist, T. R., Lee, A. J., Kacsoh, B. Bioinformatics 2, 330338. doi: 10.1109/TCBB.2005.50, Shehu, A., Barbar, D., and Molloy, K. (2016). doi: 10.1093/bioinformatics/btv590, Mazandu, G. K., Chimusa, E. R., and Mulder, N. J. J. Mach.

Fmax is the overall maximum harmonic mean of precision and recall across all possible thresholds on the predicted gene-term association matrix (Jiang et al., 2016). Genet. Nat.

It is difficult to give a pure categorization of GO-based gene function prediction solutions since there are always overlaps. Comput. Biol.

Data Eng. Proteomics 13, 130142. The number of published papers related to GO-based gene function prediction over 10 years. Gene ontology term overlap as a measure of gene functional similarity. 17:184. doi: 10.1186/s13059-016-1037-6, Jones, C. E., Brown, A. L., and Baumann, A. U. Hierarchical multi-label prediction of gene function. The key task has shifted from collecting such data to analyzing the data with a unified functional description scheme. Methods Mol. Kernel-based data fusion and its application to protein function prediction in yeast? in Pacific Symposium on Biocomputing (Hawaii: World Scientific), 300311.

doi: 10.1186/gb-2008-9-s1-s6.

GOEAST: a web-based software toolkit for gene ontology enrichment analysis. GigaScience 3, 2047217X. (2018) recently presented the GOLabeler, which separately trained five different classifiers from five different feature descriptors on sequence data, and then combined these classifiers to make a prediction.

Therefore, we give a comprehensive review of GO-based gene function prediction methods ( categorized in Figure 3). (2016) used hash tables to store essential information learned from GO DAG and to efficiently compute the semantic similarity of genes. doi: 10.1093/bioinformatics/bty751, Mazandu, G. K., Chimusa, E. R., Mbiyavanga, M., and Mulder, N. J.

doi: 10.1093/nar/gks1099, Chen, W.-H., Zhao, X.-M., van Noort, V., and Bork, P. (2013). 88, 209241. A network of protein-protein interactions in yeast. For example, GO:0043473 represents the pigmentation, and GO:0048066 describes the developmental pigmentation; the two terms are connected by a line with I, which means that the developmental pigmentation is a subtype of pigmentation. Therefore, we focus on function prediction methods using GO. doi: 10.1038/nrg2363, Ruepp, A., Zollner, A., Maier, D., Albermann, K., Hani, J., Mokrejs, M., et al. doi: 10.1186/1471-2105-9-S5-S4, Pesquita, C., Faria, D., Falcao, A. O., Lord, P., and Couto, F. M. (2009). Bioinformatics 19, 12751283. A.

These solutions demonstrate that compressing GO terms improves accuracy and may even boost efficiency (Wang et al., 2015; Yu et al., 2017e; Zhao et al., 2019a). Park et al. (2016) proposed a novel model (NoisyGOA) that measured the taxonomic similarity between ontological terms using the GO hierarchy and the semantic similarity between genes using annotations. maize drought genomic genetic yield genome flowering analyses prediction enrichment ontology functional ontology acyclic graphs minimal genomes eukaryotic consistent prediction applied nodes protein data
Page not found - Supermarché Utile ARRAS
Sélectionner une page

Aucun résultat

La page demandée est introuvable. Essayez d'affiner votre recherche ou utilisez le panneau de navigation ci-dessus pour localiser l'article.