TERL: classification of transposable elements by convolutional neural networks

Registro completo de metadados
MetadadosDescriçãoIdioma
Autor(es): dc.contributorFederal University of Technology - Parana (UTFPR)-
Autor(es): dc.contributorUniversidade Estadual Paulista (UNESP)-
Autor(es): dc.contributorUniversidade de São Paulo (USP)-
Autor(es): dc.contributorEuripides Soares da Rocha University of Marilia-
Autor(es): dc.contributorUniversidade Estadual de Campinas (UNICAMP)-
Autor(es): dc.creatorda Cruz, Murilo Horacio Pereira-
Autor(es): dc.creatorDomingues, Douglas Silva [UNESP]-
Autor(es): dc.creatorSaito, Priscila Tiemi Maeda-
Autor(es): dc.creatorPaschoal, Alexandre Rossi-
Autor(es): dc.creatorBugatti, Pedro Henrique-
Data de aceite: dc.date.accessioned2022-08-04T22:09:41Z-
Data de disponibilização: dc.date.available2022-08-04T22:09:41Z-
Data de envio: dc.date.issued2022-04-28-
Data de envio: dc.date.issued2022-04-28-
Data de envio: dc.date.issued2021-05-20-
Fonte completa do material: dc.identifierhttp://dx.doi.org/10.1093/bib/bbaa185-
Fonte completa do material: dc.identifierhttp://hdl.handle.net/11449/221755-
Fonte: dc.identifier.urihttp://educapes.capes.gov.br/handle/11449/221755-
Descrição: dc.descriptionTransposable elements (TEs) are the most represented sequences occurring in eukaryotic genomes. Few methods provide the classification of these sequences into deeper levels, such as superfamily level, which could provide useful and detailed information about these sequences. Most methods that classify TE sequences use handcrafted features such as k-mers and homology-based search, which could be inefficient for classifying non-homologous sequences. Here we propose an approach, called transposable elements pepresentation learner (TERL), that preprocesses and transforms one-dimensional sequences into two-dimensional space data (i.e., image-like data of the sequences) and apply it to deep convolutional neural networks. This classification method tries to learn the best representation of the input data to classify it correctly. We have conducted six experiments to test the performance of TERL against other methods. Our approach obtained macro mean accuracies and F1-score of 96.4% and 85.8% for superfamilies and 95.7% and 91.5% for the order sequences from RepBase, respectively. We have also obtained macro mean accuracies and F1-score of 95.0% and 70.6% for sequences from seven databases into superfamily level and 89.3% and 73.9% for the order level, respectively. We surpassed accuracy, recall and specificity obtained by other methods on the experiment with the classification of order level sequences from seven databases and surpassed by far the time elapsed of any other method for all experiments. Therefore, TERL can learn how to predict any hierarchical level of the TEs classification system and is about 20 times and three orders of magnitude faster than TEclass and PASTEC, respectively https://github.com/muriloHoracio/TERL. Contact:murilocruz@alunos.utfpr.edu.br.-
Descrição: dc.descriptionFederal University of Technology - Parana (UTFPR)-
Descrição: dc.descriptionBioinformatics Graduation Program (PPGBIOINFO) Department of Computer Science Federal University of Technology - Parana (UTFPR)-
Descrição: dc.descriptionSão Paulo State University at Botucatu-
Descrição: dc.descriptionUniversity of São Paulo-
Descrição: dc.descriptionDepartment of Biodiversity São Paulo State University at Rio Claro-
Descrição: dc.descriptionEuripides Soares da Rocha University of Marilia-
Descrição: dc.descriptionUniversity of São Paulo (ICMC-USP)-
Descrição: dc.descriptionUniversity of Campinas (IC-UNICAMP)-
Descrição: dc.descriptionDepartment of Computing Federal University of Technology - Parana (UTFPR)-
Descrição: dc.descriptionSão Paulo State University at Botucatu-
Descrição: dc.descriptionDepartment of Biodiversity São Paulo State University at Rio Claro-
Idioma: dc.languageen-
Relação: dc.relationBriefings in bioinformatics-
???dc.source???: dc.sourceScopus-
Palavras-chave: dc.subjectconvolutional neural networks-
Palavras-chave: dc.subjectdeep learning-
Palavras-chave: dc.subjectrepresentation learning-
Palavras-chave: dc.subjectsequence classification-
Palavras-chave: dc.subjecttransposable elements-
Título: dc.titleTERL: classification of transposable elements by convolutional neural networks-
Tipo de arquivo: dc.typelivro digital-
Aparece nas coleções:Repositório Institucional - Unesp

Não existem arquivos associados a este item.