Handling imbalanced datasets through Optimum-Path Forest

Registro completo de metadados
MetadadosDescriçãoIdioma
Autor(es): dc.contributorUniversidade Estadual Paulista (UNESP)-
Autor(es): dc.creatorPassos, Leandro Aparecido-
Autor(es): dc.creatorJodas, Danilo S.-
Autor(es): dc.creatorRibeiro, Luiz C.F.-
Autor(es): dc.creatorAkio, Marco-
Autor(es): dc.creatorde Souza, Andre Nunes-
Autor(es): dc.creatorPapa, João Paulo-
Data de aceite: dc.date.accessioned2025-08-21T22:31:55Z-
Data de disponibilização: dc.date.available2025-08-21T22:31:55Z-
Data de envio: dc.date.issued2022-05-01-
Data de envio: dc.date.issued2022-05-01-
Data de envio: dc.date.issued2022-04-22-
Fonte completa do material: dc.identifierhttp://dx.doi.org/10.1016/j.knosys.2022.108445-
Fonte completa do material: dc.identifierhttp://hdl.handle.net/11449/234201-
Fonte: dc.identifier.urihttp://educapes.capes.gov.br/handle/11449/234201-
Descrição: dc.descriptionIn the last decade, machine learning-based approaches became capable of performing a wide range of complex tasks sometimes better than humans, demanding a fraction of the time. Such an advance is partially due to the exponential growth in the amount of data available, which makes it possible to extract trustworthy real-world information from them. However, such data is generally imbalanced since some phenomena are more likely than others. Such a behavior yields considerable influence on the machine learning model's performance since it becomes biased on the more frequent data it receives. Despite the considerable amount of machine learning methods, a graph-based approach has attracted considerable notoriety due to the outstanding performance over many applications, i.e., the Optimum-Path Forest (OPF). In this paper, we propose three OPF-based strategies to deal with the imbalance problem: the O2PF and the OPF-US, which are novel approaches for oversampling and undersampling, respectively, as well as a hybrid strategy combining both approaches. The paper also introduces a set of variants concerning the strategies mentioned above. Results compared against several state-of-the-art techniques over public and private datasets confirm the robustness of the proposed approaches.-
Descrição: dc.descriptionFundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)-
Descrição: dc.descriptionConselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)-
Descrição: dc.descriptionDepartment of Computing São Paulo State University, Av. Eng. Luiz Edmundo Carrijo Coube, 14-01-
Descrição: dc.descriptionDepartment of Electrical Engineering São Paulo State University, Av. Eng. Luiz Edmundo Carrijo Coube, 14-01-
Descrição: dc.descriptionDepartment of Computing São Paulo State University, Av. Eng. Luiz Edmundo Carrijo Coube, 14-01-
Descrição: dc.descriptionDepartment of Electrical Engineering São Paulo State University, Av. Eng. Luiz Edmundo Carrijo Coube, 14-01-
Descrição: dc.descriptionFAPESP: #2013/07375-0-
Descrição: dc.descriptionFAPESP: #2014/12236-1-
Descrição: dc.descriptionFAPESP: #2017/02286-0-
Descrição: dc.descriptionFAPESP: #2018/21934-5-
Descrição: dc.descriptionFAPESP: #2019/07665-4-
Descrição: dc.descriptionFAPESP: #2019/18287-0-
Descrição: dc.descriptionFAPESP: #2020/12101-0-
Descrição: dc.descriptionCNPq: #307066/2017-7-
Descrição: dc.descriptionCNPq: #427968/2018-6-
Idioma: dc.languageen-
Relação: dc.relationKnowledge-Based Systems-
???dc.source???: dc.sourceScopus-
Palavras-chave: dc.subjectImbalanced data-
Palavras-chave: dc.subjectOptimum-Path Forest-
Palavras-chave: dc.subjectOversampling-
Palavras-chave: dc.subjectUndersampling-
Título: dc.titleHandling imbalanced datasets through Optimum-Path Forest-
Tipo de arquivo: dc.typelivro digital-
Aparece nas coleções:Repositório Institucional - Unesp

Não existem arquivos associados a este item.