Efficient set similarity join on multi-attribute data using lightweight filters

Registro completo de metadados
MetadadosDescriçãoIdioma
Autor(es): dc.creatorRibeiro, Leonardo Andrade-
Autor(es): dc.creatorBorges, Felipe Ferreira-
Autor(es): dc.creatorOliveira, Diego-
Data de aceite: dc.date.accessioned2026-02-09T12:01:21Z-
Data de disponibilização: dc.date.available2026-02-09T12:01:21Z-
Data de envio: dc.date.issued2022-05-13-
Data de envio: dc.date.issued2022-05-13-
Data de envio: dc.date.issued2021-09-
Fonte completa do material: dc.identifierhttps://repositorio.ufla.br/handle/1/49940-
Fonte completa do material: dc.identifierhttps://sol.sbc.org.br/journals/index.php/jidm/article/view/1969-
Fonte: dc.identifier.urihttp://educapes.capes.gov.br/handle/capes/1152619-
Descrição: dc.descriptionWe consider the problem of efficiently answering set similarity joins on multi-attribute data. Traditionalset similarity join algorithms assume string data represented by a single set and, thus, miss the opportunity to exploitpredicates over multiple attributes to reduce the number of similarity computations. In this article, we present a frame-work to enhance existing algorithms with additional filters for dealing with multi-attribute data. We then instantiatethis framework with a lightweight filtering technique based on a simple, yet effective data structure, for which exact andprobabilistic implementations are evaluated. In this context, we devise a cost model to identify the best attribute order-ing to reduce processing time. Moreover, alternative approaches are also investigated and a new algorithm combiningkey ideas from previous work is introduced. Finally, we present a thorough experimental evaluation, which demonstratesthat our main proposal is efficient and significantly outperforms competing algorithms.-
Idioma: dc.languageen-
Publicador: dc.publisherBrazilian Computer Society-
Direitos: dc.rightsrestrictAccess-
???dc.source???: dc.sourceJournal of Information and Data Management-
Palavras-chave: dc.subjectAdvanced query processing-
Palavras-chave: dc.subjectData cleaning-
Palavras-chave: dc.subjectData integration-
Palavras-chave: dc.subjectMulti-attribute data-
Palavras-chave: dc.subjectSimilarity join-
Título: dc.titleEfficient set similarity join on multi-attribute data using lightweight filters-
Tipo de arquivo: dc.typeArtigo-
Aparece nas coleções:Repositório Institucional da Universidade Federal de Lavras (RIUFLA)

Não existem arquivos associados a este item.