Strategies for automatic determination of similarity threshold for genre-aware focused crawling processes.

Registro completo de metadados
MetadadosDescriçãoIdioma
Autor(es): dc.creatorSiqueira, Gustavo Oliveira de-
Autor(es): dc.creatorAssis, Guilherme Tavares de-
Autor(es): dc.creatorFerreira, Anderson Almeida-
Autor(es): dc.creatorMangaravite, Vítor-
Autor(es): dc.creatorPádua, Flávio Luis Cardeal-
Data de aceite: dc.date.accessioned2025-08-21T15:55:16Z-
Data de disponibilização: dc.date.available2025-08-21T15:55:16Z-
Data de envio: dc.date.issued2018-10-15-
Data de envio: dc.date.issued2018-10-15-
Data de envio: dc.date.issued2017-
Fonte completa do material: dc.identifierhttp://www.repositorio.ufop.br/handle/123456789/10363-
Fonte completa do material: dc.identifierhttp://www.iadisportal.org/ijwi/papers/2017151102.pdf-
Fonte: dc.identifier.urihttp://educapes.capes.gov.br/handle/capes/1027494-
Descrição: dc.descriptionThe great popularity and, specially, the fast Web growth have led to the proposal and analysis of new techniques for helping users to locate effectively the needed information in a satisfactory time, without much difficulty. Traditional crawlers are not capable to identify relevant sub-spaces on Web related to a specific theme; however, focused crawlers are capable to solve, effectively and efficiently, the mentioned problem. Usually, a focused crawler process requires a specific value, called similarity threshold value, for determining whether a crawled Web page is relevant or not according to a topic of interest; such value is distinct for each specific topic. In order to determine automatically such a value for focused crawlers related to a genre-aware approach, we propose three strategies in this work. Our experimental evaluation achieved, as the best result, 100% of precision and 98% of F1, considering a specific crawling process for which it was determined automatically a similarity threshold value: a great result compared with the baseline.-
Formato: dc.formatapplication/pdf-
Idioma: dc.languageen-
Direitos: dc.rightsrestrito-
Palavras-chave: dc.subjectSimilarity threshold-
Palavras-chave: dc.subjectWeb crawling-
Palavras-chave: dc.subjectFocused crawling-
Título: dc.titleStrategies for automatic determination of similarity threshold for genre-aware focused crawling processes.-
Aparece nas coleções:Repositório Institucional - UFOP

Não existem arquivos associados a este item.