A Simple Mechanism for Focused Web-harvesting - Computer Science > Information RetrievalReport as inadecuate




A Simple Mechanism for Focused Web-harvesting - Computer Science > Information Retrieval - Download this document for free, or read online. Document in PDF available to download.

Abstract: The focused web-harvesting is deployed to realize an automated andcomprehensive index databases as an alternative way for virtual topical dataintegration. The web-harvesting has been implemented and extended by not onlyspecifying the targeted URLs, but also predefining human-edited harvestingparameters to improve the speed and accuracy. The harvesting parameter setcomprises three main components. First, the depth-scale of being harvestedfinal pages containing desired information counted from the first page at thetargeted URLs. Secondly, the focus-point number to determine the exact boxcontaining relevant information. Lastly, the combination of keywords torecognize encountered hyperlinks of relevant images or full-texts embedded inthose final pages. All parameters are accessible and fully customizable foreach target by the administrators of participating institutions over anintegrated web interface. A real implementation to the Indonesian ScientificIndex which covers all scientific information across Indonesia is also brieflyintroduced.



Author: Z. Akbar, L.T. Handoko

Source: https://arxiv.org/







Related documents