Tight and simple Web graph compression - Computer Science > Data Structures and AlgorithmsReport as inadecuate




Tight and simple Web graph compression - Computer Science > Data Structures and Algorithms - Download this document for free, or read online. Document in PDF available to download.

Abstract: Analysing Web graphs has applications in determining page ranks, fighting Webspam, detecting communities and mirror sites, and more. This study is howeverhampered by the necessity of storing a major part of huge graphs in theexternal memory, which prevents efficient random access to edge hyperlinklists. A number of algorithms involving compression techniques have thus beenpresented, to represent Web graphs succinctly but also providing random access.Those techniques are usually based on differential encodings of the adjacencylists, finding repeating nodes or node regions in the successive lists, moregeneral grammar-based transformations or 2-dimensional representations of thebinary matrix of the graph. In this paper we present two Web graph compressionalgorithms. The first can be seen as engineering of the Boldi and Vigna 2004method. We extend the notion of similarity between link lists, and use a morecompact encoding of residuals. The algorithm works on blocks of varying sizein the number of input lines and sacrifices access time for bettercompression ratio, achieving more succinct graph representation than otheralgorithms reported in the literature. The second algorithm works on blocks ofthe same size, in the number of input lines, and its key mechanism is mergingthe block into a single ordered list. This method achieves much more attractivespace-time tradeoffs.



Author: Szymon Grabowski, Wojciech Bieniecki

Source: https://arxiv.org/







Related documents