Abstract
The link structure of the Web is generally viewed as the webgraph, and web structure mining is a research area that mainly aims to find hidden communities in the Web and so on, by focusing on the webgraph. In this paper, we identify a common frequent substructure by observing the webgraph, and newly define it as an isolated star (i-star). We propose an efficient enumeration algorithm of i-stars, and try structure mining by enumerating them from the real web data. As a result, we observed that most of i-stars correspond to index structures in single domains, while some of them are verified to stand for useful communities, which implies the validity of i-stars as candidate substructure for structure mining. We also suggest that the notion of i-star can be a helpful tool for preprocessing the webgraph to have its succinct representation for further structure mining.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Asano, Y., Imai, H., Toyoda, M., Kitsuregawa, M.: Finding neighbor communities in the Web using inter-site graph. In: Proc. 14th Int’l Conf. on Database and Expert Syst. Appl., pp. 558–568 (2003)
Broder, A.Z., Kumar, S.R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., Wiener, J.L.: Graph structure in the web. Computer Networks 33, 309–320 (2000)
Flake, G.W., Lawrence, S., Giles, C.L.: Efficient identification of web communities. In: Proc. 6th ACM Int’l Conf. on Knowl, pp. 150–160 (2000)
Ito, H., Iwama, K., Osumi, T.: Linear-time enumeration of isolated cliques. In: Proc. 13th Ann. European Symp. on Algorithms, pp. 119–130 (2005)
Kleinberg, J.: Authoritative sources in a hyperlinked environment. J. ACM 46, 604–632 (1997)
Kleinberg, J., Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.S.: The Web as a graph: measurements, models, and methods. In: Proc. 5th Int’l Comput. and Comb. Conf., pp. 1–17 (1999)
Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.S.: Trawling the Web for emerging cyber-communities. Comput. Net. 31, 1481–1493 (1999)
Laura, L., Leonardi, S., Millozzi, S., Meyer, U., Sibeyn, J.F.: Algorithms and experiments for the Webgraph. In: Proc. 11th Ann. Euro. Symp. on Algor., pp. 703–714 (2003)
Raghavan, S., Garcia-Molina, H.: Representing web graphs. In: Proc. 19th Int’l Conf. on Data Eng., pp. 405–416 (2003)
The Stanford WebBase Project, http://www-diglib.stanford.edu/testbed/doc2/WebBase/
Uno, Y., Ota, Y., Uemichi, A., Umano, M.: Mining communities and detecting link farms in the Web by isolated cliques. In: Proc. 2nd Int’l Conf. on Knowledge Engineering and Decision Support, pp. 179–187 (2006)
Uno, Y., Uemichi, A.: Structural properties of i-star contracted webgraphs. Manuscript
Wu, B., Davison, B.D.: Identifying link farm spam pages. In: Proc. 14th Int’l WWW Conf., pp. 820–829 (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Uno, Y., Ota, Y., Uemichi, A. (2008). Web Structure Mining by Isolated Stars. In: Aiello, W., Broder, A., Janssen, J., Milios, E. (eds) Algorithms and Models for the Web-Graph. WAW 2006. Lecture Notes in Computer Science, vol 4936. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78808-9_14
Download citation
DOI: https://doi.org/10.1007/978-3-540-78808-9_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78807-2
Online ISBN: 978-3-540-78808-9
eBook Packages: Computer ScienceComputer Science (R0)