Abstract
The World Wide Web has developed into a central source of information, a very important marketplace, a highly noticed presentation platform, and a frequented meeting place, to mention only some. Furthermore, the ever-growing number of users and content creators leads to a rapid evolution and emergence of different Web sites. As a consequence, it is more and more difficult to identify the Web sites providing the information and services of interest.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amitay, E., D. Carmel, A. Darlow, R. Lempel, and A. Soffer. 2003. The connectivity sonar: Detecting site functionality by structural patterns. In Proceedings of the 14th Conference on Hypertext and Hypermedia. Nottingham.
Biber, D. 1988. Variation across speech and writing. Cambridge, MA: Cambridge University Press.
Björneborn, L. 2010. Genre connectivity and genre drift in a web of genres. In Genres on the web: Computational models and empirical studies, eds. A. Mehler, S. Sharoff, and M. Santini. Dordrecht: Springer.
Braslavski, P. 2010. Marrying relevance and genre rankings: An Exploratory Study. In Genres on the web: Computational models and empirical studies, eds. A. Mehler, S. Sharoff, and M. Santini. Dordrecht: Springer.
Bruce, I. 2010. Evolving genres in online domains: The hybrid genre of the participatory news article. In Genres on the web: Computational models and empirical studies, eds. A. Mehler, S. Sharoff, and M. Santini, M. Dordrecht: Springer.
Chakrabarti, S. 2003. Mining the web. San Francisco, CA: Morgan Kaufmann.
Cho, J., and H. Garcia-Molina. 2000. The evolution of the web and its implications for an incremental crawler. In 26th Conference on Very Large Data Bases. Cairo.
Cooley, R. 2003. The use of web structure and content to identify subjectively interesting web usage patterns. ACM Transactions on Internet Technology 3(2):93–116.
Dehmer, M., and F. Emmert-Streib. 2010. Mining graph patterns in web-based systems: A conceptual view. In Genres on the web: Computational models and empirical studies, eds. A. Mehler, S. Sharoff, and M. Santini. Dordrecht: Springer.
DMOZ. Open directory project, http://www.dmoz.org
Domingos, P., and M. Pazzani. 1997. On the optimality of the bayesian classifier under zero-one loss. Machine Learning 29:103–137.
Duda, R., P. Hart, and D. Stork. 2001. Pattern classification, 2nd Ed. New York, NY: Wiley.
Dunning, T. 1993. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19:61–74.
Ester, M., H.-P. Kriegel, and M. Schubert. 2002. Web site mining: A new way to spot competitors, customers and suppliers in the World Wide Web. In Proceedings of the 8th International Conference on Knowledge Discovery and Data Mining. Edmonton.
Fetterly, D., M. Manasse, and M. Najork. 2004. Spam, damn spam, and statistics: Using statistical analysis to locate spam web pages. In Proceedings of the 7th International Workshop on the Web and Databases. Paris.
Gibson, D., K. Punera, and A. Tomkins. 2005. The volume and evolution of web page templates. In Proceedings of the 14th International World Wide Web Conference. Chiba.
Han, J., and M. Kamber. 2006. Data mining, 2nd Ed. San Francisco, CA: Morgan Kaufmann.
Kohavi, R., and G. John. 1997. Wrappers for feature subset selection. Artificial Intelligence, 97:273–324.
Kriegel, H.-P., and M. Schubert. 2004. Classification of websites as sets of feature vectors. In International Conference on Databases and Applications. Innsbruck.
Kumar, R., K. Punera, and A. Tomkins. 2006. Hierarchical topic segmentation of websites. In Proceedings of the 12th International Conference on Knowledge Discovery and Data Mining. Philadelphia, PA.
Kwon, O.-W., and J.-H. Lee. 2003. Text categorization based on k-nearest neighbor approach for web site classification. Information Processing and Management 39:25–44.
Lee, D. 2001. Genres, registers, text types, domains, and styles: Clarifying the concepts and navigating a path through the BNC Jungle. Language Learning & Technology 5:37–72.
Lindemann, C., and L. Littig. 2006. Coarse-grained classification of web sites by their structural properties. In Proceedings of the 8th International Workshop on Web Information and Data Management. Arlington, VA.
Lindemann, C., and L. Littig. 2007. Classifying web sites. In Proceedings of the 16th International World Wide Web Conference. Banff.
Liu, B. 2007. Web data mining: Exploring hyperlinks, contents and usage data. Heidelberg: Springer.
Pierre, J.M. 2001. On the automated classification of web sites. Linköping Electronic Articles in Computer and Information Science 6.
Sharoff, S. 2010. In the garden and in the jungle: Comparing genres in the BNC and internet. In Genres on the web: Computational models and empirical studies, eds. A. Mehler, S. Sharoff, and M. Santini. Dordrecht: Springer.
Stein, B., S. Meyer zu Eissen, and N. Lipka. 2010. Web genre analysis: Use cases, retrieval models, and implementation issues. In Genres on the web: Computational models and empirical studies, eds. A. Mehler, S. Sharoff, and M. Santini. Dordrecht: Springer.
Tian, Y-H., T. Huang, and W. Gao. 2004. Two-phase web site classification based on hidden Markov tree models. Web Intelligence and Agent Systems 2:249–264.
Vogel, D. 2003. Using generic corpora to learn domain-specific terminology. In Workshop on Link Analysis for Detecting Complex Behavior. Washington, DC.
Weiss, N.A. 2002. Introductory Statistics, 6th Ed., Greg Tobin. Reading MA: Addison Wesley.
Yahoo! Mindset, http://mindset.research.yahoo.com
Yang, Y., and Webb, G. 2003. Weighted proportional k-interval discretization for naive-bayes classifiers. Artificial Intelligence 2637:501–512.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media B.V.
About this chapter
Cite this chapter
Lindemann, C., Littig, L. (2010). Classification of Web Sites at Super-genre Level. In: Mehler, A., Sharoff, S., Santini, M. (eds) Genres on the Web. Text, Speech and Language Technology, vol 42. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-9178-9_10
Download citation
DOI: https://doi.org/10.1007/978-90-481-9178-9_10
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-9177-2
Online ISBN: 978-90-481-9178-9
eBook Packages: Computer ScienceComputer Science (R0)