Abstract
The Web media monitoring methodology underlying this paper provides linguistic descriptives by automatically mirroring, processing and comparing large samples of Web-based corpora. Since May 1999, the database of the webLyzard project has continually been extended and now comprises more than 3,700 sites, which are being monitored in monthly intervals. The wealth of information contained in these sites is converted into aggregated representations through structural and textual analysis. Based on word frequencies and distance measures, perceptual maps and the semantic orientation of Web-based corpora towards particular concepts are computed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bauer, C., Scharl, A.: Quantitative Evaluation of Web Site Content and Structure. Internet Research: Networking Applications and Policy 10, 31–43 (2000)
Scharl, A.: Evolutionary Web Development. Springer, London (2000)
Scharl, A., Bauer, C.: Explorative Analysis and Evaluation of Commercial Web Information Systems. In: Proc. 20th International Conference on Information Systems, pp. 534–539 (1999)
Chang, G., Healey, M.J., McHugh, J.A.M., Wang, J.T.L.: Mining the World Wide Web - An Information Search Approach. Kluwer Academic Publishers, Norwell (2001)
Mena, J.: Data Mining Your Website. Digital Press, Boston (1999)
Turban, E., Aronson, J.E.: Decision Support Systems and Intelligent Systems, 5th edn. Prentice-Hall, Upper Saddle River (1998)
Pyle, D.: Data Preparation for Data Mining. Morgan Kaufmann, San Francisco (1999)
Kleinberg, J., Papadimitriou, C., Raghavan, P.: A Microeconomic View of Data Mining. Data Mining and Knowledge Discovery 2, 311–324 (1998)
Murphy, J., Hofacker, C.F., Bennett, M.: Website-generated Market-research Data: Tracing the Tracks Left Behind by Visitors. Cornell Hotel and Restaurant Administration Quarterl 42, 82–91 (2001)
Mobasher, B., Dai, H., Nakagawa, M., Luo, T.: Discovery and Evaluation of Aggregate Usage Profiles for Web Personalization. Data Mining and Knowledge Discovery 6, 61–82 (2002)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2001)
Bauer, C., Scharl, A.: Acquisition and Symbolic Visualization of Aggregated Customer Information for Analyzing Web Information Systems. In: Proc. 32nd Hawaii International Conference on System Sciences (1999)
McMillan, S.J.: The Microscope and the Moving Target: The Challenge of Applying Content Analysis to the World Wide Web. Journalism and Mass Communication Quarterly 77, 80–98 (2000)
Koster, M.: Evaluation of the Standard for Robots Exclusion [Online], Available: http://www.robotstxt.org/wc/evalhtml
Krippendorf, K.: Content Analysis: An Introduction to Its Methodology. Sage, Beverly Hills (1980)
Potter, J.W., Levine-Donnerstern, D.: Rethinking Validity and Reliability in Content Analysis. Journal of Applied Communication Research 27, 258–284 (1999)
Lebart, L., Salem, A., Berry, L.: Exploring Textual Data. Kluwer Academic Publishers, Dordrecht (1998)
Potter, R.F.: Measuring the ”Bells & Whistles” of a New Medium: Using Content Analysis to Describe Structural Features of Cyberspace. In: Proc. 49th Annual Conference of the International Communication Association (1999)
McEnery, T., Wilson, A.: Corpus Linguistics. Edinburgh University Press, Edinburgh (1996)
Biber, D., Conrad, S., Reppen, R.: Corpus Linguistics - Investigating Language Structure and Use. Cambridge University Press, Cambridge (1998)
Tesch, R.: Qualitative Research: Analysis Types and Software Tools. Falmer Press, New York (1990)
Terveen, L.G., Hill, W.C., Amento, B.: Constructing, Organizing, and Visualizing Collections of Topically Related Web Resources. ACM Transactions on Computer-Human Interaction 6, 67–94 (1999)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press Books, Harlow (1999)
Titscher, S., Wodak, R., Meyer, M., Vetter, E.: Methoden der Textanalyse: Leitfaden und Überblick. Westdeutscher Verlag, Opladen (1998)
McMillan, S.J.: The Microscope and the Moving Target: The Challenge of Applying a Stable Research Technique to a Dynamic Communication Environment. In: Proc. 49th Annual Conference of the International Communication Association (1999)
Aarseth, E.J.: Nonlinearity and Literary Theory. In: Landow, G.P. (ed.) Hyper/Text/Theory, pp. 51–86. Johns Hopkins University Press, Baltimore (1994)
Pearce, C., Miller, E.: The TELLTALE Dynamic Hypertext Environment: Approaches to Scalability. In: Nicholas, C., Mayfield, J. (eds.) Intelligent Hypertext: Advanced Techniques for the World Wide Web, pp. 109–130. Springer, Heidelberg (1997)
Hull, D.A., Grefenstette, G.: Querying Across Languages: A Dictionary-based Approach to Multilingual Information Retrieval. In: Proc. 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 49-57 (1996)
Grefenstette, G.: Comparing Two Language Identification Schemes. In: Proc. 3rd International Conference on Statistical Analysis of Textual Data, pp. 263-268 (1995)
Someya, Y. (1999) e-lemma.txt [Online]. Available: http://www.lexically.net/downloads/e-lemma.zip
Stone, P.J., Dunphy, D.C., Smith, M.S., Ogilvie, D.M.: The General Inquirer: A Computer Approach to Content Analysis. MIT Press, Cambridge (1966)
Stone, P.J.: The General Inquirer [Online]. Available: http://www.wjh.harvard.edu/~inquirer/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Scharl, A., Pollach, I., Bauer, C. (2003). Determining the Semantic Orientation of Web-Based Corpora. In: Liu, J., Cheung, Ym., Yin, H. (eds) Intelligent Data Engineering and Automated Learning. IDEAL 2003. Lecture Notes in Computer Science, vol 2690. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45080-1_116
Download citation
DOI: https://doi.org/10.1007/978-3-540-45080-1_116
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40550-4
Online ISBN: 978-3-540-45080-1
eBook Packages: Springer Book Archive