Abstract
We propose a method for classifying queries whose frequency spikes in a search engine into their topical categories such as celebrities and sports. Unlike previous methods using Web search results and query logs that take a certain period of time to follow spiking queries, we exploit Twitter to timely classify spiking queries by focusing on its massive amount of super-fresh content. The proposed method leverages unique information in Twitter—not only tweets but also users and hashtags. We integrate such heterogeneous information in a graph and classify queries using a graph-based semi-supervised classification method. We design an experiment to replicate a situation when queries spike. The results indicate that the proposed method functions effectively and also demonstrate that accuracy improves by combining the heterogeneous information in Twitter.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agichtein, E., Castillo, C., Donato, D., Gionis, A., Mishne, G.: Finding high-quality content in social media. In: WSDM 2008, pp. 183–194 (2008)
Baeza-Yates, R., Calderón-Benavides, L., González-Caro, C.N.: The Intention Behind Web Queries. In: Crestani, F., Ferragina, P., Sanderson, M. (eds.) SPIRE 2006. LNCS, vol. 4209, pp. 98–109. Springer, Heidelberg (2006)
Beitzel, S.M., Jensen, E.C., Frieder, O., Lewis, D.D., Chowdhury, A., Kolcz, A.: Improving automatic query classification via semi-supervised learning. In: ICDM 2005, pp. 42–49 (2005)
Broder, A.Z., Fontoura, M., Gabrilovich, E., Joshi, A., Josifovski, V., Zhang, T.: Robust classification of rare queries using web knowledge. In: SIGIR 2007, pp. 231–238 (2007)
Diemert, E., Vandelle, G.: Unsupervised query categorization using automatically-built concept graphs. In: WWW 2009, pp. 461–461 (2009)
Dong, A., Zhang, R., Kolari, P., Bai, J., Diaz, F., Chang, Y., Zheng, Z., Zha, H.: Time is of the essence: improving recency ranking using twitter data. In: WWW 2010, pp. 331–340 (2010)
Hu, J., Wang, G., Lochovsky, F., Tao Sun, J., Chen, Z.: Understanding user’s query intent with Wikipedia. In: WWW 2009, pp. 471–480 (2009)
Kudo, T., Yamamoto, K., Matsumoto, Y.: Applying conditional random fields to Japanese morphological analysis. In: EMNLP 2004, pp. 230–237 (2004)
Kulkarni, A., Teevan, J., Svore, K.M., Dumais, S.T.: Understanding temporal query dynamics. In: WSDM 2011, pp. 167–176 (2011)
Li, X., Wang, Y.-Y., Acero, A.: Learning query intent from regularized click graphs. In: SIGIR 2008, pp. 339–346 (2008)
Li, Y., Zheng, Z., Dai, H.K.: KDD CUP-2005 report: facing a great challenge. SIGKDD Explor. Newsl. 7(2), 91–99 (2005)
Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes twitter users: real-time event detection by social sensors. In: WWW 2010, pp. 851–860 (2010)
Shen, D., Sun, J.-T., Yang, Q., Chen, Z.: Building bridges for web query classification. In: SIGIR 2006, pp. 131–138 (2006)
Talukdar, P., Crammer, K.: New Regularized Algorithms for Transductive Learning. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009, Part II. LNCS, vol. 5782, pp. 442–457. Springer, Heidelberg (2009)
Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using gaussian fields and harmonic functions. In: ICML 2003, pp. 912–919 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yoshida, M., Arase, Y. (2012). Exploiting Twitter for Spiking Query Classification. In: Hou, Y., Nie, JY., Sun, L., Wang, B., Zhang, P. (eds) Information Retrieval Technology. AIRS 2012. Lecture Notes in Computer Science, vol 7675. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35341-3_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-35341-3_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35340-6
Online ISBN: 978-3-642-35341-3
eBook Packages: Computer ScienceComputer Science (R0)