Advertisement

A novel time-shifting method to find popular blog post topics

  • Lin-Chih ChenEmail author
  • Da-Ren Chen
  • Ming-Fong Lai
Methodologies and Application
  • 13 Downloads

Abstract

Blog search engines and general search engines automatically crawl Web pages from the Internet and generate search results to users. One difference between the two is that blog search engines focus on blog posts and filter general pages. This difference allows bloggers to focus only on page results for posts rather than other types of page results. Another difference is that posts involve more time-related issues than general pages. For general pages, the general search engine can only show the last update time for the page. However, for posts, the blog search engine can show all possible time clues for the post. For some frequently updated posts, time clues help bloggers find information more efficiently. In this paper, we first use some well-known semantic analysis models to analyze the performance of Google Blog Search. Next, we apply a hybrid strategy that considers the document link and time clue relationships between posts to further improve its retrieval performance. Finally, we present several experiments to simulate various possible scenarios to confirm the effectiveness of our strategy.

Keywords

Semantic analysis Time-shifting method Machine learning Natural language processing Normalized Google distance 

Notes

Acknowledgements

This study was supported by Ministry of Science and Technology of Taiwan (Grant Nos. 108-2410-H-259-048-MY3, 107-2410-H-259-016).

Compliance with ethical standards

Conflict of interest

All authors declare that he/she has no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Supplementary material

500_2019_4485_MOESM1_ESM.docx (4.9 mb)
Supplementary material 1 (DOCX 5048 kb)

References

  1. Becchi M, Crowley P (2008) Extending finite automata to efficiently match perl-compatible regular expressions. In: Proceedings of the 2008 ACM CoNEXT conference. ACM, Madrid, p 25Google Scholar
  2. Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3(1):993–1022zbMATHGoogle Scholar
  3. Bolelli L, Ertekin Ş, Giles CL (2009) Topic and trend detection in text collections using latent dirichlet allocation. In: Proceedings of the 31th european conference on IR research on advances in information retrieval. Springer Press, Toulouse, France, pp 776–780CrossRefGoogle Scholar
  4. Brahmane AV, Amune A (2014) A survey of dynamic distributed network intrusion detection using online adaboost-based parameterized methods. Int J Innov Res Adv Eng 1(9):256–262Google Scholar
  5. Chen L-C (2012) Building a term suggestion and ranking system based on a probabilistic analysis model and a semantic analysis graph. Decis Support Syst 53(1):257–266CrossRefGoogle Scholar
  6. Chen L-C (2017) An effective LDA-based time topic model to Improve blog search performance. Inf Process Manage 53:1299–1319CrossRefGoogle Scholar
  7. Chen L-C (2018) A novel page clipping search engine based on page discussion topics. Knowl Inf Syst.  https://doi.org/10.1007/s10115-018-1173-2 CrossRefGoogle Scholar
  8. Cilibrasi RL, Vit’anyi PMB (2007) The Google similarity distance. IEEE Trans Knowl Data Eng 19(3):370–383CrossRefGoogle Scholar
  9. Cosma G, Joy M (2012) An approach to source-code plagiarism detection and investigation using latent semantic analysis. IEEE Trans Comput 61(3):379–394MathSciNetzbMATHCrossRefGoogle Scholar
  10. Dumais ST (2004) Latent semantic analysis. Ann Rev Inf Sci Technol 38(1):189–230Google Scholar
  11. Fernandez-Beltran R, Pla F (2015) Incremental probabilistic latent semantic analysis for video retrieval. Image Vis Comput 38(C):1–12CrossRefGoogle Scholar
  12. Fox C (1989) A stop list for general text. ACM SIGIR Forum 24(1–2):19–35CrossRefGoogle Scholar
  13. Fu R, Qin B, Liu T (2015) Open-categorical text classification based on multi-LDA models. Soft Comput 19(1):29–38CrossRefGoogle Scholar
  14. Fujimura K, Toda H, Inoue T, Hiroshima N, Kataoka R, Sugizaki M (2006) BLOGRANGER—a multi-faceted blog search engine. In: Proceedings of the WWW 2006 workshop on the weblogging ecosystem: aggregation, analysis and dynamics, Edinburgh, W3C, pp 22–26Google Scholar
  15. Gomaa WH, Fahmy AA (2013) A survey of text similarity approaches. Int J Comput Appl 68(13):13–18Google Scholar
  16. Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42(1):177–196zbMATHCrossRefGoogle Scholar
  17. Hofmann T (2003) Collaborative filtering via Gaussian probabilistic latent semantic analysis. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval, pp 259–266Google Scholar
  18. Hofmann T (2004) Latent semantic models for collaborative filtering. ACM Trans Inf Syst 22(1):89–115CrossRefGoogle Scholar
  19. Hofmann T, Schölkopf B, Smola AJ (2008) Kernel methods in machine learning. Ann Stat 36(3):1171–1220MathSciNetzbMATHCrossRefGoogle Scholar
  20. Hsieh J-W, Chen L-C, Chen S-Y, Chen D-Y, Alghyaline S, Chiang H-F (2015) Vehicle color classification under different lighting conditions through color correction. IEEE Sens J 15(2):971–983CrossRefGoogle Scholar
  21. Jeong O-R, Oh J (2012) Social community based blog search framework. In: Proceedings of the 17th international conference on database systems for advanced applications, vol 2012. Springer, Busan, pp 130–141CrossRefGoogle Scholar
  22. Ji Z, Jing P, Wang J, Su Y (2012) Scene image classification with biased spatial block and PLSA. Int J Dig Content Technol Appl 6(1):398–404Google Scholar
  23. Keikha M, Crestani F, Carman MJ (2013) Searching blog sites with product reviews. In: Proceedings of the 15th international conference on human interface and the management of information: information and interaction for learning, culture, collaboration and business—volume part III. Springer, Las Vegas, pp 495–500Google Scholar
  24. Kim J, Yun U, Pyun G, Ryang H, Lee G, Yoon E, Ryu KH (2015) A blog ranking algorithm using analysis of both blog influence and characteristics of blog posts. Cluster Comput 18(1):157–164CrossRefGoogle Scholar
  25. Klein R, Kyrilov A, Tokman M (2011) Automated assessment of short free-text responses in computer science using latent semantic analysis. In: Proceedings of the 16th annual joint conference on innovation and technology in computer science education, pp 158–162Google Scholar
  26. Krestel R, Fankhauser P, Nejdl W (2009) Latent Dirichlet allocation for tag recommendation. In: Proceedings of the 3rd ACM conference on recommender systems, 22nd–25th October 2009. ACM, New York, pp 61–68Google Scholar
  27. Kuo F-F, Shan M-K, Lee S-Y (2013) Background music recommendation for video based on multimodal latent semantic analysis. In: Proceedings of the 2013 IEEE international conference on multimedia and expo. IEEE, San Jose, pp 1–6Google Scholar
  28. Landauer TK, Dumais ST (1997) A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol Rev 104(2):211–240CrossRefGoogle Scholar
  29. Landauer TK, Foltz PW, Laham D (1998) An introduction to latent semantic analysis. Discourse Process 25(1):259–284CrossRefGoogle Scholar
  30. Landauer TK, McNamara DS, Dennis S, Kintsch W (2013) Handbook of latent semantic analysis. Psychology Press, LondonCrossRefGoogle Scholar
  31. Lemaire B, Denhiere G (2004) Incremental construction of an associative network from a corpus. In: Proceedings of the 26th annual meeting of the cognitive science society, pp 825–830Google Scholar
  32. Li M, Li WK, Li G (2013) On mixture memory Garch models. J Time Ser Anal 34(6):606–624MathSciNetzbMATHCrossRefGoogle Scholar
  33. Liénou M, Maître H, Datcu M (2010) Semantic annotation of satellite images using latent Dirichlet allocation. IEEE Geosci Remote Sens Lett 7(1):28–32CrossRefGoogle Scholar
  34. Lin D, Li S, Cao D (2010) Making intelligent business decisions by mining the implicit relation from bloggers’ posts. Soft Comput 14(12):1317–1327CrossRefGoogle Scholar
  35. Lindsey R, Veksler VD, Grintsvayg A, Gray WD (2007) Be wary of what your computer reads: the effects of corpus selection on measuring semantic relatedness. In: Proceedings of the 8th international conference on cognitive modeling. Taylor & Francis Press, Ann Arbor, pp 279–284Google Scholar
  36. Lintean M, Moldovan C, Rus V, McNamara D (2010) The role of local and global weighting in assessing the semantic similarity of texts using latent semantic analysis. In: Proceedings of the 23th international florida artificial intelligence research society conference. AAAI Press, Marco Island, pp 235–240Google Scholar
  37. Liu Z, Zhang Y, Chang EY, Sun M (2011) PLDA+: parallel latent Dirichlet allocation with data placement and pipeline processing. ACM Trans Intell Syst Technol 2(3):26:21–26:18CrossRefGoogle Scholar
  38. Logan B, Kositsky A, Moreno P (2004) Semantic analysis of song lyrics. In: Proceedings of the 2004 IEEE international conference on multimedia and expo. IEEE, Taipei, pp 827–830Google Scholar
  39. Luh C-J, Yang S-A, Huang DT-L (2012) Estimating search engine ranking function with latent semantic analysis and a genetic algorithm. In: Proceedings of the 2012 3rd international conference on e-business and e-government—volume 04, pp 439–442Google Scholar
  40. Lukins SK, Kraft NA, Etzkorn LH (2008) Source code retrieval for bug localization using latent Dirichlet allocation. In: Proceedings of the 2008 15th working conference on reverse engineering. IEEE, Antwerp, pp 155–164CrossRefGoogle Scholar
  41. Matveeva I, Levow G-A, Farahat A, Royer C (2005) Term representation with generalized latent semantic analysis. In: Proceedings of the international conference on recent advances in natural language processing (RANLP-05)Google Scholar
  42. McInerney J, Rogers A, Jennings NR (2012) Improving location prediction services for new users with probabilistic latent semantic analysis. In: Proceedings of the 2012 ACM conference on ubiquitous computing. ACM, Pittsburgh, pp 906–910Google Scholar
  43. Mesaros A, Heittola T, Klapuri A (2011) Latent semantic analysis in sound event detection. In: Proceedings of the 19th european signal processing conference, Barcelona, Spain, August 29–September 2. EURASIP, pp 1307–1311Google Scholar
  44. Mishne G, Rijke Md (2006) A study of blog search. Lect Notes Comput Sci 3936(1):289–301CrossRefGoogle Scholar
  45. Nguyen HV, Bai L (2011) Cosine similarity metric learning for face verification. Lect Notes Comput Sci 6493(2011):709–720CrossRefGoogle Scholar
  46. Ozsoy MG, Alpaslan FN, Cicekli I (2011) Text summarization using latent semantic analysis. J Inf Sci 37(4):405–417MathSciNetCrossRefGoogle Scholar
  47. Pingdom (2015) 2015 The Web Shown in Numbers! https://goo.gl/mW77a3. Accessed 24 Nov 2018
  48. Porter MF (2018) Snowball: a language for stemming algorithms. http://snowball.tartarus.org/. Accessed 24 Nov 2018
  49. Prayiush (2012) Number of Blogs up from 35 Million in 2006 to 181 Million by the End of 2011. https://goo.gl/8WLlTs. Accessed 24 Nov 2018
  50. Shi C, Quan J, Li M (2013) Information extraction for computer science academic rankings system. In: Proceedings of the 2013 international conference on cloud and service computing. IEEE, Beijing, pp 69–76CrossRefGoogle Scholar
  51. Siddiqui A, Mishra N, Verma JS (2015) A survey on automatic image annotation and retrieval. Int J Comput Appl 118(20):27–32Google Scholar
  52. Somasundaram K, Murphy GC (2012) Automatic categorization of bug reports using latent Dirichlet allocation. In: Proceedings of the 5th India software engineering conference, 22–25, 2012. ACM, Kanpur, pp 125–130Google Scholar
  53. Speh J, Muhic A, Rupnik J (2013) Parameter estimation for the latent Dirichlet allocation. In: Proceedings of the 2013 conference on data mining and data warehouses. Information Society, Ljubljana, pp 1–4Google Scholar
  54. Takama Y, Kajinami T, Matsumura A (2005) Blog search with keyword map-based relevance feedback. In: Proceedings of the 2nd international conference on fuzzy systems and knowledge discovery—volume part II, vol 2005. Springer, Changsha, pp 1208–1215CrossRefGoogle Scholar
  55. Tan P-N, Steinbach M, Kumar V (2005) Introduction to data mining. Addison-Wesley, BostonGoogle Scholar
  56. Thelwall M, Hasler L (2007) Blog search engines. Online Inf Rev 31(4):467–479CrossRefGoogle Scholar
  57. Tsai FS (2011) A tag-topic model for blog mining. Expert Syst Appl 38(5):5330–5335CrossRefGoogle Scholar
  58. Wang C, Blei DM (2013) Variational inference in nonconjugate models. J Mach Learn Res 14(1):1005–1031MathSciNetzbMATHGoogle Scholar
  59. Wang X, McCallum A (2006) Topics over time: a non-markov continuous-time model of topical trends. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, Philadelphia, PA, USA, pp 424–433CrossRefGoogle Scholar
  60. Wang Y, Agichtein E, Benzi M (2012) TM-LDA: efficient online modeling of latent topic transitions in social media. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, Beijing, China, pp 123–131Google Scholar
  61. Wang H, Jiang Y, Jiang X, Wu J, Yang X (2018) Automatic vessel segmentation on fundus images using vessel filtering and fuzzy entropy. Soft Comput 22(5):1501–1509CrossRefGoogle Scholar
  62. Wyner A, Engers T (2010) A framework for enriched, controlled on-line discussion forums for e-government policy-making. In: Proceedings of ongoing research and projects of IFIP eGOV and ePart 2010. Trauner Druck, Linz, pp 357–366Google Scholar
  63. Xu C, Zhang Y-F, Zhu G, Rui Y, Lu H, Huang Q (2008) Using webcast text for semantic event detection in broadcast sports video. IEEE Trans Multimed 10(7):1342–1355CrossRefGoogle Scholar
  64. Yeh J-Y, Keb H-R, Yang W-P, Meng I-H (2005) Text summarization using a trainable summarizer and latent semantic analysis. Inf Process Manage 41(1):75–95CrossRefGoogle Scholar
  65. Zhai J, Zhang S, Zhang M, Liu X (2018) Fuzzy integral-based ELM ensemble for imbalanced big data classification. Soft Comput 22(11):3519–3531CrossRefGoogle Scholar
  66. Zhao WX, Jiang J, Weng J, He J, Lim E-P, Yan H, Li X (2011) Comparing twitter and traditional media using topic models. In: Proceedings of the 33rd european conference on advances in information retrieval. Springer Press, Dublin, Ireland, pp 338–349CrossRefGoogle Scholar
  67. Zhu L, Sun A, Choi B (2008) Online spam-blog detection through blog search. In: Proceedings of the 17th ACM conference on information and knowledge management, pp 1347–1348Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Information ManagementNational Dong Hwa UniversityHualienTaiwan
  2. 2.Department of Information ManagementNational Taichung University of Science and TechnologyTaichungTaiwan
  3. 3.Department of Commerce Automation and ManagementNational Pingtung UniversityPingtung CityTaiwan

Personalised recommendations