Enabling Time Sensitive Information Retrieval on the Web through Real Time Search Engines Using Streams

Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 249)

Abstract

Real time search engines constantly index web content originated by data streams also. This is because, the web sources like social networking sites, news, and tweets provide up to date information through streams. As new content is arrived constantly from those sources, it is very challenging job for search engines to have efficient indexing mechanisms to ensure index freshness and coverage of the index. Such updated index supports faster search whose results also include the latest content available. Latencies such as retrieval latency and indexing latency play an important role in index freshness. The former is the time taken to fetch the content after its publication while the latter is the time taken to make index on the newly fetched content. This paper presents a framework which optimizes indexing latency and also indexing coverage. The empirical results revealed that the proposed framework is capable of achieving index freshness and coverage in order to support faster processing of search queries.

Keywords

Indexing search engines index freshness index coverage information retrieval 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Google Real-Time Search (2012), http://www.google.com/realtime
  2. 2.
    Twitter Search (2012), http://search.twitter.com
  3. 3.
    Geer, D.: Is It Really Time for Real-Time Search? Computer 43(3), 16–19 (2010)CrossRefGoogle Scholar
  4. 4.
    Gurumurthy, S., et al.: Improving Web Search Relevance andFreshness with Content Previews. In: Proc. 19th ACM Int’l Conf. Information and Knowledge Management, CIKM (2010)Google Scholar
  5. 5.
    Jansen, B.J., Campbell, G., Gregg, M.: Real Time Search UserBehavior. In: Proc. 28th ACM Conf. Human Factors in ComputingSystems, CHI (2010)Google Scholar
  6. 6.
    Gurler, U., Ozkaya, B.Y.: Analysis of the (s, S) Policy forPerishables with a Random Shelf Life. IIE Trans. 40, 759–781 (2008)CrossRefGoogle Scholar
  7. 7.
    Cho, J., Garcia-Molina, H.: Synchronizing a Database toImprove Freshness. In: Proc. ACM SIGMOD Int’l Conf. Managementof Data (2000)Google Scholar
  8. 8.
    Cho, J., Garcia-Molina, H.: Effective Page Refresh Policies forWeb Crawlers. ACM Trans. Database Systems 28(4), 390–426 (2003)CrossRefGoogle Scholar
  9. 9.
    Coffman Jr., E.G., Liu, Z., Webber, R.R.: Optimal RobotScheduling for Web Search Engines. J. Scheduling 1(1), 15–29 (1998)CrossRefMATHGoogle Scholar
  10. 10.
    Edwards, J., McCurley, K., Tomlin, J.: An Adaptive Model ofOptimizing Performance of an Incremental Web Crawler. In: Proc. Ninth Int’l World Wide Web Conf., WWW (2000)Google Scholar
  11. 11.
    Pandey, S., Olston, C.: User-Centric Web Crawling. In: Proc. 14th Int’l World Wide Web Conf., WWW (2005)Google Scholar
  12. 12.
    Wolf, J.L., et al.: Optimal Crawling Strategies for Web SearchEngines. In: Proc. 11th Int’l World Wide Web Conf., WWW (2002)Google Scholar
  13. 13.
    Chmielewski, D., Hu, G.: A Distributed Platform for Archiving and Retrieving RSS Feeds. In: Proc. Fourth ACIS Int’l Conf. Computer and Information Science, pp. 215–220 (2005)Google Scholar
  14. 14.
    Sia, K.C., Cho, J., Cho, H.: Efficient Monitoring Algorithm forFast News Alerts. IEEE Trans. Knowledge and Data Eng. 19(7), 950–961 (2007)CrossRefGoogle Scholar
  15. 15.
    Fitzpatrick, B., et al.: PubSubHubbub Core 0.3 (2012), http://pubsubhubbub.googlecode.com/svn/trunk/pubsubhubbubcore-0.3.html
  16. 16.
    Saint-Andre, P.: Extensible Messaging and Presence Protocol(XMPP): Core (2012), http://tools.ietf.org/html/draft-ietf-xmpp-3920bis-05
  17. 17.
    Arasu, A., et al.: Searching the Web. ACM Trans. Internet Technology 1(1), 2–43 (2001)CrossRefGoogle Scholar
  18. 18.
    Heydon, A., Najork, M.: Mercator: A Scalable, Extensible WebCrawler. World Wide Web 2, 219–229 (1999)CrossRefGoogle Scholar
  19. 19.
    Pant, G., Srinivasan, P., Menczer, F.: Crawling the Web. In: Web Dynamics: Adapting to Change in Content, Size. Topology and Use. Springer (2004)Google Scholar
  20. 20.
    Castillo, C., Nelli, A., Panconesi, A.: Crawling the Web WithLimited Memory. In: Proc. Web Intelligence Conf. (2006)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringGMR Institute of TechnologyRajamIndia

Personalised recommendations