Efficient indexing structure to handle durable queries through web crawling

Devi, R. Suganya; Manjula, D.; Sugumaran, Vijayan

doi:10.1007/s10586-016-0595-4

Efficient indexing structure to handle durable queries through web crawling

Published: 11 July 2016

Volume 19, pages 1347–1358, (2016)
Cite this article

Cluster Computing Aims and scope Submit manuscript

393 Accesses
3 Citations
Explore all metrics

Abstract

This paper studies efficient processing of durable top-k queries on historical time series databases. Durable top-k queries, obtained as an extension of snapshot top-k queries during a certain time period, play a key role in finding objects with durable quality and predicting the status of these objects for successive time intervals by updating the query interval at all timestamps. Web crawling and indexing are tremendously significant in recent times, especially in terms of achieving efficient durable top-k queries from vast quantum of web documents. Existing algorithms that have been employed throw up results that are less than applicable to analyzers. This paper chiefly focuses on web crawling and indexing query terms under their respective categories and updating rank changes at every time interval. Links are crawled using the modified depth-first search (MDFS) algorithm, accessed, and metadata such as the title, keywords, and descriptions extracted. To handle query indexing, novel indexing techniques are proposed to yield efficient results. This study is invaluable for analysts working on large data obtained as a result of crawling and indexing, effectively decreasing their workload.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Progressive Term Frequency Analysis on Large Text Collections

Parrot: A Progressive Analysis System on Large Text Collections

Article Open access 22 October 2020

A Fast Data Ingestion and Indexing Scheme for Real-Time Log Analytics

References

Cooley, R.B., Mobasher, B., Srivastava, J.: Web mining: information and pattern discovery on the world wide web. In: Proceedings of the 9th IEEE International Conference on Tool with Artificial Intelligence, pp. 558–567. (1997)
Singh, A., Srivatsa, M., Liu, L., Miller, T.: Apoidea: a decentralized peer-to-peer architecture for crawling the world wide web. In: Proceedings of the Workshop on Distributed Information Retrieval, Lecture Notes in Computer Science (SIGIR 2003), vol. 2924, pp. 126–142. (2003)
David, G., Jon, K., Prabhakar, R.: Inferring web communities from link topology. In: Proceedings of the 9th ACM Conference on Hypertext and Hypermedia: Links, Objects, Time and Space-Structure in Hypermedia Systems, Pittsburgh, pp. 225–234. (1998)
Pandey, S., Olston, C.: User-centric web crawling. In: Proceedings of the 14\(^{th}\) International Conference on World Wide Web, pp. 401–411. (2005)
Kosala, R., Blockeel, H.: Web mining research: a survey. ACM SIGKDD Explor. 2, 1–15 (2000)
Minhas, Gurmeen, Kumar, Mukesh: LSI based relevance computation for topical web crawler. J. Emerg. Technol. Web Intell. 5(4), 401–406 (2013)
Google Scholar
Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-K query processing techniques in relational database systems. ACM Comput. Surv. 40(4), 11–58 (2008)
Article Google Scholar
Jiang, B., Pei, J.: Online interval skyline queries on time series. In: Proceedings of the IEEE International Conference on Data Engineering (ICDE), pp. 1036–1047. (2009)
Lee, M.L., Hsu, W., Li, L., Tok, W.H.: Consistent top-K queries over time. In: Proceedings of the 14\(^{th}\) International Conference on Database Systems for Advanced Applications (DASFAA), vol. 5463, pp. 51–65. (2009)
Li, F., Yi, K., Le, W.: Top-k queries on temporal data. VLDB J. 19, 715–733 (2010)
Article Google Scholar
Re, C., Dalvi, N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: Proceedings of the International Conference on Data Engineering (ICDE), pp. 886–895. (2007)
Hou, L., Mamoulis, U.N., Berberich, K., Bedathur, S.: Durable top-k search in document archives. In: Proceedings of the International Conference on Management of Data (ACM SIGMOD), pp. 555–566. (2010)
Dakka, W., Gravano, L., Ipeirotis, P.G.: Answering general time sensitive queries. IEEE Trans. Knowl. Data Eng. 24(2), 220–235 (2012)
Article Google Scholar
Mahmoud, A., Cengiz, O., Erkay, S.: Efficient top-k similarity document search utilizing distributed file systems and cosine similarity. Clust. Comput. 19, 109–126 (2016)
Article Google Scholar
Wang, H., Cal, Y., Yang, Y., Zhang, S., Mamoulis, N.: Durable queries over historical time series. IEEE Trans. Knowl. Data Eng. 26(3), 595–607 (2014)
Article Google Scholar
He, Z., Wu, C., Liu, G., Zheng, Z., Tian, Y.: Decomposition tree: a spatio-temporal indexing method for movement big data. Clust. Comput. 18, 1481–1492 (2015)
Article Google Scholar
kim, J., Yun, U., Pyun, G., Ryang, H., Lee, G., Yoon, E., Ryu, K.H.: A blog ranking algorithm using analysis of both blog influence and characteristics of blog posts. Clust. Comput. 18, 157–164 (2015)
Article Google Scholar
Devi, S.R., Manjula, D.: Survey on comparative analysis of durable queries over historical time series. Int. J. Comput. Appl. 106, 34–37 (2014)
Google Scholar
Suganya Devi, R., Manjula, D., Siddharth, R.K.: An efficient approach for web indexing of big data through hyperlinks in web crawling. Sci. World J. (2015). doi:10.1155/2015/739286

Download references

Acknowledgments

Dr. Sugumaran’s research has been supported in part by a 2016 School of Business Administration Spring/Summer Research Fellowship from Oakland University.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Anna University, Chennai, 600025, India
R. Suganya Devi & D. Manjula
Department of Decision and Information Sciences, School of Business Administration, Oakland University, Rochester, MI, 48309, USA
Vijayan Sugumaran

Authors

R. Suganya Devi
View author publications
You can also search for this author in PubMed Google Scholar
D. Manjula
View author publications
You can also search for this author in PubMed Google Scholar
Vijayan Sugumaran
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to R. Suganya Devi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Devi, R.S., Manjula, D. & Sugumaran, V. Efficient indexing structure to handle durable queries through web crawling. Cluster Comput 19, 1347–1358 (2016). https://doi.org/10.1007/s10586-016-0595-4

Download citation

Received: 02 April 2016
Revised: 27 June 2016
Accepted: 28 June 2016
Published: 11 July 2016
Issue Date: September 2016
DOI: https://doi.org/10.1007/s10586-016-0595-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient indexing structure to handle durable queries through web crawling

Abstract

Access this article

Similar content being viewed by others

Progressive Term Frequency Analysis on Large Text Collections

Parrot: A Progressive Analysis System on Large Text Collections

A Fast Data Ingestion and Indexing Scheme for Real-Time Log Analytics

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient indexing structure to handle durable queries through web crawling

Abstract

Access this article

Similar content being viewed by others

Progressive Term Frequency Analysis on Large Text Collections

Parrot: A Progressive Analysis System on Large Text Collections

A Fast Data Ingestion and Indexing Scheme for Real-Time Log Analytics

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation