Efficient Subsequence Search in Databases

Jain, Rohit; Mohania, Mukesh K.; Prabhakar, Sunil

doi:10.1007/978-3-642-38562-9_45

Rohit Jain²¹,
Mukesh K. Mohania²² &
Sunil Prabhakar²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7923))

Included in the following conference series:

International Conference on Web-Age Information Management

3466 Accesses
1 Citations
1 Altmetric

Abstract

Finding tuples in a database that match a particular subsequence (with gaps) is an important problem for a range of applications. Subsequence search is equivalent to searching for regular expressions of the type .* q ₁ .* q ₂ .* … .* q _l .*, where the subsequence is q ₁ q ₂ …q _l. For efficient execution of these queries, there is a need for appropriate index structures that are both efficient and can scale to large problem sizes. This paper presents two index structures for such queries based on trie and bitmap. These indices are disk-resident, hence can be easily used by large databases with limited memory availability. Our indices are applicable to dynamic databases, where tuples can be added or deleted. Both indices are implemented and validated against a naive approach. The results show that the proposed indices are efficient, having low I/O and time overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Subramaniam, L.V., Faruquie, T.A., Ikbal, S., Godbole, S., Mohania, M.K.: Business intelligence from voice of customer. In: International Conference on Data Engineering (2009)
Google Scholar
Knuth, D.E., Morris, J., Pratt, V.R.: Fast pattern matching in strings. SIAM Journal on Computing 6, 323–350 (1977)
Article MathSciNet MATH Google Scholar
Antoshenkov, G.: Byte-aligned bitmap compression. In: Conference on Data Compression (1995)
Google Scholar
Baeza-Yates, R.A., Gonnet, G.H.: Fast text searching for regular expressions or automaton searching on tries. J. ACM 43(6), 915–936 (1996)
Article MathSciNet MATH Google Scholar
Manber, U., Baeza-Yates, R.: An algorithm for string matching with a sequence of don’t cares. Information Processing Letters 37(3), 133–136 (1991)
Article MathSciNet MATH Google Scholar
Yong Chan, C., Garofalakis, M., Rastogi, R.: Re-tree: An efficient index structure for regular expressions. The Very Large Databases Journal 12(2), 102–119 (2003)
Article Google Scholar
de la Briandais, R.: File searching using variable length keys. In: AFIPS Western JCC, San Francisco, Calif., pp. 295–298 (1959)
Google Scholar
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)
Article MathSciNet MATH Google Scholar
McCreight, E.M.: A space-economical suffix tree construction algorithm. J. ACM 23(2), 262–272 (1976)
Article MathSciNet MATH Google Scholar
du Mouza, C., Rigaux, P., Scholl, M.: Parameterized pattern queries. Data Knowl. Eng. 63(2), 433–456 (2007)
Article Google Scholar
Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. SIGMOD Rec. 23, 419–429 (1994)
Article Google Scholar
Cho, J., Rajagopalan, S.: A fast regular expression indexing engine. In: International Conference on Data Engineering, pp. 419–430 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Sciences, Purdue University, 305 N. University Street, West Lafayette, IN, 47907, USA
Rohit Jain & Sunil Prabhakar
IBM India Research Lab, New Delhi, India
Mukesh K. Mohania

Authors

Rohit Jain
View author publications
You can also search for this author in PubMed Google Scholar
Mukesh K. Mohania
View author publications
You can also search for this author in PubMed Google Scholar
Sunil Prabhakar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Technology, Tsinghua University, 100084, Beijing, China
Jianyong Wang
Management Science and Information Systems Department, Rutgers, the State University of New Jersey, 1, Washington Park, 07102, Newark, NJ, USA
Hui Xiong
Department of Information Engineering, Nagoya University, 464-8601, Nagoya, Japan
Yoshiharu Ishikawa
Department of Computer Science, Hong Kong Baptist University, Hong Kong
Jianliang Xu
School of Information Science and Engineering, Yanshan University, Qinhuangdao, China
Junfeng Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jain, R., Mohania, M.K., Prabhakar, S. (2013). Efficient Subsequence Search in Databases. In: Wang, J., Xiong, H., Ishikawa, Y., Xu, J., Zhou, J. (eds) Web-Age Information Management. WAIM 2013. Lecture Notes in Computer Science, vol 7923. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38562-9_45

Download citation

DOI: https://doi.org/10.1007/978-3-642-38562-9_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38561-2
Online ISBN: 978-3-642-38562-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics