Incremental Maintenance of Biological Databases Using Association Rule Mining

Lam, Kai-Tak; Koh, Judice L. Y.; Veeravalli, Bharadwaj; Brusic, Vladimir

doi:10.1007/11818564_16

Kai-Tak Lam²²,
Judice L. Y. Koh^23,24,
Bharadwaj Veeravalli²² &
…
Vladimir Brusic²⁵

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4146))

Included in the following conference series:

International Workshop on Pattern Recognition in Bioinformatics

669 Accesses

Abstract

Biological research frequently requires specialist databases to support in-depth analysis about specific subjects. With the rapid growth of biological sequences in public domain data sources, it is difficult to keep these databases current with the sources. Simple queries formulated to retrieve relevant sequences typically return a large number of false matches and thus demanding manual filtration. In this paper, we propose a novel methodology that can support automatic incremental updating of specialist databases. Complex queries for incremental updating of relevant sequences are learned using Association Rule Mining (ARM), resulting in a significant reduction in false positive matches. This is the first time ARM is used in formulating descriptive queries for the purpose of incremental maintenance of specialised biological databases. We have implemented and tested our methodology on two real-world databases. Our experiments conclusively show that the methodology guarantees an F-score of up to 80% in detecting new sequences for these two databases.

Download to read the full chapter text

Chapter PDF

DRIMS: A Software Tool to Incrementally Maintain Previous Discovered Rules

Pattern-Growth Methods

Dynamic and Incremental Update of Mined Association Rules Against Changes in Dataset

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Siew, J.P., Khan, A.M., Tan, P.T., Koh, J.L., Seah, S.H., Koo, C.Y., Chai, S.C., Armugam, A., Brusic, V., Jeyaseelan, K.: Systematic analysis of snake neurotoxins functional classification using a data warehousing approach. Bioinformatics 20(18), 3466–3480 (2004)
Article Google Scholar
Wang, Z., Wang, G.: APD: the Antimicrobial Peptide Database. Nucleic Acids. Res. 32, 590–592 (2004)
Article Google Scholar
Szymanski, M., Barciszewski, J.: Aminoacyl-tRNA synthetases database Y2K. Nucleic Acids Res. 28, 326–328 (2000)
Article Google Scholar
Tan, P.T.J., Khan, A.M., Brusic, V.: Bioinformatics for venom and toxin sciences. Brief Bioinform. 1, 53–62 (2003)
Article Google Scholar
Gendel, S.M.: Sequence Databases for Assessing the Potential Allergenicity of Proteins Used in Transgenic Foods. Advances in Food and Nutrition Research 42, 63–92 (1998)
Article Google Scholar
Koh, J.L.Y., Krishnan, S.P.T., Seah, S.H., Tan, P.T.J., Khan, A.M., Lee, M.L., Brusic, V.: BioWare: A framework for bioinformatics data retrieval, annotation and publishing. In: SIGIR 2004 workshop on Search and Discovery in Bioinformatics, Sheffield, UK, July 29 (2004)
Google Scholar
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international conference on Management of data, Washington, D.C., United States, pp. 207–216 (1993)
Google Scholar
Creighton, C., Hanash, S.: Mining gene expression databases for association rules. Bioinformatics 19(1), 79–86 (2003)
Article Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: The International Conference on Very Large Databases, pp. 487–499 (1994)
Google Scholar
Borgelt, C., Kruse, R.: Induction of Association Rules: Apriori Implementation. In: 15th Conference on Computational Statistics. Physica Verlag, Heidelberg (2002)
Google Scholar
Ananiadou, S., Friedman, C., Tsujii, J.: Introduction: named entity recognition in biomedicine. Journal of Biomedical Informatics 37, 393–395 (2004)
Article Google Scholar
Zhou, G.D., Zhang, J., Su, J., Shen, D., Tan, C.L.: Recognizing Names in Biomedical Texts: a Machine Learning Approach. Bioinformatics 20(7), 1178–1190 (2004)
Article Google Scholar
Settles, B.: ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text. Bioinformatics 21(14), 3191–3192 (2005)
Article Google Scholar
Ohta, T., Tateisi, Y., Kim, J., Mima, H., Tsujii, J.: The GENIA corpus: an annotated research abstract corpus in molecular biology domain. In: Proceedings of Human Language Technology (HLT 2002), San Diego, pp. 489–493 (2002)
Google Scholar
Kim, J., Ohta, T., Tsuruoka, Y., Tateisi, Y., Collier, N.: Introduction to the bio-entity recognition task at JNLPBA. In: Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA), Geneva, Switzerland, pp. 70–75 (2004)
Google Scholar
Yeh, A., Hirschman, L., Morgan, A., Colosimo, M.: BioCreAtIve Task 1A: gene mention finding evaluation. BMC Bioinformatics 6(suppl. 1), S2 (2005)
Article Google Scholar
Bailey, T.L., Elkan, C.: The Value of Prior Knowledge in Discovering Motifs with MEME. ISMB 3, 21–29 (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical & Computer Engineering, National University of Singapore, 4 Engineering Drive 3, 117576, Singapore
Kai-Tak Lam & Bharadwaj Veeravalli
Institute for Infocomm Research, 21 Heng Mui Keng Terrace, 119613, Singapore
Judice L. Y. Koh
School of Computing, National University of Singapore, 3 Science Drive 2, 119260, Singapore
Judice L. Y. Koh
Australian Centre for Plant Functional Genomics, School of Land and Food Sciences, and the Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD 4072, Australia
Vladimir Brusic

Authors

Kai-Tak Lam
View author publications
You can also search for this author in PubMed Google Scholar
Judice L. Y. Koh
View author publications
You can also search for this author in PubMed Google Scholar
Bharadwaj Veeravalli
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Brusic
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Singapore-MIT Alliance, 50 Nanyang Avenue, N2-B2C-15, Singapore
Jagath C. Rajapakse
School of Computing, National University of Singapore, Singapore
Limsoon Wong
Computer Science and Engineering, The Penn State University, USA
Raj Acharya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lam, KT., Koh, J.L.Y., Veeravalli, B., Brusic, V. (2006). Incremental Maintenance of Biological Databases Using Association Rule Mining. In: Rajapakse, J.C., Wong, L., Acharya, R. (eds) Pattern Recognition in Bioinformatics. PRIB 2006. Lecture Notes in Computer Science(), vol 4146. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11818564_16

Download citation

DOI: https://doi.org/10.1007/11818564_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37446-6
Online ISBN: 978-3-540-37447-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Incremental Maintenance of Biological Databases Using Association Rule Mining

Abstract

Chapter PDF

Similar content being viewed by others

DRIMS: A Software Tool to Incrementally Maintain Previous Discovered Rules

Pattern-Growth Methods

Dynamic and Incremental Update of Mined Association Rules Against Changes in Dataset

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Incremental Maintenance of Biological Databases Using Association Rule Mining

Abstract

Chapter PDF

Similar content being viewed by others

DRIMS: A Software Tool to Incrementally Maintain Previous Discovered Rules

Pattern-Growth Methods

Dynamic and Incremental Update of Mined Association Rules Against Changes in Dataset

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation