A Weighted Density-Based Approach for Identifying Standardized Items that are Significantly Related to the Biological Literature

Al-Azzam, Omar; Wu, Jianfei; Al-Nimer, Loai; Chitraranjan, Charith; Denton, Anne M.

doi:10.1007/978-3-642-45252-9_6

Omar Al-Azzam^3,4,
Jianfei Wu^3,4,
Loai Al-Nimer^3,4,
Charith Chitraranjan^3,4 &
…
Anne M. Denton^3,4

Part of the book series: Studies in Big Data ((SBD,volume 3))

3364 Accesses

Abstract

A large part of scientific knowledge is confined to the text of publications. An algorithm is presented for distinguishing those pieces of information that can be predicted from the text of publication abstracts from those, for successes in prediction are spurious. The significance of relationships between textual data and information that is represented in standardized ontologies and protein domains is evaluated using a density-based approach. The approach also integrates a weighting system to account for many-to-many relationships between the abstracts and the genes they represent as well as between genes and the items that describe them. We evaluate the approach using data related from the model species yeast, and show that our results are in better agreement with biological expectations than a comparison algorithm.

Supported by the National Science Foundation under Grant No. IDM-0415190.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Zweigenbaum, P., Demner-Fushman, D., Cohen, K.B.: Frontiers of biomedical text mining: current progress. Briefings Bioinform 8(5), 58–375 (2007)
Article Google Scholar
Valencia, A.: Text mining in genomics and systems biology. DTMBIO ’08: Proceeding of the 2nd International Workshop on Data and Tex Mining in Bioinformatics, pp. 3–4. Napa Valley, California, USA, ACM (2008)
Google Scholar
Mima, H., Ananiadou, S., Matsushima, K.: Terminology-based knowledge mining for new knowledge discovery. ACM Trans. Asian Lang. Inf. Process. 5(1), 74–88 (2006)
Article Google Scholar
Chiang, Jung-Hsien, Hsu-Chun, Yu.: MeKE: discovering the functions of gene products from biomedical literature via sentence alignment. Bioinformatics 19(11), 1417–1422 (2003)
Article Google Scholar
Lussier, Y.A., Borlawsky, T., Rappaport, D., Liu, Y., Friedman, C.: PhenoGO: assigning phenotypic context to gene ontology annotations with natural language processing. In: Pacific Symposium on Biocomputing, pp. 64–75. World Scientific, Singapore (2006)
Google Scholar
Koller, D.: Probabilistic Relational Models, ILP. Lecture Notes in Computer Science, vol 1634, pp. 3–13. Springer (1999)
Google Scholar
Anne, M.: Denton and Jianfei Wu: data mining of vector-item patterns using neighborhood histograms. Knowl. Inf. Syst. 21(2), 173–199 (2009)
Article Google Scholar
Everitt, B.S.: The Analysis of Contingency Tables. CHAPMAN and HALL/CRC, London (1992)
Google Scholar
Fan, W., Wallace, L., Rich, S., Zhang, Z.: Tapping the power of text mining. Commun. ACM 49(9), 76–82 (2006)
Article Google Scholar
Mooney, R.J., Bunescu, R.: Mining knowledge from text using information extraction. SIGKDD Explor. Newsl. 7(1), 3–10 (2006)
Google Scholar
Qi, X., Davison, B.D.: Web page classification: features and algorithms. ACM Comput. Surv. 41(2), 1–31 (2009)
Google Scholar
Godbole, S., Roy, S.: Text classification, business intelligence, and interactivity: automating C-Sat analysis for services industry. In: KDD ’08: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 911–919. Las Vegas, Nevada, USA, ACM (2008)
Google Scholar
Johnson, H.L., Cohen, K.B., Hunter, L.: A fault model for ontology mapping, alignment, and linking systems. In: Pacific Symposium on Biocomputing, pp. 233–268. Publisher World Scientific, Singapore (2007)
Google Scholar
Inniss, T.R., Lee, J.R., Light, M., Grassi, M.A., Thomas, G., Williams, A.B.: Towards applying text mining and natural language processing for biomedical ontology acquisition, In: TMBIO’06: Proceedings of the 1st International Workshop on Text Mining in Bioinformatics, pp. 7–14, Arlington, Virginia, USA, ACM, (2006)
Google Scholar
Spasic, I., Ananiadou, S.: Using automatically learnt verb selectional preferences for classification of biomedical terms. J. Biomed. Inform. 37(6), 483–497 (2004)
Article Google Scholar
Xiong, L., Chitti, S., Liu. L.: k nearest neighbor classification across multiple private databases. In: CIKM’06: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, pp. 840–841. Arlington, Virginia, USA, ACM, (2006)
Google Scholar
Song, Y., Huang, J., Zhou, D., Zha, H., Giles, C.L.: IKNN: informative K-nearest neighbor pattern classification, PKDD. Lecture Notes in Computer Science, vol 4702, pp. 248–264. Springer (2007)
Google Scholar
Zhang, C., Lu, X., Zhang, X.: Significance of gene ranking for classification of microarray samples. IEEE/ACM Trans. Comput. Biol. Bioinformatics 3(3), 312–320 (2006)
Google Scholar
Evert, S.: Significance tests for the evaluation of ranking methods. COLING’04: Proceedings of the 20th International Conference on Computational Linguistics, p. 945. Association for Computational Linguistics, Geneva, Switzerland, (2004)
Google Scholar
Smucker, M.D., Allan, J., Carterette, B.: A comparison of statistical significance tests for information retrieval evaluation. CIKM’07: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, pp. 623–632. Lisbon, Portugal, ACM, (2007)
Google Scholar
Smucker, M.D., Allan, J., Carterette, B.: Agreement among statistical significance tests for information retrieval evaluation at varying sample sizes. SIGIR’09: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 630-631. Boston, MA, USA, ACM, (2009)
Google Scholar
Zhang, L., Zhang, D., Simoff, S.J., Debenham, J.: Weighted kernel model for text categorization. AusDM’06: Proceedings of the Fifth Australasian Conference on Data Mining and Analystics, pp. 111–114. Sydney, Australia, Australian Computer Society Inc, (2006)
Google Scholar
Klopotek, M.A.: Very large Bayesian multinets for text classification. Future Gener. Comput. Syst. 21(7), 1068–1082 (2005)
Google Scholar
Brants, T.: Natural language processing in information retrieval. CLIN, Antwerp papers in linguistics, University of Antwerp, vol 111 (2003)
Google Scholar
Carvalho, G., de Matos, D.M.., Rocio, V.: Document retrieval for question answering: a quantitative evaluation of text preprocessing. PIKM ’07: Proceedings of the ACM First Ph.D. Workshop in CIKM, pp. 125–130. Lisbon, Portugal, ACM, (2007)
Google Scholar
Porter, M.: Porter Stemming Algorithm http://tartarus.org/martin/PorterStemmer, http://tartarus.org/martin/PorterStemmer, (1977)
Elkan, C.: Deriving TF-IDF as a fisher kernel, SPIRE. Lect. Notes Comput. Sci. 3772, 295–300 (2005)
Google Scholar
Joachims, T.: A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. In: ICML, pp. 143–151 (1997)
Google Scholar

Download references

Acknowledgments

This material is based upon work supported by the National Science Foundation under Grant No. IDM-0415190.

Author information

Authors and Affiliations

Math, Science, and Technology Department, University of Minnesota Crookston, Crookston, MN, 56716, USA
Omar Al-Azzam, Jianfei Wu, Loai Al-Nimer, Charith Chitraranjan & Anne M. Denton
Department of Computer Science, North Dakota State University, Fargo, ND, 58105, USA
Omar Al-Azzam, Jianfei Wu, Loai Al-Nimer, Charith Chitraranjan & Anne M. Denton

Authors

Omar Al-Azzam
View author publications
You can also search for this author in PubMed Google Scholar
Jianfei Wu
View author publications
You can also search for this author in PubMed Google Scholar
Loai Al-Nimer
View author publications
You can also search for this author in PubMed Google Scholar
Charith Chitraranjan
View author publications
You can also search for this author in PubMed Google Scholar
Anne M. Denton
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Omar Al-Azzam .

Editor information

Editors and Affiliations

Faculty of Commerce, Kansai University, Osaka, Japan
Katsutoshi Yada

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Al-Azzam, O., Wu, J., Al-Nimer, L., Chitraranjan, C., Denton, A.M. (2014). A Weighted Density-Based Approach for Identifying Standardized Items that are Significantly Related to the Biological Literature. In: Yada, K. (eds) Data Mining for Service. Studies in Big Data, vol 3. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45252-9_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-45252-9_6
Published: 04 January 2014
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45251-2
Online ISBN: 978-3-642-45252-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics