Detecting the knowledge structure of bioinformatics by mining full-text collections
Rent the article at a discountRent now
* Final gross prices may vary according to local VAT.Get Access
Bioinformatics is a fast-growing, diverse research field that has recently gained much public attention. Even though there are several attempts to understand the field of bioinformatics by bibliometric analysis, the proposed approach in this paper is the first attempt at applying text mining techniques to a large set of full-text articles to detect the knowledge structure of the field. To this end, we use PubMed Central full-text articles for bibliometric analysis instead of relying on citation data provided in Web of Science. In particular, we develop text mining routines to build a custom-made citation database as a result of mining full-text. We present several interesting findings in this study. First, the majority of the papers published in the field of bioinformatics are not cited by others (63 % of papers received less than two citations). Second, there is a linear, consistent increase in the number of publications. Particularly year 2003 is the turning point in terms of publication growth. Third, most researches of bioinformatics are driven by USA-based institutes followed by European institutes. Fourth, the results of topic modeling and word co-occurrence analysis reveal that major topics focus more on biological aspects than on computational aspects of bioinformatics. However, the top 10 ranked articles identified by PageRank are more related to computational aspects. Fifth, visualization of author co-citation analysis indicates that researchers in molecular biology or genomics play a key role in connecting sub-disciplines of bioinformatics.
- Albarrán, P, Ruiz-Castillo, J (2011) References made and citations received by scientific articles. Journal of the American Society for Information Science and Technology 62: pp. 40-49 CrossRef
- Altschul, SF, Gish, W, Miller, W, Myers, EW, Lipman, DJ (1990) Basic local alignment search tool. Journal of Molecular Biology 215: pp. 403-410
- Altschul, SF, Madden, TL, Schäffer, AA, Zhang, J, Zhang, Z, Miller, W, Lipman, DJ (1997) Gapped BLAST and PSIBLAST: a new generation of protein database search programs. Nucleic Acids Research 25: pp. 3389-3402 CrossRef
- Ashburner, M, Ball, CA, Blake, JA, Botstein, D, Butler, H, Cherry, M, Davis, AP, Dolinski, K, Dwight, SS, Eppig, JT (2000) Gene Ontology: tool for the unification of biology. Nature Genetics 25: pp. 25-29 CrossRef
- Bansard, JY, Rebholz-Schuhman, D, Cameron, G, Clark, D, Mulligen, E, Beltrame, F, Barbolla, E, Martin-Sanchez, F, Milanesi, L, Tollis, I, Lei, J, Coatrieux, JL (2007) Medical informatics and bioinformatics: a bibliometric study. IEEE Transactions on Information Technology in Biomedicine 11: pp. 237-243 CrossRef
- Belew, R.K. (2005). Scientific impact quantity and quality: Analysis of two sources of bibliographic data. arXiv:cs.IR/0504036 v1. pp. 1–12.
- Blei, D, Ng, A, Jordan, M (2003) Latent Dirichlet allocation. Journal of Machine Learning Research 3: pp. 993-1022
- Brusic, V (2007) The growth of bioinformatics. Briefings in Bioinformatics. 8: pp. 69-70 CrossRef
- Butler, L. (2006). RQF Pilot Study Project—History and Political Science Methodology for Citation Analysis, November 2006. http://www.chass.org.au/papers/PAP20061102LB.php. Accessed 14 Oct 2012.
- Chen, C, Ibekwe-SanJuan, F, Hou, J (2010) The structure and dynamics of cocitation clusters: A multiple-perspective cocitation analysis. Journal of American Society for Information Science 61: pp. 1386-1409 CrossRef
- Church, K, Hanks, P (1990) Word association norms, mutual information and lexicography. Computational Linguistics 16: pp. 22-29
- Ding, Y, Yan, E, Frazho, A, Caverlee, J (2009) PageRank for ranking authors in co-citation networks. Journal of the American Society for Information Science and Technology 60: pp. 2229-2243 CrossRef
- Dunning, T (1993) Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19: pp. 61-74
- Franceschet, M (2011) The skewness of computer science. Information Processing and Management 47: pp. 117-124 CrossRef
- Glänzel, W, Janssens, F, Thijs, B (2009) A comparative analysis of publication activity and citation impact based on the core literature in bioinformatics. Scientometrics 79: pp. 109-129 CrossRef
- Huang, H, Andrews, J, Tang, J (2011) Citation characterization and impact normalization in bioinformatics journals. Journal of the American Society of Information Science and Technology 63: pp. 490-497 CrossRef
- Ibáñez, A, Larrañaga, P, Bielza, C (2009) Predicting citation count of Bioinformatics papers within four years of publication. Bioinformatics 25: pp. 3303-3309 CrossRef
- Janssens, F., Glänzel, W., & De Moor, B. (2007). Dynamic hybrid clustering of bioinformatics by incorporating text mining and citation analysis. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 07), pp. 360–369.
- Jeong, S, Lee, S, Kim, HG (2009) Are you an invited speaker? A bibliometric analysis of elite groups for scholarly events in bioinformatics. Journal of the American Society for Information Science and Technology 60: pp. 1118-1131 CrossRef
- Luscombe, N. M., Greenbaum, D, & Gerstein, M. (2001). What is bioinformatics? A proposed definition and overview of the field. Methods of Information in Medicine, 40, 346–58.
- Manoharan, A., Kanagavel, B., Muthuchidambaram, A., Kumaravel, J.P.S. (2011) Bioinformatics Research – An Informetric View. In 2011 International Conference on Information Communication and Management (IPCSIT) vol.16.
- Maslov, S., & Redner, S. (2008). Promise and pitfalls of extending Google’s PageRank algorithm to citation networks. Journal of Neuroscience, 28(44), 11103–11105.
- Osareh, F (1996) Bibliometrics, citation analysis and co-citation analysis: A review of literature I. Libri 46: pp. 149-158 CrossRef
- Patra, SK, Mishra, S (2006) Bibliometric study of bioinformatics literature. Scientometrics 67: pp. 477-489
- Perez-Iratxeta, C, Andrade-Navarro, MA, Wren, JD (2007) Evolving research trends in bioinformatics. Briefings in Bioinformatics 8: pp. 88-95 CrossRef
- Ratinov, L., & Roth D. (2009). Design challenges and misconceptions in named entity recognition. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 09), pp. 147–155.
- Seglen, PO (1992) The skewness of science. Journal of the American Society for Information Science 43: pp. 628-638 CrossRef
- Song, M., & Chung, Y.K. (2013). Mining citation data for automatic author co-citation analysis, to be submitted to Information Processing and Management.
- Stringer, MJ, Sales-Pardo, M, Nunes Amaral, LA (2010) Statistical validation of a global model for the distribution of the ultimate number of citations accrued by papers published in a scientific journal. Journal of the American Society for Information Science and Technology 61: pp. 1377-1385 CrossRef
- Raan, AFJ (2006) Statistical properties of bibliometric indicators: Research group indicator distributions and correlations. Journal of the American Society for Information Science and Technology 57: pp. 408-430 CrossRef
- White, HD, Griffith, BC (1981) Author cocitation: A literature measure of intellectual structure. Journal of American Society for Information Science 32: pp. 163-171 CrossRef
- White, HD, McCain, KW (1998) Visualizing a discipline: An author co-citation analysis of information science, 1972–1995. Journal of the American Society for Information Science 49: pp. 327-355
- Detecting the knowledge structure of bioinformatics by mining full-text collections
Volume 96, Issue 1 , pp 183-201
- Cover Date
- Print ISSN
- Online ISSN
- Springer Netherlands
- Additional Links
- Text mining
- PubMed Central
- Industry Sectors