Abstract
Alzheimer’s disease (AD) is one of degenerative brain diseases, whose cause is hard to be diagnosed accurately. As the number of AD patients has increased, researchers have strived to understand the disease and develop its treatment, such as medical experiments and literature analysis. In the area of literature analysis, several traditional studies analyzed the literature at the macro level like author, journal, and institution. However, analysis of the literature both at the macro level and micro level will allow for better recognizing the AD research field. Therefore, in this study we adopt a more comprehensive approach to analyze the AD literature, which consists of productivity analysis (year, journal/proceeding, author, and Medical Subject Heading terms), network analysis (co-occurrence frequency, centrality, and community) and content analysis. To this end, we collect metadata of 96,081 articles retrieved from PubMed. We specifically perform the concept graph-based network analysis applying the five centrality measures after mapping the semantic relationship between the UMLS concepts from the AD literature. We also analyze the time-series topical trend using the Dirichlet multinomial regression topic modeling technique. The results indicate that the year 2013 is the most productive year and Journal of Alzheimer’s Disease the most productive journal. In discovery of the core biological entities and their relationships resided in the AD related PubMed literature, the relationship with glycogen storage disease is founded most frequently mentioned. In addition, we analyze 16 main topics of the AD literature and find a noticeable increasing trend in the topic of transgenic mouse.
Similar content being viewed by others
Notes
“Unified Medical Language System (UMLS)—Home” [Online]. http://www.nlm.nih.gov/research/umls/. Accessed: 8 Feb 2013.
“National Library of Medicine—National Institutes of Health” [Online]. http://www.nlm.nih.gov/. Accessed 8 Feb 2013.
References
Al-Mubaid, H., & Singh, R. K. (2005). A new text mining approach for finding protein-to-disease associations. American Journal of Biochemistry and Biotechnology, 1(3), 145.
Andreasen, T., Bulskov, H., Jensen, P. A., & Lassen, T. (2009). Conceptual indexing of text using ontologies and lexical resources. Presented at the Proceedings of the eighth international conference on flexible query answering systems (Vol. 5822, pp. 323–332). Berlin: Springer.
Ansari, M. A., Gul, S., & Yaseen, M. (2006). Alzheimer’s disease: A bibliometric study. Trends in Information Management (TRIM), 2(2), 130–140.
Bachman, D., Wolf, P. A., Linn, R., Knoefel, J., Cobb, J., Belanger, A., … D’Agostino, R. (1993). Incidence of dementia and probable Alzheimer’s disease in a general population The Framingham Study. Neurology, 43(3 Part 1), 515–515.
Barnes, L., Wilson, R., Schneider, J., Bienias, J., Evans, D., & Bennett, D. (2003). Gender, cognitive decline, and risk of AD in older persons. Neurology, 60(11), 1777–1781.
Bastian, M., Heymann, S., & Jacomy, M. (2009). Gephi: An open source software for exploring and manipulating networks. (pp. 361–362). Presented at the International AAAI Conference on Weblogs and Social Media, ICWSM 2009.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
Bleik, S., Song, M., Smalter, A., Huan, J., & Lushington, G. (2009). CGM: A biomedical text categorization approach using concept graph mining (pp. 38–43). Presented at the IEEE International Conference on Bioinformatics and Biomedicine Workshop, 2009, BIBMW 2009.
Blondel, V. D., Guillaume, J.-L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), P10008.
Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30(1), 107–117.
Brookmeyer, R., Johnson, E., Ziegler-Graham, K., & Arrighi, H. M. (2007). Forecasting the global burden of Alzheimer’s disease. Alzheimer’s & Dementia, 3(3), 186–191.
Cavnar, W. B., & Trenkle, J. M. (1994). N-gram-based text categorization. Proceedings of 3rd annual symposium on document analysis and information retrieval, 48113(2), 161–175.
Chen, H., Wan, Y., Jiang, S., & Cheng, Y. (2014). Alzheimer’s disease research in the future: bibliometric analysis of cholinesterase inhibitors from 1993 to 2012. Scientometrics, 98(3), 1865–1877.
Chen, Y.-M., Wang, X.-L., & Liu, B.-Q. (2005). Multi-document summarization based on lexical chains. 2005. Presented at the Proceedings of 2005 IEEE international conference on machine learning and cybernetics (Vol. 3, pp. 1937–1942).
Damashek, M. (1995). Gauging similarity with n-grams: Language-independent categorization of text. Science, 267(5199), 843–848.
Ercan, G., & Cicekli, I. (2007). Using lexical chains for keyword extraction. Information Processing and Management, 43(6), 1705–1714.
Erhardt, R. A., Schneider, R., & Blaschke, C. (2006). Status of text-mining techniques applied to biomedical text. Drug Discovery Today, 11(7), 315–325.
Evans, D. A., Bennett, D. A., Wilson, R. S., Bienias, J. L., Morris, M. C., Scherr, P. A., et al. (2003). Incidence of Alzheimer disease in a biracial urban community: Relation to apolipoprotein E allele status. Archives of Neurology, 60(2), 185–189.
Gong, Y., & Liu, X. (2001). Generic text summarization using relevance measure and latent semantic analysis. Presented at the Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval (pp. 19–25). New York: ACM.
Hebert, L. E., Scherr, P. A., McCann, J. J., Beckett, L. A., & Evans, D. A. (2001). Is the risk of developing Alzheimer’s disease greater for women than for men? American Journal of Epidemiology, 153(2), 132–136.
Huang, C., Tian, Y., Zhou, Z., Ling, C. X., & Huang, T. (2006). Keyphrase extraction using semantic networks structure analysis (pp. 275–284). Presented at the Sixth IEEE international conference on data mining, ICDM’06.
Krauthammer, M., Kaufmann, C. A., Gilliam, T. C., & Rzhetsky, A. (2004). Molecular triangulation: bridging linkage and molecular-network information for identifying candidate genes in Alzheimer’s disease. Proceedings of the National Academy of Sciences of the United States of America, 101(42), 15148–15153.
Kukull, W. A., Higdon, R., Bowen, J. D., McCormick, W. C., Teri, L., Schellenberg, G. D., et al. (2002). Dementia and Alzheimer disease incidence: A prospective cohort study. Archives of Neurology, 59(11), 1737–1746.
Lambiotte, R., Delvenne, J. C., & Barahona, M. (2009). Laplacian dynamics and multiscale modular structure in networks. ArXiv preprint arXiv: 0812.1770.
Li, J., Zhu, X., & Chen, J. Y. (2009). Building disease-specific drug-protein connectivity maps from molecular interaction networks and PubMed abstracts. PLoS Computational Biology, 5(7), e1000450. doi:10.1371/journal.pcbi.1000450.
Lindberg, D. A., Humphreys, B. L., & McCray, A. T. (1993). The unified medical language system. Methods of Information in Medicine, 32(4), 281–291.
Miech, R., Breitner, J., Zandi, P., Khachaturian, A., Anthony, J., & Mayer, L. (2002). Incidence of AD may decline in the early 90 s for men, later for women The Cache County study. Neurology, 58(2), 209–218.
Mimno, D., & McCallum, A. (2008). Topic models conditioned on arbitrary features with dirichlet-multinomial regression. Presented at the Proceedings of the 24th conference on uncertainty in artificial intelligence (pp. 411–418).
Orešič, M., Lötjönen, J., & Soininen, H. (2010). Systems medicine and the integration of bioinformatic tools for the diagnosis of Alzheimer’s disease. Genome Medicine, 2(11), 83.
Ravetti, M. G., Rosso, O. A., Berretta, R., & Moscato, P. (2010). Uncovering molecular biomarkers that correlate cognitive decline with the changes of hippocampus’ gene expression profiles in Alzheimer’s disease. PLoS One, 5(4), e10153. doi:10.1371/journal.pone.0010153.
Rocca, W. A., Cha, R. H., Waring, S. C., & Kokmen, E. (1998). Incidence of dementia and Alzheimer’s disease: A reanalysis of data from Rochester, Minnesota, 1975–1984. American Journal of Epidemiology, 148(1), 51–62.
Salton, G., Wong, A., & Yang, C.-S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620.
Seshadri, S., Wolf, P., Beiser, A., Au, R., McNulty, K., White, R., et al. (1997). Lifetime risk of dementia and Alzheimer’s disease: The impact of mortality on risk estimates in the Framingham Study. Neurology, 49(6), 1498–1504.
Shehata, S., Karray, F., & Kamel, M. (2007). A concept-based model for enhancing text categorization. Presented at the Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 629–637). New York: ACM.
Smalheiser, N. R., & Swanson, D. R. (1996). Linking estrogen to Alzheimer’s disease: An informatics approach. Neurology, 47(3), 809–810.
Smalheiser, N. R., & Swanson, D. R. (1998). Using ARROWSMITH: A computer-assisted approach to formulating and assessing scientific hypotheses. Computer Methods and Programs in Biomedicine, 57(3), 149–153.
Song, M., Kim, S., Zhang, G., Ding, Y., & Chambers, T. (2014). Productivity and influence in bioinformatics: A bibliometric analysis using PubMed central. Journal of the Association for Information Science and Technology, 65(2), 352–371. doi:10.1002/asi.22970.
Sorensen, A. A. (2009). Alzheimer’s disease research: scientific productivity and impact of the top 100 investigators in the field. Journal of Alzheimer’s Disease, 16(3), 451–465.
Sorensen, A. A., Seary, A., & Riopelle, K. (2010). Alzheimer’s disease research: A COIN study using co-authorship network analytics. Procedia-Social and Behavioral Sciences, 2(4), 6582–6586. doi:10.1016/j.sbspro.2010.04.068.
Thota, H., Rao, A. A., Reddi, K. K., Akula, S., Changalasetty, S. B., & Srinubabu, G. (2007). Alzheimer’s disease care and management: Role of information technology. Bioinformation, 2(3), 91–95.
Turney, P. D., & Pantel, P. (2010). From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 37(1), 141–188.
Wan, X., Yang, J., & Xiao, J. (2007). Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction (Vol. 45(1), p 552). Presented at the Annual Meeting-Association for Computational Linguistics.
Wasserman, S., & Faust, K. (1994). Social network analysis. Cambridge: Cambridge University.
Acknowledgments
This work was supported by the Bio-Synergy Research Project (2013M3A9C4078138) of the Ministry of Science, ICT and Future Planning through the National Research Foundation.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Song, M., Heo, G.E. & Lee, D. Identifying the landscape of Alzheimer’s disease research with network and content analysis. Scientometrics 102, 905–927 (2015). https://doi.org/10.1007/s11192-014-1372-x
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-014-1372-x