Classifying modeling and simulation as a scientific discipline
- 659 Downloads
The body of knowledge related to modeling and simulation (M&S) comes from a variety of constituents: (1) practitioners and users, (2) tool developers and (3) theorists and methodologists. Previous work has shown that categorizing M&S as a concentration in an existing, broader disciple is inadequate because it does not provide a uniform basis for research and education across all institutions. This article presents an approach for the classification of M&S as a scientific discipline and a framework for ensuing analysis. The novelty of the approach lies in its application of machine learning classification to documents containing unstructured text (e.g. publications, funding solicitations) from a variety of established and emerging disciplines related to modeling and simulation. We demonstrate that machine learning classification models can be trained to accurately separate M&S from related disciplines using the abstracts of well-index research publication repositories. We evaluate the accuracy of our trained classifiers using cross-fold validation. Then, we demonstrate that our trained classifiers can effectively identify a set of previously unseen M&S funding solicitations and grant proposals. Finally, we use our approach to uncover new funding trends in M&S and support a uniform basis for education and research.
KeywordsSimulation Research History of OR Machine learning
We gratefully acknowledge the support of our colleagues at the Virginia Modeling, Analysis and Simulation Center (VMASC), University of Virginia (UVA) and Gettysburg College in manually classifying the 1000 NSF and NIH Grants used in the evaluation.
- Argamon, S., Koppel, M., & Avneri, G. (1998). Routing documents according to style. In First international workshop on innovative information systems, pp. 85–92. Citeseer.Google Scholar
- Baird, L., & Moore, A. W. (1999). Gradient descent for general reinforcement learning. Advances in Neural Information Processing Systems, 20, 968–974.Google Scholar
- Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010, pp. 177–186. Springer.Google Scholar
- Eyheramendy, S., Lewis, D., & Madigan, D. (2003). On the naive bayes model for text categorization. In Proceedings of the ninth international workshop on artificial intelligence and statistics, pp. 705–722.Google Scholar
- Fox, C. (1989). A stop list for general text. In ACM SIGIR forum (Vol. 24, pp. 19–21). ACM.Google Scholar
- Glänzel, W., & Schubert, A. (2005). Analysing scientific networks through co-authorship. In H. F. Moed, W. Glänzel, K. U. Leuven & U. Schmoch (Eds.), Handbook of quantitative science and technology research (pp. 257–276). New York, NY: Springer.Google Scholar
- Hu, X., Downie, J. S., & Ehmann, A. F. (2009). Lyric text mining in music mood classification. American Music, 183(5,049), 2–209.Google Scholar
- Jahn, N., Fenner, M., & Schirrwagen, J. (2013). PlosopenR–exploring FP7 funded PLOS plosopenR–exploring FP7 funded PLOS. Information Services & Use, 33(2), 93–101.Google Scholar
- Jordan, A. (2002). On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. Advances in Neural Information Processing Systems, 14, 841.Google Scholar
- Katz, J. S., & Hicks, D. (1995). The classification of interdisciplinary journals: A new approach. In Proceeding of the fifth biennial conference of the international society for scientometrics and informatics, pp. 7–10.Google Scholar
- Kohavi, R., et al. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In International joint conference on artificial intelligence (Vol. 14, pp. 1137–1145). Lawrence Erlbaum Associates Ltd.Google Scholar
- Lewis, D. D. (1998). Naive (bayes) at forty: The independence assumption in information retrieval. In D. E. Chemnitz (Ed.), Machine learning: ECML-98 (pp. 4–15). New York, NY: Springer.Google Scholar
- Liu, B., & Zhang, L. (2012). A survey of opinion mining and sentiment analysis. In C. Aggarwal & C. Zhai (Eds.), Mining text data (pp. 415–463). New York, NY: Springer.Google Scholar
- McCallum, A., Nigam, K., et al. (1998). A comparison of event models for naive bayes text classification. In AAAI-98 workshop on learning for text categorization (Vol. 752, pp. 41–48). Citeseer.Google Scholar
- Miltsakaki, E., & Troutt, A. (2008). Real-time web text classification and analysis of reading difficulty. In Proceedings of the third workshop on innovative use of NLP for building educational applications, pp. 89–97. Association for Computational Linguistics.Google Scholar
- NIH. (2003). National Institute of Health Research Awards 1990–2012 via Exporter. http://exporter.nih.gov/. Accessed June 19, 2013.
- Noyons, E. C. M., Moed, H. F., & Luwel, M. (1999). Combining mapping and citation analysis for evaluative bibliometric purposes: A bibliometric study. Journal of the Association for Information Science and Technology, 50(2), 115.Google Scholar
- Pazzani, M., & Meyers, A. (2003). NSF Research Award Abstracts 1990–2003 Data Set. http://archive.ics.uci.edu/ml/datasets/NSF+Research+Award+Abstracts+1990-2003. Accessed June 19, 2013.
- Rajman, M., & Besançon, R. (1998). Text mining: Natural language techniques and text mining applications. In S. Spaccapietra & F. Maryanski (Eds.), Data mining and reverse engineering (pp. 50–64). New York, NY: Springer.Google Scholar
- Salter, L., & Hearn, A. (1997). Outside the lines: Issues in interdisciplinary research. Montreal: McGill-Queen’s Press-MQUP.Google Scholar
- Sarjoughian, H. S., & Zeigler, B. P. (2001). Towards making modeling & simulation into a discipline. Simulation Series, 33(2), 130–135.Google Scholar
- Wang, B., & PAN, W. (2005). A survey of content-based anti-spam email filtering [j]. Journal of Chinese Information Processing, 5, 000.Google Scholar
- Zhang, T. (2004). Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the twenty-first international conference on machine learning, p. 116. ACM.Google Scholar