Advertisement

Scientometrics

, Volume 109, Issue 2, pp 615–628 | Cite as

Classifying modeling and simulation as a scientific discipline

  • Ross GoreEmail author
  • Saikou Diallo
  • Jose Padilla
Article

Abstract

The body of knowledge related to modeling and simulation (M&S) comes from a variety of constituents: (1) practitioners and users, (2) tool developers and (3) theorists and methodologists. Previous work has shown that categorizing M&S as a concentration in an existing, broader disciple is inadequate because it does not provide a uniform basis for research and education across all institutions. This article presents an approach for the classification of M&S as a scientific discipline and a framework for ensuing analysis. The novelty of the approach lies in its application of machine learning classification to documents containing unstructured text (e.g. publications, funding solicitations) from a variety of established and emerging disciplines related to modeling and simulation. We demonstrate that machine learning classification models can be trained to accurately separate M&S from related disciplines using the abstracts of well-index research publication repositories. We evaluate the accuracy of our trained classifiers using cross-fold validation. Then, we demonstrate that our trained classifiers can effectively identify a set of previously unseen M&S funding solicitations and grant proposals. Finally, we use our approach to uncover new funding trends in M&S and support a uniform basis for education and research.

Keywords

Simulation Research History of OR Machine learning 

Notes

Acknowledgments

We gratefully acknowledge the support of our colleagues at the Virginia Modeling, Analysis and Simulation Center (VMASC), University of Virginia (UVA) and Gettysburg College in manually classifying the 1000 NSF and NIH Grants used in the evaluation.

References

  1. Aboelela, S. W., Larson, E., Bakken, S., Carrasquillo, O., Formicola, A., Glied, S. A., et al. (2007). Defining interdisciplinary research: Conclusions from a critical review of the literature. Health Services Research, 42(1p1), 329–346.CrossRefGoogle Scholar
  2. Alpaydin, E. (2004). Introduction to machine learning. Cambridge: The MIT Press.zbMATHGoogle Scholar
  3. Argamon, S., Koppel, M., & Avneri, G. (1998). Routing documents according to style. In First international workshop on innovative information systems, pp. 85–92. Citeseer.Google Scholar
  4. Baird, L., & Moore, A. W. (1999). Gradient descent for general reinforcement learning. Advances in Neural Information Processing Systems, 20, 968–974.Google Scholar
  5. Balci, O. (2001). A methodology for certification of modeling and simulation applications. ACM Transactions on Modeling and Computer Simulation (TOMACS), 11(4), 352–377.MathSciNetCrossRefGoogle Scholar
  6. Börner, K., Klavans, R., Patek, M., Zoss, A. M., Biberstine, J. R., Light, R. P., et al. (2012). Design and update of a classification system: The ucsd map of science. PLoS One, 7(7), e39464.CrossRefGoogle Scholar
  7. Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010, pp. 177–186. Springer.Google Scholar
  8. Bourke, P., & Butler, L. (1998). Institutions and the map of science: Matching university departments and fields of research1. Research Policy, 26(6), 711–718.CrossRefGoogle Scholar
  9. Crookall, D. (2010). Serious games, debriefing, and simulation/gaming as a discipline. Simulation and Gaming, 41(6), 898–920.CrossRefGoogle Scholar
  10. Efron, B., & Gong, G. (1983). A leisurely look at the bootstrap, the jackknife, and cross-validation. The American Statistician, 37(1), 36–48.MathSciNetGoogle Scholar
  11. Eyheramendy, S., Lewis, D., & Madigan, D. (2003). On the naive bayes model for text categorization. In Proceedings of the ninth international workshop on artificial intelligence and statistics, pp. 705–722.Google Scholar
  12. Fox, C. (1989). A stop list for general text. In ACM SIGIR forum (Vol. 24, pp. 19–21). ACM.Google Scholar
  13. Glänzel, W., & Schubert, A. (2005). Analysing scientific networks through co-authorship. In H. F. Moed, W. Glänzel, K. U. Leuven & U. Schmoch (Eds.), Handbook of quantitative science and technology research (pp. 257–276). New York, NY: Springer.Google Scholar
  14. Glänzel, W. (1996). The need for standards in bibliometric research and technology. Scientometrics, 35(2), 167–176.CrossRefGoogle Scholar
  15. Glänzel, W., & Moed, H. F. (2002). Journal impact measures in bibliometric research. Scientometrics, 53(2), 171–193.CrossRefGoogle Scholar
  16. Glenisson, P., Glänzel, W., Janssens, F., & De Moor, B. (2005). Combining full text and bibliometric information in mapping scientific disciplines. Information Processing and Management, 41(6), 1548–1572.CrossRefGoogle Scholar
  17. Herrera, M., Roberts, D. C., & Gulbahce, N. (2010). Mapping the evolution of scientific fields. PloS One, 5(5), e10355.CrossRefGoogle Scholar
  18. Hinze, S. (1994). Bibliographical cartography of an emerging interdisciplinary discipline: The case of bioelectronics. Scientometrics, 29(3), 353–376.CrossRefGoogle Scholar
  19. Hu, X., Downie, J. S., & Ehmann, A. F. (2009). Lyric text mining in music mood classification. American Music, 183(5,049), 2–209.Google Scholar
  20. Ioannidis, J. P. A. (2006). Concentration of the most-cited papers in the scientific literature: Analysis of journal ecosystems. PLoS One, 1(1), e5.CrossRefGoogle Scholar
  21. Jahn, N., Fenner, M., & Schirrwagen, J. (2013). PlosopenR–exploring FP7 funded PLOS plosopenR–exploring FP7 funded PLOS. Information Services & Use, 33(2), 93–101.Google Scholar
  22. Jordan, A. (2002). On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. Advances in Neural Information Processing Systems, 14, 841.Google Scholar
  23. Katz, J. S., & Hicks, D. (1995). The classification of interdisciplinary journals: A new approach. In Proceeding of the fifth biennial conference of the international society for scientometrics and informatics, pp. 7–10.Google Scholar
  24. Kaur, J., Hoang, D. T., Sun, X., Possamai, L., JafariAsbagh, M., Patil, S., et al. (2012). Scholarometer: A social framework for analyzing impact across disciplines. PloS One, 7(9), e43235.CrossRefGoogle Scholar
  25. Kim, S.-B., Han, K.-S., Rim, H.-C., & Myaeng, S. H. (2006). Some effective techniques for naive bayes text classification. IEEE Transactions on Knowledge and Data Engineering, 18(11), 1457–1466.CrossRefGoogle Scholar
  26. Kohavi, R., et al. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In International joint conference on artificial intelligence (Vol. 14, pp. 1137–1145). Lawrence Erlbaum Associates Ltd.Google Scholar
  27. Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159–174.MathSciNetCrossRefzbMATHGoogle Scholar
  28. Lewis, D. D. (1998). Naive (bayes) at forty: The independence assumption in information retrieval. In D. E. Chemnitz (Ed.), Machine learning: ECML-98 (pp. 4–15). New York, NY: Springer.Google Scholar
  29. Lin, F.-R., Hsieh, L.-S., & Chuang, F.-T. (2009). Discovering genres of online discussion threads via text mining. Computers and Education, 52(2), 481–495.CrossRefGoogle Scholar
  30. Liu, B., & Zhang, L. (2012). A survey of opinion mining and sentiment analysis. In C. Aggarwal & C. Zhai (Eds.), Mining text data (pp. 415–463). New York, NY: Springer.Google Scholar
  31. Mayr, E. (2004). What makes biology unique? Considerations on the autonomy of a scientific discipline. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  32. McCallum, A., Nigam, K., et al. (1998). A comparison of event models for naive bayes text classification. In AAAI-98 workshop on learning for text categorization (Vol. 752, pp. 41–48). Citeseer.Google Scholar
  33. Miltsakaki, E., & Troutt, A. (2008). Real-time web text classification and analysis of reading difficulty. In Proceedings of the third workshop on innovative use of NLP for building educational applications, pp. 89–97. Association for Computational Linguistics.Google Scholar
  34. Nederhof, A. J., & Noyons, E. C. M. (1992). Assessment of the international standing of university departments’ research: A comparison of bibliometric methods. Scientometrics, 24(3), 393–404.CrossRefGoogle Scholar
  35. NIH. (2003). National Institute of Health Research Awards 1990–2012 via Exporter. http://exporter.nih.gov/. Accessed June 19, 2013.
  36. Noyons, E. (2001). Bibliometric mapping of science in a policy context. Scientometrics, 50(1), 83–98.CrossRefGoogle Scholar
  37. Noyons, E. C. M., Moed, H. F., & Luwel, M. (1999). Combining mapping and citation analysis for evaluative bibliometric purposes: A bibliometric study. Journal of the Association for Information Science and Technology, 50(2), 115.Google Scholar
  38. Pazzani, M., & Meyers, A. (2003). NSF Research Award Abstracts 1990–2003 Data Set. http://archive.ics.uci.edu/ml/datasets/NSF+Research+Award+Abstracts+1990-2003. Accessed June 19, 2013.
  39. Rajman, M., & Besançon, R. (1998). Text mining: Natural language techniques and text mining applications. In S. Spaccapietra & F. Maryanski (Eds.), Data mining and reverse engineering (pp. 50–64). New York, NY: Springer.Google Scholar
  40. Salter, L., & Hearn, A. (1997). Outside the lines: Issues in interdisciplinary research. Montreal: McGill-Queen’s Press-MQUP.Google Scholar
  41. Sarjoughian, H. S., & Zeigler, B. P. (2001). Towards making modeling & simulation into a discipline. Simulation Series, 33(2), 130–135.Google Scholar
  42. Searls, D. B. (2010). The roots of bioinformatics. PLoS Computational Biology, 6(6), e1000809.CrossRefGoogle Scholar
  43. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys (CSUR), 34(1), 1–47.CrossRefGoogle Scholar
  44. Vessey, I., Ramesh, V., & Glass, R. L. (2005). A unified classification system for research in the computing disciplines. Information and Software Technology, 47(4), 245–255.CrossRefGoogle Scholar
  45. Vinkler, P. E. T. E. R. (1988). An attempt of surveying and classifying bibliometric indicators for scientometric purposes. Scientometrics, 13(5–6), 239–259.CrossRefGoogle Scholar
  46. Wallace, M. L., Larivière, V., & Gingras, Y. (2012). A small world of citations? The influence of collaboration networks on citation practices. PloS One, 7(3), e33339.CrossRefGoogle Scholar
  47. Wang, B., & PAN, W. (2005). A survey of content-based anti-spam email filtering [j]. Journal of Chinese Information Processing, 5, 000.Google Scholar
  48. Wei, C.-H., Harris, B. R., Li, D., Berardini, T. Z., Huala, E., Kao, H.-Y., et al. (2012). Accelerating literature curation with text mining tools: A case study of using PubTator to curate genes in PubMed abstracts. Database, 2012, bas041. doi: 10.1093/database/bas041.CrossRefGoogle Scholar
  49. White, J. (2001). Open portal for digital library. Communications of the ACM, 44(7), 14–44.CrossRefGoogle Scholar
  50. Yu, B. (2008). An evaluation of text classification methods for literary study. Literary and Linguistic Computing, 23(3), 327–343.CrossRefGoogle Scholar
  51. Zhang, T. (2004). Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the twenty-first international conference on machine learning, p. 116. ACM.Google Scholar

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2016

Authors and Affiliations

  1. 1.Virginia Modeling, Analysis and Simulation CenterOld Dominion UniversityNorfolkUSA

Personalised recommendations