Advertisement

Scientometrics

, Volume 119, Issue 2, pp 845–862 | Cite as

Automatic zone identification in scientific papers via fusion techniques

  • Nasrin Asadi
  • Kambiz Badie
  • Maryam Tayefeh MahmoudiEmail author
Article
  • 53 Downloads

Abstract

Zone identification is a topic in the area of text mining which helps researchers be benefited by the content of scientific papers in a satisfactory manner. The major aim of zone identification is to classify the sentences of scientific texts into some predefined zone categories which can be useful for summarization as well as information extraction. In this paper, we propose a two-level approach to zone identification within which the first level is in charge of classifying the sentences in a given paper based on some semantic and lexical features. In this respect, several machine learning algorithms such as Simple Logistics, Logistic Model Trees and Sequential Minimal Optimization are applied. The second level is responsible for applying fusion to the classification results obtained for consecutive sentences of the first level in order to make the final decision. The proposed method is evaluated on ART and DRI corpora as two well-known data sets. Results obtained for the accuracy of zone identification for these corpora are respectively 65.75% and 84.15%, which seem to be quite promising compared to those obtained by previous approaches.

Keywords

Zone identification Semantic features Logistic regression Fusion techniques Scientific paper 

References

  1. Agarwal, S., & Yu, H. (2009). Automatically classifying sentences in full-text biomedical articles into introduction, methods, results and discussion. Bioinformatics, 25(23), 3174–3180.CrossRefGoogle Scholar
  2. Badie, K., Asadi, N., & Tayefeh Mahmoudi, M. (2018). Zone identification based on features with high semantic richness and combining results of separate classifiers. Journal of Information and Telecommunication, 2(4), 411–427.CrossRefGoogle Scholar
  3. Barua, S. (2013). Multi-sensor information fusion for classification of driver’s physiological sensor data. Master's thesis, Mlardalen University, Sweden.Google Scholar
  4. Castanedo, F. (2013). A review of data fusion techniques. The Scientific World Journal, 2013, 1–19.CrossRefGoogle Scholar
  5. Dasigi, V., Mann, R. C., & Protopopescu, V. A. (2001). Information fusion for text classificationan experimental comparison. Pattern Recognition, 34(12), 2413–2425.CrossRefzbMATHGoogle Scholar
  6. Fisas, B., Saggion, H., & Ronzano, F. (2015). On the discoursive structure of computer graphics research papers. In LAW@ NAACL-HLT (pp. 42–51).Google Scholar
  7. Groza, T. (2013). Using typed dependencies to study and recognise conceptualisation zones in biomedical literature. PLoS ONE, 8(11), e79570.CrossRefGoogle Scholar
  8. Groza, T., Hassanzadeh, H., & Hunter, J. (2013). Recognizing scientific artifacts in biomedical literature. Biomedical Informatics Insights, 6, 15.Google Scholar
  9. Guo, Y., Korhonen, A., & Poibeau, T. (2011). A weakly-supervised approach to argumentative zoning of scientific documents. In Proceedings of the conference on empirical methods in natural language processing (pp. 273–283). Association for Computational Linguistics.Google Scholar
  10. Guo, Y., Korhonen, A., Silins, I., & Stenius, U. (2011). Weakly supervised learning of information structure of scientific abstractsis it accurate enough to benefit real-world tasks in biomedicine? Bioinformatics, 27(22), 3179–3185.CrossRefGoogle Scholar
  11. Guo, Y., Reichart, R., & Korhonen, A. (2015). Unsupervised declarative knowledge induction for constraint-based learning of information structure in scientific documents. Transactions of the Association for Computational Linguistics, 3, 131–143.CrossRefGoogle Scholar
  12. Guo, Y., Silins, I., Stenius, U., & Korhonen, A. (2013). Active learning-based information structure analysis of full scientific articles and two applications for biomedical literature review. Bioinformatics, 29(11), 1440–1447.CrossRefGoogle Scholar
  13. Heffernan, K., & Teufel, S. (2018). Identifying problems and solutions in scientific text. Scientometrics, 116(2), 1367–1382.CrossRefGoogle Scholar
  14. Hirohata, K., Okazaki, N., Ananiadou, S., & Ishizuka, M. (2008). Identifying sections in scientific abstracts using conditional random fields. In Proceedings of the third international joint conference on natural language processing: volume-I.Google Scholar
  15. Holmes, G., Donkin, A., & Witten, I.H. (1994). Weka: A machine learning workbench. In Proceedings of the second Australian and New Zealand conference on intelligent information systems (pp. 357–361). IEEE.Google Scholar
  16. Kiela, D., Guo, Y., Stenius, U., & Korhonen, A. (2014). Unsupervised discovery of information structure in biomedical documents. Bioinformatics, 31(7), 1084–1092.CrossRefGoogle Scholar
  17. Kilicoglu, H. (2018). Biomedical text mining for research rigor and integrity: Tasks, challenges, directions. Briefings in Bioinformatics, 19(6), 1400–1414.Google Scholar
  18. Kuncheva, L. I. (2014). Combining pattern classifiers: Methods and algorithms (2nd ed.). New York: Wiley.zbMATHGoogle Scholar
  19. Landwehr, N., Hall, M., & Frank, E. (2005). Logistic model trees. Machine Learning, 59(1–2), 161–205.CrossRefzbMATHGoogle Scholar
  20. Liakata, M., Dobnik, S., Saha, S., Batchelor, C.R., & Rebholz-Schuhmann, D. (2013). A discourse-driven content model for summarising scientific articles evaluated in a complex question answering task. In EMNLP (pp 747–757).Google Scholar
  21. Liakata, M., Teufel, S., Siddharthan, A., & Batchelor, C. R., et al. (2010). Corpora for the conceptualisation and zoning of scientific papers. In LREC.Google Scholar
  22. Liakata, M., Saha, S., Dobnik, S., Batchelor, C., & Rebholz-Schuhmann, D. (2012). Automatic recognition of conceptualization zones in scientific articles and two life science applications. Bioinformatics, 28(7), 991–1000.CrossRefGoogle Scholar
  23. Mangai, U. G., Samanta, S., Das, S., & Chowdhury, P. R. (2010). A survey of decision fusion and feature fusion strategies for pattern classification. IETE Technical Review, 27(4), 293–307.CrossRefGoogle Scholar
  24. Mann, G. S., & McCallum, A. (2010). Generalized expectation criteria for semi-supervised learning with weakly labeled data. Journal of Machine Learning Research, 11, 955–984.MathSciNetzbMATHGoogle Scholar
  25. Mizuta, Y., & Collier, N. (2004). Zone identification in biology articles as a basis for information extraction. In Proceedings of the international joint workshop on natural language processing in biomedicine and its applications (pp. 29–35). Association for Computational Linguistics.Google Scholar
  26. Platt, J. C. (1999). Fast training of support vector machines using sequential minimal optimization. Advances in Kernel Methods, 185–208.Google Scholar
  27. Rajesh, P., & Karthikeyan, M. (2017). A comparative study of data mining algorithms for decision tree approaches using weka tool. Advances in Natural and Applied Sciences, 11(9), 230–243.Google Scholar
  28. Ronzano, F., & Saggion, H. (2016). Knowledge extraction and modeling from scientific publications. In International workshop on semantic, analytics, visualization (pp. 11–25). Springer.Google Scholar
  29. Saggion, H., & Ronzano, F. (2016). Natural language processing for intelligent access to scientific information. In COLING (Tutorials) (pp. 9–13).Google Scholar
  30. Sarinnapakorn, K., & Kubat, M. (2007). Combining subclassifiers in text categorization: A dst-based solution and a case study. IEEE Transactions on Knowledge and Data Engineering, 19(12), 1638–1651.CrossRefGoogle Scholar
  31. Soldatova, L., & Liakata, M. (2007). An ontology methodology and cisp-the proposed core information about scientific papers. JISC Project Report.Google Scholar
  32. Suanmali, L., Binwahlan, M.S., & Salim, N. (2009). Sentence features fusion for text summarization using fuzzy logic. In Ninth international conference on hybrid intelligent systems (Vol. 1, pp. 142–146). IEEE.Google Scholar
  33. Sumner, M., Frank, E., & Hall, M. (2005). Speeding up logistic model tree induction. In European conference on principles of data mining and knowledge discovery (pp. 675–683). Springer.Google Scholar
  34. Teufel, S. (2000). Argumentative zoning: Information extraction from scientific text. Ph.D. thesis, University of Edinburgh.Google Scholar
  35. Teufel, S., & Kan, M.Y. (2011). Robust argumentative zoning for sensemaking in scholarly documents. In Advanced language technologies for digital libraries (pp. 154–170). Springer.Google Scholar
  36. Teufel, S., Siddharthan, A., & Batchelor, C. (2009). Towards discipline-independent argumentative zoning: Evidence from chemistry and computational linguistics. In Proceedings of the 2009 conference on empirical methods in natural language processing (Vol. 3, pp. 1493–1502). Association for Computational Linguistics.Google Scholar
  37. Teufel, S., & Moens, M. (1999). Argumentative classification of extracted sentences as a first step towards flexible abstracting. Advances in Automatic Text Summarization, 155, 1–171.Google Scholar
  38. Uma Shankar, B., Meher, S., Ghosh, A., & Bruzzone, L. (2006). Remote sensing image classification: A neuro-fuzzy mcs approach. In Computer vision, graphics and image processing (pp. 128–139).Google Scholar
  39. Uysal, A. K. (2016). An improved global feature selection scheme for text classification. Expert Systems with Applications, 43, 82–92.CrossRefGoogle Scholar
  40. Ware, M., & Mabe, M. (2015). The stm report: An overview of scientific and scholarly journal publishing. Oxford: International Association of Scientific: Technical and Medical Publishers.Google Scholar
  41. Wilbur, W. J., Rzhetsky, A., & Shatkay, H. (2006). New directions in biomedical text annotation: Definitions, guidelines and corpus construction. BMC Bioinformatics, 7(1), 356.CrossRefGoogle Scholar

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2019

Authors and Affiliations

  • Nasrin Asadi
    • 1
  • Kambiz Badie
    • 2
  • Maryam Tayefeh Mahmoudi
    • 3
    Email author
  1. 1.IT Research FacultyICT Research InstituteTehranIran
  2. 2.E-Services and E-Content Research Group, IT Research FacultyICT Research InstituteTehranIran
  3. 3.Data Processing and Analysis Systems Research Group, IT Research FacultyICT Research InstituteTehranIran

Personalised recommendations