Skip to main content
Log in

Using text mining techniques for identifying research gaps and priorities: a case study of the environmental science in Iran

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

This study aims to observe the researchers’ behavior in Iranian scientific databases to determine the research gaps and priorities in their field of research. Text mining and natural language processing techniques were used to identify what researchers are looking for and to analyze existing research works. In this paper, the information about the behavior of researchers who work in the field of environmental science and existing research works in the Iranian scientific database are processed. The search trends in all areas are evaluated by analyzing the users’ search data. The trend analysis indicates that in the period of February 2013 to July 2015, the growth of the researchers’ requests in some domains of the environment such as Industry, Training, Assessment, Material, Water and Pollution was 1.5 up to 2 times more than the overall requests. A Combination of the trend analysis and clustering of queries led to shaping four priority zones. Then, the research priorities for each environmental research area were determined. The results show that Training, Pollution, Rangeland, Management and Law are those domains in the environmental research which have the most research gaps in Iran, but there are enough research in Forest, Soil and Industry domains. At the end, we describe the steps for the implementation of a decision support system in environmental research management. Researchers, managers and policy makers can use this proposed “research demand and supply monitoring” system or RDSM to make appropriate decisions and allocate their resources more efficiently.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. http://ganj.irandoc.ac.ir.

  2. Available in: http://thesauri.irandoc.ac.ir/.

  3. The fraction of the records that are relevant to the query that are successfully retrieved.

  4. The fraction of retrieved records that are relevant to the query.

  5. An open source engine for full text search.

  6. Boolean Operators are used to connect and define the relationship between search terms (e.g. +,−, &, not, etc.).

  7. Popular but invaluable words (same as “the”, “in”, “as”… in English).

  8. The Islamic theory or philosophy of law.

References

  • Abedinzadeh, N., Jamalzade Fallah, F., Pendashteh, A., Mokrem, R., Panahandeh, M., Moghadami, S., et al. (2013). Investigation of effectiveness of EMS establishment in improvement of environmental performance in industrial units accordance with Iso 14000 Standards. Rasht: SID. Retrieved from http://fa.projects.sid.ir/ViewPaper.aspx?ID=84034#.

  • Abrishamchi, A. (2013). Overview of key urban air pollution problems in Iran and its capital city, Tehran. In Section 3. Case Studies on Specific Urban Areas: Understanding the Roles of Key Economic, Geographic, and Urban Design Inputs in the Pollution Characterization or Mitigation Scenarios 87 (pp. 11–18). Irvine, California.

  • Akçapınar, G. (2015). How automated feedback through text mining changes plagiaristic behavior in online assignments. Computers and Education, 87, 123–130.

    Article  Google Scholar 

  • Anwar, T., & Abulaish, M. (2014). A social graph based text mining framework for chat log investigation. Digital Investigation, 11(4), 349–362.

    Article  Google Scholar 

  • Beth, B., & Deyrup, M. M. (2015). The SHU research logs: Student online search behaviors trans-scripted. The Journal of Academic Librarianship, 41(5), 593–601.

    Article  Google Scholar 

  • Bijalwan, V., Kumar, V., Kumari, P., & Jordan, P. (2014). KNN based machine learning approach for text and document mining. International Journal of Database Theory and Application, 7(1), 61–70.

    Article  Google Scholar 

  • Bouguettaya, A., Yu, Q., Liu, X., Zhou, X., & Song, A. (2015). Efficient agglomerative hierarchical clustering. Expert Systems with Applications, 42(5), 2785–2797.

    Article  Google Scholar 

  • Cavnar, W., & Trenkle, J. (1994). N-gram-based text categorization. Ann Arbor MI, 48113(2), 161–175.

    Google Scholar 

  • Chen, L., Mao, K., Zheng, Y., Zhou, X., & Zhu, C. (2012). Research on mining association rules in university scientific projects management. Communications in Computer and Information Science, 345, 561–567.

    Article  Google Scholar 

  • Choubin, B., Khalighi-Sigaroodi, S., Malekian, A., Ahmad, S., & Attarod, P. (2014). Drought forecasting in a semi-arid watershed using climate signals: a neuro-fuzzy modeling approach. Journal of Mountain Science, 11(6), 1593–1605.

    Article  Google Scholar 

  • Choudhary, A., Oluikpe, P., Harding, J., & Carrillo, P. (2009). The needs and benefits of text mining applications on post-project reviews. Computers in Industry, 60(9), 728–740.

    Article  Google Scholar 

  • Claes, J., & Poels, G. (2014). Merging event logs for process mining: A rule based merging method and rule suggestion algorithm. Expert Systems with Applications, 41(16), 7291–7306.

    Article  Google Scholar 

  • Davide, B., Rosso, P., Gómez-Soriano, J., & Sanchis, E. (2010). Answering questions with an n-gram based passage retrieval engine. Journal of Intelligent Information Systems, 34(2), 113–134.

    Article  Google Scholar 

  • Delen, D., & Crossland, M. D. (2008). Seeding the survey and analysis of research literature with text mining. Expert Systems with Applications, 34(3), 1707–1720.

    Article  Google Scholar 

  • Erdmann, M., Ikeda, K., Ishizaki, H., Hattori, G., & Takishima, Y. (2014). Feature based sentiment analysis of tweets in multiple languages. In B. Benatallah, A. Bestavros, Y. Manolopoulos, A. Vakal, & Y. Zhang (Eds.), Web information systems engineering—WISE 2014 (pp. 109–124). Thessaloniki, Greece: Springer.

    Chapter  Google Scholar 

  • Faramarzi, M., Abbaspour, K. C., Schulin, R., & Yang, H. (2009). Modelling blue and green water resources availability in Iran. Hydrological Processes, 23(3), 486.

    Article  Google Scholar 

  • Fronza, I., Sillitti, A., Succi, G., Terho, M., & Vlasenko, J. (2013). Failure prediction based on log files using random indexing and support vector machines. Journal of Systems and Software, 86(1), 2–11.

    Article  Google Scholar 

  • Gadkari, N., Savio Raj, S., & Raka, H. (2015). Query subtopic mining from search log data. International Journal of Current Engineering and Technology, 5(3), 2058–2062.

    Google Scholar 

  • Gorjian, S., & Ghobadian, B. (2015). Solar desalination: A sustainable solution to water crisis in Iran. Renewable and Sustainable Energy Reviews, 48, 571–584.

    Article  Google Scholar 

  • Gu, X., & Blackmore, K. (2016). Recent trends in academic journal growth. Scientometrics, 108(2), 693–716. doi:10.1007/s11192-016-1985-3.

    Article  Google Scholar 

  • Günes, E., & Radev, D. R. (2004). LexRank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 22, 457–479.

    Google Scholar 

  • Gunton, T. (2002). Establishing environmental priorities for the 21st century: Results from an expert survey methodology. Environments, 30(1), 71–98.

    Google Scholar 

  • Han, J., Kamber, M., & Pei, J. (2011). Data mining: Concepts and techniques. Boston: The Morgan Kaufmann Series in Data Management Systems.

    MATH  Google Scholar 

  • Hemmati, Z., & Shobeiri, S. M. (2016). Review the status of environmental education in Iran and comparison with other countries. Journal of Human and Environment, 14(2), 61–81.

    Google Scholar 

  • Houškova, B., & Houška, M. (2011). Data, information and knowledge in agricultural decision-making. Agris on-line Papers in Economics and Informatics, 3(2), 74–82.

    Google Scholar 

  • Hsin-Chang, Y., & Lee, C.-H. (2005). A text mining approach for automatic construction of hypertexts. Expert Systems with Applications, 29(4), 723–734.

    Article  Google Scholar 

  • IranDoc. (2016, 11 14). IrandDoc Thesauri. (IranDoc) Retrieved 11 14, 2016, from IranDoc: http://thesauri.irandoc.ac.ir/.

  • IranDoc. (2016, 11 14). IRANDOC Information Reposiory. Retrieved 11 14, 2016, from http://irandoc.ac.ir/db/databases-about.html.

  • IranDoc. (2016, 11 14). Iranian Scientific Repository. Retrieved from IRANDOC: http://ganj.irandoc.ac.ir/.

  • Jalalimanesh, A. (2012). Knowledge discovery in scientific databases using text mining and social network analysis. In Control, Systems and Industrial Informatics (ICCSII) (pp. 46–49). IEEE.

  • Julien, H., Pecoskie, J., & Reed, K. (2011). Trends in information behavior research, 1999–2008: A content analysis. Library and Information Science Research, 33(1), 19–24.

    Article  Google Scholar 

  • Kademani, B., Sagar, A., Surwase, G., & Bhanumurthy, K. (2013). Publication trends in materials science: A global perspective. Scientometrics, 94(3), 1275–1295.

    Article  Google Scholar 

  • Khosravi, M., & Jamali, H. R. (2014). Log analysis of the IRANDOC database and the analysis of its users’ information seeking behavior. Iranian Journal of Information Processing and Management, 29(4), 979–1006. Retrieved from http://jipm.irandoc.ac.ir/article-1-2444-fa.html.

  • Kim, M., & Chen, C. (2015). A scientometric review of emerging trends and new developments in recommendation systems. Scientometrics, 104(1), 239–263.

    Article  Google Scholar 

  • Kirkland, J. (2010). The management of university research. In P. A. Peterson (Ed.), International Encyclopedia of Education (Third Edition) (Third Edition ed., pp. 316–321). Oxford: Elsevier. doi:10.1016/B978-0-08-044894-7.00877-0.

  • Kolehmainen, M., Martikainen, H., Hiltunen, T., & Ruuskanen, J. (2011). Forecasting air quality parameters using hybrid neural network modelling. In International Conference on Urban Air Quality: Measurement, Modelling and Management. 65 (pp. 277–286). Madrid: Springer Science and Business Media.

  • Kouziokas, G. (2016). Technology-based management of environmental organizations using an Environmental Management Information System (EMIS): Design and development. Environmental Technology and Innovation, 5, 106–116.

    Article  Google Scholar 

  • Kusiak, A., Verma, A., & Wei, X. (2013). A data-mining approach to predict influent quality. Environmental Monitoring and Assessment, 185(3), 2197–2210.

    Article  Google Scholar 

  • Library of Congress Collections Policy Statements. (2016). Retrieved from LIBRARY OF CONGRESS: https://www.loc.gov/acq/devpol/environ.pdf.

  • Lin, H.-C., Hong, Y.-M., & Kan, Y.-C. (2012). The backend design of an environmental monitoring system upon real-time prediction of groundwater level fluctuation under the hillslope. Environmental Monitoring and Assessment, 184(1), 381–395.

    Article  Google Scholar 

  • Liu, K., Li, X., Shi, X., & Wang, S. (2008). Monitoring mangrove forest changes using remote sensing and GIS data with decision-tree learning. Wetlands, 28(2), 336–346.

    Article  Google Scholar 

  • Losiewicz, P., Oard, D., & Kostoff, R. (2000). Textual data mining to support science and technology management. Journal of Intelligent Information Systems, 15(2), 99–119.

    Article  Google Scholar 

  • Lu, G., & Eldin, N. (2014). Employers’ expectations: A probabilistic text mining model. Procedia Engineering, 85, 175–182.

    Article  Google Scholar 

  • Ma, R., & Ho, Y.-S. (2013). Comparison of environmental laws publications in science citation index expanded and social science index: A bibliometric analysis. Scientometrics, 109(1), 227–239. doi:10.1007/s11192-016-2010-6.

    Article  Google Scholar 

  • Marino, J. B., Banchs, R. E., Crego, J. M., de Gispert, A., Lambert, P., Fonollosa, J. A., & Costa-Jussà, M. R. (2006). N-gram-based machine translation. Computational Linguistics, 32(4), 527–549.

    Article  MathSciNet  MATH  Google Scholar 

  • Mesdaghinia, A., Mahvi, A., Nasseri, S., Nodehi, R., & Hadi, M. (2015). A bibliometric analysis on the solid waste-related research from 1982 to 2013 in Iran. International Journal of Recycling of Organic Waste in Agriculture, 4(3), 185–195.

    Article  Google Scholar 

  • Munková, D., Munk, M., & Vozár, M. (2013). Data pre-processing evaluation for text mining: Transaction/sequence model. Procedia Computer Science, 18, 1198–1207.

    Article  Google Scholar 

  • Nadjla, H., & Sahar, M. (2014). Search strategies in nanotechnology databases: Log analysis. Iranian Journal of Information Processing and Management, 29(1), 233–252. Retrieved from http://jipm.irandoc.ac.ir/article-1-2192-fa.html.

  • Nicholas, D., Huntington, P., & Jamali, H. (2008). User diversity: As demonstrated by deep log analysis. The Electronic Library, 26(1), 21–38.

    Article  Google Scholar 

  • Noh, H., Jo, Y., & Lee, S. (2015). Keyword selection and processing strategy for applying text mining to patent analysis. Expert Systems with Applications, 42(9), 4348–4360.

    Article  Google Scholar 

  • Oberreuter, G., & Velásquez, J. (2013). Text mining applied to plagiarism detection: The use of words for detecting deviations in the writing style. Expert Systems with Applications, 40(9), 3756–3763.

    Article  Google Scholar 

  • Salehi, S. (2012). Environmental behavior and education. Journal of Education, 18(2), 201–226. Retrieved from http://education.scu.ac.ir/article_10133.html.

  • Salehi, S., & Pazoki Nejad, Z. (2013). Environment in higher education: The evaluation of environmental awareness in the mazandaran students. Educational Planning Studies, 2(4), 199–220. Retrieved from http://www.eps.journals.umz.ac.ir/?_action=articleInfo&article=764#.

  • Saneifar, H., Bonniol, S., Poncelet, P., & Roche, M. (2014). Enhancing passage retrieval in log files by query expansion based on explicit and pseudo relevance feedback. Computers in Industry, 65(6), 937–951.

    Article  Google Scholar 

  • Shachi, M., & Jaiswal, U. (2014). Resolving issues in parsing technique in machine translation from hindi language to english language. In International Conference on Computer and Communication Technology (ICCCT) (pp. 55–58). Allahabad: IEEE.

  • Shearer, C. (2000). The CRISP-DM Model: The new blueprint for data mining. Journal of Data Warehousing, 5(4), 13–22.

    Google Scholar 

  • Souza, F. (2014). A data-based model to locate mass movements triggered by seismic events in Sichuan, China. Environmental Monitoring and Assessment, 186(1), 575–587.

    Article  Google Scholar 

  • Sphinx. (2001). Open Source Search Server. Retrieved 11 14, 2016, from Sphinx Search: http://sphinxsearch.com/docs/current/extended-syntax.html.

  • Sunikka, A., & Bragge, J. (2012). Applying text-mining to personalization and customization research literature—Who, what and where? Expert Systems with Applications, 39, 10049–10058.

    Article  Google Scholar 

  • Tsai, H.-H. (2011). Research trends analysis by comparing data mining and customer relationship management through bibliometric methodology. Scientometrics, 87(3), 425–450.

    Article  Google Scholar 

  • Tu, Y.-N., & Seng, J.-L. (2009). Research intelligence involving information retrieval—An example of conferences and journals. Expert Systems with Applications, 36(10), 12151–12166.

    Article  Google Scholar 

  • Wang, G., Zhang, X., Tang, S., Zheng, H., & Zhao, B. (2016). Unsupervised clickstream clustering for user behavior analysis. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (pp. 225–236). San Jose: ACM. doi:10.1145/2858036.2858107.

Download references

Acknowledgements

The authors gratefully acknowledge the support of the Iranian Research Institute for Information Science and Technology (IRANDOC). And specially acknowledge the help of IRANDOC’s R&D department.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seyyed-Mahdi Hosseini-Motlagh.

Appendix

Appendix

Table 7 Original Persian list of top 10 frequent N-grams in users’ Queries

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rabiei, M., Hosseini-Motlagh, SM. & Haeri, A. Using text mining techniques for identifying research gaps and priorities: a case study of the environmental science in Iran. Scientometrics 110, 815–842 (2017). https://doi.org/10.1007/s11192-016-2195-8

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-016-2195-8

Keywords

Navigation