Skip to main content
Log in

Trainable Framework for Information Extraction, Structuring and Summarization of Unstructured Data, Using Modified NER

  • Published:
Wireless Personal Communications Aims and scope Submit manuscript

Abstract

The World Wide Web is an ever expanding source of data in today’s world. Millions of tera-bytes of data and information is getting added every second. In this information age as the data is getting generated at an exponential rate, the fact to be noted is that most of the information is already available is in the form of natural language text. The task of information extraction from mammoth data leads us to think on the quality and the form of available data. Secondly, the ever increasing data poses a challenging task of extracting useful information from the available data. The third task is to extract information as efficiently as possible. For retrieving the information there is a need to develop ingenious way to answer any kind of query put up by a user from the available unstructured data. This paper proposes a novel trainable and integrated Natural Language Information Interpretation and Representation System (NLIIRS) that accepts any available un-annotated corpus of data in the form of natural language, and performs the following tasks: finds out the useful data, extracts relevant information in usable form (structured form/tables), summarizes the data and structures the data in relational form. At the end the Question and Answering (Q&A) module shows the cognitive abilities of NLIIR system by answering the questions in natural language relevant to the text. This multispecialty system beyond just Q&A. This is a trainable system capable of handling any unstructured data to be transformed into structured and well organized information. It allows the user to ask questions in natural language. It adopts the advantages of a modified named entity recognition so as to bypass the time consuming process of parts of speech tagging while pre-processing the available corpus (data) for information extraction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. Gharehchopogh, F. S., & Khalifelu, Z. A. (2011). Analysis and evaluation of unstructured data: Text mining versus natural language processing. In 5th international conference on application of information and communication technologies (AICT), Baku, Azerbaijan, Oct. 12–14, 12460377.

  2. Bashir, S. M. B., Latiff, M. S. A., Ahmed, A. A., Yousif, A., & Eltayeeb, M. E. (2013). Content-based information retrieval techniques based on grid computing: A review. IETE Technical Review, 3(3), 223–224.

    Article  Google Scholar 

  3. Mukherjee, P., & Chakraborty, B. (2016). Automated knowledge provider system with natural language query processing. IETE Technical Review Journal, 33(5), 525–538.

    Article  Google Scholar 

  4. Zadrozny, W., de Paiva, V., & Moss, L. S. (2015). Explaining Watson: Polymath style. In Proceedings of twenty-ninth AAAI conference on artificial intelligence (pp. 4078–4082).

  5. Weizenbaum, J. (1966). ELIZA—A computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), 36–45.

    Article  Google Scholar 

  6. Woods, W. (1977). Lunar rocks in natural English: Explorations in natural language question answering. In Linguistic structures processing, North Holland, Amsterdam (pp. 521–569).

  7. Green Jr., B. F., Wolf, A. K., Chomsky, C., & Laughery, K. (1961). Baseball: An automatic question-answerer. In Western joint IRE-AIEE-ACM computer conference, Los Angeles, CA (pp. 219–224).

  8. Cambria, E., & White, B. (1994). Jumping NLP curves: A review of natural language processing research. IEEE Computational Intelligence Magazine, 9, 48–57.

    Article  Google Scholar 

  9. Winograd, T. (1972). SHRDLU-Procedures as a representation for data in a computer program for understanding natural language. Cognitive Psychology, 3.

  10. Dietz, L., Verma, M., Radlinski, F., & Craswell, N. (2017). TREC complex answer retrieval overview. In Proceedings the twenty-sixth text retrieval conference (TREC 2017), USA.

  11. Abdullah, M. F., & Ahmad, K. (2013). The mapping process of unstructured data to structured data. In Proceedings of 3rd international conference on research and innovation in information systems2013 (ICRIIS’13), Kuala Lumpur, Malaysia (pp. 14–21).

  12. Biswas, P., Sharan, A., & Malik, N. (2014). A framework for restricted domain question answering system. In Proceedings of IEEE international conference on issues and challenges in intelligent computing techniques (ICICT), New Delhi (pp. 613–620).

  13. Tekli, J. (2016). An overview on XML semantic disambiguation from unstructured text to semi-structured data: Background, applications, and ongoing challenges. IEEE Transactions of Knowledge and Data Engineering, 28, 1383–1407.

    Article  Google Scholar 

  14. Ranjan, P., & Balabantaray, R. C. (2016). Question answering system for factoid based question. In Proceedings of IEEE 2nd international conference on contemporary computing and informatics (ic3i), Noida, India (pp. 221–224).

  15. Acheampong, K. N., Pan, Z., Zhou, E.-Q., & Li, X.-Y. (2017). Answer triggering of factoid questions: A cognitive approach. In Proceedings of IEEE 13th international computer conference on wavelet active media technology and information processing (ICCWAMTIP), Chengdu, China (pp. 33–37).

  16. Chandurkar, A., & Bansal, A. (2017). Information retrieval from a structured knowledge base. In Proceedings of IEEE 11th international conference on semantic computing, San Diego, CA, USA (pp. 407–412).

  17. Sonntag, D., & Profitlich, H. J. (2017). Integrated decision support by combining textual information extraction, facetted search and information visualization. In Proceedings of IEEE 30th international symposium on computer-based medical systems (CBMS), Saarbrucken, Germany (pp. 1–24).

  18. Lei, K., Deng, Y., Zhang, B., & Shen, Y. (2017). Open domain question answering with character level deep learning models. In Proceedings of IEEE 10th international symposium on computational intelligence and design, China (pp. 30–33).

  19. Pradeep, S., & Kallimani, J. S. (2017). A survey on various challenges and aspects in handling big data. In Proceedings of IEEE international conference on electrical, electronics, communication, computer and optimization techniques (ICEECCOT), India (pp. 765–769).

  20. Ma, R., Zhang, J., Li, M., Chen, L., & Gao, J. (2017). Hybrid answer selection model for non-factoid question answering. In Proceedings of international conference on Asian language processing (IALP), Singapore (pp. 371–373).

  21. Alemzadeh, H., & Devarakonda, M. (2017). An NLP-based cognitive system for disease status identification in electronic health records. In IEEE EMBS international conference on biomedical and health informatics (BHI), USA (pp. 89–92).

  22. Tablan, V., Damljanovic, D., & Bontcheva, K. (2009). A natural language query interface to structured information. In Proceedings of ESWC 2008, Tenerife, Canary Islands, Spain (pp. 361–375).

  23. Gliozzo, A., Biran, O., Patwardhan, S., & McKeown, K. (2015). Semantic technologies in IBM Watson. In Proceedings of fourth workshop on teaching NLP, Sofia, Bulgaria (pp. 85–92).

  24. Devarakonda, M., & Tsou, C.-H. (2015). Automated problem list generation from electronic medical records in IBM Watson. In Proceedings of twenty-seventh conference on innovative applications of artificial intelligence, USA (pp. 3942–3947).

  25. Feng, Y., Yu, H., Geng, S., & Yu, X. (2017). Domain named entity recognition method based on skip-gram model. In Proceedings of first international conference on electronics instrumentation and information systems (EIIS), China (pp. 1–5).

  26. Mansouri, A., Affendey, L. S., Mamat, A., & Kadir, R. A. (2008). Semantically factoid question answering using fuzzy SVM named entity recognition. In Proceedings of IEEE international symposium on information technology, Malaysia (pp. 2–11).

  27. Murtaza, S. S., Lak, P., Bener, A., & Pischdotchian, A. (2016). How to effectively train IBM Watson: Classroom experience. In Proceedings of 49th Hawaii international conference on system sciences (HICSS), USA (pp. 1663–1670).

  28. Ahmed, M. N., Toor, A. S., O’Neil, K., & Friedland, D. (2017). Cognitive computing and the future of health care. IEEE Pulse. Available https://pulse.embs.org/may-2017.

  29. Asakiewicz, C., Stohr, E. A., Mahajan, S., & Pandey, L. (2017). Building a cognitive application using Watson DeepQA. IEEE Computer Society, Issue No. 04, pp. 36–44. https://www.computer.org/csdl/mags/it/2017/04/mit2017040036-abs.html.

  30. Cucerzan, S., & Agichtein, E. (2016). Factoid question answering over unstructured and structured web content. Microsoft Research, One Microsoft Way.

  31. Banerjee, P. S., Chakraborty, B., Tripathi, D., et al. (2019). A information retrieval based on question and answering and NER for unstructured information without using SQL. Wireless Personal Communications. https://doi.org/10.1007/s11277-019-06501-z.

  32. Das, B. C. (2014). A survey on question answering system. M.Tech dissertation, Department of Computer Science and Engineering, Indian Institute of Technology, Bombay, India.

  33. Jurafsky, D., & James, H. (2017). Speech and language processing. Martin: Stanford University.

    Google Scholar 

Download references

Acknowledgements

This publication is an outcome of the R&D work undertaken project under the Visvesvaraya Ph.D. Scheme of Ministry of Electronics & Information Technology, MeitY, Government of India, being implemented by Digital India Corporation. This research work has been done at Research Project Lab of National Institute of Technology (NIT), Durgapur, India. Financial support was received from Visvesvaraya Ph.D. Scheme, Deity, Govt. of India (Order Number: PHD-MLA/4 (29)/2014_2015 Dated-27/4/2015) to carry out this research work. The authors would like to thank the Department of Computer Science and Engineering, NIT, Durgapur, for academically supporting this research work. The authors would also like to thank the Department of Computer Science and Engineering, Jaypee University of Engineering and Technology, Guna MP. A sincere thanks to students of Jaypee University of Engineering and Technology, Guna, Mr. Hardik Gupta and Mr. Deepak Tripathi.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Partha Sarathy Banerjee.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Table 1 is the abbreviation table for the abbreviated terms used in the proposed algorithms. Comparison Tables 7 and 8. Table 7 compares the various other models and Table 8 compares the features of the proposed model and some other models.

Table 7 Comparison table on system/technique description
Table 8 Comparison table on the basis tasks performed by various systems

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Banerjee, P.S., Chakraborty, B., Anand, U. et al. Trainable Framework for Information Extraction, Structuring and Summarization of Unstructured Data, Using Modified NER. Wireless Pers Commun 117, 769–807 (2021). https://doi.org/10.1007/s11277-020-07896-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11277-020-07896-w

Keywords

Navigation