Towards Computation of Novel Ideas from Corpora of Scientific Text

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9285)


In this work we present a method for the computation of novel ‘ideas’ from corpora of scientific text. The system functions by first detecting concept noun-phrases within the titles and abstracts of publications using Part-Of-Speech tagging, before classifying these into sets of problem and solution phrases via a target-word matching approach. By defining an idea as a co-occurring \(<\) problem,solution \(>\) pair, known-idea triples can be constructed through the additional assignment of a relevance value (computed via either phrase co-occurrence or an ‘idea frequency-inverse document frequency’ score). The resulting triples are then fed into a collaborative filtering algorithm, where problem-phrases are considered as users and solution-phrases as the items to be recommended. The final output is a ranked list of novel idea candidates, which hold potential for researchers to integrate into their hypothesis generation processes. This approach is evaluated using a subset of publications from the journal Science, with precision, recall and F-Measure results for a variety of model parametrizations indicating that the system is capable of generating useful novel ideas in an automated fashion.


Idea mining Text mining Natural language processing Recommender systems Collaborative filtering 


  1. 1.
    Ahn, H.J.: A new similarity measure for collaborative filtering to alleviate the new user cold-starting problem. Information Sciences 178(1), 37–51 (2008)CrossRefGoogle Scholar
  2. 2.
    Allan, J., Carbonell, J.G., Doddington, G., Yamron, J., Yang, Y.: Topic detection and tracking pilot study final report (1998)Google Scholar
  3. 3.
    Andrade, C.: How to write a good abstract for a scientific paper or conference presentation. Indian Journal of Psychiatry 53(2), 172 (2011)CrossRefzbMATHGoogle Scholar
  4. 4.
    Banko, M., Etzioni, O., Center, T.: The tradeoffs between open and traditional relation extraction. In: Proceedings of 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 8, pp. 28–36 (2008)Google Scholar
  5. 5.
    Biemann, C., Böhm, K., Heyer, G., Melz, R.: Semantictalk: software for visualizing brainstorming sessions and thematic concept trails on document collections. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 534–536. Springer, Heidelberg (2004) CrossRefGoogle Scholar
  6. 6.
    Biemann, C., Böhm, K., Heyer, G., Melz, R.: Automatically building concept structures and displaying concept trails for the use in brainstorming sessions and content management systems. In: Böhme, T., Larios Rosillo, V.M., Unger, H., Unger, H. (eds.) IICS 2004. LNCS, vol. 3473, pp. 157–167. Springer, Heidelberg (2006) CrossRefGoogle Scholar
  7. 7.
    Breese, J.S., Heckerman, D., Kadie, C.: Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence, pp. 43–52. Morgan Kaufmann (1998)Google Scholar
  8. 8.
    Brown, P.F., deSouza, P.V., Mercer, R.L., Pietra, V.J.D., Lai, J.C.: Class-based n-gram models of natural language. Comput. Linguist. 18(4), 467–479 (1992)Google Scholar
  9. 9.
    Bybee, J.L., Hopper, P.J.: Frequency and the emergence of linguistic structure, vol. 45. John Benjamins Publishing (2001)Google Scholar
  10. 10.
    Chen, C.: Citespace ii: Detecting and visualizing emerging trends and transient patterns in scientific literature. Journal of the American Society for information Science and Technology 57(3), 359–377 (2006)CrossRefzbMATHGoogle Scholar
  11. 11.
    Culotta, A., Sorensen, J.: Dependency tree kernels for relation extraction. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p. 423. Association for Computational Linguistics (2004)Google Scholar
  12. 12.
    Ding, W., Chen, C.: Dynamic topic detection and tracking: A comparison of hdp, c-word, and cocitation methods. Journal of the Association for Information Science and Technology (2014)Google Scholar
  13. 13.
    Guo, S., et al.: Analysis and evaluation of similarity metrics in collaborative filtering recommender system (2014)Google Scholar
  14. 14.
    von Hardenberg, C., Bérard, F.: Bare-hand human-computer interaction. In: Proceedings of the 2001 Workshop on Perceptive User Interfaces, PUI 2001, pp. 1–8. ACM, New York (2001)Google Scholar
  15. 15.
    Hare, V.C., Milligan, B.: Main idea identification: Instructional explanations in four basal reader series. Journal of Literacy Research 16(3), 189–204 (1984)CrossRefGoogle Scholar
  16. 16.
    Hildreth, P.M., Kimble, C.: Knowledge networks: Innovation through communities of practice. IGI Global (2004)Google Scholar
  17. 17.
    Hollander, S.: Computer-assisted Creativity and the Policy Process. Thayer School of Engineering (1984)Google Scholar
  18. 18.
    Jenks, G.F.: The data model concept in statistical mapping. International Yearbook of Cartography 7(1), 186–190 (1967)Google Scholar
  19. 19.
    Jessop, J.L.: Expanding our students’ brainpower: Idea generation and critical thinking skills. IEEE Antennas and Propagation Magazine 44(6), 140–144 (2002)CrossRefzbMATHGoogle Scholar
  20. 20.
    Jitendra, A.K., Cole, C.L., Hoppes, M.K., Wilson, B.: Effects of a direct instruction main idea summarization program and self-monitoring on reading comprehension of middle school students with learning disabilities. Reading & Writing Quarterly: Overcoming Learning Difficulties 14(4), 379–396 (1998)CrossRefzbMATHGoogle Scholar
  21. 21.
    Kamp, H.: A theory of truth and semantic representation. Formal semantics-the essential readings, 189–222 (1981)Google Scholar
  22. 22.
    Kling, H.: Get more out of group projects by using structured brainstorming. Quality Progress 23(3), 136–136 (1990)zbMATHGoogle Scholar
  23. 23.
    Koopman, P.: How to write an abstract. Carnegie Mellon University. Retrieved May 31, 2013 (1997)Google Scholar
  24. 24.
    Morris, J., Hirst, G.: Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics 17(1), 21–48 (1991)Google Scholar
  25. 25.
    Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)CrossRefGoogle Scholar
  26. 26.
    Osborn, A.: Applied Imagination - Principles and Procedures of Creative Problem-Solving. Charles Scribner’s Sons (1953)Google Scholar
  27. 27.
    Park, Y., Lee, S.: How to design and utilize online customer center to support new product concept generation. Expert Systems with Applications 38(8) (2011)Google Scholar
  28. 28.
    Sarwar, B., Karypis, G., Konstan, J., Riedl, J.: Item-based collaborative filtering recommendation algorithms, pp. 285–295 (2001)Google Scholar
  29. 29.
    Song, F., Croft, W.B.: A general language model for information retrieval. In: Proceedings of the Eighth International Conference on Information and Knowledge Management, pp. 316–321. ACM (1999)Google Scholar
  30. 30.
    Su, X., Khoshgoftaar, T.M.: A survey of collaborative filtering techniques. Advances in Artificial Intelligence 2009, 4 (2009)CrossRefzbMATHGoogle Scholar
  31. 31.
    Tan, A.H., et al.: Text mining: the state of the art and the challenges. In: Proceedings of the PAKDD 1999 Workshop on Knowledge Disocovery from Advanced Databases, pp. 65–70 (1999)Google Scholar
  32. 32.
    Taylor, J.W.: How to create new ideas. Prentice-Hall (1961)Google Scholar
  33. 33.
    Thorleuchter, D.: Finding new technological ideas and inventions with text mining and technique philosophy. In: Data Analysis, Machine Learning and Applications, pp. 413–420 (2008)Google Scholar
  34. 34.
    Thorleuchter, D., den Poel, D.V., Prinzie, A.: A compared r&d-based and patent-based cross impact analysis for identifying relationships between technologies. Technological Forecasting and Social Change 77(7), 1037–1050 (2010)CrossRefGoogle Scholar
  35. 35.
    Thorleuchter, D., Van den Poel, D.: Companies website optimising concerning consumer’s searching for new products. In: 2011 International Conference on Uncertainty Reasoning and Knowledge Engineering (URKE), vol. 1. IEEE (2011)Google Scholar
  36. 36.
    Thorleuchter, D., Van den Poel, D.: Semantic technology classificationa defence and security case study. In: 2011 International Conference on Uncertainty Reasoning and Knowledge Engineering (URKE), vol. 1, pp. 36–39. IEEE (2011)Google Scholar
  37. 37.
    Thorleuchter, D., Van den Poel, D.: Extraction of ideas from microsystems technology. In: Jin, D., Lin, S. (eds.) Advances in CSIE, Vol. 1. AISC, vol. 168, pp. 563–568. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  38. 38.
    Thorleuchter, D., Van den Poel, D., Prinzie, A.: Extracting consumers needs for new products-a web mining approach. In: Third International Conference on Knowledge Discovery and Data Mining, WKDD 2010, pp. 440–443. IEEE (2010)Google Scholar
  39. 39.
    Thorleuchter, D., den Poel, D.V., Prinzie, A.: Mining ideas from textual information. Expert Systems with Applications 37(10), 7182–7188 (2010)CrossRefzbMATHGoogle Scholar
  40. 40.
    Trampuš, M., Mladenic, D.: Constructing domain templates with concept hierarchy as background knowledge. Information Technology And Control 43(4) (2014)Google Scholar
  41. 41.
    Wallas, G.: The art of thought (1926)Google Scholar
  42. 42.
    Wang, C., Lu, J., Zhang, G.: Mining key information of web pages: A method and its application. Expert Systems with Applications 33(2), 425–433 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  43. 43.
    Yoon, J.: Detecting weak signals for long-term business opportunities using text mining of web news. Expert Systems with Applications 39(16), 12543–12550 (2012)CrossRefGoogle Scholar
  44. 44.
    Young, J.W.: A technique for producing ideas. NTC Business Books (1975)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.School Of Computer ScienceUniversity of Nottingham Malaysia CampusSemenyihMalaysia
  2. 2.Horizon Digital Economy Research, School of Computer ScienceUniversity of NottinghamNottinghamUK

Personalised recommendations