Abstract
In this work we present a method for the computation of novel ‘ideas’ from corpora of scientific text. The system functions by first detecting concept noun-phrases within the titles and abstracts of publications using Part-Of-Speech tagging, before classifying these into sets of problem and solution phrases via a target-word matching approach. By defining an idea as a co-occurring \(<\) problem,solution \(>\) pair, known-idea triples can be constructed through the additional assignment of a relevance value (computed via either phrase co-occurrence or an ‘idea frequency-inverse document frequency’ score). The resulting triples are then fed into a collaborative filtering algorithm, where problem-phrases are considered as users and solution-phrases as the items to be recommended. The final output is a ranked list of novel idea candidates, which hold potential for researchers to integrate into their hypothesis generation processes. This approach is evaluated using a subset of publications from the journal Science, with precision, recall and F-Measure results for a variety of model parametrizations indicating that the system is capable of generating useful novel ideas in an automated fashion.
Chapter PDF
Similar content being viewed by others
References
Ahn, H.J.: A new similarity measure for collaborative filtering to alleviate the new user cold-starting problem. Information Sciences 178(1), 37–51 (2008)
Allan, J., Carbonell, J.G., Doddington, G., Yamron, J., Yang, Y.: Topic detection and tracking pilot study final report (1998)
Andrade, C.: How to write a good abstract for a scientific paper or conference presentation. Indian Journal of Psychiatry 53(2), 172 (2011)
Banko, M., Etzioni, O., Center, T.: The tradeoffs between open and traditional relation extraction. In: Proceedings of 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 8, pp. 28–36 (2008)
Biemann, C., Böhm, K., Heyer, G., Melz, R.: Semantictalk: software for visualizing brainstorming sessions and thematic concept trails on document collections. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 534–536. Springer, Heidelberg (2004)
Biemann, C., Böhm, K., Heyer, G., Melz, R.: Automatically building concept structures and displaying concept trails for the use in brainstorming sessions and content management systems. In: Böhme, T., Larios Rosillo, V.M., Unger, H., Unger, H. (eds.) IICS 2004. LNCS, vol. 3473, pp. 157–167. Springer, Heidelberg (2006)
Breese, J.S., Heckerman, D., Kadie, C.: Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence, pp. 43–52. Morgan Kaufmann (1998)
Brown, P.F., deSouza, P.V., Mercer, R.L., Pietra, V.J.D., Lai, J.C.: Class-based n-gram models of natural language. Comput. Linguist. 18(4), 467–479 (1992)
Bybee, J.L., Hopper, P.J.: Frequency and the emergence of linguistic structure, vol. 45. John Benjamins Publishing (2001)
Chen, C.: Citespace ii: Detecting and visualizing emerging trends and transient patterns in scientific literature. Journal of the American Society for information Science and Technology 57(3), 359–377 (2006)
Culotta, A., Sorensen, J.: Dependency tree kernels for relation extraction. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p. 423. Association for Computational Linguistics (2004)
Ding, W., Chen, C.: Dynamic topic detection and tracking: A comparison of hdp, c-word, and cocitation methods. Journal of the Association for Information Science and Technology (2014)
Guo, S., et al.: Analysis and evaluation of similarity metrics in collaborative filtering recommender system (2014)
von Hardenberg, C., Bérard, F.: Bare-hand human-computer interaction. In: Proceedings of the 2001 Workshop on Perceptive User Interfaces, PUI 2001, pp. 1–8. ACM, New York (2001)
Hare, V.C., Milligan, B.: Main idea identification: Instructional explanations in four basal reader series. Journal of Literacy Research 16(3), 189–204 (1984)
Hildreth, P.M., Kimble, C.: Knowledge networks: Innovation through communities of practice. IGI Global (2004)
Hollander, S.: Computer-assisted Creativity and the Policy Process. Thayer School of Engineering (1984)
Jenks, G.F.: The data model concept in statistical mapping. International Yearbook of Cartography 7(1), 186–190 (1967)
Jessop, J.L.: Expanding our students’ brainpower: Idea generation and critical thinking skills. IEEE Antennas and Propagation Magazine 44(6), 140–144 (2002)
Jitendra, A.K., Cole, C.L., Hoppes, M.K., Wilson, B.: Effects of a direct instruction main idea summarization program and self-monitoring on reading comprehension of middle school students with learning disabilities. Reading & Writing Quarterly: Overcoming Learning Difficulties 14(4), 379–396 (1998)
Kamp, H.: A theory of truth and semantic representation. Formal semantics-the essential readings, 189–222 (1981)
Kling, H.: Get more out of group projects by using structured brainstorming. Quality Progress 23(3), 136–136 (1990)
Koopman, P.: How to write an abstract. Carnegie Mellon University. Retrieved May 31, 2013 (1997)
Morris, J., Hirst, G.: Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics 17(1), 21–48 (1991)
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)
Osborn, A.: Applied Imagination - Principles and Procedures of Creative Problem-Solving. Charles Scribner’s Sons (1953)
Park, Y., Lee, S.: How to design and utilize online customer center to support new product concept generation. Expert Systems with Applications 38(8) (2011)
Sarwar, B., Karypis, G., Konstan, J., Riedl, J.: Item-based collaborative filtering recommendation algorithms, pp. 285–295 (2001)
Song, F., Croft, W.B.: A general language model for information retrieval. In: Proceedings of the Eighth International Conference on Information and Knowledge Management, pp. 316–321. ACM (1999)
Su, X., Khoshgoftaar, T.M.: A survey of collaborative filtering techniques. Advances in Artificial Intelligence 2009, 4 (2009)
Tan, A.H., et al.: Text mining: the state of the art and the challenges. In: Proceedings of the PAKDD 1999 Workshop on Knowledge Disocovery from Advanced Databases, pp. 65–70 (1999)
Taylor, J.W.: How to create new ideas. Prentice-Hall (1961)
Thorleuchter, D.: Finding new technological ideas and inventions with text mining and technique philosophy. In: Data Analysis, Machine Learning and Applications, pp. 413–420 (2008)
Thorleuchter, D., den Poel, D.V., Prinzie, A.: A compared r&d-based and patent-based cross impact analysis for identifying relationships between technologies. Technological Forecasting and Social Change 77(7), 1037–1050 (2010)
Thorleuchter, D., Van den Poel, D.: Companies website optimising concerning consumer’s searching for new products. In: 2011 International Conference on Uncertainty Reasoning and Knowledge Engineering (URKE), vol. 1. IEEE (2011)
Thorleuchter, D., Van den Poel, D.: Semantic technology classificationa defence and security case study. In: 2011 International Conference on Uncertainty Reasoning and Knowledge Engineering (URKE), vol. 1, pp. 36–39. IEEE (2011)
Thorleuchter, D., Van den Poel, D.: Extraction of ideas from microsystems technology. In: Jin, D., Lin, S. (eds.) Advances in CSIE, Vol. 1. AISC, vol. 168, pp. 563–568. Springer, Heidelberg (2012)
Thorleuchter, D., Van den Poel, D., Prinzie, A.: Extracting consumers needs for new products-a web mining approach. In: Third International Conference on Knowledge Discovery and Data Mining, WKDD 2010, pp. 440–443. IEEE (2010)
Thorleuchter, D., den Poel, D.V., Prinzie, A.: Mining ideas from textual information. Expert Systems with Applications 37(10), 7182–7188 (2010)
Trampuš, M., Mladenic, D.: Constructing domain templates with concept hierarchy as background knowledge. Information Technology And Control 43(4) (2014)
Wallas, G.: The art of thought (1926)
Wang, C., Lu, J., Zhang, G.: Mining key information of web pages: A method and its application. Expert Systems with Applications 33(2), 425–433 (2007)
Yoon, J.: Detecting weak signals for long-term business opportunities using text mining of web news. Expert Systems with Applications 39(16), 12543–12550 (2012)
Young, J.W.: A technique for producing ideas. NTC Business Books (1975)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Liu, H., Goulding, J., Brailsford, T. (2015). Towards Computation of Novel Ideas from Corpora of Scientific Text. In: Appice, A., Rodrigues, P., Santos Costa, V., Gama, J., Jorge, A., Soares, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2015. Lecture Notes in Computer Science(), vol 9285. Springer, Cham. https://doi.org/10.1007/978-3-319-23525-7_33
Download citation
DOI: https://doi.org/10.1007/978-3-319-23525-7_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23524-0
Online ISBN: 978-3-319-23525-7
eBook Packages: Computer ScienceComputer Science (R0)