Selecting the Links in BisoNets Generated from Document Collections
According to Koestler, the notion of a bisociation denotes a connection between pieces of information from habitually separated domains or categories. In this paper, we consider a methodology to find such bisociations using a network representation of knowledge, which is called a BisoNet, because it promises to contain bisociations. In a first step, we consider how to create BisoNets from several textual databases taken from different domains using simple text-mining techniques. To achieve this, we introduce a procedure to link nodes of a BisoNet and to endow such links with weights, which is based on a new measure for comparing text frequency vectors. In a second step, we try to rediscover known bisociations, which were originally found by a human domain expert, namely indirect relations between migraine and magnesium as they are hidden in medical research articles published before 1987. We observe that these bisociations are easily rediscovered by simply following the strongest links. Future work includes extending our methods to non-textual data, improving the similarity measure, and applying more sophisticated graph mining methods.
KeywordsDocument Collection Term Frequency View Versus Indirect Relation Textual Database
Unable to display preview. Download preview PDF.
- 1.Koestler, A.: The act of creation. London Hutchinson (1964)Google Scholar
- 2.Barron, F.: Putting creativity to work. In: The nature of creativity. Cambridge Univ. Press, Cambridge (1988)Google Scholar
- 3.Cormac, E.M.: A cognitive theory of metaphor. MIT Press, Cambridge (1985)Google Scholar
- 4.Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD internation conference on management of data, pp. 207–216 (1993)Google Scholar
- 8.Aamodt, A., Plaza, E.: Case-based reasoning: foundational issues, methodological variations and system approaches. Artificial Intelligence Communications 7(1), 39–59 (1994)Google Scholar
- 9.Cardoso, A., Costa, E., Machado, P., Pereira, F., Gomes, P.: An architecture for hybrid creative reasoning. In: Soft Computing in Case Based Reasoning. Springer, Heidelberg (2000)Google Scholar
- 10.van Rijsbergen, C.J., Robertson, S.E., Porter, M.F.: New models in probabilistic information retrieval. In: British Library Research and Development Report, Number 5587. London British Library (1980)Google Scholar
- 12.Jaccard, P.: Étude comparative de la distribution florale dans une portion des alpes et du jura. Bulletin de la Société Vaudoise des Sciences Naturelles 37, 547–579 (1901)Google Scholar
- 13.Don, R., Swanson, N.R., Smalheiser, V.I.T.: Ranking indirect connections in literature-based discovery: The role of medical subject headings. Journal of the American Society for Information Science and Technology (JASIST) 57(11) (September 2006)Google Scholar