Abstract
XML is going to be the main language for exchanging financial information between businesses over the Internet. As more and more banks and financial institutions move to electronic information exchange and reporting, the financial world is in a flood of information. With the sheer amount of financial information stored, presented and exchanged using XML-based standards, the ability to extract interesting knowledge from the data sources to better understand customer buying/selling behaviors and upward/downward trends in the stock market becomes increasingly important and desirable. Hence, there have been growing demands for efficient methods of discovering valuable information from a large collection of XML-based data. One of the most popular approaches to find the useful information is to mine frequently occurring tree patterns. In this paper, we propose a novel algorithm, FIXiT,for efficiently extracting maximal frequent subtrees from a set of XML-based documents. The main contributions of our algorithm are that: (1) it classifies the available financial XML standards such as FIXML, FpML, XBRL, and so forth with respect to their specifications, and (2) there is no need to perform tree join operations during the phase of generating maximal frequent subtrees.
This work was supported in part by the Ubiquitous Autonomic Computing and Network Project, 21st Century Frontier R&D Program and by the university IT Research Center project (ITRC), funded by the Korean Ministry of Information and Communication.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 12th International Conference on Very Large Databases, pp. 487–499 (1994)
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa, S.: Efficient substructure discovery from large semi-structured data. In: Proceedings of the 2nd SIAM International Conference on Data Mining, pp. 158–174 (2002)
Chi, Y., Yang, Y., Muntz, R.R.: HybridTreeMiner: An efficient algorithm for mining frequent rooted trees and free trees using canonical forms. In: The 16th International Conference on Scientific and Statistical Database Management, pp. 11–20 (2004)
Chi, Y., Yang, Y., Muntz, R.R.: Canonical forms for labelled trees and their applications in frequent subtree mining. Knowledge and Information Systems 8(2), 203–234 (2005)
Inokuchi, A., Washio, T., Motoda, H.: An Apriori-based algorithm for mining frequent substructures from graph data. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000)
Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: Proceedings of IEEE International Conference on Data Mining, pp. 313–320 (2001)
Miyahara, T., Suzuki, T., Shoudai, T., Uchida, T., Takahashi, K., Ueda, H.: Discovery of frequent tag tree patterns in semistructured web documents. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 341–355. Springer, Heidelberg (2002)
Paik, J., Shin, D.R., Kim, U.M.: EFoX: a Scalable Method for Extracting Frequent Subtrees. In: Sunderam, V.S., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2005. LNCS, vol. 3516, pp. 813–817. Springer, Heidelberg (2005)
Paik, J., Won, D., Fotouhi, F., Kim, U.M.: EXiT-B: A New Approch for Extracting Maximal Frequent Subtrees from XML Data. In: Gallagher, M., Hogan, J.P., Maire, F. (eds.) IDEAL 2005. LNCS, vol. 3578, pp. 1–8. Springer, Heidelberg (2005)
Termier, A., Rousset, M.-C., Sebag, M.: TreeFinder: a First step towards XML data mining. In: Proceedings of IEEE International Conference on Data Mining, pp. 450–457 (2002)
Wang, K., Liu, H.: Schema discovery for semistructured data. In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, pp. 271–274 (1997)
Zaki, M.J.: Efficiently mining frequent trees in a forest. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data mining, pp. 71–80 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Paik, J., Eom, Y.I., Kim, U.M. (2006). Extraction of Interesting Financial Information from Heterogeneous XML-Based Data. In: Alexandrov, V.N., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds) Computational Science – ICCS 2006. ICCS 2006. Lecture Notes in Computer Science, vol 3994. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11758549_52
Download citation
DOI: https://doi.org/10.1007/11758549_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34385-1
Online ISBN: 978-3-540-34386-8
eBook Packages: Computer ScienceComputer Science (R0)