Extraction of Interesting Financial Information from Heterogeneous XML-Based Data

Paik, Juryon; Eom, Young Ik; Kim, Ung Mo

doi:10.1007/11758549_52

Juryon Paik²⁰,
Young Ik Eom²⁰ &
Ung Mo Kim²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3994))

Included in the following conference series:

International Conference on Computational Science

1736 Accesses

Abstract

XML is going to be the main language for exchanging financial information between businesses over the Internet. As more and more banks and financial institutions move to electronic information exchange and reporting, the financial world is in a flood of information. With the sheer amount of financial information stored, presented and exchanged using XML-based standards, the ability to extract interesting knowledge from the data sources to better understand customer buying/selling behaviors and upward/downward trends in the stock market becomes increasingly important and desirable. Hence, there have been growing demands for efficient methods of discovering valuable information from a large collection of XML-based data. One of the most popular approaches to find the useful information is to mine frequently occurring tree patterns. In this paper, we propose a novel algorithm, FIXiT,for efficiently extracting maximal frequent subtrees from a set of XML-based documents. The main contributions of our algorithm are that: (1) it classifies the available financial XML standards such as FIXML, FpML, XBRL, and so forth with respect to their specifications, and (2) there is no need to perform tree join operations during the phase of generating maximal frequent subtrees.

This work was supported in part by the Ubiquitous Autonomic Computing and Network Project, 21st Century Frontier R&D Program and by the university IT Research Center project (ITRC), funded by the Korean Ministry of Information and Communication.

Download to read the full chapter text

Chapter PDF

Transactional Tree Mining

Enabling Real Time Analytics over Raw XML Data

Clustering XML Documents Using Frequent Edge-Sets

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 12th International Conference on Very Large Databases, pp. 487–499 (1994)
Google Scholar
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa, S.: Efficient substructure discovery from large semi-structured data. In: Proceedings of the 2nd SIAM International Conference on Data Mining, pp. 158–174 (2002)
Google Scholar
Chi, Y., Yang, Y., Muntz, R.R.: HybridTreeMiner: An efficient algorithm for mining frequent rooted trees and free trees using canonical forms. In: The 16th International Conference on Scientific and Statistical Database Management, pp. 11–20 (2004)
Google Scholar
Chi, Y., Yang, Y., Muntz, R.R.: Canonical forms for labelled trees and their applications in frequent subtree mining. Knowledge and Information Systems 8(2), 203–234 (2005)
Article Google Scholar
Inokuchi, A., Washio, T., Motoda, H.: An Apriori-based algorithm for mining frequent substructures from graph data. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000)
Chapter Google Scholar
Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: Proceedings of IEEE International Conference on Data Mining, pp. 313–320 (2001)
Google Scholar
Miyahara, T., Suzuki, T., Shoudai, T., Uchida, T., Takahashi, K., Ueda, H.: Discovery of frequent tag tree patterns in semistructured web documents. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 341–355. Springer, Heidelberg (2002)
Chapter Google Scholar
Paik, J., Shin, D.R., Kim, U.M.: EFoX: a Scalable Method for Extracting Frequent Subtrees. In: Sunderam, V.S., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2005. LNCS, vol. 3516, pp. 813–817. Springer, Heidelberg (2005)
Chapter Google Scholar
Paik, J., Won, D., Fotouhi, F., Kim, U.M.: EXiT-B: A New Approch for Extracting Maximal Frequent Subtrees from XML Data. In: Gallagher, M., Hogan, J.P., Maire, F. (eds.) IDEAL 2005. LNCS, vol. 3578, pp. 1–8. Springer, Heidelberg (2005)
Chapter Google Scholar
Termier, A., Rousset, M.-C., Sebag, M.: TreeFinder: a First step towards XML data mining. In: Proceedings of IEEE International Conference on Data Mining, pp. 450–457 (2002)
Google Scholar
Wang, K., Liu, H.: Schema discovery for semistructured data. In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, pp. 271–274 (1997)
Google Scholar
Zaki, M.J.: Efficiently mining frequent trees in a forest. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data mining, pp. 71–80 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Sungkyunkwan University, 300 Chunchun-dong, Jangan-gu, Suwon, Gyeonggi-do, 440-746, Republic of Korea
Juryon Paik, Young Ik Eom & Ung Mo Kim

Authors

Juryon Paik
View author publications
You can also search for this author in PubMed Google Scholar
Young Ik Eom
View author publications
You can also search for this author in PubMed Google Scholar
Ung Mo Kim
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Advanced Computing and Emerging Technologies Centre, The School of Systems Engineering, University of Reading, RG6 6AY, Reading, United Kingdom
Vassil N. Alexandrov
Department of Mathematics and Computer Science, University of Amsterdam, Kruislaan 403, 1098, Amsterdam, SJ, The Netherlands
Geert Dick van Albada
Faculty of Sciences, Section of Computational Science, University of Amsterdam, Kruislaan 403, 1098, Amsterdam, SJ, The Netherlands
Peter M. A. Sloot
Computer Science Department, University of Tennessee, 37996-3450, Knoxville, TN, USA
Jack Dongarra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Paik, J., Eom, Y.I., Kim, U.M. (2006). Extraction of Interesting Financial Information from Heterogeneous XML-Based Data. In: Alexandrov, V.N., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds) Computational Science – ICCS 2006. ICCS 2006. Lecture Notes in Computer Science, vol 3994. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11758549_52

Download citation

DOI: https://doi.org/10.1007/11758549_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34385-1
Online ISBN: 978-3-540-34386-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Extraction of Interesting Financial Information from Heterogeneous XML-Based Data

Abstract

Chapter PDF

Similar content being viewed by others

Transactional Tree Mining

Enabling Real Time Analytics over Raw XML Data

Clustering XML Documents Using Frequent Edge-Sets

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Extraction of Interesting Financial Information from Heterogeneous XML-Based Data

Abstract

Chapter PDF

Similar content being viewed by others

Transactional Tree Mining

Enabling Real Time Analytics over Raw XML Data

Clustering XML Documents Using Frequent Edge-Sets

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation