Abstract
In some applications it is necessary to sort a set of elements according to an order relationship which is not known a priori. In these cases, a training set of ordered elements is often available, from which the order relationship can be automatically learned. In this work, it is assumed that the correct succession of elements in a training sequence (or chain) is given, so that it is possible to induce the definition of two predicates, first/1 and succ/2, which are then used to establish an ordering relationship. A peculiarity of this work is the relational representation of training data which allows various relationships between ordered elements to be expressed in addition to the ordering relationship. Therefore, an ILP learning algorithm is applied to induce the definitions of the two predicates. Two methods are reported for the identification of either single chains or multiple chains on new objects. They have been applied to the problem of learning the reading order of layout components extracted from document images. Experimental results show the effectiveness of the proposed solution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aiello, M., Monz, C., Todoran, L., Worring, M.: Document understanding for a broad class of documents. International Journal on Document Analysis and Recognition-IJDAR 5(1), 1–16 (2002)
Aiello, M., Smeulders, A.: Bidimensional relations for reading order detection. In: Proceedings of Joint Conference on Information Science (2003)
Altamura, O., Esposito, F., Malerba, D.: Transforming paper documents into XML format with WISDOM++. International Journal on Document Analysis and Recognition-IJDAR 4(1), 2–17 (2001)
Breuel, T.M.: High performance document layout analysis. In: Proceedings of the 2003 Symposium on Document Image Understanding (SDIUT 2003) (2003)
Ceci, M., Berardi, M., Porcelli, G., Malerba, D.: A data mining approach to reading order detection. In: ICDAR 2007: 9th International Conference on Document Analysis and Recognition, pp. 924–928 (2007)
Cohen, W.W., Schapire, R.E., Singer, Y.: Learning to order things. Journal of Artificial Intelligence Research (JAIR) 10, 243–270 (1999)
De Raedt, L.: Interactive Theory Revision. Academic Press, London (1992)
Džeroski, S., Lavrač, N.: Relational Data Mining. Springer, Berlin (2001)
Dwork, C., Kumar, R., Naor, M., Sivakumar, D.: Rank aggregation methods for the web. In: WWW 2001: Proceedings of the 10th international conference on World Wide Web, pp. 613–622. ACM Press, New York (2001)
Gionis, A., Kujala, T., Mannila, H.: Fragments of order. In: KDD 2003: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 129–136. ACM Press, New York (2003)
Grimaldi, R.P.: Discrete and Combinatorial Mathematics, an Applied Introduction, 3rd edn. Addison Wesley, Reading (1994)
Ishitani, Y.: Document transformation system from papers to XML data based on Pivot XML document method. In: ICDAR 2003: 7th International Conference on Document Analysis and Recognition, p. 250. IEEE Computer Society, Los Alamitos (2003)
Kamishima, T., Akaho, S.: Learning from order examples. In: Proceedings of the 2nd IEEE International Conference on Data Mining, pp. 645–648 (2002)
Lavrač, N., Džeroski, S.: Inductive Logic Programming: techniques and applications. Ellis Horwood, Chichester (1994)
Levi, G., Sirovich, F.: Generalized and/or graphs. Artificial Intelligence 7(3), 243–259 (1976)
Lloyd, J.W.: Foundations of Logic Programming, 2nd edn. Springer, Berlin (1987)
Malerba, D.: Learning recursive theories in the normal ILP setting. Fundamenta Informaticae 57(1), 39–77 (2003)
Malerba, D., Esposito, F., Altamura, O., Ceci, M., Berardi, M.: Correcting the document layout: A machine learning approach. In: ICDAR 2003: 7th International Conference on Document Analysis and Recognition, p. 97 (2003)
Mannila, H., Meek, C.: Global partial orders from sequential data. In: KDD 2000: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 161–168. ACM Press, New York (2000)
Maruster, L., Weijters, A., van der Aalst, W., van den Bosch, A.: Process mining: Discovering direct successors in process logs. In: Lange, S., Satoh, K., Smith, C.H. (eds.) DS 2002. LNCS, vol. 2534, pp. 364–373. Springer, Heidelberg (2002)
Meunier, J.-L.: Optimized xy-cut for determining a page reading order. In: ICDAR 2005: 8th International Conference on Document Analysis and Recognition, pp. 347–351. IEEE Computer Society, Washington (2005)
Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)
Muggleton, S.: Inductive Logic Programming. Academic Press, London (1992)
Nienhuys-Cheng, S.-W., de Wolf, R.: Foundations of inductive logic programming. Springer, Heidelberg (1997)
Taylor, S.L., Dahl, D.A., Lipshutz, M., Weir, C., Norton, L.M., Nilson, R., Linebarger, M.: Integrated text and image understanding for document understanding. In: HLT 1994: Proceedings of the workshop on Human Language Technology, pp. 421–426 (1994)
Tsujimoto, S., Asada, H.: Understanding multi-articled documents. In: Proceedings of the 10th International Conference on Pattern Recognition, pp. 551–556 (1990)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Malerba, D., Ceci, M. (2008). Learning to Order: A Relational Approach. In: Raś, Z.W., Tsumoto, S., Zighed, D. (eds) Mining Complex Data. MCD 2007. Lecture Notes in Computer Science(), vol 4944. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68416-9_17
Download citation
DOI: https://doi.org/10.1007/978-3-540-68416-9_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68415-2
Online ISBN: 978-3-540-68416-9
eBook Packages: Computer ScienceComputer Science (R0)