Algorithms for Redescription Mining

Galbrun, Esther; Miettinen, Pauli

doi:10.1007/978-3-319-72889-6_2

Esther Galbrun¹⁶ &
Pauli Miettinen¹⁷

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

367 Accesses

Abstract

The aim of redescription mining is to find valid redescriptions for given data, query language, similarity relation, and user-specified constraints. In other words, we need to explore the search space consisting of query pairs from the query language, looking for those pairs that have similar enough support in the data and that satisfy the other constraints. In this chapter, we present the different methods that have been proposed to carry out this exploration efficiently. Existing methods can be arranged into three main categories: (1) mine-and-pair approaches, (2) alternating approaches, and (3) approaches that use atomic updates. We consider each one in turn, explaining its general common principles and looking at different algorithms designed on these principles. Next, we compare the different methods and discuss their relative strengths and weaknesses. Finally, we consider how to adapt the algorithms to handle cases where some values are missing from the input data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The equation for the pessimistic Jaccard index presented by Galbrun and Miettinen (2012, Equation 5.7) is erroneous, as it misses two summands from the denominator.

References

Aggarwal CC (2015) Data Mining: The Textbook. Springer, Cham, https://doi.org/10.1007/978-3-319-14142-8
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of 20th International Conference on Very Large Data Bases (VLDB’94), pp 487–499
Google Scholar
Blockeel H, De Raedt L, Ramon J (1998) Top-down induction of clustering trees. In: Proceedings of the 15th International Conference on Machine Learning (ICML’98), pp 55–63
Google Scholar
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC press, Boca Raton, FL
Google Scholar
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297, https://doi.org/10.1007/BF00994018
Galbrun E, Miettinen P (2012) From black and white to full color: Extending redescription mining outside the Boolean world. Stat Anal Data Min 5(4):284–303, https://doi.org/10.1002/sam.11145
Gallo A, Miettinen P, Mannila H (2008) Finding subgroups having several descriptions: Algorithms for redescription mining. In: Proceedings of the 8th SIAM International Conference on Data Mining (SDM’08), pp 334–345, https://doi.org/10.1137/1.9781611972788.30
Ganter B, Wille R (1999) Formal Concept Analysis: Mathematical Foundations. Springer, Berlin, https://doi.org/10.1007/978-3-642-59830-2
Garey MR, Johnson DS (2002) Computers and intractability. A guide to the theory of NP-completeness, vol 29. W. H. Freeman and Co., San Francisco, CA
Google Scholar
Kumar D (2007) Redescription mining: Algorithms and applications in bioinformatics. PhD thesis, Department of Computer Science, Virginia Polytechnic Institute and State University
Google Scholar
Mannila H, Toivonen H, Verkamo AI (1994) Efficient algorithms for discovering association rules. In: Proceedings of the 1994 AAAI Workshop on Knowledge Discovery in Databases (KDD’94), pp 181–192
Google Scholar
Mihelčić M, Džeroski S, Lavrač N, Šmuc T (2017) A framework for redescription set construction. Expert Syst Appl 68:196–215, https://doi.org/10.1016/j.eswa.2016.10.012
Mihelčić M, Džeroski S, Lavrač N, Šmuc T (2016) Redescription mining with multi-target predictive clustering trees. In: Proceedings of the 4th International Workshop on the New Frontiers in Mining Complex Patterns (NFMCP’15), pp 125–143, https://doi.org/10.1007/978-3-319-39315-5_9
Mihelčić M, Džeroski S, Lavrač N, Šmuc T (2017) Redescription mining augmented with random forest of multi-target predictive clustering trees. J of Intell Inf Syst pp 1–34, https://doi.org/10.1007/s10844-017-0448-5
Négrevergne B, Termier A, Rousset M, Méhaut J (2014) Para miner: A generic pattern mining algorithm for multi-core architectures. Data Min Knowl Disc 28(3):593–633, https://doi.org/10.1007/s10618-013-0313-2
Quinlan J (1986) Induction of decision trees. Mach Learn 1(1):81–106, https://doi.org/10.1023/A:1022643204877
Ramakrishnan N, Zaki MJ (2009) Redescription mining and applications in bioinformatics. In: Chen J, Lonardi S (eds) Biological Data Mining, Chapman and Hall/CRC, Boca Raton, FL
Google Scholar
Ramakrishnan N, Kumar D, Mishra B, Potts M, Helm RF (2004) Turning CARTwheels: An alternating algorithm for mining redescriptions. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’04), pp 266–275, https://doi.org/10.1145/1014052.1014083
Zaki MJ, Hsiao CJ (2005) Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data En 17(4):462–478, https://doi.org/10.1109/TKDE.2005.60
Zaki MJ, Ramakrishnan N (2005) Reasoning about sets using redescription mining. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’05), pp 364–373, https://doi.org/10.1145/1081870.1081912
Zhao L, Zaki MJ, Ramakrishnan N (2006) BLOSOM: A framework for mining arbitrary Boolean expressions. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’06), pp 827–832, https://doi.org/10.1145/1150402.1150511
Zinchenko T, Galbrun E, Miettinen P (2015) Mining predictive redescriptions with trees. In: IEEE International Conference on Data Mining Workshops, pp 1672–1675, https://doi.org/10.1109/ICDMW.2015.123

Download references

Author information

Authors and Affiliations

LORIA, Inria Nancy – Grand Est, Villers-lès-Nancy, France
Esther Galbrun
Databases and Information Systems, Max-Planck-Institute for Informatics, Saarbrücken, Germany
Pauli Miettinen

Authors

Esther Galbrun
View author publications
You can also search for this author in PubMed Google Scholar
Pauli Miettinen
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Galbrun, E., Miettinen, P. (2017). Algorithms for Redescription Mining. In: Redescription Mining. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-72889-6_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-72889-6_2
Published: 12 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-72888-9
Online ISBN: 978-3-319-72889-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics