Data Farming: Concepts and Methods

  • Andrew Kusiak
Part of the Massive Computing book series (MACO, volume 6)


A typical data mining project uses data collected for various purposes, ranging from routinely gathered data, to process improvement projects, and to data required for archival purposes. In some cases, the set of considered features might be large (a wide data set) and sufficient for extraction of knowledge. In other cases the data set might be narrow and insufficient to extract meaningful knowledge or the data may not even exist.

Mining wide data sets has received attention in the literature, and many models and algorithms for feature selection have been developed for wide data sets.

Determining features for which data should be collected in the absence of an existing data set or when a data set is partially available has not been sufficiently addressed in the literature. Yet, this issue is of paramount importance as the interest in data mining is growing. The methods and process for the definition of the most appropriate features for data collection, data transformation, data quality assessment, and data analysis are referred to as data farming. This chapter outlines the elements of a data fanning discipline.

Key Words

Data Farming Data Mining Feature Definition Feature Functions New Features 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Barry, MJ.A. and G. Linoff (1997), Data Mining Techniques: For Marketing, Sales, and Customer Support, John Wiley, New York.Google Scholar
  2. Bloedorn, E. and R.S. Michalski (1998), Data-driven constructive induction, IEEE Intelligent Systems, Vol. 13, No. 2, pp. 30–37.CrossRefGoogle Scholar
  3. Breiman, L., J.H. Friedman, R.A. Olshen, and P.J. Stone (1984), Classification and Regression Trees, Wadworth International Group, Belmont, CA.zbMATHGoogle Scholar
  4. Carlett, J. (1991), Megainduction: Machine Learning on Very Large Databases, Ph.D. Thesis, Department of Computer Science, University of Sydney, Australia.Google Scholar
  5. Caroll, J.M. and J. Olson (1987), Mental Models in Human-Computer Interaction: Research Issues About the User of Software Knows, National Academy Press, Washington, DC.Google Scholar
  6. Cattral, R., F. Oppacher, and D. Deugo (2001), Supervised and unsupervised data mining with an evolutionary algorithm, Proceedings of the 2001 Congress on Evolutionary Computation, IEEE Press, Piscataway, NJ, pp. 767–776.CrossRefGoogle Scholar
  7. Cios, K., W. Pedrycz, and R. Swiniarski (1998), Data Mining: Methods for Knowledge Discovery, Kluwer, Boston, MA.Google Scholar
  8. Dugherty, D., R. Kohavi, and M. Sahami (1995), Supervised and unsupervised discretization of continuous features, Proceedings of the 12 th International Machine Learning Conference, pp. 194–202.Google Scholar
  9. Duda, R.O. and P.E. Hart (1973), Pattern Recognition and Scene Analysis, John Wiley, New York.Google Scholar
  10. Fayyad, U.M. and K.B. Irani (1993), Multi-interval discretization of continuously-valued attributes for classification learning, Proceedings of the 13 th International Joint Conference on Artificial Intelligence, pp. 1022–1027.Google Scholar
  11. Fukunaga, K. (1990), Introduction to Statistical Pattern Analysis, Academic Press, San Diego, CA.Google Scholar
  12. Han, J. and M. Kamber (2001), Data Mining: Concepts and Techniques, Morgan Kaufmann, San Diego, CA.Google Scholar
  13. John, G., R. Kohavi, and K. Pfleger (1994), Irrelevant features and the subset selection problem, Proceedings of the II th International Conference on Machine Learning, ICLM’94, Morgan Kaufmann, San Diego, CA, pp. 121–127.Google Scholar
  14. Kruchten, P. (2000), The Rational Unified Process: An Introduction, Addison-Wesley, New York, 2000.Google Scholar
  15. Kovacs, T. (2001), What should a classifier system learn, Proceedings of the 2001 Congress on Evolutionary Computation, IEEE Press, Piscataway, NJ, pp. 775–782.CrossRefGoogle Scholar
  16. Kusiak, A. (1999), Engineering Design: Products, Processes, and Systems, Academic Press, San Diego, CA.Google Scholar
  17. Kusiak, A. (2000), Decomposition in data mining: an industrial case study, IEEE Transactions on Electronics Packaging Manufacturing, Vol. 23, No. 4, pp. 345–353.CrossRefGoogle Scholar
  18. Kusiak, A., J.A. Kern, K.H. Kernstine, and T.L. Tseng (2000), Autonomous decision-making: A data mining approach, IEEE Transactions on Information Technology in Biomedicine, Vol. 4, No. 4, pp. 274–284.CrossRefGoogle Scholar
  19. Kusiak, A. (2001), Feature transformation methods in data mining, IEEE Transactions on Electronics Packaging Manufacturing, Vol. 24, No. 3, 2001, pp. 214–221.CrossRefGoogle Scholar
  20. Kusiak, A. (2002), A Data Mining Approach for Generation of Control Signatures, ASME Transactions: Journal of Manufacturing Science and Engineering, Vol. 124, No. 4, pp. 923–926.CrossRefGoogle Scholar
  21. LINDO (2003), (Accessed June 5, 2003).Google Scholar
  22. Pawlak Z. (1982), Rough sets, International Journal of Information and Computer Science, Vol. 11, No. 5, pp. 341–356.zbMATHMathSciNetCrossRefGoogle Scholar
  23. Pawlak, Z. (1991), Rough Sets: Theoretical Aspects of Reasoning About Data, Kluwer, Boston, MA.Google Scholar
  24. Preparata, F.P. and Shamos, M.I. (1985), Pattern Recognition and Scene Analysis, Springer-Verlag, New York.Google Scholar
  25. Quinlan, J.R. (1986), Induction of decision trees, Machine Learning, Vol. 1, No 1, pp. 81–106.Google Scholar
  26. Ragel, A. and B. Cremilleux (1998), Treatment of missing values for association rules, Proceedings of the Second Pacific Asia Conference, PAKDD’ 98, Melbourne, Australia.Google Scholar
  27. Stone, M. (1974), Cross-validatory choice and assessment of statistical predictions, Journal of the Royal Statistical Society, Vol. 36, pp. 111–147.zbMATHGoogle Scholar
  28. Slowinski, R. (1993), Rough set learning of preferential attitude in multi-criteria decision making, in Komorowski, J. and Ras, Z. (Eds), Methodologies for Intelligent Systems, Springer-Verlag, Berlin, Germany, pp. 642–651.Google Scholar
  29. Tou, J.T. and R.C. Gonzalez (1974), Pattern Recognition Principles, Addison Wesley, New York.zbMATHGoogle Scholar
  30. Vafaie, H. and K. De Jong (1998), Feature space transformation using genetic algorithms, IEEE Intelligent Systems, Vol. 13, No. 2, pp. 57–65.CrossRefGoogle Scholar
  31. Venables, W.N. and B.D. Ripley (1998), Modern Statistics with S-PLUS, Springer-Verlag, New York.Google Scholar
  32. Wickens, G., S.E. Gordon, and Y. Liu (1998), An Introduction to Human Factors Engineering, Harper Collins, New York.Google Scholar
  33. Wilson, S.W. (1995), Classifier fitness based on accuracy, Evolutionary Computation, Vol. 3, No. 2, pp. 149–175.Google Scholar
  34. Wnek, J. and R.S. Michalski (1994), Hypothesis-driven constructive induction in AQ17-HCI: A method and experiments, Machine Learning, Vol. 14, No, 2, pp. 139–168.zbMATHCrossRefGoogle Scholar
  35. Yang, J. and V. Honavar (1998), Feature subset selection using a genetic algorithm, IEEE Intelligent Systems, Vol. 13, No. 2, pp. 44–49.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2006

Authors and Affiliations

  • Andrew Kusiak
    • 1
  1. 1.Intelligent Systems Laboratory Mechanical and Industrial EngineeringThe University of IowaIowa City

Personalised recommendations