Advertisement

Collective Principal Component Analysis from Distributed, Heterogeneous Data

  • Hillol Kargupta
  • Weiyun Huang
  • Krishnamoorthy Sivakumar
  • Byung-Hoon Park
  • Shuren Wang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1910)

Abstract

Principal component analysis (PCA) is a statistical technique to identify the dependency structure of multivariate stochastic observations. PCA is frequently used in data mining applications. This paper considers PCA in the context of the emerging network-based computing environments. It offers a technique to perform PCA from distributed and heterogeneous data sets with relatively small communication overhead. The technique is evaluated against different data sets, including a data set for a web mining application. This approach is likely to facilitate the development of distributed clustering, associative link analysis, and other heterogeneous data mining applications that frequently use PCA.

Keywords

Central Site Heterogeneous Data Data Mining Application Distribute Data Mining Divisive Partitioning 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Boley, D.: Principal direction divisive partitioning. Data Mining and Knowledge Discovery 2 (1998) 325–344CrossRefGoogle Scholar
  2. 2.
    Faloutsos, C., Korn, F., Labrinidis, A., Kotidis, Y., Kaplunovich, A., Perkovic, D.: Quantifiable data mining using principal component analysis. Technical report (1997) Institute for Systems Research, University of Maryland technical Report TR 97–25.Google Scholar
  3. 3.
    Golub, G.H., Loan, C.F.V.: Matrix Computations. The Johns Hopkins University Press, Baltimore (1989)Google Scholar
  4. 4.
    Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Croz, J.D., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.: LAPACK Users’ Guide. Third edn. Society for Industrial & Applied Mathematics (1999)Google Scholar
  5. 5.
    Chan, P.K., Stolfo, S.J.: Sharing learned models among remote database partitions by local meta-learning. In Simoudis, E., Han, J., Fayyad, U., eds.: The Second International Conference on Knowledge Discovery and Data Mining, AAAI Press (1996) 2–7Google Scholar
  6. 6.
    Grossman, R., Bailey, S., Kasif, S., Mon, D., Ramu, A., Malhi, B.: The preliminary design of papyrus: A system for high performance, distributed data mining over clusters, meta-clusters and super-clusters. Fourth International Conference of Knowledge Discovery and Data Mining, New York, New York, Pages 37–43 (1998)Google Scholar
  7. 7.
    Kargupta, H., Park, B., Hershbereger, D., Johnson, E.: Collective data mining: A new perspective toward distributed data mining. To be published in the Advances in Distributed and Parallel Knowledge Discovery, Eds: Hillol Kargupta and Philip Chan, AAAI/MIT Press (1999)Google Scholar
  8. 8.
    Jackson, J.E.: A User’s Guide to Principal Components. John Wiley (1991)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Hillol Kargupta
    • 1
  • Weiyun Huang
    • 1
  • Krishnamoorthy Sivakumar
    • 1
  • Byung-Hoon Park
    • 1
  • Shuren Wang
    • 1
  1. 1.School of Electrical Engineering and Computer ScienceWashington State University PullmanUSA

Personalised recommendations