Skip to main content

Transitive Identity Mapping Using Force-Based Clustering

  • Conference paper
  • First Online:
  • 487 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8316))

Abstract

In most information retrieval systems, software processes (whether agent-based or not) reason about passive items of data. An alternative approach instantiates each record as an agent that actively self-organizes with other agents (including queries). Imitating the movement of bodies under physical forces, we describe a distributed algorithm (“force-based clustering,” or FBC) for dynamically clustering and querying large, heterogeneous, dynamic collections of entities. The algorithm moves entities in a virtual space in a way that estimates the transitive closure of the pairwise comparisons. We demonstrate this algorithm on a large, heterogeneous collection of records, each representing a person. We have some information about a person of interest, but no record in the collection directly matches this information. Application of FBC identifies a small subset of records that are good candidates for describing the person of interest, for further manual investigation and verification.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Abu-Khzam, F.N., Samatovaz, N., Ostrouchov, G., Langston, M.A., Geist, A.: Distributed dimension reduction algorithms for widely dispersed data. In: Fourteenth IASTED International Conference on Parallel and Distributed Computing and Systems (IASTED PDCS 2002), pp. 167–174. ACTA Press (2002)

    Google Scholar 

  2. Aggarwal, C.C., Yu, P.S.: Finding generalized projected clusters in high dimensional spaces. In: SIGMOD Conference, pp. 70–81 (2000)

    Google Scholar 

  3. Faloutsos, C., Lin, K.-I.D.: FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In: ACM SIGMOD, San Jose, CA, pp. 163–174 (1995)

    Google Scholar 

  4. Fang, J., Li, H.: Optimal/near-optimal dimensionality reduction for distributed estimation in homogeneous and certain inhomogeneous scenarios. IEEE Trans. Signal Process. 58(8), 4339–4353 (2010)

    Article  MathSciNet  Google Scholar 

  5. Hinneburg, A., Aggarwal, C.C, Keim, D.A.: What is the nearest neighbor in high dimensional spaces? In: 26th International Conference on Very Large Data Bases (VLDB 2000), pp. 506–515. Morgan Kaufmann, Cairo (2000)

    Google Scholar 

  6. Jang, W., Hendry, M.: Cluster analysis of massive datasets in astronomy. Stat. Comput. 17(3), 253–262 (2007)

    Article  MathSciNet  Google Scholar 

  7. Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990)

    Book  Google Scholar 

  8. Kruskal, J.B.: Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29, 1–27 (1964)

    Article  MATH  MathSciNet  Google Scholar 

  9. Magdalinos, P., Doulkeridis, C., Vazirgiannis, M.: K-landmarks: distributed dimensionality reduction for clustering quality maintenance. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 322–334. Springer, Heidelberg (2006)

    Google Scholar 

  10. Magdalinos, P., Doulkeridis, C., Vazirgiannis, M.: a novel effective distributed dimensionality reduction algorithm. In: SIAM Feature Selection for Data Mining Workshop (SIAM-FSDM’06), Bethesda, MD (2006)

    Google Scholar 

  11. Magdalinos, P., Vazirgiannis, M., Valsamou, D.: Distributed knowledge discovery with non linear dimensionality reduction. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010. LNCS, vol. 6119, pp. 14–26. Springer, Heidelberg (2010)

    Google Scholar 

  12. NARA: The Soundex Indexing System. National Archives and Records Administration. http://www.archives.gov/research/census/soundex.html (2007)

  13. Parunak, H.V.D., Brueckner, S.A., Sauter, J.A., Matthews, R.: Global convergence of local agent behaviors. In Proceedings of Fourth International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS05), pp. 305–312. ACM (2005)

    Google Scholar 

  14. Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, Cambridge (2011)

    Book  Google Scholar 

  15. Roweis, S., Saul, L.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)

    Article  Google Scholar 

  16. Roy, O., Vetterli, M.: Dimensionality reduction for distributed estimation in the infinite dimensional regime. IEEE Trans. Inf. Theory 54(4), 1655–1669 (2008)

    Article  MathSciNet  Google Scholar 

  17. Cao, L., Gorodetsky, L., Mitkas, P.: Agent mining: the synergy of agents and data mining. IEEE Intell. Syst. 24(3), 64–72 (2009)

    Article  Google Scholar 

  18. Tenenbaum, J.B., Silva, Vd, Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to H. Van Dyke Parunak .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Van Dyke Parunak, H., Brueckner, S. (2014). Transitive Identity Mapping Using Force-Based Clustering. In: Cao, L., Zeng, Y., Symeonidis, A., Gorodetsky, V., Müller, J., Yu, P. (eds) Agents and Data Mining Interaction. ADMI 2013. Lecture Notes in Computer Science(), vol 8316. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55192-5_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-55192-5_10

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-55191-8

  • Online ISBN: 978-3-642-55192-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics