Skip to main content

Methoden der Linked Data Integration

  • Chapter
  • First Online:
Linked Enterprise Data

Part of the book series: X.media.press ((XMEDIAP))

Zusammenfassung

Datenintegration bezeichnet das Zusammenführen unterschiedlicher Datensätze mit dem Ziel der gemeinsamen Abfrage und ist eine essentielle Voraussetzung für den Einsatz von Linked Data im Unternehmenskontext. Dieser Beitrag behandelt die Prozesse, welche notwendig sind um eine globale Sicht auf mehrere Datenquellen herzustellen. Da Linked Data Publisher eine Vielzahl verschiedener Vokabulare verwenden, um Informationen zu repräsentieren, gilt es zunächst die Datensets in ein konsistentes Zielvokabular zu überführen. Desweiteren müssen, in einem zweiten Schritt, Ressourcen in unterschiedlichen Datensätzen, welche dasselbe Realwelt-Objekt repräsentieren, identifiziert und verknüpft werden. Zuletzt müssen die zuvor verknüpften Ressourcen zu einer Entität verschmolzen werden.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 49.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Siehe http://dbpedia.org, aufgerufen am 21.03.2014.

  2. 2.

    Siehe http://www.freebase.com, aufgerufen am 21.03.2014.

  3. 3.

    Siehe http://ldif.wbsg.de/, aufgerufen am 21.03.2014.

  4. 4.

    Siehe http://r2r.wbsg.de/, aufgerufen am 21.03.2014.

  5. 5.

    Siehe http://silk.wbsg.de/, aufgerufen am 21.03.2014.

  6. 6.

    Siehe http://aksw.org/Projects/LIMES.html, aufgerufen am 21.03.2014.

  7. 7.

    Siehe http://sieve.wbsg.de/, aufgerufen am 21.03.2014.

Literatur

  1. Aguirre, J.L., B. Cuenca Grau, K. Eckert, J. Euzenat, A. Ferrara, R.W. van Hague, L. Hollink, E. Jimenez-Ruiz, C. Meilicke, A. Nikolov, D. Ritze, F. Scharffe, P. Shvaiko, O. Sváb-Zamazal, C. Trojahn, und B. Zapilko. 2012. Results of the Ontology Alignment Evaluation Initiative 2012. In Proceedings of the Seventh International Workshop on Ontology Matching (OM), 73–115.

    Google Scholar 

  2. Bizer, C., J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, und S. Hellmann. 2009. DBpedia – a crystallization point for the web of data. Web Semantics: Science, Services and Agents on the World Wide Web 7(3): 154–165.

    Article  Google Scholar 

  3. Bizer, C., und A. Schultz. 2010. The r2r framework: Publishing and discovering mappings on the web. In Proceedings of the First International Workshop on Consuming Linked Data.

    Google Scholar 

  4. Bollacker, K., C. Evans, P. Paritosh, T. Sturge, und J. Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 1247–1250.

    Google Scholar 

  5. Breiman, L., J. Friedman, C.J. Stone, und R.A. Olshen. 1984. Classification and regression trees. Chapman & Hall/CRC.

    Google Scholar 

  6. Caruana, R., und A. Niculescu-Mizil. 2006. An empirical comparison of supervised learning algorithms. In Proceedings of the 23rd International Conference on Machine Learning, 161–168.

    Google Scholar 

  7. Christen, P. 2011. A survey of indexing techniques for scalable record linkage and deduplication. IEEE Transactions on Knowledge and Data Engineering 24(9): 1537–1555.

    Article  Google Scholar 

  8. Christen, P. 2012. Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer.

    Google Scholar 

  9. Cochinwala, M., V. Kurien, G. Lalk, und D. Shasha. 2001. Efficient data reconciliation. Information Sciences 137(1): 1–15.

    Article  MATH  Google Scholar 

  10. Cohen, W.W., P. Ravikumar, und S.E. Fienberg. 2003. A comparison of string distance metrics for name-matching tasks. In Proceedings of the Workshop on Information Integration on the Web, 73–78.

    Google Scholar 

  11. Dey, D., S. Sarkar, und P. De. 1998. Entity matching in heterogeneous databases: A distance based decision model. In Proceedings of the 31st Annual Hawaii International Conference on System Sciences, 305–313.

    Google Scholar 

  12. Doan, A., Y. Lu, Y. Lee, und J. Han. 2003. Profile-based object matching for information integration. IEEE Intelligent Systems 18(5): 54–59.

    Article  Google Scholar 

  13. Elfeky, M.G., V.S. Verykios, und A.K. Elmagarmid. 2002. TAILOR: A record linkage toolbox. In Proceedings of 18th International Conference on Data Engineering, 17–28.

    Chapter  Google Scholar 

  14. Elmagarmid, A.K., P.G. Ipeirotis, und V.S. Verykios. 2007. Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering 19(1): 1–16.

    Article  Google Scholar 

  15. Fellegi, I.P., und A.B. Sunter. 1969. A theory for record linkage. Journal of the American Statistical Association 64(328).

    Google Scholar 

  16. Isele, R. 2013. Learning Expressive Linkage Rules for Entity Matching using Genetic Programming. Ph.D. thesis, University of Mannheim.

    Google Scholar 

  17. Isele, R., und C. Bizer. 2012. Learning expressive linkage rules using genetic programming. Proceedings of the VLDB Endowment (PVLDB) 5(11): 1638–1649.

    Article  Google Scholar 

  18. Köpcke, H., und E. Rahm. 2010. Frameworks for entity matching: A comparison. Data & Knowledge Engineering 69(2): 197–210.

    Article  Google Scholar 

  19. Köpcke, H., A. Thor, und E. Rahm. 2010. Evaluation of entity resolution approaches on real-world match problems. Proceedings of the VLDB Endowment 3(1-2): 484–493.

    Article  Google Scholar 

  20. Koza, J.R. 1993. Genetic programming: on the programming of computers by means of natural selection. MIT Press.

    Google Scholar 

  21. Levenshtein, V.I. 1966. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10(8): 707–710.

    MathSciNet  Google Scholar 

  22. Lim, E.P., J. Srivastava, S. Prabhakar, und J. Richardson. 1993. Entity identification in database integration. In Proceedings of the Ninth International Conference on Data Engineering, 294–301.

    Chapter  Google Scholar 

  23. Mendes, P.N., H. Mühleisen, und C. Bizer. 2012. Sieve: linked data quality assessment and fusion. In Proceedings of the 2012 Joint EDBT/ICDT Workshops, 116–123. ACM.

    Google Scholar 

  24. Michalowski, M., S. Thakkar, und C.A. Knoblock. 2004. Exploiting secondary sources for unsupervised record linkage. In Proceedings of the VLDB Workshop on Information Integration on the Web, 34–39.

    Google Scholar 

  25. Naumann, F., und M. Herschel. 2010. An Introduction to Duplicate Detection. Morgan & Claypool.

    Google Scholar 

  26. Ngomo, A.C.N., und S. Auer. 2011. Limes: a time-efficient approach for large-scale link discovery on the web of data. In Proceedings of the Twenty-Second international joint conference on Artificial Intelligence-Volume Volume Three, 2312–2317. AAAI Press.

    Google Scholar 

  27. Quinlan, J.R. 1986. Induction of decision trees. Machine Learning 1(1): 81–106.

    Google Scholar 

  28. Quinlan, J.R. 1993. programs for machine learning. Morgan Kaufmann Publishers.

    Google Scholar 

  29. Rokach, L., und O.Z. Maimon. 2008. Data mining with decision trees: theory and applications. World Scientific Publishing Company Incorporated.

    Google Scholar 

  30. Russell, R. April 1918. Index, United States patent 1261167.

    Google Scholar 

  31. Russell, R. November 1922. Index, United States patent 1435663.

    Google Scholar 

  32. Sarawagi, S., und A. Bhamidipaty. 2002. Interactive deduplication using active learning. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 269–278.

    Chapter  Google Scholar 

  33. Schultz, A., A. Matteini, R. Isele, P.N. Mendes, C. Bizer, und C. Becker. 2012. LDIF–a framework for large-scale linked data integration. In 21st International World Wide Web Conference, Developers Track.

    Google Scholar 

  34. Tejada, S., C.A. Knoblock, und S. Minton. 2001. Learning object identification rules for information integration. Information Systems 26(8): 607–633.

    Article  MATH  Google Scholar 

  35. Winkler, W.E. 1995. Matching and record linkage. Business Survey Methods 355–384.

    Google Scholar 

  36. Winkler, W.E. 2002. Methods for record linkage and bayesian networks. Tech. rep., Series RRS2002/05, U.S. Bureau of the Census.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robert Isele .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Isele, R. (2014). Methoden der Linked Data Integration. In: Pellegrini, T., Sack, H., Auer, S. (eds) Linked Enterprise Data. X.media.press. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30274-9_5

Download citation

Publish with us

Policies and ethics