Methoden der Linked Data Integration

Isele, Robert

doi:10.1007/978-3-642-30274-9_5

Robert Isele⁴

Part of the book series: X.media.press ((XMEDIAP))

Zusammenfassung

Datenintegration bezeichnet das Zusammenführen unterschiedlicher Datensätze mit dem Ziel der gemeinsamen Abfrage und ist eine essentielle Voraussetzung für den Einsatz von Linked Data im Unternehmenskontext. Dieser Beitrag behandelt die Prozesse, welche notwendig sind um eine globale Sicht auf mehrere Datenquellen herzustellen. Da Linked Data Publisher eine Vielzahl verschiedener Vokabulare verwenden, um Informationen zu repräsentieren, gilt es zunächst die Datensets in ein konsistentes Zielvokabular zu überführen. Desweiteren müssen, in einem zweiten Schritt, Ressourcen in unterschiedlichen Datensätzen, welche dasselbe Realwelt-Objekt repräsentieren, identifiziert und verknüpft werden. Zuletzt müssen die zuvor verknüpften Ressourcen zu einer Entität verschmolzen werden.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Hardcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Siehe http://dbpedia.org, aufgerufen am 21.03.2014.
2.
Siehe http://www.freebase.com, aufgerufen am 21.03.2014.
3.
Siehe http://ldif.wbsg.de/, aufgerufen am 21.03.2014.
4.
Siehe http://r2r.wbsg.de/, aufgerufen am 21.03.2014.
5.
Siehe http://silk.wbsg.de/, aufgerufen am 21.03.2014.
6.
Siehe http://aksw.org/Projects/LIMES.html, aufgerufen am 21.03.2014.
7.
Siehe http://sieve.wbsg.de/, aufgerufen am 21.03.2014.

Literatur

Aguirre, J.L., B. Cuenca Grau, K. Eckert, J. Euzenat, A. Ferrara, R.W. van Hague, L. Hollink, E. Jimenez-Ruiz, C. Meilicke, A. Nikolov, D. Ritze, F. Scharffe, P. Shvaiko, O. Sváb-Zamazal, C. Trojahn, und B. Zapilko. 2012. Results of the Ontology Alignment Evaluation Initiative 2012. In Proceedings of the Seventh International Workshop on Ontology Matching (OM), 73–115.
Google Scholar
Bizer, C., J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, und S. Hellmann. 2009. DBpedia – a crystallization point for the web of data. Web Semantics: Science, Services and Agents on the World Wide Web 7(3): 154–165.
Article Google Scholar
Bizer, C., und A. Schultz. 2010. The r2r framework: Publishing and discovering mappings on the web. In Proceedings of the First International Workshop on Consuming Linked Data.
Google Scholar
Bollacker, K., C. Evans, P. Paritosh, T. Sturge, und J. Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 1247–1250.
Google Scholar
Breiman, L., J. Friedman, C.J. Stone, und R.A. Olshen. 1984. Classification and regression trees. Chapman & Hall/CRC.
Google Scholar
Caruana, R., und A. Niculescu-Mizil. 2006. An empirical comparison of supervised learning algorithms. In Proceedings of the 23rd International Conference on Machine Learning, 161–168.
Google Scholar
Christen, P. 2011. A survey of indexing techniques for scalable record linkage and deduplication. IEEE Transactions on Knowledge and Data Engineering 24(9): 1537–1555.
Article Google Scholar
Christen, P. 2012. Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer.
Google Scholar
Cochinwala, M., V. Kurien, G. Lalk, und D. Shasha. 2001. Efficient data reconciliation. Information Sciences 137(1): 1–15.
Article MATH Google Scholar
Cohen, W.W., P. Ravikumar, und S.E. Fienberg. 2003. A comparison of string distance metrics for name-matching tasks. In Proceedings of the Workshop on Information Integration on the Web, 73–78.
Google Scholar
Dey, D., S. Sarkar, und P. De. 1998. Entity matching in heterogeneous databases: A distance based decision model. In Proceedings of the 31st Annual Hawaii International Conference on System Sciences, 305–313.
Google Scholar
Doan, A., Y. Lu, Y. Lee, und J. Han. 2003. Profile-based object matching for information integration. IEEE Intelligent Systems 18(5): 54–59.
Article Google Scholar
Elfeky, M.G., V.S. Verykios, und A.K. Elmagarmid. 2002. TAILOR: A record linkage toolbox. In Proceedings of 18th International Conference on Data Engineering, 17–28.
Chapter Google Scholar
Elmagarmid, A.K., P.G. Ipeirotis, und V.S. Verykios. 2007. Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering 19(1): 1–16.
Article Google Scholar
Fellegi, I.P., und A.B. Sunter. 1969. A theory for record linkage. Journal of the American Statistical Association 64(328).
Google Scholar
Isele, R. 2013. Learning Expressive Linkage Rules for Entity Matching using Genetic Programming. Ph.D. thesis, University of Mannheim.
Google Scholar
Isele, R., und C. Bizer. 2012. Learning expressive linkage rules using genetic programming. Proceedings of the VLDB Endowment (PVLDB) 5(11): 1638–1649.
Article Google Scholar
Köpcke, H., und E. Rahm. 2010. Frameworks for entity matching: A comparison. Data & Knowledge Engineering 69(2): 197–210.
Article Google Scholar
Köpcke, H., A. Thor, und E. Rahm. 2010. Evaluation of entity resolution approaches on real-world match problems. Proceedings of the VLDB Endowment 3(1-2): 484–493.
Article Google Scholar
Koza, J.R. 1993. Genetic programming: on the programming of computers by means of natural selection. MIT Press.
Google Scholar
Levenshtein, V.I. 1966. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10(8): 707–710.
MathSciNet Google Scholar
Lim, E.P., J. Srivastava, S. Prabhakar, und J. Richardson. 1993. Entity identification in database integration. In Proceedings of the Ninth International Conference on Data Engineering, 294–301.
Chapter Google Scholar
Mendes, P.N., H. Mühleisen, und C. Bizer. 2012. Sieve: linked data quality assessment and fusion. In Proceedings of the 2012 Joint EDBT/ICDT Workshops, 116–123. ACM.
Google Scholar
Michalowski, M., S. Thakkar, und C.A. Knoblock. 2004. Exploiting secondary sources for unsupervised record linkage. In Proceedings of the VLDB Workshop on Information Integration on the Web, 34–39.
Google Scholar
Naumann, F., und M. Herschel. 2010. An Introduction to Duplicate Detection. Morgan & Claypool.
Google Scholar
Ngomo, A.C.N., und S. Auer. 2011. Limes: a time-efficient approach for large-scale link discovery on the web of data. In Proceedings of the Twenty-Second international joint conference on Artificial Intelligence-Volume Volume Three, 2312–2317. AAAI Press.
Google Scholar
Quinlan, J.R. 1986. Induction of decision trees. Machine Learning 1(1): 81–106.
Google Scholar
Quinlan, J.R. 1993. programs for machine learning. Morgan Kaufmann Publishers.
Google Scholar
Rokach, L., und O.Z. Maimon. 2008. Data mining with decision trees: theory and applications. World Scientific Publishing Company Incorporated.
Google Scholar
Russell, R. April 1918. Index, United States patent 1261167.
Google Scholar
Russell, R. November 1922. Index, United States patent 1435663.
Google Scholar
Sarawagi, S., und A. Bhamidipaty. 2002. Interactive deduplication using active learning. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 269–278.
Chapter Google Scholar
Schultz, A., A. Matteini, R. Isele, P.N. Mendes, C. Bizer, und C. Becker. 2012. LDIF–a framework for large-scale linked data integration. In 21st International World Wide Web Conference, Developers Track.
Google Scholar
Tejada, S., C.A. Knoblock, und S. Minton. 2001. Learning object identification rules for information integration. Information Systems 26(8): 607–633.
Article MATH Google Scholar
Winkler, W.E. 1995. Matching and record linkage. Business Survey Methods 355–384.
Google Scholar
Winkler, W.E. 2002. Methods for record linkage and bayesian networks. Tech. rep., Series RRS2002/05, U.S. Bureau of the Census.
Google Scholar

Download references

Author information

Authors and Affiliations

brox IT-Solutions GmbH, An der Breiten Wiese 9, 30625, Hannover, Deutschland
Robert Isele

Authors

Robert Isele
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robert Isele .

Editor information

Editors and Affiliations

Inst. f. Medienwirtschaft, Fachhochschule St. Pölten, St. Pölten, Austria
Tassilo Pellegrini
Hasso-Plattner-Institut für Softwaresystemtechnik GmbH, Potsdam, Germany
Harald Sack
Inst. f. Informatik III, Rheinische Friedrich-Wilhelms-Univ. Bonn, Bonn, Germany
Sören Auer

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Isele, R. (2014). Methoden der Linked Data Integration. In: Pellegrini, T., Sack, H., Auer, S. (eds) Linked Enterprise Data. X.media.press. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30274-9_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-30274-9_5
Published: 31 October 2014
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30273-2
Online ISBN: 978-3-642-30274-9
eBook Packages: Computer Science and Engineering (German Language)

Publish with us

Policies and ethics