Knowledge and Information Systems

, Volume 28, Issue 1, pp 175–196 | Cite as

Generational analysis of tension and entropy in data structures: impact on automatic data integration and on the semantic web

Regular Paper

Abstract

The move toward automatic data integration from autonomous and heterogeneous sources is viewed as a transition from a closed to an open system, which is in essence an adaptive information processing system. Data definition languages from various computing eras spanning almost 50 years to date are examined, assessing if they have moved from closed systems to open systems paradigm. The study proves that contemporary data definition languages are indistinguishable from older ones using measurements of Variety, Tension and Entropy, three characteristics of complex adaptive systems (CAS). The conclusion is that even contemporary data definition languages designed for such integration exhibit closed systems characteristics along with open systems aspirations only. Plenty of good will is insufficient to make them more suitable for automatic data integration than their oldest predecessors. A previous report and these new findings set the stage for the development and proposal of a mathematically sound data definition language based on CAS, thus potentially making it better suited for automatic data integration from autonomous heterogeneous sources.

Keywords

Data integration Semantic web Data definition languages Law of requisite variety Coding and information theory Complex adaptive systems Variety Regulator Tension Entropy GlossoMote 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Batini C, Lenzerini M, Navathe S (1986) A comparative analysis of methodologies for database schema integration. ACM Comput Surv 18(4): 323–364CrossRefGoogle Scholar
  2. 2.
    Hunter A, Liu W (2005) Merging uncertain information with semantic heterogeneity in XML. Knowl Inf Syst 9(2): 230–258CrossRefGoogle Scholar
  3. 3.
    Rohn E (2009) Generational analysis of variety in data structures: impact on automatic data integration and on the semantic web. J Knowl Inf Syst. doi:10.1007/s10115-009-0246-7
  4. 4.
    Ashby RW (1940) Adaptiveness and equilibrium. J Ment Sci 86: 478–484Google Scholar
  5. 5.
    Ashby RW (1947) The nervous system as physical machine: with special reference to the origin of adaptive behavior. Mind 56(221): 44–59CrossRefGoogle Scholar
  6. 6.
    Ashby RW (1956) An introduction to cybernetics. Chapman & Hall, LondonMATHGoogle Scholar
  7. 7.
    Casti JL (1985) Canonical Models and the law of requisite variety. J Optim Theory Appl 46(4): 455–459MATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Bar-Yam Y (1997) Dynamics of complex systems. Westview Press, CambridgeMATHGoogle Scholar
  9. 9.
    Hannon B, Ruth M (1997) Modeling dynamic biological systems. Modeling dynamic systems. Springer, BerlinGoogle Scholar
  10. 10.
    Polderman JW, Willems JC (1998) Introduction to mathematical systems theory—a behavioral approach. Texts in applied mathematics. Springer, BerlinGoogle Scholar
  11. 11.
    Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27: 379–423MATHMathSciNetGoogle Scholar
  12. 12.
    Frame M, Mandelbrot B (2003) A panorama of fractals and their uses. unknown date http://classes.yale.edu/fractals/Panorama/SocialSciences/Linguistics/Linguistics.html
  13. 13.
    Buckley W (1998) Society—a complex adaptive system. International studies in global change. Gordon and Breach Publishers, New YorkGoogle Scholar
  14. 14.
    Mowshowitz A (1981) On approaches to the study of social issues in computing. Commun ACM 24(3): 146–155CrossRefGoogle Scholar
  15. 15.
    Lyytinen K (1987) Different perspectives on information systems: problems and solutions. ACM Comput Surv 19(1): 6–46CrossRefGoogle Scholar
  16. 16.
    Sullivan J, Vine (2003) wikimedia.org. p. License: this image is public domain. You may use this image for any purpose, including commercial. http://commons.wikimedia.org/wiki/File:Vine.jpg
  17. 17.
    Buckley W (1967) Sociology and modern systems theory. Prentice-Hall Inc, Englewood ClifsGoogle Scholar
  18. 18.
    Rohn E (2007) Complex adaptive system based data integration: theory and applications, in information systems. New Jersey Institute of Technology, Newark, p. 390Google Scholar
  19. 19.
    Raymond RC (1950) Communications, entropy, and life. Am Sci 38(April 1950): 273–278Google Scholar
  20. 20.
    Markus ML, Steinfield CW, Wigand RT (2003) The evolution of vertical is standards: electronic interchange standards in the US home mortgage industry. MIS Quarterly (Special Issue), 2003(2003 Special Issue)Google Scholar
  21. 21.
    Zipf GK (1949) Human behavior and the principle of least effort: an introduction to human ecology. Addison-Wesley, ReadingGoogle Scholar
  22. 22.
    Shu NC et al (1977) EXPRESS: a data extraction, processing, and restructuring system. ACM Trans Database Syst 2(2): 134–174CrossRefGoogle Scholar
  23. 23.
    Shu NC, Housel BC, Lum VY (1975) CONVERT: a high level translation definition language for data conversion. In: Proceedings of the 1975 ACM SIGMOD international conference on management of data. 1975. ACM Press, San JoseGoogle Scholar
  24. 24.
    Sowa JF (2001) Meaning preservation in translation. http://users.best-web.net/~sowa/logic/meaning.htm
  25. 25.
    Rohn E (2006) Data integration potentiometer in DERMIS. In: The 3rd international ISCRAM conference. NewarkGoogle Scholar
  26. 26.
    Rohn E (2007) A survey of schema standards and portals for emergency management and collaboration. In: The 4th international ISCRAM conference. DelftGoogle Scholar
  27. 27.
    Rohn E, Klashner R (2001) A survey of XML standards. Internal technical report. NJIT, NewarkGoogle Scholar
  28. 28.
    Rohn E, Klashner R (2004) Hidden disorder in XML tags. In: Proceedings of the Americas conference on information systems. New York. http://www.aisnet.org/conf.shtml
  29. 29.
    Sowa JF (1999) Knowledge representation: logical, philosophical, and computational foundations. Brooks Cole Publishing Co, Pacific GroveGoogle Scholar
  30. 30.
    Post EL (1943) Formal reduction of the general combinatorial decision problem. Am J Math 65(2): 197–215MATHCrossRefMathSciNetGoogle Scholar
  31. 31.
    Abiteboul S, Cluet S, Milo T (2002) Correspondence and translation for heterogeneous data. Theor Comput Sci 275(1–2)Google Scholar
  32. 32.
    Abiteboul S et al (1997) The lorel query language for semi-structured data. Int J Digit Libr 1(1): 68–88CrossRefMathSciNetGoogle Scholar
  33. 33.
    Adelberg B (1998) NoDoSE—a tool for semi-automatically extracting structured and semi-structured data from text documents. ACM SIGIR 27(2): 283–294CrossRefGoogle Scholar
  34. 34.
    Halevy A (2005) Why your data won’t mix: semantic heterogeneity. ACM Queue 3(8): 50–58CrossRefGoogle Scholar
  35. 35.
    Halevy AY et al. (2005) Enterprise information integration: successes, challenges and controversies. In: Proceedings of the 2005 ACM SIGMOD international conference on Management of data. ACM Press, Baltimore. doi:10.1145/1066157.1066246
  36. 36.
    Katz RH (1980) Heterogeneous databases and high level abstraction. In: Proceedings of the workshop on data abstraction, databases and conceptual modelling. ACM Press, Pingree ParkGoogle Scholar
  37. 37.
    Kirk T et al (1995) The information manifold. Information gathering from heterogeneous, distributed environments, ed. Knoblock CA, Levy AGoogle Scholar
  38. 38.
    Lahiri T, Abiteboul S, Widom J (2000) Ozone: integrating structured and semistructured data. In: Revised papers from the 7th international workshop on database programming languages: research issues in structured and semistructured database programming: Springer-VerlagGoogle Scholar
  39. 39.
    Lee J, Malone T (1990) PARTIALLY SHARED VIEWS a scheme for communicating among groups that use different type hierarchies. ACM Trans Inf Syst(TOIS) 8(1): 1–26CrossRefGoogle Scholar
  40. 40.
    Levi AY, Rajaraman A, Ordille JJ (1996) Querying heterogeneous information sources using source descriptions. In: The 22nd international conference on very large databases (VLDB-96). BombayGoogle Scholar
  41. 41.
    Li W-S, Clifton C (1995) Semint: a system prototype for semantic integration in heterogeneous databases. In: ACM SIGMOD international conference on Management of data. ACM Press, San JoseGoogle Scholar
  42. 42.
    Liu L, Pu C, Han W (2000) XWRAP: an XML-enabled wrapper construction system for web information sources. In: 16th International conference on data engineering (ICDE’00). San Diego, California, p 611. http://www.computer.org/portal/web/csdl/proceedings/i#5
  43. 43.
    Liu S et al (2005) XSDL: making XML semantics explicit. Lect Notes Comput Sci 3372(2005): 64–83CrossRefGoogle Scholar
  44. 44.
    Metadatabase (2003) An information integration theory and reference model. http://viu.eng.rpi.edu/mdb/iitrm.html
  45. 45.
    Miller RJ, Loannidis YE, Ramakrishnan R (1994) Schema equivalence in heterogeneous systems: bridging theory and practice. Inf Syst Front 19(1): 3–31Google Scholar
  46. 46.
    Motro A, Buneman P (1981) Constructing superviews. In: SIGMOD international conference on management of data. ACM Press, Ann ArborGoogle Scholar
  47. 47.
    Noy NF (2004) Semantic integration: a survey of ontology-based approaches. SIGMOD Rec 33(4): 65–70CrossRefGoogle Scholar
  48. 48.
    Quass D et al. (1995) Querying semistructured heterogeneous information. In: Deductive and object-oriented databases, pp. 319–344Google Scholar
  49. 49.
    Sanderson M, van Rijsbergen C (1999) The impact on retrieval effectiveness of skewed frequency distributions. ACM Trans Inf Syst 17(4): 440–465CrossRefGoogle Scholar
  50. 50.
    Smith JM et al (1981) Multibase—integrating heterogeneous distributed database systems. In: National computer conference. AFIPS, MontvaleGoogle Scholar
  51. 51.
    Stohr E, Nickerson JV (2003) Enterprise Integration: Methods and Direction. In: Luftman J. (eds) Competing in the information age: Align in the sand. Oxford University Press, OxfordGoogle Scholar
  52. 52.
    Swartwout D, Fry JP (1978) Towards the support of integrated views of multiple databases: an aggregate schema facility. In: SIGMOD. ACM, AustinGoogle Scholar
  53. 53.
    Uschold M, Gruninger M (2004) Ontologies and semantics for seamless connectivity. SIGMOD Rec 33(4): 58–64CrossRefGoogle Scholar
  54. 54.
    Von-Wun S et al (2003) Automated semantic annotation and retrieval based on sharable ontology and case-based learning techniques. In: Proceedings of the 3rd ACM/IEEE-CS joint conference on digital libraries. IEEE Computer Society, HoustonGoogle Scholar
  55. 55.
    Wade AE (1993) Single logical view over enterprise-wide distributed databases. In: ACM SIGMOD international conference on management of data. ACM Press, WashingtonGoogle Scholar
  56. 56.
    Wiederhold G (1992) Mediators in the architecture of future information systems. IEEE Comput 25(3): 38–49Google Scholar
  57. 57.
    Yan LL (1997) Towards efficient and scalable mediation: the AURORA approach. In: Proceedings of the 1997 conference of the centre for advanced studies on collaborative research. p. 23Google Scholar
  58. 58.
    Zhang YT, Gong L, Wang YC (2005) Corpus-based word sense disambiguation using naive Bayesian. Zhongnan Daxue Xuebao (Ziran Kexue Ban)/J Central South University (Sci Technol) 36(SUPPL): 483Google Scholar
  59. 59.
    Ahuja A, Ng Y-K (2009) A dynamic attribute-based data filtering and recovery scheme for web information processing. Knowl Inf Syst 18(3): 263–291CrossRefGoogle Scholar
  60. 60.
    Yang J, Cheung WK, Chen X (2009) Learning element similarity matrix for semi-structured document analysis. Knowl Inf Syst 19(1): 53–78CrossRefGoogle Scholar
  61. 61.
    Hurford J (1987) Biological evolution of the Saussurean sign as a component of the language acquisition device. Lingua 77(2): 187–222CrossRefGoogle Scholar
  62. 62.
    Komarova N, Nowak MA (2001) The evolutionary dynamics of the lexical matrix. Bull Mathe Biol 63: 451–484CrossRefGoogle Scholar
  63. 63.
    Chklovski T et al (2004) The senseval-3 multilingual English–Hindi lexical sample task. In: Third international workshop on the evaluation of systems for the semantic analysis of text. BarcelonaGoogle Scholar
  64. 64.
    Nowak MA, Krakauer DC (1999) The evolution of language. Proc Natl Acad Sci USA 96(14): 8028–8033CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2010

Authors and Affiliations

  1. 1.CogniMax LLCHighland ParkUSA

Personalised recommendations