Skip to main content

Part of the book series: Undergraduate Topics in Computer Science ((UTICS))

  • 2413 Accesses

Abstract

This is an introductory chapter in which

  1. (i)

    The goals of core data analysis as a tool helping to enhance and augment knowledge of the domain are outlined. Since knowledge is represented by the concepts and statements of relation between them, two main pathways for data analysis are summarization, for developing and augmenting concepts, and correlation, for enhancing and establishing relations.

  2. (ii)

    A set of eight cases involving small datasets and related data analysis problems is presented. The datasets are taken from various fields such as monitoring market towns, computer security protocols, bioinformatics, and cognitive psychology.

  3. (iii)

    An overview of data visualization, its goals and some techniques, is given.

  4. (iv)

    A general view of strengths and pitfalls of data analysis is provided.

  5. (v)

    An overview of the concept of classification as a soft knowledge structure widely used in theory and practice is given.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • M. Berthold, D. Hand, Intelligent Data Analysis (Springer, New York, 2003)

    Google Scholar 

  • S.K. Card, J.D. Mackinlay, B. Shneiderman, Readings in Information Visualization: Using Vision to Think (Morgan Kaufmann Publishers, San Francisco, CA, 1999). ISBN 1-55860-533-9

    Google Scholar 

  • R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, 2nd edn. (Wiley-Interscience, 2012). ISBN 0-471-05669-3

    Google Scholar 

  • J.F. Hair, W.C. Black, B.J. Babin, R.E. Anderson, Multivariate Data Analysis, 7th edn. (Prentice Hall, 2010). ISBN-10: 0-13-813263-1

    Google Scholar 

  • S.S. Haykin, Neural Networks, 2nd edn. (Prentice Hall, 1999). ISBN 0132733501

    Google Scholar 

  • C. Kendig (ed.), Natural Kinds and Classification in Scientific Practice (Routledge, Oxford, 2015). ISBN 9781848935402

    Google Scholar 

  • J. Kepler, Harmonies of the World (originally 1619, translated to English by C.G. Wallis, 1939) (Global Grey Publisher, 2014). (E-publication http://www.24grammata.com/wp-content/uploads/2014/08/Kepler-Harmonies-Of-The-World-24grammata.pdf. Accessed 11 June 2017)

  • E.V. Koonin, The Logic of Chance: the Nature and Origin of Biological Evolution (FT Press Science, 2011)

    Google Scholar 

  • S.D. Levitt, S.J. Dubner, Freakonomics (William Morrow, New York, 2005). See also a free extension in about 300 episodes in voice and print: http://freakonomics.com/archive/. Accessed 17 June 2017

  • K. Libbrecht, The Snowflake: Winter’s Secret Beauty (Voyageur Press, 2004)

    Google Scholar 

  • R. Mazza, Introduction to Information Visualization (Springer, New York, 2009). ISBN: 978-1-84800-218-0

    Google Scholar 

  • B. Mirkin, Methods for Grouping in SocioEconomic Research (Finansy I Statistika Publishers, Moscow, 1985). (in Russian)

    Google Scholar 

  • B. Mirkin, Mathematical Classification and Clustering (Kluwer Academic Press, 1996)

    Google Scholar 

  • B. Mirkin, Clustering: a Data Recovery Approach (Chapman & Hall/CRC, 2012). ISBN: 1-4398-3841-9

    Google Scholar 

  • S.R. Ranganathan, Colon Classification (Ess Ess Publications, 2006). ISBN-10: 8170004608

    Google Scholar 

  • H.H. Sisler, Electronic Structure, Properties, and the Periodic Law (Van Nostrand Reinhold Company, 1973)

    Google Scholar 

  • G. Standing, The Precariat—the New Dangerous Class (Bloomsbury Academic, London, 2011)

    Google Scholar 

  • J.W. Tukey, Exploratory Data Analysis (Addison-Wesley, Reading MA, 1977)

    MATH  Google Scholar 

  • C. Ware, Information Visualization: Perception for Design, 3rd edn. (Elsevier, 2012). ISBN: 978-0-12-381464-7

    Google Scholar 

Articles

  • J.G. Adair, The Hawthorne effect: a reconsideration of the methodological artifact. J. Appl. Psychol. 69(2), 334–345 (1984)

    Article  Google Scholar 

  • H. Brody, M.R. Rip, P. Vinten-Johansen, N. Paneth, S. Rachman, Map-making and myth-making in Broad Street: the London cholera epidemic, 1854. Lancet 356(9223), 64–68 (2000)

    Article  Google Scholar 

  • W.S. Cleveland, Graphical methods for data presentation: full scale breaks, dot charts, and multibased logging. Am. Stat. 38, 270–280 (1984)

    Google Scholar 

  • R. Fisher, The use of multiple measurements in taxonomic problems. Ann. Eugenics 7, 179–188 (1936)

    Article  Google Scholar 

  • S. Henikoff, J. Henikoff, Amino acid substitution matrices from protein blocks. PNAS USA 89(22), 10915–10919 (1992)

    Article  Google Scholar 

  • M.I. Jordan, T.M. Mitchell, Machine learning: trends, perspectives, prospects. Science 349(6245), 255–260 (2015)

    Article  MathSciNet  Google Scholar 

  • G. Keren, S. Baggen, Recognition models of alphanumeric characters, Perception and Psychophysics, 29(3), 234–246 (1981)

    Article  Google Scholar 

  • Y. LeCun, Obstacles on the path to AI (2015), https://drive.google.com/file/d/0BxKBnD5y2M8NbWN6XzM5UXkwNDA/view?pli=1. Accessed 6 Feb 2018

  • B. Lee, N.H. Riche, P. Isenberg, S. Carpendale, More than telling a story: a closer look at the process of transforming data into visually shared stories. IEEE Comput. Graph. Appl. 35(5), 84–90 (2015)

    Article  Google Scholar 

  • S. Machlis, 22 free tools for data visualization and analysis (2017) https://www.computerworld.com/article/2507728/enterprise-applications/enterprise-applications-22-free-tools-for-data-visualization-and-analysis.html?page=1. Accessed 14 Jan 2018

  • B. Mirkin, Summary and semi-average similarity criteria for individual clusters, in Models, Algorithms, and Technologies for Network Analysis, ed. by B. Goldengorin, V. Kalyagin, P. Pardalos (Springer, New York, 2013), pp. 101–126

    Chapter  Google Scholar 

  • Y. Qin, H.A. Simon, Laboratory replication of scientific discovery processes. Cogn. Sci. 14(2), 281–312 (1990)

    Article  Google Scholar 

  • R. Rao, S.K.Card, The table lens: merging graphical and symbolic representations in an interactive focus+ context visualization for tabular information. In Proceedings of the ACM SIGCHI conference on Human factors in computing systems, pp. 318–322 (1994)

    Google Scholar 

  • S. Roberts, J. Winters, Linguistic diversity and traffic accidents: lessons from statistical studies of cultural traits. PLoS ONE 8(8), e70902 (2013)

    Article  Google Scholar 

  • M. Savage, F. Devine, N. Cunningham, M. Taylor, Y. Li, J. Hjellbrekke, B. Le Roux, S. Friedman, A. Miles, A new model of social class? Findings from the BBC’s Great British class survey experiment. Sociology 47(2), 219–250 (2013)

    Article  Google Scholar 

  • E. Segel, J. Heer, Narrative visualization: telling stories with data. IEEE Trans. Vis. Comput. Graph. 16(6), 1139–1148 (2010)

    Article  Google Scholar 

  • H. Wainer, H.L. Zwerling, Evidence that smaller schools do not improve student achievement. Phi Delta Kappan 88(4), 300–303 (2006)

    Article  Google Scholar 

  • C.R. Woese, Bacterial evolution. Microbiol. Rev. 51(2), 221 (1987)

    Google Scholar 

  • G.U. Yule, Notes on the theory of association of attributes in statistics. Biometrika 2(2), 121–134 (1903)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Boris Mirkin .

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Mirkin, B. (2019). Topics in Substance of Data Analysis. In: Core Data Analysis: Summarization, Correlation, and Visualization. Undergraduate Topics in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-030-00271-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00271-8_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00270-1

  • Online ISBN: 978-3-030-00271-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics