Abstract
This is an introductory chapter in which
-
(i)
The goals of core data analysis as a tool helping to enhance and augment knowledge of the domain are outlined. Since knowledge is represented by the concepts and statements of relation between them, two main pathways for data analysis are summarization, for developing and augmenting concepts, and correlation, for enhancing and establishing relations.
-
(ii)
A set of eight cases involving small datasets and related data analysis problems is presented. The datasets are taken from various fields such as monitoring market towns, computer security protocols, bioinformatics, and cognitive psychology.
-
(iii)
An overview of data visualization, its goals and some techniques, is given.
-
(iv)
A general view of strengths and pitfalls of data analysis is provided.
-
(v)
An overview of the concept of classification as a soft knowledge structure widely used in theory and practice is given.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
M. Berthold, D. Hand, Intelligent Data Analysis (Springer, New York, 2003)
S.K. Card, J.D. Mackinlay, B. Shneiderman, Readings in Information Visualization: Using Vision to Think (Morgan Kaufmann Publishers, San Francisco, CA, 1999). ISBN 1-55860-533-9
R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, 2nd edn. (Wiley-Interscience, 2012). ISBN 0-471-05669-3
J.F. Hair, W.C. Black, B.J. Babin, R.E. Anderson, Multivariate Data Analysis, 7th edn. (Prentice Hall, 2010). ISBN-10: 0-13-813263-1
S.S. Haykin, Neural Networks, 2nd edn. (Prentice Hall, 1999). ISBN 0132733501
C. Kendig (ed.), Natural Kinds and Classification in Scientific Practice (Routledge, Oxford, 2015). ISBN 9781848935402
J. Kepler, Harmonies of the World (originally 1619, translated to English by C.G. Wallis, 1939) (Global Grey Publisher, 2014). (E-publication http://www.24grammata.com/wp-content/uploads/2014/08/Kepler-Harmonies-Of-The-World-24grammata.pdf. Accessed 11 June 2017)
E.V. Koonin, The Logic of Chance: the Nature and Origin of Biological Evolution (FT Press Science, 2011)
S.D. Levitt, S.J. Dubner, Freakonomics (William Morrow, New York, 2005). See also a free extension in about 300 episodes in voice and print: http://freakonomics.com/archive/. Accessed 17 June 2017
K. Libbrecht, The Snowflake: Winter’s Secret Beauty (Voyageur Press, 2004)
R. Mazza, Introduction to Information Visualization (Springer, New York, 2009). ISBN: 978-1-84800-218-0
B. Mirkin, Methods for Grouping in SocioEconomic Research (Finansy I Statistika Publishers, Moscow, 1985). (in Russian)
B. Mirkin, Mathematical Classification and Clustering (Kluwer Academic Press, 1996)
B. Mirkin, Clustering: a Data Recovery Approach (Chapman & Hall/CRC, 2012). ISBN: 1-4398-3841-9
S.R. Ranganathan, Colon Classification (Ess Ess Publications, 2006). ISBN-10: 8170004608
H.H. Sisler, Electronic Structure, Properties, and the Periodic Law (Van Nostrand Reinhold Company, 1973)
G. Standing, The Precariat—the New Dangerous Class (Bloomsbury Academic, London, 2011)
J.W. Tukey, Exploratory Data Analysis (Addison-Wesley, Reading MA, 1977)
C. Ware, Information Visualization: Perception for Design, 3rd edn. (Elsevier, 2012). ISBN: 978-0-12-381464-7
Articles
J.G. Adair, The Hawthorne effect: a reconsideration of the methodological artifact. J. Appl. Psychol. 69(2), 334–345 (1984)
H. Brody, M.R. Rip, P. Vinten-Johansen, N. Paneth, S. Rachman, Map-making and myth-making in Broad Street: the London cholera epidemic, 1854. Lancet 356(9223), 64–68 (2000)
W.S. Cleveland, Graphical methods for data presentation: full scale breaks, dot charts, and multibased logging. Am. Stat. 38, 270–280 (1984)
R. Fisher, The use of multiple measurements in taxonomic problems. Ann. Eugenics 7, 179–188 (1936)
S. Henikoff, J. Henikoff, Amino acid substitution matrices from protein blocks. PNAS USA 89(22), 10915–10919 (1992)
M.I. Jordan, T.M. Mitchell, Machine learning: trends, perspectives, prospects. Science 349(6245), 255–260 (2015)
G. Keren, S. Baggen, Recognition models of alphanumeric characters, Perception and Psychophysics, 29(3), 234–246 (1981)
Y. LeCun, Obstacles on the path to AI (2015), https://drive.google.com/file/d/0BxKBnD5y2M8NbWN6XzM5UXkwNDA/view?pli=1. Accessed 6 Feb 2018
B. Lee, N.H. Riche, P. Isenberg, S. Carpendale, More than telling a story: a closer look at the process of transforming data into visually shared stories. IEEE Comput. Graph. Appl. 35(5), 84–90 (2015)
S. Machlis, 22 free tools for data visualization and analysis (2017) https://www.computerworld.com/article/2507728/enterprise-applications/enterprise-applications-22-free-tools-for-data-visualization-and-analysis.html?page=1. Accessed 14 Jan 2018
B. Mirkin, Summary and semi-average similarity criteria for individual clusters, in Models, Algorithms, and Technologies for Network Analysis, ed. by B. Goldengorin, V. Kalyagin, P. Pardalos (Springer, New York, 2013), pp. 101–126
Y. Qin, H.A. Simon, Laboratory replication of scientific discovery processes. Cogn. Sci. 14(2), 281–312 (1990)
R. Rao, S.K.Card, The table lens: merging graphical and symbolic representations in an interactive focus+ context visualization for tabular information. In Proceedings of the ACM SIGCHI conference on Human factors in computing systems, pp. 318–322 (1994)
S. Roberts, J. Winters, Linguistic diversity and traffic accidents: lessons from statistical studies of cultural traits. PLoS ONE 8(8), e70902 (2013)
M. Savage, F. Devine, N. Cunningham, M. Taylor, Y. Li, J. Hjellbrekke, B. Le Roux, S. Friedman, A. Miles, A new model of social class? Findings from the BBC’s Great British class survey experiment. Sociology 47(2), 219–250 (2013)
E. Segel, J. Heer, Narrative visualization: telling stories with data. IEEE Trans. Vis. Comput. Graph. 16(6), 1139–1148 (2010)
H. Wainer, H.L. Zwerling, Evidence that smaller schools do not improve student achievement. Phi Delta Kappan 88(4), 300–303 (2006)
C.R. Woese, Bacterial evolution. Microbiol. Rev. 51(2), 221 (1987)
G.U. Yule, Notes on the theory of association of attributes in statistics. Biometrika 2(2), 121–134 (1903)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Mirkin, B. (2019). Topics in Substance of Data Analysis. In: Core Data Analysis: Summarization, Correlation, and Visualization. Undergraduate Topics in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-030-00271-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-00271-8_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00270-1
Online ISBN: 978-3-030-00271-8
eBook Packages: Computer ScienceComputer Science (R0)