Overview

Authors:

Boris Mirkin ⁰

Boris Mirkin
1. Department of Data Analysis and Artificial Intelligence, Faculty of Computer Science, National Research University Higher School of Economics, Moscow, Russia
View author publications

You can also search for this author in PubMed Google Scholar

Focuses on the encoder-decoder interpretation of summarization methods, such as Principal Component Analysis and K-means clustering
Supplies an in-depth description of K-means partitioning including a data-driven mathematical theory
Covers novel topics such as Google PageRank ranking and Consensus clustering as interlaced within the general framework
Includes a multitude of worked examples, case studies and questions (with answers)

Part of the book series: Undergraduate Topics in Computer Science (UTICS)

12k Accesses
11 Citations
6 Altmetric

This is a preview of subscription content, log in via an institution to check access.

Access this book

eBook USD 54.99

Price excludes VAT (USA)

Softcover Book USD 69.99

Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Other ways to access

Licence this eBook for your library

Institutional subscriptions

Table of contents (5 chapters)

Front Matter

Pages i-xv

Download chapter PDF
Topics in Substance of Data Analysis
- Boris Mirkin
Pages 1-75
Quantitative Summarization
- Boris Mirkin
Pages 77-161
Learning Correlations
- Boris Mirkin
Pages 163-292
Core Partitioning: K-means and Similarity Clustering
- Boris Mirkin
Pages 293-403
Divisive and Separate Cluster Structures
- Boris Mirkin
Pages 405-475
Back Matter

Pages 477-524

Download chapter PDF

Keywords

About this book

This text examines the goals of data analysis with respect to enhancing knowledge, and identifies data summarization and correlation analysis as the core issues. Data summarization, both quantitative and categorical, is treated within the encoder-decoder paradigm bringing forward a number of mathematically supported insights into the methods and relations between them. Two Chapters describe methods for categorical summarization: partitioning, divisive clustering and separate cluster finding and another explain the methods for quantitative summarization, Principal Component Analysis and PageRank.

Features:

· An in-depth presentation of K-means partitioning including a corresponding Pythagorean decomposition of the data scatter.

· Advice regarding such issues as clustering of categorical and mixed scale data, similarity and network data, interpretation aids, anomalous clusters, the number of clusters, etc.

· Thorough attention to data-driven modelling including a number of mathematically stated relations between statistical and geometrical concepts including those between goodness-of-fit criteria for decision trees and data standardization, similarity and consensus clustering, modularity clustering and uniform partitioning.

New edition highlights:

· Inclusion of ranking issues such as Google PageRank, linear stratification and tied rankings median, consensus clustering, semi-average clustering, one-cluster clustering

· Restructured to make the logics more straightforward and sections self-contained

Core Data Analysis: Summarization, Correlation and Visualization is aimed at those who are eager to participate in developing the field as well as appealing to novices and practitioners.

Reviews

“This book provides a clear overview of the data analysis process, the different types of statistical techniques employed for data analysis, and their role and purpose. … There is good use of a variety of examples to demonstrate how the different techniques are applied in practice. The book’s main purpose would be as a textbook for undergraduate students, or a reference book for data analysts.” (Mark Taylor, Computing Reviews, May 5, 2022)

Authors and Affiliations

Department of Data Analysis and Artificial Intelligence, Faculty of Computer Science, National Research University Higher School of Economics, Moscow, Russia

Boris Mirkin

About the author

Boris Mirkin holds a PhD in Computer Science (Mathematics) and DSc in Systems Analysis (Technology) degrees from Russian Universities. Between 1991-2010, he had long-term visiting appointments in France, Germany, USA, and a teaching appointment as a Professor of Computer Science at Birkbeck University of London, UK (2000-2010).

He develops methods for clustering and interpretation of complex data within the “data recovery” perspective. Currently these approaches are being extended to automation of text analysis problems including the development and use of hierarchical ontologies. He has published a hundred refereed papers and a dozen books, of which the latest are: "Clustering: A Data Recovery Approach" (Chapman and Hall/CRC Press, 2012) and a textbook "Introductory Data Analysis" (In Russian, URAIT Publishers, Moscow, 2016).

Bibliographic Information

Book Title: Core Data Analysis: Summarization, Correlation, and Visualization
Authors: Boris Mirkin
Series Title: Undergraduate Topics in Computer Science
DOI: https://doi.org/10.1007/978-3-030-00271-8
Publisher: Springer Cham
eBook Packages: Computer Science, Computer Science (R0)
Copyright Information: Springer Nature Switzerland AG 2019
Softcover ISBN: 978-3-030-00270-1Published: 18 April 2019
eBook ISBN: 978-3-030-00271-8Published: 15 April 2019
Series ISSN: 1863-7310
Series E-ISSN: 2197-1781
Edition Number: 2
Number of Pages: XV, 524
Number of Illustrations: 107 b/w illustrations, 80 illustrations in colour
Topics: Data Structures, Systems and Data Security, Data Mining and Knowledge Discovery, Math Applications in Computer Science

Publish with us

Policies and ethics

Core Data Analysis: Summarization, Correlation, and Visualization

Overview

Access this book

Other ways to access

Table of contents (5 chapters)

Front Matter

Topics in Substance of Data Analysis

Quantitative Summarization

Learning Correlations

Core Partitioning: K-means and Similarity Clustering

Divisive and Separate Cluster Structures

Back Matter