Advertisement

Data Exploration, Validation, and Data Sanitization

  • Venkat Reddy Konasani
  • Shailendra Kadre
Chapter

Abstract

Preparing the data for the actual analysis is an important portion of any analytics project. The raw data comes from a variety of sources such as classical relational databases, flat files, spreadsheets, and unstructured data from sources such as social media text. A project may contain both structured and unstructured data, and to add to the complexity, there can be numerous data sources. As you would expect, the data will have a lot of challenges—both in quality and in quantity. An analyst needs to first read the data from its sources, which itself can be a challenging task, and then parse it to be useful for any further analysis. SAS needs data to be in its own datasets before you can use any of its routines for analysis. In short, the raw data is not always ready for the analysis; it needs to be validated and cleaned before the analysis.

Keywords

Credit Card Discrete Variable Monthly Income Data Exploration Frequency Table 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Copyright information

© Venkat Reddy Konasani 2015

Authors and Affiliations

  • Venkat Reddy Konasani
    • 1
  • Shailendra Kadre
    • 1
  1. 1.APIndia

Personalised recommendations