Outlier Detection in Categorical, Text and Mixed Attribute Data

Aggarwal, Charu C.

doi:10.1007/978-1-4614-6396-2_7

Charu C. Aggarwal²

7596 Accesses
3 Citations

Abstract

A significant number of attributes in real data sets are not numerical, but have categorical values. For example, while demographic data may contain quantitative attributes such as the age, many other attributes such as sex and zip code are categorical. Data collected from surveys may often contain responses to multiple-choice questions, which are categorical. Similarly, many kinds of data such as the names of people and entities, IP-addresses and URLs are inherently discrete in nature. In many cases, categorical and numerical data are found in the same data set, as different attributes. This is referred to as mixed-attribute data. Mixed data is quite challenging to address because of the difficulties in appropriately weighting the importance of the different attributes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Author information

Authors and Affiliations

IBM T.J. Watson Research Center, Yorktown Heights, New York, USA
Charu C. Aggarwal

Authors

Charu C. Aggarwal
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Aggarwal, C.C. (2013). Outlier Detection in Categorical, Text and Mixed Attribute Data. In: Outlier Analysis. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6396-2_7

Download citation

DOI: https://doi.org/10.1007/978-1-4614-6396-2_7
Published: 24 December 2012
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-6395-5
Online ISBN: 978-1-4614-6396-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics