Chapter

Big Data Analytics in Genomics

pp 145-167

Date:

Genomic Applications of the Neyman–Pearson Classification Paradigm

  • Jingyi Jessica LiAffiliated withDepartment of Statistics, University of California, Los Angeles Email author 
  • , Xin TongAffiliated withDepartment of Data Sciences and Operations, University of Southern California

* Final gross prices may vary according to local VAT.

Get Access

Abstract

The Neyman–Pearson (NP) classification paradigm addresses an important binary classification problem where users want to minimize type II error while controlling type I error under some specified level α, usually a small number. This problem is often faced in many genomic applications involving binary classification tasks. The terminology Neyman–Pearson classification paradigm arises from its connection to the Neyman–Pearson paradigm in hypothesis testing. The NP paradigm is applicable when one type of error (e.g., type I error) is far more important than the other type (e.g., type II error), and users have a specific target bound for the former. In this chapter, we review the NP classification literature, with a focus on the genomic applications as well as our contribution to the NP classification theory and algorithms. We also provide simulation examples and a genomic case study to demonstrate how to use the NP classification algorithm in practice.

Keywords

Classification Genomic applications Neyman–Pearson Statistical learning Methodology