Data Science: 6th International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE 2020, Taiyuan, China, September 18-21, 2020, Proceedings, Part I

Data Science is about drawing useful conclusions from large and diverse data sets through exploration, prediction, and inference. Alternatively, Wikipedia defines data science as “a multidisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from structured and unstructured data.” Clinical pharmacology has always been a quantitative discipline making extensive use of different analytical methods to extract knowledge from data, and the rapidly evolving field of data science offers many opportunities for clinical pharmacologists. In this issue of Clinical Pharmacology & Therapeutics (CPT), we explore many of these opportunities and how they can benefit this discipline. This issue is also the first to be fully dedicated to a single topic, where the perspectives, reviews, and the original research articles are all related to the same theme. We also introduce the first tutorials, a new article type for CPT, dedicated to helping readers to understand new areas of science. Data science is impacting the data sources and types that are becoming available, how data are captured and made available to users, and how they are analyzed and interpreted. This issue considers how each of these can be relevant to clinical pharmacologists and clinical development researchers (Figure 1). Traditionally, clinical pharmacology, and indeed much of drug discovery and development, has proceeded through carefully designed experiments and clinical trials that test a hypothesis. Phase III clinical outcome trials evaluating new therapies and vaccines are among the most complex experiments performed in medicine, and a common theme is the difficulty of predicting clinical results in a wider patient base after regulatory approvals. The high cost of clinical trials, low success rates, and potentially reduced efficacy of approved therapies in larger populations can cost healthcare industries, government, and academic research hospitals millions of dollars each year, may drive up costs and delay life-saving treatments to patients, and in some cases lead to adverse events. Data in clinical trials are usually captured to answer the specific question under investigation. Realworld data (RWD) is data collected outside the boundaries of a specific experiment or clinical trial, often for reasons other than scientific hypothesis testing, and is thought of as an important source of additional or novel information


Program Description
The Master of Data Science (Non-Thesis) program is designed to give candidates a foundation in statistics and computer science and also provide knowledge in a particular application domain of science or engineering. The balance between these three elements is a strength of the program and can prepare candidates for Data Science careers in industry, government, or for further study at the PhD level. Throughout is an emphasis on working in teams, creative problem solving, and professional development.
The certificates are designed for college graduates and professionals interested in the emerging field of Data Science.

Master of Data Science (Non-Thesis)
The field of Data Science draws on elements of computer science, statistics and interdisciplinary applications to address the unique needs of gaining knowledge and insight through data analysis. This masters nonthesis program is designed to give candidates a foundation in statistics and computer science and also provide knowledge in a particular application domain of science or engineering. The balance between these three elements is a strength of the program and can prepare candidates for Data Science careers in industry, government, or for further study at the PhD level. Moreover, the coursework will be flexible and tailored to each candidate. For example, the program will allow a candidate to increase his/her skills in data analytics while developing a focused area of application or alternatively allow a candidate with depth in an area of application to gain skills in statistics and computer science. This program will follow a 3 X 3 + 1 design: three modules and a minimodule.

Modules (Each consisting of three 3 credit courses)
•

Mines Combined Undergraduate / Graduate Degree Program
The Master of Science in Data Science (MSDS) program allows students to work on a Bachelor of Science and a Master of Science simultaneously. The MSDS requires 30 credit hours of coursework at the graduate level. Students enrolled within the combined program may choose up to six credits of coursework at the 400 or 500 level to "doublecount"; that is, apply towards both their Bachelor of Science degree requirements and their Master of Science degree requirements. Courses that will be double counted need to be preapproved by the Director of Data Science or their graduate advisor, and must be successfully completed with a grade of a B or better.

Certificate Programs in Data Science
There are five Certificates in Data Science. Applicants for each are required to have an undergraduate degree to be admitted into the Certificate programs. Course prerequisites, if any, are noted for each Certificate program.
Students working toward one of the Data Science Certificates are required to successfully complete 12 credit hours, as detailed below for each Certificate.The courses taken for the Certificates can be used towards a Master's or PhD degree at Mines, however courses used for one Data Science Certificate cannot also be counted toward another Data Science Certificate.

Post Baccalaureate Certificate in Data Science -Foundations (12 credit hours)
The Data Science -Foundations Post Baccalaureate Certificate is an online or residential program focusing on the foundational concepts in statistics and computer science that support the explosion of new methods for interpreting data in its many forms. The Certificate balances an introduction to data science with teaching basic skills in applying methods in statistics and machine learning to analyze data. Students will gain a perspective on the kinds of problems that can be solved by data intensive methods and will also acquire new analysis skills outside of the certificate. Moreover, the coursework will cover a broad range of applications, making it relevant for varied scientific and engineering domains.
Applicants must have completed the following courses, or their equivalents, with a B-or better: CSCI261 and CSCI262 Data Structures, MATH332 Linear Algebra and MATH334 Introduction to Probability.

Post Baccalaureate Certificate in Data Science -Computer Science (12 credit hours)
The Data Science -Computer Science Post Baccalaureate Certificate is an online or residential program focusing on data science concepts within computer science (e.g., computational techniques and machine learning) plus prerequisite knowledge (e.g., probability and regression). The aim of this new certificate is to help students develop an essential skill set in data analytics, including (1) deriving predictive insights by applying advanced statistics, modeling, and programming skills, (2) acquiring indepth knowledge of machine learning and computational techniques, and (3)

Graduate Certificate in Data Science -Statistical Learning (12 credit hours)
The Data Science -Statistical Learning Graduate Certificate is an online or residential program focusing on statistical methods for interpreting complex data sets and quantifying the uncertainty in a data analysis. The Certificate also includes gaining new skills in computer science but is grounded in statistical models for data, also termed statistical learning, rather than algorithmic approaches. Students will develop an essential skill set in statistical methods most commonly used in data science along with the understanding of the methods' strengths and weaknesses. Moreover, the coursework will cover a broad range of applications making it relevant for varied scientific and engineering domains.

Graduate Certificate in Data Science -Earth Resources (12 credit hours)
The Graduate Certificate in Data Science -Earth Resources is an online program building on the foundational concepts in data science as it pertains to managing surface and subsurface Earth resources and on specific applications (use cases) from the petroleum and minerals industries as well as water resource monitoring and remote sensing of Earth change. The Certificate includes one core introductory Data Science course, two courses specific to Earth resources and one elective.

Graduate Certificate in Business Analytics
The certificate is an online or residential program. The requirements are to complete the following three courses: Course substitutions can be approved on a case-by-case basis by the certificate directors. Completing the certificate will also position students to complete either the MS ETM degree or MS in Data Science degree as all the certificate courses can be applied to either degree.

DSCI403. INTRODUCTION TO DATA SCIENCE. 3.0 Semester Hrs.
This course will teach students the core skills needed for gathering, cleaning, organizing, analyzing, interpreting, and visualizing data. Students will learn basic SQL for working with databases, basic Python programming for data manipulation, and the use and application of statistical and machine learning toolkits for data analysis. The course will be primarily focused on applications, with an emphasis on working with real (non-synthetic) datasets. Prerequisite: CSCI101 or CSCI261.
The goal of machine learning is to build computer systems that improve automatically with experience, which has been successfully applied to a variety of application areas, including, for example, gene discovery, financial forecasting, and credit card fraud detection. This introductory course will study both the theoretical properties of machine learning algorithms and their practical applications. Students will have an opportunity to experiment with machine learning techniques and apply them to a selected problem in the context of term projects. Prerequisite: MATH201, MATH332.

DSCI530. STATISTICAL METHODS I. 3.0 Semester Hrs.
Introduction to probability, random variables, and discrete and continuous probability models. Elementary simulation. Data summarization and analysis. Confidence intervals and hypothesis testing for means and variances. Chi square tests. Distribution-free techniques and regression analysis. Prerequisite: MATH213 or equivalent.

DSCI560. INTRODUCTION TO KEY STATISTICAL LEARNING METHODS I. 3.0 Semester Hrs.
Part one of a two-course series introducing statistical learning methods with a focus on conceptual understanding and practical applications. Methods covered will include Introduction to Statistical Learning, Linear Regression, Classification, Resampling Methods, Basis Expansions, Regularization, Model Assessment and Selection.

DSCI561. INTRODUCTION TO KEY STATISTICAL LEARNING METHODS II. 3.0 Semester Hrs.
Part two of a two course series introducing statistical learning methods with a focus on conceptual understanding and practical applications. Methods covered will include Non-linear Models, Tree-based Methods, Support Vector Machines, Neural Networks, Unsupervised Learning.

DSCI575. MACHINE LEARNING. 3.0 Semester Hrs.
The goal of machine learning research is to build computer systems that learn from experience and that adapt to their environments. Machine learning systems do not have to be programmed by humans to solve a problem; instead, they essentially program themselves based on examples of how they should behave, or based on trial and error experience trying to solve the problem. This course will focus on the methods that have proven valuable and successful in practical applications. The course will also contrast the various methods, with the aim of explaining the situations in which each is most appropriate. Prerequisite: CSCI262, MATH201, MATH332.