In DSCI 311, students will explore intermediate and advanced techniques in data science. This course prepares students to successfully apply computational and statistical techniques to upper-division coursework in data science as well as quantitative, data-driven courses in other domains or subject areas. Topics include managing data with software programs, data cleaning, handling text, dimensionality, principal components analysis, regression, classification and inference. Ethical concerns resulting from use of the techniques in this course will be addressed.
This course is primarily intended for data science majors, with others able to register if prerequisites are met.
Lectures and Labs
Two 80-minute lectures are delivered each week. Mandatory attendance at 50 minute-lab each week is also required.
Prerequisites
- DSCI 102 – Foundations of Data Science II
- CS 211 – Computer Science II
- Math 252 – Calculus II
- Math 342 – Elementary Linear Algebra II
Learning Outcomes
Upon successful completion of this course each student should be able to:
- Efficiently manage large data sets using advanced functions in the Pandas library.
- Clean data, including data sets that contain text, and prepare it for statistical analysis.
- Perform basic database entry and operations using SQL.
- Understand the difference between classification and regression, and be able to predictively model both continuous and categorical variables.
- Apply resampling methods in general, and bootstrap resampling in particular.
- Be able to enumerate ethical concerns resulting from use of the techniques in this course.
Textbooks and readings
- Sam Lau, Joey Gonzalez, and Deb Nolan. Principles and Techniques of Data Science.
- Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. An Introduction to Statistical Learning.
- Various PDF files in Canvas/Readings
Course Requirements and Grading
This course will be taught as two 80-minute live lectures and one 50-minute lab each week. Aside from the required textbook, all course materials (project assignments, lab assignments, tutorial videos, sample exam material) will be available from Canvas. We will be using Slack for asynchronous questions and answers.
Grading will be based on the following criteria:
Percentage | Component |
---|---|
32 | Homework (8 x 4% each) |
8 | Lab attendance and submission (8 x 1% each) |
20 | Course Project 1 (2 x 10%) |
16 | Midterm exam |
24 | Final exam |
Grading Scale
+ | - | ||
---|---|---|---|
A | 96.67-100.0 | 93.34-96.66 | 90.00-93.33 |
B | 86.67-89.99 | 83.34-86.66 | 80.00-83.33 |
C | 76.67-79.99 | 73.34-76.66 | 70.00-73.33 |
D | 66.67-69.99 | 63.34-66.66 | 60.00-63.33 |
F | 0.00-59.99 |