DSCI 345 : Probability and Statistics for Data Science | School of Computer and Data Sciences

School of Computer and Data Sciences Menu

DSCI 345 covers a foundational basis in probability and statistics for work in data science.

This course is aimed at students who have done some work looking at real data (as in DSCI 101/102) and will gain a deeper (and more quantitative) understanding of what we are doing when we make predictions using data. The course covers both tools for modeling randomness and calculating properties of those models, and the process of estimating quantities from data. An important thread throughout the course is on simulating data: being able to construct and simulate from sophisticated models for random data generation.

Lectures

Two 80-minute lectures are delivered each week.

Prerequisites

MATH 342: Elementary Linear Algebra II
CS 211: Computer Science II

Learning Outcomes

Upon successful completion of this course each student should be able to:

describe and appropriately use different probability distributions in simulation and modeling
choose appropriate likelihood-based loss functions
produce and visualize compelling and realistic fake datasets
fit generative models to real data and estimate associate uncertainty

Textbooks and readings

Assigned readings from Probability for Data Science, by Any Adhikari and Jim Pitman (freely available book)

Course Requirements and Grading

Course grades will be based on assignments and a final, totaled as either 85% assignments and 15% final or 70% assignments and 30% final, whichever is greater. Most assignments are weekly homework, due the week after they are assigned. These will be multi-part assignments done using jupyter notebooks, that will contain writing, mathematics, and (python) code.