DSCI 345 covers a foundational basis in probability and statistics for work in data science.
This course is aimed at students who have done some work looking at real data (as in DSCI 101/102) and will gain a deeper (and more quantitative) understanding of what we are doing when we make predictions using data. The course covers both tools for modeling randomness and calculating properties of those models, and the process of estimating quantities from data. An important thread throughout the course is on simulating data: being able to construct and simulate from sophisticated models for random data generation.
Lectures
Two 80-minute lectures are delivered each week.
Prerequisites
- MATH 342: Elementary Linear Algebra II
- CS 211: Computer Science II
Learning Outcomes
Upon successful completion of this course each student should be able to:
- describe and appropriately use different probability distributions in simulation and modeling
- choose appropriate likelihood-based loss functions
- produce and visualize compelling and realistic fake datasets
- fit generative models to real data and estimate associate uncertainty
Textbooks and readings
Assigned readings from Probability for Data Science, by Any Adhikari and Jim Pitman (freely available book)
Course Requirements and Grading
Course grades will be based on assignments and a final, totaled as either 85% assignments and 15% final or 70% assignments and 30% final, whichever is greater. Most assignments are weekly homework, due the week after they are assigned. These will be multi-part assignments done using jupyter notebooks, that will contain writing, mathematics, and (python) code.