DSCI 411

This course provides a student the opportunity to apply the theoretical knowledge and techniques acquired during the Data Science degree curriculum to a project involving real data from the student’s domain of specialization. During the project, the student will participate in the entirety of the post-acquisition data science workflow: establishing an hypothesis of information buried in the data, preparing the raw data for analysis, applying suitable and appropriate analysis techniques to the data, establishing the statistical significance of the results obtained, and suitably visualizing the results of the project for consumption by relevant stakeholders.

Each student is advised by an individual from the specialization domain and another individual from the DSCI program. It is possible, even desirable, for the project to extend work done by the student during a summer internship between junior and senior years; if so, an industrial adviser from the company at which the internship was done is also included in the project team.

The raw data to be processed will be provided by the domain advisor and/or the industrial advisor. The problem statement can be specified by the advising team; a student will be able to extend or modify the problem statement with the agreement of the advising team.

Contact Hours

The advising team and student will schedule a weekly 30-minute progress meeting. The domain and DSCI advisors will each additionally offer weekly office hours during which the student can seek assistance.

Schedule

The student will prepare a slide presentation describing progress to date to be delivered during the advising team meeting in week 5. A second slide presentation, describing the results of the project will be delivered during the advising team meeting in week 10. A report on the project will be submitted to the advising team for grading by 5pm on the Friday of week 10.

Prerequisites

  • All courses completed to date in the DSCI curriculum must yield a GPA ≥ 3.75.

Learning Outcomes

Upon successful completion of the course, students will:

  • Demonstrate the ability to carry out a data science project in your domain of specialization from end to end.
  • Demonstrate proficiency in preparation and walk through of a presentation.
  • Demonstrate the ability to carry out a literature search and summarize the state of the art.
  • Demonstrate the ability to translate the project objectives into a realistic work plan.
  • Demonstrate the ability to design and implement required software using tools such as R, Pandas, SciPy, and traditional programming languages such as Python and C.
  • Demonstrate the ability to professionally present the project plan and results.

Course Requirements and Grading

Grading will be based on the following criteria:

Percentage Component
10 Week 5 presentation
10 Week 10 presentation
20 Weekly engagement
60 Final report

A letter grade will be assigned to the project based on this input.

Grading Scale

  +   -
A 96.67-100.0 93.34-96.66 90.00-93.33
B 86.67-89.99 83.34-86.66 80.00-83.33
C 76.67-79.99 73.34-76.66 70.00-73.33
D 66.67-69.99 63.34-66.66 60.00-63.33
F   0.00-59.99  

Completed Capstones

Covariance Changepoint Detection for Time Series Data by Sabrina Reis

Glucose Prediction During Physical Activity for Type 2 Diabetics on Non-Insulin Intensive Therapies by Lindsey Uribe