The final project assignment will be officially released after Project 2 is due on Friday, November 21 at 11:59PM PT. In anticipation of this, we would like you to start thinking about a rough sketch of your group’s plans for the final project. The final project will be due on Wednesday, December 17 at 11:59PM PT. Barring any extenuating circumstances, groups will be the same as the Project 2 groups.
Final Project Rough Guidelines¶
The final project is a free-form report-style analysis where you will investigate some questions of interest on a dataset of interest that you can find online. Implementing sophisticated or novel statistical models is not a requirement of the project (if you’d like to, go for it!), but you should nonetheless conduct an organized, detailed, and thorough anaylsis of your question. Most importantly, your project should exemplify the workflows and practices that we have been developing throughout the semester.
Here are a couple of cool projects from the Spring 2023 version of this course to give you an idea of what’s expected.
Note that in the above examples, their websites are built using JupyterBook 1, which is an older version of MyST/JupyterBook 2 that we have been using, so some of the build files in their repos may look different. However, all these projects share some common features: organized presentation, basic data cleaning, data analysis with exploratory plots, using simple statistical models/tools (logistic regression, ARIMA, linear regression etc.), and use of tools we have discussed in this class (environments, Makefiles, packages, tests, etc.)
Proposal Guidelines¶
The proposal process for the project is relatively informal: we just want to make sure your group is on the same page and has a reasonably fleshed out idea in advance of the project release. Please discuss with your group and be able to answer the following questions. Ideally, this should be done by November 21 so we can review/give feedback as soon as possible. However, it’s fine if it comes a few days late; just know that delaying might eat into your available time to work on the project. You are of course welcome to make your proposals sooner.
We will be dedicating lab on November 21 as a space for you to meet with your groups and hash out these details. You are welcome to discuss your group’s proposal either in-person with your lab TA during this session (preferred so we can give live feedback), or via email/office hours. After your lab TA OK’s your proposal, your group can start the project.
Dataset
What dataset will you be working with? How big is it, and are you able to download it? Is it small enough to host in GitHub?
Questions and Answers
What question(s) do you intend to explore/ask with your dataset? Roughly speaking, what strategies do you anticipate using to answer these questions (any statistical models, plotting, etc.)?
Expected Challenges
What challenges do you expect in carrying out your analysis (computational, data storage, etc.)? What plans do you have in-place to manage these challenges as they arise?