IPUMS Extract & Analysis Plan

Due by 11:59 PM on Friday, March 13, 2020

AS OF MARCH 12, THIS ASSIGNMENT IS OPTIONAL. MORE DETAILS ON THE GROUP PROJECT TO BE DETERMINED

Submit this as a pdf to moodle. The pdf doesn’t need to be made R - you can make it in Google Docs or Word, just saved as a pdf.

You should essentially add this additional material to your final group proposal. You can either add this at the end or integreate it where appropriate (i.e., when talking about a specific variable).

Content of the Document

Include the following:

Data Extract: What Variables and Samples you Plan to Request from IPUMS

This should be a list of the exact variables - by their variable name in the IPUMS system - you plan to include in a data extract request and the “sample”, or which time period(s) you are including.

Analysis Plan: What do you plan to do with the data once you have it from IPUMS?

This information should include a very clear list of steps or of the process that you expect to undertake to do any necessary data management/wrangling to get the variables in the format you need.

It should specify the steps you plan to take – not to actually take them yet, just thinking through the plan – to investigate the distributions of the variables.

You should have written out the regression models – in equation notation – with all of the variables as they will be included, and (re-)stated which coefficients of the model reflect your the hypothesis.

This last point should make really clear whether you are including variables as a factor (see the suggested HW4 question) or multiple indicator variables, and in either case which level will be the reference group. You will need to spell out the hypothetical regression model with each and every beta estimate that you plan to include.

Again, this does not expect that you have done any of this work yet with your data or in R. It is simply an exercise to get you thinking about what exactly you will need to do next – this keeps you organized and in many cases is an additional opportunity to explicate your thinking about your topic since this is getting even a little closer to actually testing your hypotheses. And for those of you newer to R, it will help articulate where you may want to get additional help for this kind of work (plan to come to the dplyr workshop on Friday 3/27, 2-4pm!).

The next assignment will be a data appendix, where you will illustrate that you have the data in hand and have brought it into R, what of the steps you list above that you have been able to complete and what you have left to do, what unanticipated issues have arisen, and what this exploratory data analysis (not yet having fit a model, just exploring the variables one at a time and with scatterplots and other visuals) is telling you about your main research question.