6. Problem set 6
Required
1. Log Transformations
Estimate the relationship between Number of Hospitals and Doctors within a sample of counties in the dataset called CountyHealth
. (This dataset is from the Stat2Data
package, just like other problems from the book and you can bring it in like the code we’ve used in class or on previous homework problems (e.g., “Cereal”, or “Perch”))
Make a scatterplot of the relationship between
MDs
as the response andHospitals
as the explanatory variable and comment on what you see.Make a new variable of the log of MDs and make a scatterplot of the relationship between logged version of
MDs
and Hospitals.Fit the regression model of the relationship of the logged version of MDs and Hospitals and interpret the slope in a sentence.
Use the regression model from “c” to predict the number of doctors (not log(MDs)) for 4 counties: one with 2,3,9, and 10 hospitals, respectively. Comment on how the difference in predicted number of doctors between the counties with 2 & 3 and 9 & 10 hospitals values corresponds to the regression slope from “c”.
2. Odds, Probabilities, and Odds Ratios
Below are data from an Intensive Care Unit ICU
, on whether someone Survived (Yes/No) and their age group (Youngest: <50yo, Middle: 50-69, Oldest: 70+).
AgeGroup | Yes | No |
---|---|---|
Youngest | 54 | 5 |
Middle | 60 | 17 |
Oldest | 46 | 18 |
Calculate the odds of surviving in the ICU for those in the youngest, middle, and oldest age groups. Interpret the odds of survival for the oldest group in a sentence.
From the odds in “a”, calculate the probability of survival for each age group. Interpret the probability of survival for the youngest group in a sentence.
From the table above, calculate the proportion of the youngest group who survived and compare this answer to your answer in “b”.
Calculate a ratio of the odds of survival between those who are Middle vs. Youngest, and a ratio for those Oldest vs. Middle Aged. (No need to interpret, just calculate).
Lightly Recommended
These should be considered only for those who want a lot more practice with these ideas and to deepen your understanding beyond what we will need for this class.
Q1e. Does this model actually fit the data well? Try other transformations - of Hospitals, MDs, or both - until you find an arrangement that looks reasonable for a linear regression model; report the results of that model and explain why you think it might be best.
Q2e. Calculate the 95% Confidence Intervals for the Odds Ratios.
Q3. a. Use the Guan paper to calculate the OR and 95% CI for Diabetes and the “Primary Endpoint” (ICU admission, intubation, or death.)
Use this paper out this Tuesday from the U.S. Centers for Disease Control and Prevention (CDC) to calculate the OR and 95% CI for diabetes and ICU admissions (ICU vs. non-ICU) in the U.S.
Are the OR and the 95% CIs in “a” and “b” similar? How do they compare?
What does this tell you about how diabetes is associated with COVID progression in the US compared to China? What doesn’t it tell you, and what else would you want to know?