1. For decades it’s been suspected that schizophrenia involves anatomical abnormalities in the hippocampus, an area of the brain involved with memory. The following data bearing on this issue are from Suddath et al. (1990) and were used by Ramsey and Schafer (3rd ed., 2013, p. 31. Display 2.2). The researchers obtained MRI measurements of the volume of the left hippocampus from 15 pairs of identical twins discordant for schizophrenia, i.e, one the twin is affected with schizophrenia The data are displayed in the following table.
i. use `mutate()` to create a new variable `diff` which is the difference of the MRI measurements of each pair.
ii. use the pipe `%>%` operator add the new variable `diff` as column to `schizophrenia`.
iii. use `summarise` to compute the average difference of the MRI measurements. Use the pipe `%>%` operator to string multiple functions.
iv. use `summarise` to compute the standard deviation of the difference of the MRI measurements. Use the pipe `%>%` operator to string multiple functions.
v. based on your answers in (iii) and (iv), do you think there is evidence in favor of the initial hypothesis there is difference in the MRI measure
ments of the volume of the left hippocampus between those affected and unaffected with schizophrenia?
2. The Behavioral Risk Factor Surveillance System (BRFSS) is an annual telephone survey run by the [Centers of Disease Control](http://www.cdc.gov/brfss) in the United States. The BRFSS is designed to identify risk factors in the adult population and report emerging health trends. For example, respondents are asked about their diet and weekly physical activity, their HIV/AIDS status, possible tobacco use, and even their level of healthcare coverage.
i) How many variables are present in this data set? For each variable, identify its data type (e.g. categorical, continuous).
ii) Use `summarise` to compute the average of the variable `weight`.
iii) Use `group_by` to group the rows by `exerany` (exercise any). Repeat part (ii) on this grouped data. Comment on what you observe in the average weights between groups. Use the pipe `%>%` operator to string multiple functions.
> Do not print the data frame (too long, 20K rows), just print the average weights.
iv) Repeat part (iii) but now use the grouping variables `smoke100` and `gender`. Comment on what you observe in the average weights between groups. Use the pipe `%>%` operator to string multiple functions.
v) Obtain a random sample of 1000 rows and save this into `cdc.samp1`.
vi) Repeat parts (ii) to (v) but only using the subset data `cdc.samp1`. Use the pipe `%>%` operator to string multiple functions.
vii) Comment on what differences you observed (if any) between the results in the original sample and the smaller sample?
Use `sample()` to generate rolls from biased coin with $Pr(Head) = 0.6$ .
i) get a sample of size 10 tosses and tally the results
ii) get a sample of size 30 tosses and tally the results
iii) get a sample of size 100 tosses and tally the results
iv) what do you notice with the proportion of heads in each sample?