Question
Posted by:
$60.00 Statistics SPSS questions 1-6 tutorial
Found in Mathematics: StatisticsChapter 1, # 0
Q:Questions 1-2 relate to the Framingham data set and require the use of SPSS.
The Framingham Heart Study is a long term prospective study of the etiology of cardiovascular disease among a population of non-institutionalized people in the community of Framingham, Massachusetts. The Framingham Heart Study was a landmark study in epidemiology in that it was the first prospective study of cardiovascular disease and identified the concept of risk factors and their joint effects. The study began in 1956 and 5,209 subjects were initially enrolled in the study. In our data set, we included variables from the first examination in 1956 and the third examination, in 1968. Clinic examination data has included cardiovascular disease risk factors and markers of disease such as blood pressure, blood chemistry, lung function, smoking history, health behaviors, ECG tracings, Echocardiography, and medication use. Through regular surveillance of area hospitals, participant contact, and death certificates, the Framingham Heart Study reviews and adjudicates events for the occurrence of any of the following types of coronary heart disease (CHD): Angina Pectoris, Myocardial Infarction, Heart Failure, and Cerebrovascular disease.
The associated dataset is a subset of the data collected as part of the Framingham study and includes laboratory, clinic, questionnaire, and adjudicated event data on 400 participants. These participants for the data set have been chosen so that among all male participants, 100 smokers and 100 non-smokers were selected at random. A similar procedure resulted in 100 female smokers and 100 female non-smokers. This procedure resulted in an over-sampling of smokers. The data for each participant is on one row. People who had any type of CHD in the initial examination period are not included in the dataset.
1. The following questions relate to relationships between pairs of variables.
a) Provide the value of the correlation (r) between the initial systolic blood pressure (SYSBP1) and whether or not the person was taking anti-hypertensive (blood pressure) medication (BPMEDS1). Evaluate the strength of the relationship using Cohen’s scale and explain the nature of the relationship between the two variables in context.
b) Given the relationship between the initial systolic blood pressure (SYSBP1) and whether or not the person was taking anti-hypertensive (blood pressure) medication (BPMEDS1) that you found in part (a), is it fair to say that blood pressure medication raise a person’s blood pressure? Explain.
c) Which gender (SEX) was more likely to develop coronary heart disease by the end of the study (ANYCHD4)? Explain and support your answer with appropriate statistic(s).
2. Perform a regression analysis to predict a person’s systolic blood pressure at the first examination (SYSBP1) from his or her initial age (AGE1).
a) Write down the regression equation for predicting SYSBP3 from AGE1.
b) What is the predicted systolic blood pressure at the first examination for a person whose age at the start of the study was 48 years old?
c) Provide the value of the y-intercept. Interpret the value of the y-intercept or indicate why it would not be meaningful to do so in the context of this problem using language that someone who hasn’t taken statistics would understand.
d) Provide the value of the slope and interpret it in the context of this problem using language that someone who hasn’t taken statistics would understand.
What is the residual value for the person with ID = 1? Does the model over- or under-predict the systolic blood pressure for this person?
Questions 3-5 relate to the output provided, you do not need SPSS to answer these questions.
3. The following scatter plot depicts the regression line for predicting the mileage (the number of miles the car has been driven) from the age of the car. Use the graph to answer the following questions.
a) What is the sign (positive or negative) of the slope of the regression line?
b) What is the predicted city mileage of a car that is six years old?
c) What is the make of the car with the largest mileage? Is the residual for that car positive or negative?
4. The clustered bar graph gives the number of cars equipped with anti-lock brakes and the number of cars without (or unknown) for different car colors. Use it to answer the following questions
a) What percentage of black cars is equipped with anti-lock brakes?
b) Is there a relationship between car color and whether or not anti-lock brakes are available? Explain.
c) Can we calculate the Pearson correlation between these two variables? If so, indicate the sign of the correlation. If not, indicate why it would not make sense to do so.
5. The variable AGE, which gives the age of the car in years and ranges from 0 (car less than one year old on the used car market) to 12. Assume that the variable is approximately normally distributed with mean of 4.09 and standard deviation of 2.84. Use the normal distribution summarized in Table 1 (attached to the back of this exam) to estimate:
a) The proportion (or percentage) of cars that are less than 4 years old.
b) The proportion (or percentage) of cars that are less than 6 years old.
c) The proportion (or percentage) of cars between 4 and 6 years old.
d) The AGE score which has 30% of scores below it.
6. Answer the following questions about the Importance of Family Dinners II Report. The Importance of Family Dinners report may be found at http://www.casafamilyday.org/PDFs/FamilyDinnersII.pdf
a) Look at the results presented in Figure 2.A. What do you suppose is the sign (positive or negative) of the correlation between number of family dinners per week and teen substance-abuse risk? Explain.
b) It has been concluded from this article that increasing the number of family dinners will decrease teen substance-abuse risk, improve grades, and so on. Can this conclusion be made based on the data?
Table 1. Areas under the standard normal curve (to the right of the z-score)
[For example, the area to the right of z = 1.96 is found by decomposing 1.96 as 1.9 + .06 and referencing the entry in the row labeled 1.9 and the column labeled .06. This right-tailed area is .0250. To obtain the area to the left, calculate 1 – .0250. The left-tailed area equals .9750. Because the normal curve is symmetric about its mean, the area to the left of z = -1.96 is equal to the area to the right of z = 1.96.]



