Unit 2 Quiz - Exploring Two-Variable Data

Quiz 1

What is the primary purpose of a scatterplot?

A) To show the distribution of a single variable
B) To display the relationship between two variables
C) To summarize categorical data
D) To calculate the mean of a dataset

Which of the following indicates a positive correlation?

A) As one variable increases, the other decreases
B) As one variable increases, the other increases
C) No relationship between variables
D) Both variables decrease

What does the correlation coefficient measure?

A) The slope of the regression line
B) The strength and direction of a linear relationship
C) The mean of the dataset
D) The spread of the data

If the correlation coefficient is -0.85, what can be inferred?

A) Strong positive linear relationship
B) Weak positive linear relationship
C) Strong negative linear relationship
D) No linear relationship

What is the range of the correlation coefficient?

A) 0 to 1
B) -1 to 1
C) -1 to 0
D) -2 to 2

Which of the following is NOT a characteristic of the least squares regression line?

A) Minimizes the sum of squared residuals
B) Passes through the mean of x and y
C) Always passes through the origin
D) Used to predict y from x

What does a residual plot help to determine?

A) The strength of the correlation
B) The appropriateness of the linear model
C) The mean of the residuals
D) The slope of the regression line

In a residual plot, what pattern indicates a good fit for a linear model?

A) A clear pattern
B) Random scatter
C) A parabolic shape
D) A horizontal line

What does a high R-squared value indicate?

A) The model explains a large portion of the variability in the response variable
B) The model is not a good fit
C) There is no relationship between the variables
D) The residuals are large

Which of the following is true about outliers in a scatterplot?

A) They always strengthen the correlation
B) They can have a large effect on the correlation and regression line
C) They have no effect on the analysis
D) They are always errors in data collection

What is the purpose of transforming data in regression analysis?

A) To make the data more linear
B) To increase the correlation coefficient
C) To decrease the number of outliers
D) To change the mean of the dataset

Which transformation is commonly used to linearize exponential data?

A) Logarithmic transformation
B) Square root transformation
C) Reciprocal transformation
D) Exponential transformation

What does the slope of the regression line represent?

A) The change in y for a one-unit change in x
B) The mean of the y-values
C) The correlation coefficient
D) The sum of squared residuals

If the slope of the regression line is zero, what does this indicate?

A) Perfect positive correlation
B) Perfect negative correlation
C) No linear relationship
D) Strong linear relationship

What is the intercept of the regression line?

A) The value of y when x is zero
B) The value of x when y is zero
C) The mean of the x-values
D) The correlation coefficient

Which of the following is true about extrapolation?

A) It is always accurate
B) It involves predicting outside the range of the data
C) It is the same as interpolation
D) It decreases the correlation coefficient

What is the effect of a lurking variable?

A) It strengthens the observed relationship
B) It weakens the observed relationship
C) It can create a false impression of a relationship
D) It has no effect on the analysis

Which of the following is an example of a categorical variable?

A) Height
B) Weight
C) Gender
D) Age

What is the purpose of a contingency table?

A) To display the relationship between two categorical variables
B) To calculate the mean of a dataset
C) To show the distribution of a single variable
D) To display the relationship between two quantitative variables

What does a chi-square test of independence assess?

A) The strength of a linear relationship
B) The independence of two categorical variables
C) The mean of a dataset
D) The spread of the data

Which of the following is true about Simpsons paradox?

A) It occurs when the direction of an association reverses when data are combined
B) It strengthens the observed relationship
C) It has no effect on the analysis
D) It is the same as a lurking variable

What is the purpose of a scatterplot matrix?

A) To display the relationship between multiple pairs of variables
B) To calculate the mean of a dataset
C) To show the distribution of a single variable
D) To display the relationship between two categorical variables

Which of the following is NOT a measure of association for categorical data?

A) Chi-square statistic
B) Correlation coefficient
C) Cramrs V
D) Phi coefficient

What does a mosaic plot display?

A) The relationship between two categorical variables

B) The distribution of a single variable

C) The relationship between two quantitative variables

D) The mean of a dataset

Which of the following is true about the least squares regression line?

A) It minimizes the sum of absolute residuals

B) It minimizes the sum of squared residuals

C) It always passes through the origin

D) It is used to predict x from y

Quiz 2

1. What type of graph is typically used to display the relationship between two quantitative variables?

- a) Histogram

- b) Boxplot

- c) Scatterplot

- d) Bar graph

2. In a scatterplot, the variable plotted along the horizontal axis is usually the:

- a) Dependent variable

- b) Explanatory variable

- c) Response variable

- d) Random variable

3. Which of the following describes a positive association between two variables?

- a) As one variable increases, the other decreases.

- b) As one variable increases, the other increases.

- c) There is no consistent pattern between the two variables.

- d) As one variable decreases, the other decreases.

4. If the correlation coefficient \( r \) is close to 1, it indicates:

- a) A weak positive linear relationship

- b) A strong positive linear relationship

- c) A strong negative linear relationship

- d) No linear relationship

5. What is the range of possible values for the correlation coefficient \( r \)?

- a) 0 to 1

- b) -1 to 0

- c) -1 to 1

- d) -∞ to ∞

6. Which of the following is NOT a characteristic of the correlation coefficient \( r \)?

- a) It is sensitive to outliers.

- b) It measures the strength and direction of a linear relationship.

- c) It requires both variables to be quantitative.

- d) It changes if the units of measurement are changed.

7. The least-squares regression line minimizes the:

- a) Sum of absolute residuals

- b) Sum of squared residuals

- c) Sum of residuals

- d) Sum of the values of \( y \)

8. If a residual plot shows a random scatter of points, what does this suggest about the model?

- a) The linear model is appropriate.

- b) The linear model is not appropriate.

- c) The residuals are not independent.

- d) The data is not linear.

9. Which of the following statements is true about influential points in regression analysis?

- a) Influential points always have large residuals.

- b) Influential points are always outliers in the x-direction.

- c) Removing an influential point significantly changes the slope of the regression line.

- d) Influential points always lie close to the regression line.

10. What does the coefficient of determination \( r^2 \) represent?

- a) The slope of the regression line.

- b) The percentage of variation in the response variable explained by the explanatory variable.

- c) The intercept of the regression line.

- d) The correlation between the residuals and the explanatory variable.

11. When interpreting the slope of a regression line, it tells us:

- a) The predicted change in the response variable for a one-unit increase in the explanatory variable.

- b) The strength of the relationship between two variables.

- c) The percentage of variation explained by the model.

- d) The predicted value of the response variable when the explanatory variable is zero.

12. Which of the following is a potential problem when using regression analysis?

- a) Non-linear relationships

- b) Outliers

- c) High-leverage points

- d) All of the above

13. Extrapolation refers to:

- a) Predicting values within the range of the data.

- b) Predicting values outside the range of the data.

- c) The process of calculating residuals.

- d) The slope of the regression line.

14. Which of the following is true about the correlation and causation?

- a) A high correlation implies causation.

- b) A low correlation rules out causation.

- c) Correlation implies causation if \( r^2 \) is high.

- d) Correlation does not imply causation.

15. If two variables have a correlation close to zero, it means:

- a) There is a strong linear relationship.

- b) There is no linear relationship.

- c) There is a strong non-linear relationship.

- d) The variables are not related in any way.

Quiz 3

1. What does a scatterplot with points closely clustered around a straight line indicate?

- a) No relationship between variables

- b) A weak linear relationship

- c) A strong linear relationship

- d) A perfect linear relationship

2. If the correlation between two variables is \( r = -0.85 \), what can be said about their relationship?

- a) Strong positive linear relationship

- b) Weak positive linear relationship

- c) Strong negative linear relationship

- d) No linear relationship

3. Which of the following is a reason to use a logarithmic transformation on data?

- a) To create a linear relationship between variables

- b) To increase the correlation coefficient

- c) To eliminate outliers

- d) To decrease the spread of the data

4. An outlier in a scatterplot can:

- a) Have a small effect on the correlation coefficient

- b) Have no effect on the regression line

- c) Have a large effect on the correlation coefficient

- d) Improve the fit of the regression line

5. The slope of a least-squares regression line is interpreted as:

- a) The amount the explanatory variable increases when the response variable increases by one unit

- b) The predicted change in the explanatory variable for a one-unit change in the response variable

- c) The predicted change in the response variable for a one-unit change in the explanatory variable

- d) The percentage change in the response variable for a one-unit change in the explanatory variable

6. Which of the following would suggest that a linear model is not appropriate?

- a) A high correlation coefficient

- b) A random scatter of points in the residual plot

- c) A clear pattern in the residual plot

- d) A slope near zero in the regression line

7. What does it mean if the residual for a particular data point is negative?

- a) The actual value is greater than the predicted value.

- b) The actual value is less than the predicted value.

- c) The data point is an outlier.

- d) The model underestimates the actual value.

8. A data point with high leverage:

- a) Lies far from the mean of the explanatory variable

- b) Lies far from the mean of the response variable

- c) Has a large residual

- d) Always increases the slope of the regression line

9. What is the effect of adding a constant to all values of a dataset on the correlation coefficient \( r \)?

- a) It increases \( r \).

- b) It decreases \( r \).

- c) \( r \) remains unchanged.

- d) It makes \( r \) zero.

10. Which statement is true about the y-intercept of a least-squares regression line?

- a) It is always meaningful.

- b) It has no interpretation.

- c) It represents the predicted value of the response variable when the explanatory variable is zero.

- d) It is the average of all y-values.

11. Which of the following is NOT true about influential points?

- a) They have large residuals.

- b) They can drastically change the slope of the regression line.

- c) They are usually outliers in the x-direction.

- d) Removing them can significantly alter the correlation coefficient.

12. The correlation coefficient \( r \) is not resistant to:

- a) Non-linearity

- b) Outliers

- c) Changes in the units of measurement

- d) Shifts in the center of the data

13. Which plot is useful for identifying outliers in a linear regression model?

- a) Scatterplot

- b) Residual plot

- c) Histogram

- d) Boxplot

14. If the correlation between two variables is close to zero, it means:

- a) There is no relationship between the variables.

- b) There is no linear relationship between the variables.

- c) The variables are independent.

- d) The variables are dependent.

15. Which of the following can indicate a non-linear relationship between two variables?

- a) A scatterplot that shows a curved pattern

- b) A correlation coefficient close to 1

- c) A perfectly horizontal line in a scatterplot

- d) A random scatter in the residual plot

16. A high \( r^2 \) value in a regression model indicates:

- a) The model explains a large proportion of the variation in the response variable.

- b) The model has a high correlation coefficient.

- c) The model is a perfect fit.

- d) The slope of the regression line is positive.

17. Which of the following scenarios would most likely result in extrapolation?

- a) Predicting values far outside the range of the explanatory variable.

- b) Predicting values within the range of the data.

- c) Using the residual plot to make predictions.

- d) Calculating the y-intercept of the regression line.

18. The term "bivariate data" refers to:

- a) Data involving two quantitative variables

- b) Data involving two categorical variables

- c) Data involving one quantitative and one categorical variable

- d) Data involving more than two variables

19. In a linear regression, if the residuals show a pattern, this suggests:

- a) The linear model is appropriate.

- b) The linear model is not appropriate.

- c) The correlation coefficient is near zero.

- d) There are no influential points.

20. Which of the following is true about the effect of an outlier on a regression analysis?

- a) An outlier always decreases the correlation coefficient.

- b) An outlier can significantly change the slope and y-intercept of the regression line.

- c) An outlier has no effect if it lies on the regression line.

- d) An outlier always increases the correlation coefficient.