Unit 1:Bivariate Data
Definitions:
Describing scatterplots in the linear form:
Bivariate data: Variables are quantitative
Comparing two variables:
-does one variable cause change to the other -is there an explanatory-response relationship
-are there lurking variables
Relationships: a strong association between two variables, the explanatory and response variable can reflect any of several underlying relationships including lurking variable
-causation -common response -confounding
Causation: changes x (explanatory) causes changes in y (response)
Lurking variables: a variable that has an important effect on the relationship among the variables in a study but is not included among the variables studied
-may falsely suggest a strong relationship between x&y
-may hide a relationship that is there
-can increase variability(spread)in a study
-can create bias
Common response: both x and y respond to changes in some unobserved variable z
-increasing one does not increase the other
Confounding: the effect of y on x is mixed up with the effects on y with another variable z
Association vs Causation: An association between two variables (x,y) even if it is very strong, is not by itself good evidence that changes in x actually cause changes in y
-be cautious in accepting claims of causation
-best evidence of causation comes from an experiment
Coefficient of determination: (r^2) is the fraction of variation in the values of y that is explained by the LSRL and variation in values of x
Residuals: a residual measures the vertical distance from a point in the scatter plot to the LSRL
-residual=y-yhat
​
​
​
​
​
​
​
​
​
​
​
​
Scatterplots: only used to graph data that is used to compare 2 quantitative variables
Graphing scatter plots on paper:
-label both axes with the variable and its units
-label an choose an appropriate scale for both axes
-scale must be consistent for each axis (no gaps)
-scale doesn't need to start at 0
-attempt to use entire space
Describing scatter plots:
Form-Linear, exponential, power
Direction-positive or negative
Strength-how close are the points from the form
-Influential observations:
-points that follow the general pattern but that are not near the general cluster of the data
-if the observation (point) is removed the form an/or direction of the model changes drastically
-Regression outlier:
-a point or points that are far from the model in a vertical direction
-regression outliers do not have to be influential but usually are
Describing scatterplots in the linear form:
-if the form is linear the direction and strength of the relationship is measured by the correlation coefficent
-correlation coefficient sometimes just called correlation is abbreviated to r
Facts about correlation:
-r does not require an explanatory/response relationship
-r has no units of value
-changing the units on for the explanatory or response variable does not change r
-correlation is usually rounded to 2 decimal places
Correlation examples:
-1<r<1
r=1 perfect positive correlation
r=o no correlation
r=-1 perfect negative correlation
you will rarely see perfect correlation in the real world
Interpreting the correlation coefficient:
-include a descriptor of strength (see chart)
-include a descriptor of direction (positive or negative)
-write the solution in context (explanatory and response variables)
Least Square
Regression Line:
Regression Line:
-requires an explanatory response relationship
-used for predicting y values given a x value
-LSRL always passes through the point (x bar, y bar)
-describes how the response variable (y) changes as the explanatory variable (x) changes
-close connection between correlation and the slope of the LSRL
Equation of Least squares regression line(LSRL)
yhat=a+bx
-x is explanatory variable-actual value
-y is response variable-actual value
-y hat is predicted response variable-not an actual value
-slope: b=r(sy/sx)
-y intercept: a=ybar-b(xbar)
-when you are asked to state the LSRL always define the variables
Interpreting the LSRL:
Slope: as the explanatory variable (x) increase by 1 unit the response variable (y) increases or decreases by the amount of the slope
Y-intercept: explaining what the response variable (y) is when the explanatory variable (x) is 0
Interpreting r and r^2:
Correlation Coefficient (r):comment on strength and direction in context (state both explanatory and response variable)
Coefficient of determination(r^2):the % of change in the response variable (y) that is explained by the change in the explanatory variable (x)
Regression analysis
Is a line the best model?
-does the scatter plot seem to show a linear relationship?
-is the correlation moderate or strong?
Making a residual plot
1.calculate LSRL
2. calculate the residual for each observation. It is best to make a table to organize your results
3. graohy