top of page

Unit 1:Bivariate Data

Definitions:

Describing scatterplots in the linear form:

Bivariate data: Variables are quantitative

Comparing two variables:

-does one variable cause change to the other   -is there an explanatory-response relationship   

-are there lurking variables

Relationships: a strong association between two variables, the explanatory and response variable can reflect any of several underlying relationships including lurking variable

-causation   -common response   -confounding

Causation: changes x (explanatory) causes changes in y (response)

Lurking variables: a variable that has an important effect on the relationship among the variables in a study but is not included among the variables studied

-may falsely suggest a strong relationship between x&y 

 -may hide a relationship that is there

-can increase variability(spread)in a study

-can create bias

Common response: both x and y respond to changes in some unobserved variable z

-increasing one does not increase the other

Confounding: the effect of y on x is mixed up with the effects on y with another variable z

Association vs Causation: An association between two variables (x,y) even if it is very strong, is not by itself good evidence that changes in x actually cause changes in y

-be cautious in accepting claims of causation

-best evidence of causation comes from an experiment

Coefficient of determination: (r^2) is the fraction of variation in the values of y that is explained by the LSRL and variation in values of x

Residuals: a residual measures the vertical distance from a point in the scatter plot to the LSRL

-residual=y-yhat

​

​

​

​

​

​

​

​

​

​

​

​

Scatterplots: only used to graph data that is used to compare 2 quantitative variables

Graphing scatter plots on paper:

-label both axes with the variable and its units

-label an choose an appropriate scale for both axes

-scale must be consistent for each axis (no gaps)

-scale doesn't need to start at 0

-attempt to use entire space

Describing scatter plots:

Form-Linear, exponential, power

Direction-positive or negative

Strength-how close are the points from the form

-Influential observations:

    -points that follow the general pattern but that are not near the general cluster of the data

    -if the observation (point) is removed the form an/or direction of the model changes drastically

-Regression outlier:

    -a point or points that are far from the model in a vertical direction

    -regression outliers do not have to be influential but usually are

Describing scatterplots in the linear form:

-if the form is linear the direction and strength of the relationship is measured by the correlation coefficent

-correlation coefficient sometimes just called correlation is abbreviated to r

Facts about correlation:

-r does not require an explanatory/response relationship

-r has no units of value

-changing the units on for the explanatory or response variable does not change r

-correlation is usually rounded to 2 decimal places

Correlation examples:

-1<r<1

r=1 perfect positive correlation

r=o no correlation

r=-1 perfect negative correlation

you will rarely see perfect correlation in the real world

Interpreting the correlation coefficient:

-include a descriptor of strength (see chart)

-include a descriptor of direction (positive or negative)

-write the solution in context (explanatory and response variables)

Least Square

Regression Line:

Regression Line:

-requires an explanatory response relationship

-used for predicting y values given a x value

-LSRL always passes through the point (x bar, y bar)

-describes how the response variable (y) changes as the explanatory variable (x) changes

-close connection between correlation and the slope of the LSRL

Equation of Least squares regression line(LSRL)

yhat=a+bx

-x is explanatory variable-actual value

-y is response variable-actual value

-y hat is predicted response variable-not an actual value

-slope: b=r(sy/sx)

-y intercept: a=ybar-b(xbar)

-when you are asked to state the LSRL always define the variables

Interpreting the LSRL:

Slope: as the explanatory variable (x) increase by 1 unit the response variable (y) increases or decreases by the amount of the slope

Y-intercept: explaining what the response variable (y) is when the explanatory variable (x) is 0

Interpreting r and r^2:

Correlation Coefficient (r):comment on strength and direction in context (state both explanatory and response variable)

Coefficient of determination(r^2):the % of change in the response variable (y) that is explained by the change in the explanatory variable (x)

Regression analysis

Is a line the best model?

-does the scatter plot seem to show a linear relationship?

-is the correlation moderate  or strong?

Making a residual plot

1.calculate LSRL

2. calculate the residual for each observation. It is best to make a table to organize your results

3. graohy

bottom of page