This is Common Core State Standards Support Video in Mathematics. The standard is 8.SP.1. This standard states: Construct and interpret scatter plots for bivariate measurement data to investigate patterns of association between two quantities. Describe patterns such as clustering, outliers, positive or negative association, linear association, and nonlinear association.
This standard more than likely is the first experience for most students in this component of statistics, and by that we mean, bivariate data. So it’s critical that students’ knowledge is based on a solid understanding of the key terms involved in this standard, and these would include bivariate data, scatter plot, clustering, outlier, positive association, negative association, and nonlinear association. So let’s take a look at these definitions. Bivariate data is data gathered for the purpose of finding the relationship between two variables through simultaneous analysis. There are two variables for each observation. A scatter plot is a two-dimensional graph on a coordinate plane where you would plot the data points and that provides a visual picture of the relationship between the two variables in a set of bivariate data. Clustering is a situation where several data points are grouped together in the same general area of the graph.
An outlier is a data point that’s located far away from most of the data points. A positive association exists when you have data points that exhibit the pattern that when one variable increases the second variable also increases or when one variable decreases so does the other; and because of this association, the relationship will appear linear. A negative association exists when you have data points that exhibit the pattern that when one variable increases the second variable decreases or when one variable decreases the other increases so they do the opposite. Again this association will make the relationship appear linear. A nonlinear association would be one where data points do not follow a linear pattern and are usually exemplified either by a different type of path or widely dispersed data.
So what would be some contexts, what are some examples of bivariate measurement data? Well one context would be where you’re wondering if the number of hours that a student studies has an impact, a relationship, to the test score that they would get on the test they were studying for. Another situation might be, is there a relationship between the age and the height of a person? Another context, ages of married couples—is there a relationship between the age of the husband and the wife? Another context would be, wondering if the weight of the vehicle and the mileage of that vehicle are related. What is the relationship? Another possibility would be, you have a bunch of car accidents and you’re wondering about the level of income of the drivers, if there’s a pattern there. And then another possible context would be, wondering if there’s a relationship between, let’s say, the high or the low daily temperatures of two different cities.
The focus of this standard is to construct and interpret scatter plots for bivariate measurement data. So let’s do a few examples to again see what these patterns look like. Let’s say for example that we’ve tracked the growth of a certain student, a boy from age 2 to age 18. So here’s the data collected; then we do a scatter plot of our data points to see what the relationship might look like. And it definitely appears linear, and it is a positive association, because as the age increased, so did the height; there was growth. It is pretty linear, so on average, it looks like this student grew roughly probably about 2 to 4 inches a year.
Now let’s continue with that same idea, but what about the heights of men that are already adults? So let’s say we go out and we collect data on some men that are in their forties. So we take the data, and we plot it, and here’s what it looks like. So it’s fairly obvious that this is a nonlinear association. There is no line to this; they don’t follow a linear pattern. Now what about outliers? There do appear to be two of them. There’s one here and one here. Let’s see, that’s 61, so this 46-year-old man is only about 5'1", and this 47-year-old man here is 79" tall, which is about 6'7". So those are two examples of outliers, because those two men are shorter and taller than most.
This is just another example to illustrate a nonlinear association. This isn’t one where the data points are just widely dispersed. They do follow a pattern, but it isn’t along a straight line. So this isn’t a linear function; it will actually turn out to be a quadratic, but it is nonlinear.
Let’s take another context, and this is looking for a relationship between the ages of husbands and wives. So we go out, and we do a survey, and this is the data that we collect. We plot our data. We do our scatter plot, and this is what it looks like. So it definitely appears that there is a positive association because if the wife is young, so is the husband, and if the wife is older, so is the husband also. We also have some clustering here. Notice that for whatever reason the couples that were surveyed tended to be pretty much in their late twenties, mostly early thirties, and also in their forties. Now what about these points—this one out here, this one, this one, and these two over here; are those outliers?
One thing that we can do is draw a line, and this line is one that goes through the points that illustrate the age of the wife and the husband being the same, such as 20 and 20, 30 and 30, 40 and 40, and so forth. Now we can have a little bit better idea of what’s going on here. This point here, and these two points here, they do follow the linear path. They look like they were kind of outliers, because they were far away from these other points, but they’re not outliers, so what we need to do is kind of adjust our definition of an outlier a little bit. It’s not so much that they’re far away from the other points. Another condition is that they also have to not follow the pattern, so these three points here are not outliers, because they do follow the same pattern. For whatever reason in this sample, they just weren’t quite close to the ages of the other couples that were surveyed.
Now these two points are definitely outliers. They’re away from the other points, plus they are away from the pattern. They’re away from where the line would be. This point here is one where the wife is 47; the husband is 34. So the wife is quite a bit older than the husband. Over here, we have a situation where the husband is quite a bit older than the wife; he is 58, and she is 27.
Let’s take another context. This is one where we are looking for the relationship between the weight of the vehicle and the gas mileage that it gets. So we go out and we do some research. This is the information that we found. We then plot our data to see what the pattern would be, and so here, we have a negative association. It is pretty linear, but it’s negative because notice that as the weight of the vehicle increased the mileage that it gets got smaller; it decreased. So again this is an example of a negative association; if one variable increases, the other one decreases, or vice versa.
So this standard involves quite a bit of information that is probably new to a lot of students. Again the key is that they understand the definitions of all the terms involved, and then in turn, they know what these terms look like graphically when you do scatter plots on a two-dimensional plane.