Basics of Data Literacy
Basics of Data Literacy
)
MAKE SENSE OF DATA
MICHAEL BOWEN
ANTHONY BARTLEY
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
Arlington, Virginia
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
Claire Reinburg, Director
Wendy Rubin, Managing Editor
Andrew Cooke, Senior Editor
Amanda O’Brien, Associate Editor
Amy America, Book Acquisitions Coordinator
NSTA is committed to publishing material that promotes the best in inquiry-based science education. However, conditions of
actual use may vary, and the safety procedures and practices described in this book are intended to serve only as a guide. Additional
precautionary measures may be required. NSTA and the authors do not warrant or represent that the procedures and practices in this
book meet any safety code or standard of federal, state, or local regulations. NSTA and the authors disclaim any liability for personal
injury or damage to property arising out of or relating to the use of this book, including any of the recommendations, instructions,
or materials contained therein.
Permissions
Book purchasers may photocopy, print, or e-mail up to five copies of an NSTA book chapter for personal use only; this
does not include display or promotional use. Elementary, middle, and high school teachers may reproduce forms, sample
documents, and single NSTA book chapters needed for classroom or noncommercial, professional-development use only.
E-book buyers may download files to multiple personal devices but are prohibited from posting the files to third-party servers
or websites, or from passing files to non-buyers. For additional permission to photocopy or use material electronically from
this NSTA Press book, please contact the Copyright Clearance Center (CCC) (www.copyright.com; 978-750-8400). Please access
www.nsta.org/permissions for further information about NSTA’s rights and permissions policies.
Cataloging-in-Publication Data for the e-book are available from the Library of Congress.
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
CONTENTS
Acknowledgments���������������������������������������������������������������������������������������������������������������������������� VII
Foreword����������������������������������������������������������������������������������������������������������������������������������������� Ix
Section II: MORE ADVANCED WAYS OF COLLECTING, SHOWING, AND ANALYZING DATA������������������� 49
Chapter 8: Simple Statistics for Science Teachers: The t- Test, ANOVA Test, and Regression and
Correlation Coefficients��������������������������������������������������������������������������������������������51
Chapter 9: Doing Surveys With Kids: Asking Good Questions, Making Sense of Answers��������������� 63
Chapter 10: Somewhat More Advanced Analysis of Survey Data�������������������������������������������������� 77
Afterword ��������������������������������������������������������������������������������������������������������������������������� 89
Appendix III: Structure and Design of an “Ideal” Graph and Table....................................................��� 117
Resources����������������������������������������������������������������������������������������������������������������������������������������������� 165
Index���������������������������������������������������������������������������������������������������������������������������������������������� 167
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
ACKNOWLEDGMENTS
T
he idea for this book started developing 16 years ago when I was interviewing preservice
teachers and other science program graduates about their interpretations of graphs as
part of my PhD research. As I explored those issues further as a science methods instruc-
tor, I began creating resources and activities on this topic with my friend and colleague
Tony Bartley. We subsequently produced a workshop that we have presented many, many times
at NSTA conferences over the last decade. At those workshops, we gained more insights into the
data literacy issues confronting teachers, in part from the many comments we collected from
participants about resources they could use when doing investigations with their own students.
The content and approach used in this book arose from those observations. At those meetings
and others, Tony and I sat and hashed out the various activities, use of language, and explana-
tions offered here.
My own interests in data and representations of it began during my undergraduate studies
in 1982 when I took a research methods course with Hank Davis in the psychology department
at the University of Guelph. The seed that Hank planted at that time in his long and productive
career may be one that took the longest to flower, but I am glad that his efforts with me in that
most interesting (and somewhat bizarre) class 30 years ago have borne such fruit. Thanks Hank.
Then, my MS supervisor, John Sprague, pushed me into doing multidimensional modeling as
part of my research in behavioral toxicology, and I was fortunate that software tools for the
newly developed personal computers allowed me to play with graphical representations of data
in his laboratory in ways that hadn’t been possible even a few years earlier. That work was
influential during my PhD research in science education because it gave me insights into how
individuals gain competency in working with data. John Haysom, my science methods instruc-
tor (and now author of numerous NSTA Press books), further pushed me to figure out ways to
develop children’s interest in science investigations and, additionally, how to make abstract rela-
tionships more “real” to them by developing hands-on activities to help students experientially
understand those relationships. Finally, my main academic influence has been Wolff-Michael
Roth at the University of Victoria. My PhD work with him helped cast light (for me and others)
on the role of graphical and tabular representations in science and how individuals at various
educational levels gain understandings of (abstract) science concepts from those. He continues to
push boundaries in these areas, and I admire his tenacity at teasing out the details of how under-
standing of science evolves. His academic achievements and his friendship have tremendously
impacted my work on this project.
Finally, I would like to thank the many teachers and students who—through support, com-
ments, advice, suggestions, and participation in hands-on activities with us—have helped gener-
ate this book.
It is my, and our, profound hope that you find this book useful for developing your students’,
and your own, understandings of how to work with data.
Mike Bowen
Halifax, Nova Scotia
vii
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
Acknowledgments
M
y journey in science education started in Liverpool (England!) many
years ago when I completed a postgraduate certificate in Education at
Saint Katherine’s College of Education (now Liverpool Hope Universi-
ty). It was there I realized that most students I would meet in schools had
very different views about their own education and where science fit in. I became a
physics and chemistry teacher, first in Staffordshire, then West London, and finally
in Kent; by extension, I also became a teacher of mathematics because we worked
with data and problem sets.
Many of the people I worked with in England have now retired, but I remember both
their mentorship and collegiality: Geoff Morris of the Ounsdale School (Wombourne),
Hamish Miller of Christ’s School (Richmond-upon-Thames), and Rick Armstrong of
the Eden Valley School (Edenbridge) all deserve a mention, and thanks.
I moved to Canada in 1985 and taught in Victoria, British Columbia, from 1986 to
1989. In 1989 the University of British Columbia beckoned me for a PhD; it was here
that I was lucky enough to work with Gaelen Erickson, Bob Carlisle, Jim Gaskell, and
Dave Bateson as the home faculty members; Peter Fensham, Rosalind Driver, Cam
McRobbie, and Ruth Stavy were notable visitors; and Tony Clarke and René Fountain
were magnificent in their support as they too completed the doctoral journey.
I’m now approaching my 20th year at Lakehead University in Thunder Bay,
Ontario. Mike was here for 5 of these years, and we have an enduring friendship
through a broad range of collaborations in science education and other overlapping
interests. My colleague now and for the last 8 years at Lakehead has been Wayne
Melville, whose support for open inquiry has been consistent and strong; it helps
us both that we have one of the strongest schools on the continent, Sir Winston
Churchill Collegiate and Vocational Institute, just a few miles away. My apprecia-
tion as a university-based educator for the school-based support from the Churchill
science department in learning science through inquiry, and its chair Doug Jones,
cannot be overstated.
I have enjoyed this writing project and have learned a lot as Mike and I worked
through our approaches over the last few years. I hope that this book and its ideas
work for you and your students.
A. W. Bartley
Thunder Bay, Ontario
viii
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
FOREWORD
I
f you are a scientist or a science student, data literacy matters because it helps
you make sense of information you’ve collected in lab investigations. But most
students aren’t going to be scientists, so why should developing data literacy be
important? Isn’t it enough to get them to know science concepts, remember facts
and patterns, and draw graphs on tests? Where else would they encounter scientific
data other than in a laboratory?
Although it may not seem like it, we are surrounded by data. When you open
the newspaper and see a graph or a table as part of an article, what you’re looking
at is data. When you listen to news on the television or radio, what you’re hearing
are conclusions drawn from data someone else has collected. And they’ve collected
that data to understand something, argue a position, make a point, or persuade the
listeners to adopt a particular view. Some of these arguments are better than others
because the data has been collected, analyzed, or summarized more effectively. This
book is about understanding what good data and data analysis is so that you can
make stronger arguments and better evaluate the arguments of others. It’s impor-
tant to realize that everyone has an agenda of some sort, and being more data literate
helps you understand if others are making a fair argument.
Part of being able to take a more informed (some might say skeptical) view of
data is being literate in how data are manipulated and subsequently presented: how
they are collected, made into tables, and shown in pictures or graphs. Once you
know how to do this the right way, such as you might learn in a science classroom,
you can start asking if someone else is doing it in a way that is fair, or if they are
distorting the data for their own purposes.
Data literacy is important for your students even if they aren’t going to be scientists
because data are used to argue and persuade people to, among other things, vote for
political agendas, support specific types of spending within organizations, sell life
insurance, or lease a car. An improved understanding of data practices means that
better questions can be asked in all of these situations.
Even in everyday life, data collection can be important. Bakers often keep diaries
when they’re learning how to bake a new type of biscuit. Gardeners keep a log about
the growth of their gardens, and birdwatchers keep track of where and when they
see what types of birds and what the weather conditions were. Drivers keep track of
vehicle mileage , and homeowners keep track of their electrical bill month to month.
This is all real-life data. We could go on with examples like this forever, but now you
can probably think of some data that you keep track of.
The point is, data literacy is an important skill to develop in students, and science
classrooms are a good place to do that because data collection and interpretation
are part of the science curriculum in most jurisdictions. Almost every teacher has
faced the challenge of helping students make sense of some data set; many times,
that teacher has sat there, scratched his or her head, and wondered how to help
ix
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
FOREWORD
the students make sense of the data they collected. In science, there are some fun-
damental concepts that help scientists make sense of data, particularly the messy
data found in the real world, and yet these fundamentals are infrequently taught
in undergraduate science courses. Teachers who have their students do inquiry lab
investigations can face data analysis challenges, even in a middle school science
class, that exceed what they learned in their college science courses.
Learning about how to analyze and make better sense of data also helps you learn
the best way to collect data. And learning how to collect, summarize, and analyze
data is a very important science skill, central to the newly released Next Generation
Science Standards (NGSS).
Lab investigations used to be pretty simple and straightforward (i.e., “cookbook
labs”): The teacher provided a clear set of instructions; the students all engaged in
the same activity, followed the same procedure, and were marked on getting the
same “correct” answer. Then inquiry investigations came along, and classroom
investigations got a lot more difficult. Many of us teachers didn’t have a background
sufficient for helping our students do those types of inquiry investigation activities.
The contrast between the two different types of lab activities could not be starker
(Table F.1).
TABLE F.1
x
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
FOREWORD
xi
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
FOREWORD
xii
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
FOREWORD
In middle school, students would learn the use and justification of some of the
standard techniques for displaying, analyzing, and interpreting data, including
THE NGSS
The table below is taken from the NGSS (p. 9, Appendix F; NGSS Lead States 2013);
it clearly shows the significance of data literacy and the related progressions that
have been developed at the state standard level (Figure F.1).
FIGURE F.1
xiii
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
FOREWORD
• formulate questions that can be addressed with data and to collect, organize,
and display relevant data to answer them;
• select and use appropriate statistical methods to analyze data;
• develop and evaluate inferences and predictions that are based on data; and
• understand and apply basic concepts of probability.
The above information on science and math standards was obtained directly from
the documents listed in the Reference section (and as such much of the text is a direct
copy from those documents).
References
National Council of Teachers of Mathematics (NCTM). 2000. Principles and standards for
school mathematics. Reston, VA: National Council of Teachers of Mathematics.
National Governors Association Center for Best Practices and Council of Chief State School
Officers (NGAC and CCSSO). 2010. Common core state standards. Washington, DC:
NGAC and CCSSO.
National Research Council (NRC). 2012. A framework for K–12 science education: Practices,
crosscutting concepts, and core ideas. Washington, DC: National Academies Press.
NGSS Lead States. 2013. Next Generation Science Standards: For states, by
states. Washington, DC: National Academies Press. www.nextgenscience.org/
next-generation-science-standards.
xv
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
CHAPTER 8
SIMPLE STATISTICS FOR SCIENCE TEACHERS: THE t -TEST, ANOVA
TEST, AND REGRESSION AND CORRELATION COEFFICIENTS
I
n earlier chapters (Chapters 3 and 4), we are tests for more complicated designs, but
looked at patterns in nominal data using this is a basic introduction, and understanding
bar graphs and ordinal data using line these will help you understand more compli-
graphs. We “eyeballed” differences from cated designs if you need to.
those graphs, looking at the size of the circles In a book like this, we should probably men-
drawn around the tics, and drew conclusions tion why this chapter is here. Statistical tests
that we discussed using hedging language. But don’t seem very basic do they? We agree that
is that what scientists do? they’re not; however, we’ve seen projects by
Actually yes, in the early stages of their grade 7 students (at science fairs) in which they
research or as the research is progressing—but used t-tests and could describe how the tests
it’s not how they write their final reports. Those worked and why they used them. Correctly, we
final reports often contain statistical analyses that should add. We’ve seen ANOVA tests used by
allow the scientist to state with more certainty grade 10 students in the same settings, and by
what differences and patterns they have found grade 12 students as part of inquiry investiga-
in their data. Remember what we mentioned ear- tions in their regular classes. As a teacher you
lier? That science was a probabilistic endeavor? never know when you’re going to have that
Well, part of it being probabilistic is that scientists hyper-keen student in grade 8, so we thought
want to state with as much certainty as possible you might appreciate having some resources to
what the patterns and relationships are that they help you work with them. If nothing else, the
are looking at. Using statistics helps scientists worksheets we provide will give you something
improve the certainty of their statements so they to give them to enhance their learning when
can be as precise as possible. they’ve raced ahead of the rest of the class.
In this chapter, we’re going to look at three Besides that, this chapter might help you under-
basic statistical tests. The first is the t-test, which stand some of those mail-outs from boards of
is used when you have nominal or ordinal data education with statistics in them that most of us
and only two test variables you are comparing have trouble making heads or tails of.
(e.g., the speed of cats and dogs). The second In Appendix IX we provide three resources:
is the analysis of variance (ANOVA) test for Worksheets that do a step-by-step calculation
when you have nominal or ordinal data and of each of these types of statistical analysis,
more than two test variables (e.g., the speed of critical value tables that let you determine if
cats and dogs and pigs). The third statistical there are statistically significant differences,
analysis is correlation and regression analysis, and a worked-through example for each test
which is for interval-ratio data when you are from data used in previous chapters in this
comparing two things you have measured book. We’ll also mention that in the Resource
(e.g., the amount of salt in the pot and how long section (Appendix VI) there are links to
it takes potatoes to cook). These three basic websites that also conduct these tests if you
tests cover most of the types of inquiry studies insert the data into them.1 This chapter is an
we’ve seen grade 7–12 students conduct. There
1. We also intend to provide a resource page with analysis
tools at the NSTA Press website for the book.
51
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
CHAPTER 8
52
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
SIMPLE STATISTICS FOR SCIENCE TEACHERS: THE t -TEST, ANOVA TEST,
AND REGRESSION AND CORRELATION COEFFICIENTS
FIGURE 8.2
Gray White
53
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
CHAPTER 8
TABLE 8.1
Degrees Degrees
Gray White of Critical of Critical
Thermometer Bulb Color freedom value freedom value
4 2.78 15 2.13
Here, you’ll notice that on the left bar data
5 2.57 16 2.12
are not close to the horizontal line at all; the
horizontal line is at the average of two separate 6 2.48 18 2.10
clusters. This condition violates assumption 2 7 2.37 20 2.09
because the data are not normally distributed
8 2.31 22 2.07
(i.e., more toward the horizontal line than
away from it). 9 2.26 24 2.06
The more your data looks like the last two
10 2.23 26 2.06
graphs (and data could look like a combination
of both of them), the less likely it is that the 11 2.20 28 2.05
results of the t-test are reliable. In that situation 12 2.18 30 2.04
you might hedge how you phrased your inter-
13 2.16 40 2.02
pretation of the data analysis. For instance, if
you found a significant difference (as we will 14 2.15 60 2.00
describe below), you could write, “Despite 120 1.98
finding a significant difference between the
mean temperatures for the gray and white
If the t-statistic you calculated is less than the
thermometer, there is still some room for doubt
critical value in the table above (for the correct
because of the amount of variation in the data
degrees of freedom, which you calculate on
for the white thermometer, which was much
the worksheet) then the difference between the
greater than that for the gray thermometer.”
two means is not statistically significant.
54
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
SIMPLE STATISTICS FOR SCIENCE TEACHERS: THE t -TEST, ANOVA TEST,
AND REGRESSION AND CORRELATION COEFFICIENTS
If the calculated t-statistic is greater than the different on a graph, but scientists care about
critical value in the table above (for the cor- how much data scatter there is too, which is why
rect degrees of freedom) then the difference these statistical tests were created! And doing an
between the two means is statistically sig- ANOVA test allows you to be more convincing
nificant at 5%. This means we’re 95% confident when making arguments about your findings to
that the difference between the means is a real others (that’s why scientists do statistical analy-
one (i.e., not due to chance). ses: They remove some of the personal bias that
might influence their interpretations, so they
become more convincing with their claims).
The ANOVA Test
First, it’s important to note that the ANOVA
The ANOVA test is used when you have
test has some conditions that must be met
several different treatments you are testing
(just as the t-test did; in fact, they’re basically
(in other words, more than two treatments).
the same conditions, so you can look at the
Sometimes people do multiple t-tests instead
graphical examples from the t-test if you need
of an ANOVA—this is bad, bad, bad. Very bad.
to). Here are the conditions that must be met to
Ghostbusters bad. Why? Because you consid-
perform an ANOVA test:
erably increase the likelihood that you’ll report
a statistically significant difference when there 1. The data scatter is reasonably the same
is not one. All of those “only 1 in 20 chances of for the two categories (in statistical
being wrong” possibilities add up so it becomes terms, the variation is close to the same).
very likely that you’re wrong. An ANOVA test
stops that from happening. 2. There is more data scatter toward the
Note that the treatments have to be either middle of the circles than at the nearest
nominal- or ordinal-category-type data catego- and farthest points away from the
ries, treatments, or groups, and you have to middle (in statistical terms, the data
have collected measured data about them. has a reasonably normal distribution).
If, for instance, you measured how fast
three different breeds of dogs (with increasing 3. The data are randomly chosen (in
sizes) could run 30 m, then that would be the statistical terms this means you didn’t
type of data you would do an ANOVA test on. choose only data to include so that you
You have ordinal categories, and you have the showed only what you wanted to show).
times it took to cover the distance.
4. The replicates in the treatments need
But why would you?
to be independent of each other (one
You can see the differences in the graph can’t
treatment cannot be influencing
you?
another).
Well, an ANOVA test allows you to figure out
if the differences between the mean times to run We’ve been using the phrase data scatter to
30 m are different enough, given the way the discuss how the raw data spreads out around
data are scattered about the mean, to say with the mean, but the more correct term is variance.
certainty that the breeds of dogs can run at dif- So, an ANOVA test really is—ready for it?—an
ferent speeds. It might seem a bit odd to “test” ANalysis Of VAriance. An ANOVA test ana-
this, because you can see that the means look lyzes the variance around each of the means
55
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
CHAPTER 8
and the overall variance to figure out how spreadsheet. That’s because all of the data are
certain you can be about whether the means on it, not just the average of the times each breed
are different from each other. ran 30 m. Because you can see the data scatter
Okay, so let’s say that we’ve collected data around each average (at the dot where the line
looking at how fast the three different dog is), you can get a bit of an idea about what an
breeds can run a 30 m distance, and we’ve tested ANOVA test does. Basically, it compares the
five different dogs of each breed (Table 8.2). data scatter around each mean and the overall
That data would give us a graph that looks data scatter and where each mean is and figures
like Figure 8.5. out if the means are different from each other.
Essentially, the ANOVA test is analyzing how
TABLE 8.2 much the data you see on the graph overlap in
relation to the total amount of data scatter. If the
Time in seconds that different dog
data do not overlap enough, then the means
breeds can run 30 m
are probably different from each other. If they
Dog Poodle Labrador Doberman overlap a lot, in relation to the total amount of
data scatter, then the means probably are not
1 14 17 8
different from each other. The graph in Figure
2 13 10 9 8.6 might help you picture this.
3 13 16 6
FIGURE 8.6
4 15 8 8
5 17 9 7 The time it takes different breeds of
dogs to run 30, ordered by dog size with
Avg. 14.4 s 12 s 7.6 s an arrow depicting the overall range of
response and the gray area depicting
where the data overlaps for the pairs of
FIGURE 8.5 variables
The time it takes different dog breeds to Time Different Breeds of Dogs Take to Run 30 m
run 30 m, ordered by dog size
# of Seconds to TSravel 30 m
Overlap of Data
Time Different Breeds of Dogs Take to Run 30 m
18
# of Seconds to Travel 30 m
16
Total Range
18 14
of Data
16 12
14 10
12 8
10 6
8 4
6 2
4 0
2 Poodle Labrador Doberman
0
Poodle Labrador Doberman Type of Dog (by Size)
56
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
SIMPLE STATISTICS FOR SCIENCE TEACHERS: THE t -TEST, ANOVA TEST,
AND REGRESSION AND CORRELATION COEFFICIENTS
bit off (as it is with the Labrador) and the test But what if you do find statistical significance
will still be valid. for the whole data set after doing an ANOVA
However, it gets a bit more complicated test? Well, then looking at the graph to help fig-
drawing conclusions from the ANOVA test ure out the paired means is completely valid.
compared to the t-test. In the t-test, there was Let’s look at our data in Figure 8.5 again.
only one pair of data, and we knew that the How would we analyze it? Let’s assume that
significant difference was just between that our ANOVA was significant. Now we have
pair. But in the ANOVA, if the means are found to figure out the differences between pairs of
to be statistically significant from each other data. We should probably look at the amount
overall, we still don’t know which means were of overlap.
statistically significant from each other; we can- When you do this, you note that the lack of
not assume that they all were. In this dog study, overlap of the two gray areas (one drawn across
for instance, we know that there is a statistical covering the poodle data, the other drawn
difference between means (see Appendix IX across covering the Doberman data) probably
for a worked-out example) but not whether means that the average times of poodles and
Dobermans are faster than poodles, Labradors Dobermans are significantly different from
are faster than poodles, or Dobermans are each other. (Note the use of hedging language
faster than Labradors (the three possible pair in that statement? Also note that we haven’t
comparisons). used the word statistically because we don’t
There are tests, called post hoc (meaning know about that specific pair statistically since
after) tests, which can be done for this, but we haven’t run a statistical test.) However,
they’re complicated enough that we’re not there was so much variation in the Labrador
going to include them here.3 data that it’s difficult to draw any strong
That does not, however, mean that you conclusions about the differences in the mean
cannot draw conclusions—we can look at the times of the different breeds. The average times
graph. The first important point is this: if you for the Labradors and the Dobermans were
do an ANOVA test and do not find a statisti- pretty far from each other, and the time data
cally significant result for the whole data set, of each breed only overlapped a little, so the
then it does not matter what the graph looks mean times are quite possibly different from
like—how far apart the means are—because each other (so, significantly different from each
there is no statistically significant difference, and other). However, the time data for the poodles
that result means a whole lot more than any and the Labradors overlapped enough that it’s
eyeballing differences. We’re emphasizing this possible that the means for those breeds are not
point because even undergraduates in science different from each other—or in other words,
have difficulty understanding this. No statisti- that there is no difference between the means
cal significance means, wait for it, waaiiittt for for the poodle and the Labrador dogs. So, from
it … no statistical significance … no difference an inspection of the data scatter on the graph it
between means. Just what it says. Okay? None. would be safe to conclude that
57
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
CHAPTER 8
• Dobermans are possibly faster than worksheet in Appendix IX), then you obtain a
Labradors; and value that tells you the percentage of variation
• poodles might be the same speed as in variable y as explained by variable x (in a
Labradors. causal relationship). The closer this value is to
1 (or 100%, since you often multiply the prod-
Without statistical testing this is a qualitative uct by 100), then the closer the values are to the
determination and therefore hedging language line. In Figure 8.7 (a–c), you see three lines of
is used for all of the pair comparisons.4 best fit with different amounts of data scatter
Again, with an ANOVA analysis these dif- around them.
ferences would have a percentage certainty,
or likelihood of error, associated with them FIGURE 8.7
just like the t-test, and in the tables provided
with the worksheets (see Appendix IX) there Examples of different types of
is a 95% certainty in your answer (of statistical scatterplot relationships
significance of differences in the entire data
set), or a 5% possibility of error rate.
a b c d
Correlation and
Regression Analysis In this example, (a) would have an r-squared
This type of data analysis is done on interval- value close to 100%. At the other extreme, (d), any
ratio measures for which you want to find out if plotted line of best fit would have an r-squared
one factor (or variable) changes when another value close to 0% (in other words, there is no
one does. Basically, when you have a graph of relationship between the two variables). There
data, the regression analysis (or line of best fit are no hard and fast rules in science as to how
analysis) is determining what the best average much of an r-squared value is needed to talk
line is through the data set, and the correlation about relationships between the variables. It
coefficient analysis is a measure of just how can vary quite considerably depending on the
good that average is (i.e., how much the data circumstances. However, the r-squared value
are scattered about that line). This calculation is does give you guidance as to how you should
not a significance test (as the t-test and ANOVA be using hedging language to talk about the
test were), so you’re not determining whether relationship between the variables.
the slope of the line of best fit is significantly Remember, in Appendix IX we provide
different from something else. worksheets for doing t-tests and ANOVA tests
A correlation coefficient is a calculated as well as worked-through examples. Appendix
statistic representing how close the data VIII also demonstrates a t-test analysis.
points are to the line of best fit. If you multiply As a conclusion to this chapter we are going
the correlation coefficient by itself (see the to provide an example of a correlation analysis
in the form of a case study. In this case study,
4. A Tukey’s test at 5% indicates that there is a significant you’ll find a student report on an investigation
difference between the poodle and Doberman means,
but no difference between the poodle-Labrador or
and then a teacher’s feedback on that report.
Labrador-Doberman mean times. This reflects the
broad scatter in the Labrador times.
58
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
SIMPLE STATISTICS FOR SCIENCE TEACHERS: THE t -TEST, ANOVA TEST,
AND REGRESSION AND CORRELATION COEFFICIENTS
59
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
CHAPTER 8
Because I had x-y data, I graphed it in a scat- correlation coefficient without them because
terplot so that I could see any pattern better when I draw the line on the graph from the
(Figure 8.8). regression formula the line seems kind of high.
I would do the study with more guinea pigs if
FIGURE 8.8 I had them because maybe my pig isn’t normal
and doesn’t sleep in the same way as others.
A scatterplot of the data shown in Table
8.3, p. 59
INSTRUCTOR FEEDBACK TO
STUDENT
Comparing Guinea Pig Pellet Eating You did a good job studying your guinea pig
With Sleep
and figuring out the relationship between the
500 amount of food and the amount of sleep. You
450
wrote about it really well. You described how
Minutes Sleep/24 h
400
350 strong the relationship is (shown by your cor-
300 relation coefficient) quite effectively by using
250 the hedging language we’ve talked about in
200
class. You’re right, the relationship between
150
100 the pellet consumption and the amount of
30 50 70 90 sleep each day isn’t a strong one (as indicated
Grams of Pellets by the 23% value), but it is there. We haven’t
talked about this in class, but in physics that
number might be really low, but when you’re
describing animal behaviour and many other
I also calculated the regression formula and
things, a 23% correlation is actually really
correlation coefficient using the worksheets
good. It means you’re predicting 23% of what
you gave us, so I knew how good my calcu-
an animal is doing. I also think you’re right
lated line of best fit represented the data.
by the way, if you excluded those two values
on the top left of your page (who knows why
Regression formula: y = 2.7x + 145
your pig slept longer on those days—maybe
Correlation coefficient: 0.23 or 23% it ate more hay than normal, or maybe it ran
on its wheel more than it normally does) then
CONCLUSION your correlation would be much higher. When
I only have one guinea pig, so I can’t say any- I exclude them and calculate your correlation
thing about all guinea pigs, but I can say that coefficient it jumps up to 53%, and for animal
when mine eats more it seems to sleep longer, behaviour that is really high.
at least reasonably often. The correlation I have a question for you: Why are you sure
coefficient is only 23%, which means that the that it is the pellet consumption that is causing
line doesn’t fit the data really well, but those the amount of sleep? We did talk about the
two high amounts of sleep on the top left of difference between correlation and causation in
the graph might have made it weaker. Maybe class. On the one hand, it does seem reason-
I should calculate the regression line and the able. I’m always sleepy after a big dinner. On
60
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
SIMPLE STATISTICS FOR SCIENCE TEACHERS: THE t -TEST, ANOVA TEST,
AND REGRESSION AND CORRELATION COEFFICIENTS
Grams of Pellets/24 h
they have alarm clocks? What do you think 70
60
wakes them up? So how would we look at this?
50
If it were the length of the day, you’d think 40
your pig would eat more later in your study 30
than earlier and would sleep more later in your 20
10
study than at the beginning. Let’s graph this
0
(Figure 8.9). 0 10 20 30 40
So, do you see that? The food consumption Days
goes up over the 30 days of your study and the
sleep also goes up over those 30 days. So maybe
the amount of daylight is affecting both how Guinea Pig Sleep by Days
much your pig sleeps and how much it eats.
500
This probably means that the amount of pel- 450
lets consumed is correlated with the amount of 400
Sleep/24 h (minutes)
350
sleep, but not the cause of the amount of sleep. 300
Other than missing that (which even I admit 250
200
was pretty tricky), your study and your report 150
were both well done. 100
50
If you want to test whether it was daylight 0
that had an effect, you could do your study 0 10 20 30 40
again in the spring when the length of daylight Days
is getting longer and see if there was a decrease
in the amount of time your guinea pig slept.
Let me know what you find out.
61
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
INDEX
167
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
Index
117–122, 124–125, 131–134, 137, 139, 141 Introduction, in laboratory report, 120
appropriateness of, 122 Investigation reports, discussing data in,
interpretation, 16–17 47–48
predicting from, 15
raw data importance, 14–15
structure, 117–118 Kruskal-Wallis test, 77
student misconceptions, 132
168
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
INDEX
169
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
Index
170
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
INDEX
171
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
“Part of being able to take a more informed (some might say skeptical) view of data is
being literate in how data are manipulated and subsequently presented: how they are collected, made
into tables, and shown in pictures or graphs. Once you know how to do this the right way, such as you
might learn in a science classroom, you can start asking if someone else is doing it in a way that is
fair, or if they are distorting the data for their own purposes.”
—From the foreword to The Basics of Data Literacy
Authors Michael Bowen and Anthony Bartley have long known how important data literacy
is to informed citizens. But after years of leading workshops on data literacy, they saw just
how intimidated teachers can be at the prospect of helping students make sense of data
sets they have collected.
In response, Bowen and Bartley wrote this guide—the ideal book for teachers with
little or no statistics background. With its informal tone and easy-to-grasp examples,
The Basics of Data Literacy teaches you how to help your students collect,
summarize, and analyze data inside and outside the classroom. This book helps
you understand how to make sense of data in a way that
• is conceptually grounded in hands-on practices,
• reflects the ways scientists use and make sense of data, and
• extends the ways of understanding to simple statistical analysis.
PB343X
Grades 6–12 ISBN: 978-1-938946-03-5
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.