100% found this document useful (2 votes)
746 views33 pages

Basics of Data Literacy

Statistics for beginners

Uploaded by

JULIE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
746 views33 pages

Basics of Data Literacy

Statistics for beginners

Uploaded by

JULIE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

HELPING YOUR STUDENTS (AND YOU!

)
MAKE SENSE OF DATA
MICHAEL BOWEN
ANTHONY BARTLEY

Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
Arlington, Virginia

Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
Claire Reinburg, Director
Wendy Rubin, Managing Editor
Andrew Cooke, Senior Editor
Amanda O’Brien, Associate Editor
Amy America, Book Acquisitions Coordinator

Art and Design


Will Thomas Jr., Director
Rashad Muhammad, Graphic Designer

Printing and Production


Catherine Lorrain, Director

National Science Teachers Association


David L. Evans, Executive Director
David Beacom, Publisher

1840 Wilson Blvd., Arlington, VA 22201


www.nsta.org/store
For customer service inquiries, please call 800-277-5300.

Copyright © 2014 by the National Science Teachers Association.


All rights reserved. Printed in the United States of America.
17 16 15 14   4 3 2 1

NSTA is committed to publishing material that promotes the best in inquiry-based science education. However, conditions of
actual use may vary, and the safety procedures and practices described in this book are intended to serve only as a guide. Additional
precautionary measures may be required. NSTA and the authors do not warrant or represent that the procedures and practices in this
book meet any safety code or standard of federal, state, or local regulations. NSTA and the authors disclaim any liability for personal
injury or damage to property arising out of or relating to the use of this book, including any of the recommendations, instructions,
or materials contained therein.

Permissions
Book purchasers may photocopy, print, or e-mail up to five copies of an NSTA book chapter for personal use only; this
does not include display or promotional use. Elementary, middle, and high school teachers may reproduce forms, sample
documents, and single NSTA book chapters needed for classroom or noncommercial, professional-development use only.
E-book buyers may download files to multiple personal devices but are prohibited from posting the files to third-party servers
or websites, or from passing files to non-buyers. For additional permission to photocopy or use material electronically from
this NSTA Press book, please contact the Copyright Clearance Center (CCC) (www.copyright.com; 978-750-8400). Please access
www.nsta.org/permissions for further information about NSTA’s rights and permissions policies.

Library of Congress Cataloging-in-Publication Data


Bowen, Michael, 1962-
The basics of data literacy : helping your students (and you!) make better sense of data / Michael Bowen, Anthony Bartley.
pages cm
Includes bibliographical references and index.
ISBN 978-1-938946-03-5 -- ISBN 978-1-938946-76-9 (e-book) 1. Science--Study and teaching. 2. Mathematics--Study and
teaching. 3. Information literacy--Study and teaching. 4. Graphic methods. 5. Science--Tables. 6. Mathematics--Tables. I.
Bartley, Anthony, 1950- II. Title.
Q181.B7216 2013
001.4071--dc23
2013028904

Cataloging-in-Publication Data for the e-book are available from the Library of Congress.

Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
CONTENTS
Acknowledgments���������������������������������������������������������������������������������������������������������������������������� VII
Foreword����������������������������������������������������������������������������������������������������������������������������������������� Ix

Section I: FUNDAMENTALS OF SHOWING, ANALYZING, AND DISCUSSING DATA���������������������������������� 1


Chapter 1: Introduction: Data and Science����������������������������������������������������������������������������������� 3
Chapter 2: An Introduction to Understanding Types of Variables and Data������������������������������������13
Chapter 3: Data in Categories: Nominal-Level Data���������������������������������������������������������������������19
Chapter 4: Data in Ordered Categories: Ordinal-Level Data��������������������������������������������������������� 25
Chapter 5: Measured Data: Interval-Ratio-Level Data������������������������������������������������������������������31
Chapter 6: Structuring and Interpreting Data Tables����������������������������������������������������������������� 41
Chapter 7: How Scientists Discuss Their Data��������������������������������������������������������������������������� 45

Section II: MORE ADVANCED WAYS OF COLLECTING, SHOWING, AND ANALYZING DATA������������������� 49
Chapter 8: Simple Statistics for Science Teachers: The t- Test, ANOVA Test, and Regression and
Correlation Coefficients��������������������������������������������������������������������������������������������51
Chapter 9: Doing Surveys With Kids: Asking Good Questions, Making Sense of Answers��������������� 63
Chapter 10: Somewhat More Advanced Analysis of Survey Data�������������������������������������������������� 77

Afterword ��������������������������������������������������������������������������������������������������������������������������� 89

Appendix I: Class Worksheets for Marble-Rolling Activities����������������������������������������������������������������91

Appendix II: Other Scaffolded Investigation Activities�������������������������������������������������������������������������� 99

Appendix III: Structure and Design of an “Ideal” Graph and Table....................................................��� 117

Appendix IV: Ideas for Evaluating Laboratory Reports������������������������������������������������������������������������ 119

Appendix V: Concept Maps and Vee Maps����������������������������������������������������������������������������������������� 123

Appendix VI: Web Resources������������������������������������������������������������������������������������������������������������� 131

Appendix VII: An Introduction to Data Management From a Mathematics Perspective


by Eva Knoll��������������������������������������������������������������������������������������������������������������� 135
Appendix VIII: t -Test of Example Data�����������������������������������������������������������������������������������������������139

Appendix IX: Worksheets for Statistical Analysis������������������������������������������������������������������������������ 143

Resources����������������������������������������������������������������������������������������������������������������������������������������������� 165
Index���������������������������������������������������������������������������������������������������������������������������������������������� 167

Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
ACKNOWLEDGMENTS
T
he idea for this book started developing 16 years ago when I was interviewing preservice
teachers and other science program graduates about their interpretations of graphs as
part of my PhD research. As I explored those issues further as a science methods instruc-
tor, I began creating resources and activities on this topic with my friend and colleague
Tony Bartley. We subsequently produced a workshop that we have presented many, many times
at NSTA conferences over the last decade. At those workshops, we gained more insights into the
data literacy issues confronting teachers, in part from the many comments we collected from
participants about resources they could use when doing investigations with their own students.
The content and approach used in this book arose from those observations. At those meetings
and others, Tony and I sat and hashed out the various activities, use of language, and explana-
tions offered here.
My own interests in data and representations of it began during my undergraduate studies
in 1982 when I took a research methods course with Hank Davis in the psychology department
at the University of Guelph. The seed that Hank planted at that time in his long and productive
career may be one that took the longest to flower, but I am glad that his efforts with me in that
most interesting (and somewhat bizarre) class 30 years ago have borne such fruit. Thanks Hank.
Then, my MS supervisor, John Sprague, pushed me into doing multidimensional modeling as
part of my research in behavioral toxicology, and I was fortunate that software tools for the
newly developed personal computers allowed me to play with graphical representations of data
in his laboratory in ways that hadn’t been possible even a few years earlier. That work was
influential during my PhD research in science education because it gave me insights into how
individuals gain competency in working with data. John Haysom, my science methods instruc-
tor (and now author of numerous NSTA Press books), further pushed me to figure out ways to
develop children’s interest in science investigations and, additionally, how to make abstract rela-
tionships more “real” to them by developing hands-on activities to help students experientially
understand those relationships. Finally, my main academic influence has been Wolff-Michael
Roth at the University of Victoria. My PhD work with him helped cast light (for me and others)
on the role of graphical and tabular representations in science and how individuals at various
educational levels gain understandings of (abstract) science concepts from those. He continues to
push boundaries in these areas, and I admire his tenacity at teasing out the details of how under-
standing of science evolves. His academic achievements and his friendship have tremendously
impacted my work on this project.
Finally, I would like to thank the many teachers and students who—through support, com-
ments, advice, suggestions, and participation in hands-on activities with us—have helped gener-
ate this book.
It is my, and our, profound hope that you find this book useful for developing your students’,
and your own, understandings of how to work with data.

Mike Bowen
Halifax, Nova Scotia

vii
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
Acknowledgments

M
y journey in science education started in Liverpool (England!) many
years ago when I completed a postgraduate certificate in Education at
Saint Katherine’s College of Education (now Liverpool Hope Universi-
ty). It was there I realized that most students I would meet in schools had
very different views about their own education and where science fit in. I became a
physics and chemistry teacher, first in Staffordshire, then West London, and finally
in Kent; by extension, I also became a teacher of mathematics because we worked
with data and problem sets.
Many of the people I worked with in England have now retired, but I remember both
their mentorship and collegiality: Geoff Morris of the Ounsdale School (Wombourne),
Hamish Miller of Christ’s School (Richmond-upon-Thames), and Rick Armstrong of
the Eden Valley School (Edenbridge) all deserve a mention, and thanks.
I moved to Canada in 1985 and taught in Victoria, British Columbia, from 1986 to
1989. In 1989 the University of British Columbia beckoned me for a PhD; it was here
that I was lucky enough to work with Gaelen Erickson, Bob Carlisle, Jim Gaskell, and
Dave Bateson as the home faculty members; Peter Fensham, Rosalind Driver, Cam
McRobbie, and Ruth Stavy were notable visitors; and Tony Clarke and René Fountain
were magnificent in their support as they too completed the doctoral journey.
I’m now approaching my 20th year at Lakehead University in Thunder Bay,
Ontario. Mike was here for 5 of these years, and we have an enduring friendship
through a broad range of collaborations in science education and other overlapping
interests. My colleague now and for the last 8 years at Lakehead has been Wayne
Melville, whose support for open inquiry has been consistent and strong; it helps
us both that we have one of the strongest schools on the continent, Sir Winston
Churchill Collegiate and Vocational Institute, just a few miles away. My apprecia-
tion as a university-based educator for the school-based support from the Churchill
science department in learning science through inquiry, and its chair Doug Jones,
cannot be overstated.
I have enjoyed this writing project and have learned a lot as Mike and I worked
through our approaches over the last few years. I hope that this book and its ideas
work for you and your students.

A. W. Bartley
Thunder Bay, Ontario

viii
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
FOREWORD

I
f you are a scientist or a science student, data literacy matters because it helps
you make sense of information you’ve collected in lab investigations. But most
students aren’t going to be scientists, so why should developing data literacy be
important? Isn’t it enough to get them to know science concepts, remember facts
and patterns, and draw graphs on tests? Where else would they encounter scientific
data other than in a laboratory?
Although it may not seem like it, we are surrounded by data. When you open
the newspaper and see a graph or a table as part of an article, what you’re looking
at is data. When you listen to news on the television or radio, what you’re hearing
are conclusions drawn from data someone else has collected. And they’ve collected
that data to understand something, argue a position, make a point, or persuade the
listeners to adopt a particular view. Some of these arguments are better than others
because the data has been collected, analyzed, or summarized more effectively. This
book is about understanding what good data and data analysis is so that you can
make stronger arguments and better evaluate the arguments of others. It’s impor-
tant to realize that everyone has an agenda of some sort, and being more data literate
helps you understand if others are making a fair argument.
Part of being able to take a more informed (some might say skeptical) view of
data is being literate in how data are manipulated and subsequently presented: how
they are collected, made into tables, and shown in pictures or graphs. Once you
know how to do this the right way, such as you might learn in a science classroom,
you can start asking if someone else is doing it in a way that is fair, or if they are
distorting the data for their own purposes.
Data literacy is important for your students even if they aren’t going to be scientists
because data are used to argue and persuade people to, among other things, vote for
political agendas, support specific types of spending within organizations, sell life
insurance, or lease a car. An improved understanding of data practices means that
better questions can be asked in all of these situations.
Even in everyday life, data collection can be important. Bakers often keep diaries
when they’re learning how to bake a new type of biscuit. Gardeners keep a log about
the growth of their gardens, and birdwatchers keep track of where and when they
see what types of birds and what the weather conditions were. Drivers keep track of
vehicle mileage , and homeowners keep track of their electrical bill month to month.
This is all real-life data. We could go on with examples like this forever, but now you
can probably think of some data that you keep track of.
The point is, data literacy is an important skill to develop in students, and science
classrooms are a good place to do that because data collection and interpretation
are part of the science curriculum in most jurisdictions. Almost every teacher has
faced the challenge of helping students make sense of some data set; many times,
that teacher has sat there, scratched his or her head, and wondered how to help

ix
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
FOREWORD

the students make sense of the data they collected. In science, there are some fun-
damental concepts that help scientists make sense of data, particularly the messy
data found in the real world, and yet these fundamentals are infrequently taught
in undergraduate science courses. Teachers who have their students do inquiry lab
investigations can face data analysis challenges, even in a middle school science
class, that exceed what they learned in their college science courses.
Learning about how to analyze and make better sense of data also helps you learn
the best way to collect data. And learning how to collect, summarize, and analyze
data is a very important science skill, central to the newly released Next Generation
Science Standards (NGSS).
Lab investigations used to be pretty simple and straightforward (i.e., “cookbook
labs”): The teacher provided a clear set of instructions; the students all engaged in
the same activity, followed the same procedure, and were marked on getting the
same “correct” answer. Then inquiry investigations came along, and classroom
investigations got a lot more difficult. Many of us teachers didn’t have a background
sufficient for helping our students do those types of inquiry investigation activities.
The contrast between the two different types of lab activities could not be starker
(Table F.1).

TABLE F.1

Comparing traditional laboratory activities with inquiry-based science


investigations

Traditional, structured, Inquiry-based science


laboratory activities investigations

Basis of learning behaviorist constructivist


Curricular goals product-oriented (i.e., process-oriented (with some
everyone gets the same product)
answer)
Role of student following directions problem solver/arguer
Student participation passive/receptive active
Student ownership of lower higher
project
Student involvement lower responsibility higher responsibility
Role of teacher director/transmitter guide/facilitator

x
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
FOREWORD

As every teacher understands, supporting students who are doing laboratory


investigations of the student-directed and open-ended type (such as those in the
Inquiry-Based Science column would usually be) is a considerable challenge and
can require a lot more background knowledge than undergraduate teaching pro-
grams often provide. Some teacher preparation programs have specific courses that
deal with doing inquiry, thus allowing student teachers to learn the basics of data
literacy, but many do not.
What we (the authors) realized some years ago is that the challenge in encourag-
ing teachers to do inquiry investigations exists in part because of aspects of data
collection, analysis, synthesis, and presentation that teachers of science often just
do not know. Nor, as far as we could tell, are there good resources geared toward
helping them learn the material in a way that would be useful for their students. To
address this, we developed and have presented a workshop on data literacy at the
national NSTA conferences for the last several years. The workshop has been quite
popular, but what we have since realized is that a more comprehensive resource,
building on the workshop, would be useful for science teachers. This book grew out
of that realization. We’ve tried to write it so that it is pretty approachable by using
a minimum of technical language. And we’ve tried to use examples that relate to
classrooms and the types of data collection activities that teachers have students do.
We hope you find it useful in helping your classes become more data literate.

Who is this book for?


• Teachers who need to read government and school board documents that
present data in tables or graphs will find most chapters useful to read over
to help their understanding of those documents.
• Teachers of lower elementary grades (whenever they start students
interpreting bar charts or histograms) will find the early chapters useful.
• Middle school teachers will find the first eight chapters helpful.
• High school teachers will benefit from reading the entire book, and in
particular the later chapters if they have advanced students who need to be
challenged with more complex work.
• Individuals working on a graduate degree that involves data collection will
find this a good introduction to any research methods course they might
need to take.

The appendixes provide laboratory investigation activities (Appendixes I and II)


to help you teach these data representation and analysis concepts to your students
at various grade levels. In addition, there are appendixes to help you evaluate the
laboratory activities your students have handed in (Appendixes III and IV) as well

xi
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
FOREWORD

as a collection of data analysis worksheets with examples of quantitative data analy-


sis (Appendixes VIII and IX) that can be used by upper-level students to help them
conduct more detailed analyses of data they’ve collected in lab investigations.

Connections to the Framework and


the Standards
Our work in writing this book took place at the same time as the development of
the NGSS in the United States. The guiding document for the NGSS—A Framework
for K–12 Science Education: Practices, Crosscutting Concepts, and Core Ideas (Framework;
NRC 2012)—sets out eight scientific and engineering practices, of which Analyzing
and Interpreting Data is the fourth on the list. The Framework identifies the grade 12
goals for analyzing and interpreting data as follows:

• Analyze data systematically, either to look for salient patterns or to test


whether data are consistent with an initial hypothesis.
• Recognize when data are in conflict with expectations and consider what
revisions in the initial model are needed.
• Use spreadsheets, databases, tables, charts, graphs, statistics, mathematics,
and information and computer technology to collate, summarize, and
display data and to explore relationships between variables, especially those
representing input and output.
• Evaluate the strength of a conclusion that can be inferred from any data set,
using appropriate grade-level mathematical and statistical techniques.
• Recognize patterns in data that suggest relationships worth investigating
further. Distinguish between causal and correlational relationships.
• Collect data from physical models and analyze the performance of a design
under a range of conditions. (NRC 2012, pp. 62–63)

PROGRESSIONS IN THE FRAMEWORK


This is a quick look at the Analyzing and Interpreting Data progressions found in
the Framework document.
In elementary classes, we would see students

• make a start at recording and sharing observations; and


• engage in scientific inquiry and begin collecting categorical or numerical
data for presentation in forms that facilitate interpretation, such as tables
and graphs.

xii
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
FOREWORD

In middle school, students would learn the use and justification of some of the
standard techniques for displaying, analyzing, and interpreting data, including

• different types of graphs;


• the identification of outliers in the data set; and
• averaging to reduce the effects of measurement error.

In high school, as the complexity of investigations increases, we see a broad-


ening of the techniques for the display and analysis of the data. Examination of
the relationships between two variables sees students produce x-y scatterplots or
crosstabulations.

THE NGSS
The table below is taken from the NGSS (p. 9, Appendix F; NGSS Lead States 2013);
it clearly shows the significance of data literacy and the related progressions that
have been developed at the state standard level (Figure F.1).

FIGURE F.1

Progression of the practice of analyzing data in the NGSS


Grades K-2 Grades 3-5 Grades 6-8 Grades 9-12
Analyzing data in K–2 builds Analyzing data in 3–5 builds Analyzing data in 6–8 builds on K–5 Analyzing data in 9–12 builds on K–8
on prior experiences and on K–2 experiences and experiences and progresses to extending experiences and progresses to introducing
progresses to collecting, progresses to introducing quantitative analysis to investigations, more detailed statistical analysis, the
recording, and sharing quantitative approaches distinguishing between correlation and comparison of data sets for consistency,
observations. to collecting data and causation, and basic statistical techniques and the use of models to generate and
• Record information conducting multiple trials of of data and error analysis. analyze data.
(observations, thoughts, qualitative observations. • Construct, analyze, and/or interpret • Analyze data using tools, technologies,
and ideas). When possible and feasible, graphical displays of data and/or large and/or models (e.g., computational,
• Use and share pictures, digital tools should be used. data sets to identify linear and nonlinear mathematical) in order to make valid and
drawings, and/or writings • Represent data in tables relationships. reliable scientific claims or determine an
of observations. and/or various graphical • Use graphical displays (e.g., maps, optimal design solution.
• Use observations (firsthand displays (bar graphs, charts, graphs, and/or tables) of large • Apply concepts of statistics and
or from media) to pictographs and/or pie data sets to identify temporal and spatial probability (including determining
describe patterns and/or charts) to reveal patterns relationships. function fits to data, slope, intercept,
relationships in the natural that indicate relationships. • Distinguish between causal and and correlation coefficient for linear fits)
and designed world(s) in • Analyze and interpret correlational relationships in data. to scientific and engineering questions
order to answer scientific data to make sense of • Analyze and interpret data to provide and problems, using digital tools when
questions and solve phenomena, using logical evidence for phenomena. feasible.
problems. reasoning, mathematics, • Apply concepts of statistics and • Consider limitations of data analysis
• Compare predictions and/or computation. probability (including mean, median, (e.g., measurement error, sample
(based on prior • Compare and contrast mode, and variability) to analyze and selection) when analyzing and
experiences) to what data collected by different characterize data, using digital tools interpreting data.
occurred (observable groups in order to discuss when feasible. • Compare and contrast various types of
events). similarities and differences • Consider limitations of data analysis data sets (e.g., self-generated, archival)
• Analyze data from tests in their findings. (e.g., measurement error), and/or seek to examine consistency of measurements
of an object or tool to • Analyze data to refine to improve precision and accuracy of and observations.
determine if it works as a problem statement or data with better technological tools and • Evaluate the impact of new data on a
intended. the design of a proposed methods (e.g., multiple trials). working explanation and/or model of a
object, tool, or process. • Analyze and interpret data to determine proposed process or system.
• Use data to evaluate and similarities and differences in findings. • Analyze data to identify design features
refine design solutions. • Analyze data to define an optimal or characteristics of the components of a
operational range for a proposed object, proposed process or system to optimize
tool, process or system that best meets it relative to criteria for success.
criteria for success.

xiii
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
FOREWORD

What about math? Where does data literacy fit


in there?
Two guiding documents connect math to data literacy. The Principles and Standards for
School Mathematics (NCTM 2000) has a strand entitled Data Analysis and Probability
and the Common Core State Standards, Mathematics (NGAC and CCSSO 2010) has
Measurement and Data as well as Statistics and Probability.

USE OF DATA IN THE NCTM PRINCIPLES AND STANDARDS


The big ideas guiding the NCTM Principles and Standards are that all students
should be able to

• formulate questions that can be addressed with data and to collect, organize,
and display relevant data to answer them;
• select and use appropriate statistical methods to analyze data;
• develop and evaluate inferences and predictions that are based on data; and
• understand and apply basic concepts of probability.

Looking at graphing, we see the following progression:

• K–2: Represent data using concrete objects, pictures, and graphs


• 3–5: Represent data using tables and graphs such as line plots, bar graphs,
and line graphs
• 6–8: Select, create, and use appropriate graphical representations of data,
including histograms, box plots, and scatterplots
• 9–12: Understand histograms, parallel box plots, and scatterplots and use
them to display data

REFERENCES TO USE OF DATA IN THE COMMON CORE STATE


STANDARDS, MATHEMATICS
The strand Measurement and Data runs from first to fifth grade, with Statistics and
Probability running from sixth grade to high school. Let’s look at the progression.

MEASUREMENT AND DATA


• Grade 1: Organize, represent, and interpret data with up to three categories
• Grade 2: Draw a picture graph and, in high school, a bar graph (with single-
unit scale) to represent a data set with up to four categories
• Grade 3: Draw a scaled picture graph and a scaled bar graph to represent a
data set with several categories
• Grade 4: Make a line plot to display a data set of measurements in fractions
of a unit (½, ¼, 1/8)
xiv
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
FOREWORD

• Grade 5: Use operations on fractions for this grade to solve problems


involving information presented in line plots

STATISTICS AND PROBABILITY


• Grade 6: Recognize a statistical question as one that anticipates variability in
the data related to the question and accounts for it in the answers
• Grade 7: Use measures of center and measures of variability for numerical
data from random samples to draw informal comparative inferences about
two populations
• Grade 8: (1) Construct and interpret scatter plots for bivariate measurement
data to investigate patterns of association between two quantities
• Grade 8: (2) Know that straight lines are widely used to model relationships
between two quantitative variables. For scatter plots that suggest a linear
association, informally fit a straight line, and informally assess the model fit
by judging the closeness of the data points to the line

INTERPRETING CATEGORICAL AND QUANTITATIVE DATA


Grades 9–12: High School
• Summarize, represent, and interpret data on a single count or
measurement variable
• Summarize, represent, and interpret data on two categorical and
quantitative variables
• Interpret linear models

The above information on science and math standards was obtained directly from
the documents listed in the Reference section (and as such much of the text is a direct
copy from those documents).

References
National Council of Teachers of Mathematics (NCTM). 2000. Principles and standards for
school mathematics. Reston, VA: National Council of Teachers of Mathematics.
National Governors Association Center for Best Practices and Council of Chief State School
Officers (NGAC and CCSSO). 2010. Common core state standards. Washington, DC:
NGAC and CCSSO.
National Research Council (NRC). 2012. A framework for K–12 science education: Practices,
crosscutting concepts, and core ideas. Washington, DC: National Academies Press.
NGSS Lead States. 2013. Next Generation Science Standards: For states, by
states. Washington, DC: National Academies Press. www.nextgenscience.org/
next-generation-science-standards.

xv
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
CHAPTER 8
SIMPLE STATISTICS FOR SCIENCE TEACHERS: THE t -TEST, ANOVA
TEST, AND REGRESSION AND CORRELATION COEFFICIENTS

I
n earlier chapters (Chapters 3 and 4), we are tests for more complicated designs, but
looked at patterns in nominal data using this is a basic introduction, and understanding
bar graphs and ordinal data using line these will help you understand more compli-
graphs. We “eyeballed” differences from cated designs if you need to.
those graphs, looking at the size of the circles In a book like this, we should probably men-
drawn around the tics, and drew conclusions tion why this chapter is here. Statistical tests
that we discussed using hedging language. But don’t seem very basic do they? We agree that
is that what scientists do? they’re not; however, we’ve seen projects by
Actually yes, in the early stages of their grade 7 students (at science fairs) in which they
research or as the research is progressing—but used t-tests and could describe how the tests
it’s not how they write their final reports. Those worked and why they used them. Correctly, we
final reports often contain statistical analyses that should add. We’ve seen ANOVA tests used by
allow the scientist to state with more certainty grade 10 students in the same settings, and by
what differences and patterns they have found grade 12 students as part of inquiry investiga-
in their data. Remember what we mentioned ear- tions in their regular classes. As a teacher you
lier? That science was a probabilistic endeavor? never know when you’re going to have that
Well, part of it being probabilistic is that scientists hyper-keen student in grade 8, so we thought
want to state with as much certainty as possible you might appreciate having some resources to
what the patterns and relationships are that they help you work with them. If nothing else, the
are looking at. Using statistics helps scientists worksheets we provide will give you something
improve the certainty of their statements so they to give them to enhance their learning when
can be as precise as possible. they’ve raced ahead of the rest of the class.
In this chapter, we’re going to look at three Besides that, this chapter might help you under-
basic statistical tests. The first is the t-test, which stand some of those mail-outs from boards of
is used when you have nominal or ordinal data education with statistics in them that most of us
and only two test variables you are comparing have trouble making heads or tails of.
(e.g., the speed of cats and dogs). The second In Appendix IX we provide three resources:
is the analysis of variance (ANOVA) test for Worksheets that do a step-by-step calculation
when you have nominal or ordinal data and of each of these types of statistical analysis,
more than two test variables (e.g., the speed of critical value tables that let you determine if
cats and dogs and pigs). The third statistical there are statistically significant differences,
analysis is correlation and regression analysis, and a worked-through example for each test
which is for interval-ratio data when you are from data used in previous chapters in this
comparing two things you have measured book. We’ll also mention that in the Resource
(e.g., the amount of salt in the pot and how long section (Appendix VI) there are links to
it takes potatoes to cook). These three basic websites that also conduct these tests if you
tests cover most of the types of inquiry studies insert the data into them.1 This chapter is an
we’ve seen grade 7–12 students conduct. There
1. We also intend to provide a resource page with analysis
tools at the NSTA Press website for the book.

51
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
CHAPTER 8

introductory description of what these tests are FIGURE 8.1


doing and the conditions that should be met
Graph of temperature data with arrows
for doing them. depicting range of response and the
gray area depicting where the data
overlaps
The t -Test
The t-test is used when you compare two means Gray area is where
to see if they are statistically different from the data overlaps
each other. You should not use a t-test over and

Temperature (Degrees Celsius)


over to compare many pairs of means (see the
ANOVA test description for how to deal with
that situation). What the t-test is doing is deter-
mining what the likelihood is that the differ-
ence between the two means happens because
of chance or because of the variable you tested.
In simple terms, it’s comparing how much data
scatter there is for each variable and then com-
paring how different the means are in relation
to that data scatter so that the likelihood of the Gray White
differences between the means being due to
Thermometer Bulb Color
random chance can be determined.
It might be a bit simpler if we looked at a
1. The data scatter is reasonably the same
graph of data (Figure 8.1 here, which you
for the two categories (in statistical
might recognize from Figure 1.7, p. 10).
terms, the variation is close to the same).
A t-test would help you determine whether
the amount of overlap in the data would be sta-
2. There is more data toward the middle
tistically significant so that you could argue that
of the circles than at the nearest and
the two means are different from each other.2
farthest points away from the middle
Every test has conditions (also known as
(in statistical terms, the data has a
assumptions) that must be met for the results
reasonably normal distribution).
to be valid. If you meet those conditions,
statistical tests are pretty good at letting you 3. The data are randomly chosen (in
know whether there’s a statistically significant statistical terms, this means you didn’t
difference between means, but if you violate choose data to include so that you
those assumptions then the tests might not be showed what you wanted to show).
accurate. Here are the assumptions that should
be met to do the t-test: 4. The replicates in the two treatments
need to be independent of each other.
For instance, the data cannot be before
and after measures on the same
2. We’ve actually done this in Appendix IX. Go and take a
individuals (there’s a separate test for
look at whether the means are significantly different or that called the paired t-test).
not for this data set.

52
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
SIMPLE STATISTICS FOR SCIENCE TEACHERS: THE t -TEST, ANOVA TEST,
AND REGRESSION AND CORRELATION COEFFICIENTS

It might be easier to show you what this FIGURE 8.3


means on a graph. Let’s look at Figure 1.7 (the
Depiction of gray and white
gray and white thermometer data, p. 10) again temperature data portraying uneven
in Figure 8.2. data scatter

FIGURE 8.2

Temperature (Degrees Celsius)


Depiction of gray and white
temperature data portraying even data
scatter
Temperature (Degrees Celsius)

Gray White

Thermometer Bulb Color

Notice that in Figure 8.3, the data on the


right bar is much more scattered (has a greater
Gray White variation) than the data on the left. This vio-
Thermometer Bulb Color lates assumption 1. Now we’ll look at a graph
that violates assumption 2 (Figure 8.4, p. 54).
You’ll notice that the data in this graph
meets the assumptions listed above: The raw
data depicted around the two bars is about the
same distance from top to bottom, and there
are more data points close to the horizontal line
than far away from it.
Let’s look at a couple of extreme versions
of data for the same variables that do not meet
those assumptions for a t-test (Figure 8.3).

53
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
CHAPTER 8

FIGURE 8.4 However, having described the problem, real-


ize that the t-test is a reasonably robust test
Depiction of gray and white
temperature data portraying and is fairly accurate even if its assumptions
discontinuous data scatter are violated.
So, how do you calculate a t-test? The step-
by-step worksheet and example in Appendix
IX will show you. When you calculate your
Temperature (Degrees Celsius)

t-test statistic using the worksheet, compare


your calculated value to the table value in
Table 8.1.

TABLE 8.1

Critical values for the t-test statistic


5% Significance Table

Degrees Degrees
Gray White of Critical of Critical
Thermometer Bulb Color freedom value freedom value
4 2.78 15 2.13
Here, you’ll notice that on the left bar data
5 2.57 16 2.12
are not close to the horizontal line at all; the
horizontal line is at the average of two separate 6 2.48 18 2.10
clusters. This condition violates assumption 2 7 2.37 20 2.09
because the data are not normally distributed
8 2.31 22 2.07
(i.e., more toward the horizontal line than
away from it). 9 2.26 24 2.06
The more your data looks like the last two
10 2.23 26 2.06
graphs (and data could look like a combination
of both of them), the less likely it is that the 11 2.20 28 2.05
results of the t-test are reliable. In that situation 12 2.18 30 2.04
you might hedge how you phrased your inter-
13 2.16 40 2.02
pretation of the data analysis. For instance, if
you found a significant difference (as we will 14 2.15  60 2.00 
describe below), you could write, “Despite 120 1.98
finding a significant difference between the
mean temperatures for the gray and white
If the t-statistic you calculated is less than the
thermometer, there is still some room for doubt
critical value in the table above (for the correct
because of the amount of variation in the data
degrees of freedom, which you calculate on
for the white thermometer, which was much
the worksheet) then the difference between the
greater than that for the gray thermometer.”
two means is not statistically significant.

54
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
SIMPLE STATISTICS FOR SCIENCE TEACHERS: THE t -TEST, ANOVA TEST,
AND REGRESSION AND CORRELATION COEFFICIENTS

If the calculated t-statistic is greater than the different on a graph, but scientists care about
critical value in the table above (for the cor- how much data scatter there is too, which is why
rect degrees of freedom) then the difference these statistical tests were created! And doing an
between the two means is statistically sig- ANOVA test allows you to be more convincing
nificant at 5%. This means we’re 95% confident when making arguments about your findings to
that the difference between the means is a real others (that’s why scientists do statistical analy-
one (i.e., not due to chance). ses: They remove some of the personal bias that
might influence their interpretations, so they
become more convincing with their claims).
The ANOVA Test
First, it’s important to note that the ANOVA
The ANOVA test is used when you have
test has some conditions that must be met
several different treatments you are testing
(just as the t-test did; in fact, they’re basically
(in other words, more than two treatments).
the same conditions, so you can look at the
Sometimes people do multiple t-tests instead
graphical examples from the t-test if you need
of an ANOVA—this is bad, bad, bad. Very bad.
to). Here are the conditions that must be met to
Ghostbusters bad. Why? Because you consid-
perform an ANOVA test:
erably increase the likelihood that you’ll report
a statistically significant difference when there 1. The data scatter is reasonably the same
is not one. All of those “only 1 in 20 chances of for the two categories (in statistical
being wrong” possibilities add up so it becomes terms, the variation is close to the same).
very likely that you’re wrong. An ANOVA test
stops that from happening. 2. There is more data scatter toward the
Note that the treatments have to be either middle of the circles than at the nearest
nominal- or ordinal-category-type data catego- and farthest points away from the
ries, treatments, or groups, and you have to middle (in statistical terms, the data
have collected measured data about them. has a reasonably normal distribution).
If, for instance, you measured how fast
three different breeds of dogs (with increasing 3. The data are randomly chosen (in
sizes) could run 30 m, then that would be the statistical terms this means you didn’t
type of data you would do an ANOVA test on. choose only data to include so that you
You have ordinal categories, and you have the showed only what you wanted to show).
times it took to cover the distance.
4. The replicates in the treatments need
But why would you?
to be independent of each other (one
You can see the differences in the graph can’t
treatment cannot be influencing
you?
another).
Well, an ANOVA test allows you to figure out
if the differences between the mean times to run We’ve been using the phrase data scatter to
30 m are different enough, given the way the discuss how the raw data spreads out around
data are scattered about the mean, to say with the mean, but the more correct term is variance.
certainty that the breeds of dogs can run at dif- So, an ANOVA test really is—ready for it?—an
ferent speeds. It might seem a bit odd to “test” ANalysis Of VAriance. An ANOVA test ana-
this, because you can see that the means look lyzes the variance around each of the means

55
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
CHAPTER 8

and the overall variance to figure out how spreadsheet. That’s because all of the data are
certain you can be about whether the means on it, not just the average of the times each breed
are different from each other. ran 30 m. Because you can see the data scatter
Okay, so let’s say that we’ve collected data around each average (at the dot where the line
looking at how fast the three different dog is), you can get a bit of an idea about what an
breeds can run a 30 m distance, and we’ve tested ANOVA test does. Basically, it compares the
five different dogs of each breed (Table 8.2). data scatter around each mean and the overall
That data would give us a graph that looks data scatter and where each mean is and figures
like Figure 8.5. out if the means are different from each other.
Essentially, the ANOVA test is analyzing how
TABLE 8.2 much the data you see on the graph overlap in
relation to the total amount of data scatter. If the
Time in seconds that different dog
data do not overlap enough, then the means
breeds can run 30 m
are probably different from each other. If they
Dog Poodle Labrador Doberman overlap a lot, in relation to the total amount of
data scatter, then the means probably are not
1 14 17 8
different from each other. The graph in Figure
2 13 10 9 8.6 might help you picture this.
3 13 16 6
FIGURE 8.6
4 15 8 8
5 17 9 7 The time it takes different breeds of
dogs to run 30, ordered by dog size with
Avg. 14.4 s 12 s 7.6 s an arrow depicting the overall range of
response and the gray area depicting
where the data overlaps for the pairs of
FIGURE 8.5 variables
The time it takes different dog breeds to Time Different Breeds of Dogs Take to Run 30 m
run 30 m, ordered by dog size
# of Seconds to TSravel 30 m

Overlap of Data
Time Different Breeds of Dogs Take to Run 30 m
18
# of Seconds to Travel 30 m

16
Total Range

18 14
of Data

16 12
14 10
12 8
10 6
8 4
6 2
4 0
2 Poodle Labrador Doberman
0
Poodle Labrador Doberman Type of Dog (by Size)

Type of Dog (by Size)


As with the t-test, the ANOVA test is a pretty
Notice that this graph looks a little bit robust test, and this means that the variation
different from those you might get from a in the data scatter around the mean can be a

56
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
SIMPLE STATISTICS FOR SCIENCE TEACHERS: THE t -TEST, ANOVA TEST,
AND REGRESSION AND CORRELATION COEFFICIENTS

bit off (as it is with the Labrador) and the test But what if you do find statistical significance
will still be valid. for the whole data set after doing an ANOVA
However, it gets a bit more complicated test? Well, then looking at the graph to help fig-
drawing conclusions from the ANOVA test ure out the paired means is completely valid.
compared to the t-test. In the t-test, there was Let’s look at our data in Figure 8.5 again.
only one pair of data, and we knew that the How would we analyze it? Let’s assume that
significant difference was just between that our ANOVA was significant. Now we have
pair. But in the ANOVA, if the means are found to figure out the differences between pairs of
to be statistically significant from each other data. We should probably look at the amount
overall, we still don’t know which means were of overlap.
statistically significant from each other; we can- When you do this, you note that the lack of
not assume that they all were. In this dog study, overlap of the two gray areas (one drawn across
for instance, we know that there is a statistical covering the poodle data, the other drawn
difference between means (see Appendix IX across covering the Doberman data) probably
for a worked-out example) but not whether means that the average times of poodles and
Dobermans are faster than poodles, Labradors Dobermans are significantly different from
are faster than poodles, or Dobermans are each other. (Note the use of hedging language
faster than Labradors (the three possible pair in that statement? Also note that we haven’t
comparisons). used the word statistically because we don’t
There are tests, called post hoc (meaning know about that specific pair statistically since
after) tests, which can be done for this, but we haven’t run a statistical test.) However,
they’re complicated enough that we’re not there was so much variation in the Labrador
going to include them here.3 data that it’s difficult to draw any strong
That does not, however, mean that you conclusions about the differences in the mean
cannot draw conclusions—we can look at the times of the different breeds. The average times
graph. The first important point is this: if you for the Labradors and the Dobermans were
do an ANOVA test and do not find a statisti- pretty far from each other, and the time data
cally significant result for the whole data set, of each breed only overlapped a little, so the
then it does not matter what the graph looks mean times are quite possibly different from
like—how far apart the means are—because each other (so, significantly different from each
there is no statistically significant difference, and other). However, the time data for the poodles
that result means a whole lot more than any and the Labradors overlapped enough that it’s
eyeballing differences. We’re emphasizing this possible that the means for those breeds are not
point because even undergraduates in science different from each other—or in other words,
have difficulty understanding this. No statisti- that there is no difference between the means
cal significance means, wait for it, waaiiittt for for the poodle and the Labrador dogs. So, from
it … no statistical significance … no difference an inspection of the data scatter on the graph it
between means. Just what it says. Okay? None. would be safe to conclude that

• poodles are very probably slower than


Dobermans;
3. A common post-hoc test for dog data such as in the
example is called Tukey’s test.

57
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
CHAPTER 8

• Dobermans are possibly faster than worksheet in Appendix IX), then you obtain a
Labradors; and value that tells you the percentage of variation
• poodles might be the same speed as in variable y as explained by variable x (in a
Labradors. causal relationship). The closer this value is to
1 (or 100%, since you often multiply the prod-
Without statistical testing this is a qualitative uct by 100), then the closer the values are to the
determination and therefore hedging language line. In Figure 8.7 (a–c), you see three lines of
is used for all of the pair comparisons.4 best fit with different amounts of data scatter
Again, with an ANOVA analysis these dif- around them.
ferences would have a percentage certainty,
or likelihood of error, associated with them FIGURE 8.7
just like the t-test, and in the tables provided
with the worksheets (see Appendix IX) there Examples of different types of
is a 95% certainty in your answer (of statistical scatterplot relationships
significance of differences in the entire data
set), or a 5% possibility of error rate.

a b c d
Correlation and
Regression Analysis In this example, (a) would have an r-squared
This type of data analysis is done on interval- value close to 100%. At the other extreme, (d), any
ratio measures for which you want to find out if plotted line of best fit would have an r-squared
one factor (or variable) changes when another value close to 0% (in other words, there is no
one does. Basically, when you have a graph of relationship between the two variables). There
data, the regression analysis (or line of best fit are no hard and fast rules in science as to how
analysis) is determining what the best average much of an r-squared value is needed to talk
line is through the data set, and the correlation about relationships between the variables. It
coefficient analysis is a measure of just how can vary quite considerably depending on the
good that average is (i.e., how much the data circumstances. However, the r-squared value
are scattered about that line). This calculation is does give you guidance as to how you should
not a significance test (as the t-test and ANOVA be using hedging language to talk about the
test were), so you’re not determining whether relationship between the variables.
the slope of the line of best fit is significantly Remember, in Appendix IX we provide
different from something else. worksheets for doing t-tests and ANOVA tests
A correlation coefficient is a calculated as well as worked-through examples. Appendix
statistic representing how close the data VIII also demonstrates a t-test analysis.
points are to the line of best fit. If you multiply As a conclusion to this chapter we are going
the correlation coefficient by itself (see the to provide an example of a correlation analysis
in the form of a case study. In this case study,
4. A Tukey’s test at 5% indicates that there is a significant you’ll find a student report on an investigation
difference between the poodle and Doberman means,
but no difference between the poodle-Labrador or
and then a teacher’s feedback on that report.
Labrador-Doberman mean times. This reflects the
broad scatter in the Labrador times.

58
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
SIMPLE STATISTICS FOR SCIENCE TEACHERS: THE t -TEST, ANOVA TEST,
AND REGRESSION AND CORRELATION COEFFICIENTS

Case Study: Student TABLE 8.3

Regression and A student’s data on how much guinea


Correlation Analysis with pigs sleep in relation to how much they
eat
Teacher Commentary
Pellets Sleep in 24 h
Day eaten (g) (min)
STUDENT RESEARCH QUESTION: 1 38 185
DOES MY GUINEA PIG SLEEP 2 40 220
MORE WHEN IT EATS MORE? 3 48 217
4 42 260
METHOD 5 41 270
1. Put 100 g of pellet food in my guinea 6 47 235
pig’s bowl each day. 7 50 195
8 43 270
2. Each morning replace the food bowl 9 45 269
with a new one with 100 g of pellet 10 49 258
food, pick up any pellets lying around 11 53 310
and put them in the old food bowl, and 12 45 420
weigh the old food bowl. Subtract the 13 31 350
total remaining food from 100 g to get 14 50 310
15 42 210
how much my guinea pig has eaten.
16 53 270
Record the data in the data table.
17 54 304
3. Use a video camera with a time counter 18 51 331
19 60 321
attached to my computer to record the
20 61 215
amount of time my guinea pig sleeps in
21 62 265
its box (I used a special camera with an
22 55 254
infrared light that could see my guinea
23 65 300
pig in the dark). Each day, fast-forward
24 61 325
through the recording and keep track of
25 60 335
how many minutes the guinea pig lies
26 60 355
down with its eye facing the camera 27 68 355
(mostly) closed (What looks like sleep 28 80 435
… most sleep with their eyes open, mine 29 58 357
usually doesn’t). Record the number of 30 70 330
minutes in the data table (Table 8.3).

4. Keep hay and water in the cage so that


there is always some. The only food
being tracked is the pellets.

5. Do steps 2–4 for 30 days.

59
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
CHAPTER 8

Because I had x-y data, I graphed it in a scat- correlation coefficient without them because
terplot so that I could see any pattern better when I draw the line on the graph from the
(Figure 8.8). regression formula the line seems kind of high.
I would do the study with more guinea pigs if
FIGURE 8.8 I had them because maybe my pig isn’t normal
and doesn’t sleep in the same way as others.
A scatterplot of the data shown in Table
8.3, p. 59
INSTRUCTOR FEEDBACK TO
STUDENT
Comparing Guinea Pig Pellet Eating You did a good job studying your guinea pig
With Sleep
and figuring out the relationship between the
500 amount of food and the amount of sleep. You
450
wrote about it really well. You described how
Minutes Sleep/24 h

400
350 strong the relationship is (shown by your cor-
300 relation coefficient) quite effectively by using
250 the hedging language we’ve talked about in
200
class. You’re right, the relationship between
150
100 the pellet consumption and the amount of
30 50 70 90 sleep each day isn’t a strong one (as indicated
Grams of Pellets by the 23% value), but it is there. We haven’t
talked about this in class, but in physics that
number might be really low, but when you’re
describing animal behaviour and many other
I also calculated the regression formula and
things, a 23% correlation is actually really
correlation coefficient using the worksheets
good. It means you’re predicting 23% of what
you gave us, so I knew how good my calcu-
an animal is doing. I also think you’re right
lated line of best fit represented the data.
by the way, if you excluded those two values
on the top left of your page (who knows why
Regression formula: y = 2.7x + 145
your pig slept longer on those days—maybe
Correlation coefficient: 0.23 or 23% it ate more hay than normal, or maybe it ran
on its wheel more than it normally does) then
CONCLUSION your correlation would be much higher. When
I only have one guinea pig, so I can’t say any- I exclude them and calculate your correlation
thing about all guinea pigs, but I can say that coefficient it jumps up to 53%, and for animal
when mine eats more it seems to sleep longer, behaviour that is really high.
at least reasonably often. The correlation I have a question for you: Why are you sure
coefficient is only 23%, which means that the that it is the pellet consumption that is causing
line doesn’t fit the data really well, but those the amount of sleep? We did talk about the
two high amounts of sleep on the top left of difference between correlation and causation in
the graph might have made it weaker. Maybe class. On the one hand, it does seem reason-
I should calculate the regression line and the able. I’m always sleepy after a big dinner. On

60
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
SIMPLE STATISTICS FOR SCIENCE TEACHERS: THE t -TEST, ANOVA TEST,
AND REGRESSION AND CORRELATION COEFFICIENTS

the other hand, something else is also going on FIGURE 8.9


that might affect how the guinea pig behaves.
The teacher’s graphs compare sleep and
Winter is coming, right? What else happens food consumption over a number of days
then? I bet if you think about it you’ll remem-
ber that when winter comes it’s darker for
Guinea Pig Pellet Consumption by Days
a longer time—the daytime is shorter. What
effect might less daylight have on the amount 90
of sleep a guinea pig would want to get? Do 80

Grams of Pellets/24 h
they have alarm clocks? What do you think 70
60
wakes them up? So how would we look at this?
50
If it were the length of the day, you’d think 40
your pig would eat more later in your study 30
than earlier and would sleep more later in your 20
10
study than at the beginning. Let’s graph this
0
(Figure 8.9). 0 10 20 30 40
So, do you see that? The food consumption Days
goes up over the 30 days of your study and the
sleep also goes up over those 30 days. So maybe
the amount of daylight is affecting both how Guinea Pig Sleep by Days
much your pig sleeps and how much it eats.
500
This probably means that the amount of pel- 450
lets consumed is correlated with the amount of 400
Sleep/24 h (minutes)

350
sleep, but not the cause of the amount of sleep. 300
Other than missing that (which even I admit 250
200
was pretty tricky), your study and your report 150
were both well done. 100
50
If you want to test whether it was daylight 0
that had an effect, you could do your study 0 10 20 30 40
again in the spring when the length of daylight Days
is getting longer and see if there was a decrease
in the amount of time your guinea pig slept.
Let me know what you find out.

61
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
INDEX

Abstract, laboratory report, 120 cause, effect covariance, 39


Adjective checklist, 71 lack of alternatives, 39
Alpha calculation, 86 temporal precedence, 39
Ambiguities, experiencing, 6 Count questions, 66–67
ANOVA, 51–61, 65, 67, 134, 150–151, 153– Covarying relationships, 79–84
154 Cronbach’s alpha calculation, 86
worksheet, 150–155 Crosstabulation table, 8, 79–82, 84
Assumptions, 47, 52–54, 121 Current student knowledge, 6
in laboratory report, 121 Curvilinear relationship, 37
t-test, 52
Axes
x-y graph, 3 Data scatter, 5, 9–11, 15, 21, 28, 38, 52–58,
x-y-z graph, 4 140
Data table, 19–20, 31–32, 41–44, 59, 77,
79–81, 83, 87
Bar charts, 23, 85, 134 Deterministic language states, 45
Bar graphs, 1, 9, 11, 13–15, 19–21, 23, 25–26, Directionality of questions, 87–88
28, 49, 51, 65, 67, 69, 77, 121–122 Discrete categories questions, 42, 64–67
Biology4all website, 131 Discursive practices, 6
Brief questions, 68 Discussing data in reports, 47–48, 121
Disorder, finding structure in, 8
Distribution, 52, 55, 135
Calculators, 133–134 Double-barreled questions, 71
statistical, 134 Double-negative statements, 73–74
t-test, 133 Dual answers, 72
Captions, use of, 47–48, 117–118, 122
Causality vs. correlation, 38–39
Central tendency measure, 78 Excel, 15, 131–134
Chi-squared test, 77 Expertise of others, drawing on, 6
Clear phrasing, 64
Communication, scientific, 45–48, 133
Comparative scale, 71 F- table critical values, 160–163
Concept maps, 123–129 Face validity, 86
Conclusions in science, 8–11 Feedback, 58, 60–61
Confirmatory factor analysis, 86 Fixed sum scale, 71
Control, giving students, 128 Focused questions, 68, 72
Controlled variables, 21–22, 37, 93–94, 124 Forced ranking scale, 71
Controversial statements, 74 Format, interval-ratio data table, 44
Correlation analysis, 49, 58–60, 87, 133–134, Frequency data analysis, 65, 79–84
143
worksheet, 156–159
Correlation coefficient, 51–61, 133, 158 Graphs, 1, 3–5, 8–11, 13–21, 23–29, 32–39,
Correlation vs. causality, 38–39 41–44, 46–49, 51–58, 60–61, 65, 67, 69,
Correlational relationship, 37, 39 77–79, 83, 86, 91, 93–95, 99, 108, 110, 112,

167
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
Index

117–122, 124–125, 131–134, 137, 139, 141 Introduction, in laboratory report, 120
appropriateness of, 122 Investigation reports, discussing data in,
interpretation, 16–17 47–48
predicting from, 15
raw data importance, 14–15
structure, 117–118 Kruskal-Wallis test, 77
student misconceptions, 132

Laboratory reports, 47–48, 119–129


Harvard Forest Schoolyard Science Program, abstract, 120
131 assumptions, 121
Hedging language, 6, 14, 24, 30, 45–47, 51, discussion, 121
57–58, 60, 79, 82 interpretations, 121
Higher-order thinking, 7–8, 14, 16, 66, 84, 86 introduction, 120
skills, 7–8 methods, 120
uncertainty with, 8 results, 120
Higher-order variables, scaffolding toward, 16 title, 120
Histograms, 1, 19, 134 Learner communities, sharing with, 6
Levels of science inquiry, 7
Imposition of meaning, 8 Likert question, 68–71, 73, 77, 82, 84–85, 122
Independent variable types, 13, 36, 42, 117 Limitations to data, 24, 30
Inquiry levels, 7 Line graphs, 1, 14–15, 25–29, 33, 51, 67, 108,
Interpretation, 7, 15–16, 19, 36–37, 47, 64, 117, 121–122, 133
121, 136 Loaded statements, 74
bar graphs, 23–24
data analysis, 54
graphs, 16–17, 47 Mann-Whitney U test, 77
laboratory report, 121 Marble-rolling lab activity, 91–97
line graphs, 29–30 data sheet layout, 91–92
nuanced, 7 equipment setup, 91
Interval-ratio data, 13–14, 16, 19, 27, 30–39, focus questions, 93–96
42–44, 46, 49, 51, 66–67, 69, 77, 79, 86–87, Mathematics, data management, 23, 29, 132,
99, 101, 107, 113, 134 135–137
causality vs. correlation, 38–39 Meaning, imposition of, 8
correlational relationship, 39 Measure of central tendency, 78
cause, effect covariance, 39 Median, 22, 27, 36, 69, 78, 118, 135
lack of alternatives, 39 Memory, data tables as, 41–44
temporal precedence, 39 Methods, in laboratory report, 120
curvilinear relationship, 37 Missing categories, 74
interpretation, 35–38 Multichotomous questions, 66
non-discrete, 37–38 Multiple criteria, application of, 7
relationships between variables, 38 Multiple solutions, higher-order thinking, 7
traditional scatterplot, 37
trend bar, 34
x- and y- axis, 35 National Center for Education Statistics, 132
x-y data, 32 Nominal-level data, 13, 16, 19–24, 122
x-y graphs, interpreting, 33–34 controlled variables, 21
Interval-ratio variable tables, 42–44 interpreting bar graph, 23–24

168
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
INDEX

limitations to data, 24 Regulation of thinking process, 8


outliers, 19–20, 22, 46–47 Relationships, certainty regarding, 46
representing data, 22–23 Relationships between variables, 4–6, 38
Nominal variable tables, 41–42 Relative ordered relationship, 13
Non-discrete interval-ratio data, 37–38 Representation of data, 22–23
Nonalgorithmic thinking, 7 Research investigation reports, 47–48
Nonparametric data, 77, 86–87 Results, in laboratory reports, 120
Nonparametric equivalents, 77 Reviewing laboratory reports, 119–122
Normal distribution, 52, 55
Nuanced judgement, 7
Scaffolded investigation, 7, 16, 99–115
ball-bounce activity, 102–103
The Open Door Web Site, 131–132 effect of solutes, 100–101
Open-ended written responses, 70 elastic-stretch activity, 104–105
Open inquiry, 6–7 Scale construction, 84–87
higher-order thinking, 7–8 Scatter, data, 5, 9–11, 15, 21, 28, 38, 52–58,
laboratory activities, 6–7 140
student work, 6 Scatterplots, 1, 14, 32, 35, 37–39, 58, 60, 69,
Ordinal-level data, 13, 25–30, 35, 46, 66, 69, 79, 82–83, 87, 108, 122
85, 122 Science-class.net, 132
bar graphs, 28 Science inquiry, levels of, 7
interpreting line graph, 29–30 Self-regulation of thinking process, 8
limitations to data, 30 Semantic differential scale, 69, 71
line graphs, 28 Sharing with learner communities, 6
Ordinal scale, 71 Signed-rank test, 77
Ordinal variable tables, 41–42 Significant, vs. substantive difference, 48
Outliers, 19–20, 22, 46–47 Social desirability issues, 72
Specific questions, 65
Stapel scale, 71
Paired-comparison scale, 71 Statistical analysis, 10, 49, 51, 85, 143, 145,
Parametric data, 77, 84, 86–87 147, 149, 151, 153, 155, 157, 159, 161, 163
Post-hoc test, 57 Statistical analysis worksheets, 145–163
Probabilities, science and, 4–5, 9 Statistical calculators, 134
Statistics, 14–15, 51, 53, 55, 57, 59, 61, 118,
120–121, 131–134
Question types, 63, 67–69, 71 ANOVA test, 51–61
Questionnaires, 49, 63–68, 70, 72–73, 75, 77 case study, 59–60
Questions, in surveys, 71–75 correlation analysis, 58–60
correlation coefficient, 51–61
data scatter, 5, 9–11, 15, 21, 28, 38,
R- squared calculators, 134 52–58, 140
Randomly chosen data, 52 distribution, 52
Raw data graphing, importance of, 14–15 feedback, 60–61
Real-world data, 5–6 interpretation of data analysis, 54
Regression analysis, 49, 51, 58–60, 84, 133– normal distribution, 52
134, 156, 158 post-hoc test, 57
worksheet, 156–159 randomly chosen data, 52
Regression coefficients, 51–61 regression analysis, 58–60

169
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
Index

regression coefficient, 51–61 nonparametric equivalents, 77


student research question, 59–60 open-ended written responses, 70
t-test, assumptions, 52 ordinal scale, 71
t-test statistic, critical values, 54 paired-comparison scale, 71
t-tests, 51–61 parametric data, 87
temperature data graph, 52 question types, 71
Turkey’s test, 57–58 questionnaire, 73
Statistics Canada, 133 questions, 64–75
Structure in disorder, 8 scale construction, 84–87
Student research question, 59–60 semantic differential scale, 69, 71
Substantive vs. significant difference, 48 social desirability issues, 72
Substantive vs. statistical difference, 48 specific questions, 65
Summed scale, 87–88 Stapel scale, 71
Survey, 49, 63–69, 71–75, 77–81, 83–88, 122 summed scale, 87–88
Survey questions, 49, 64, 71–75, 86, 122 survey table, 68
Survey tables, 68 table questions, 67–68
Surveys, 49, 63–75, 77–88 unclear instructions, 73
adjective checklist, 71 verbal frequency scale, 71
brief questions, 68 Wilcoxon signed-rank test, 77
chi-squared test, 77 yes/no questions, 64–65
clear phrasing, 64
comparative scale, 71
confirmatory factor analysis, 86 T-tests, 49, 51–61, 65, 67, 69, 77, 84, 86–87,
controversial statements, 74 133–134, 139–141, 143–144, 146–147
count questions, 66–67 assumptions, 52
covarying relationships, 79–84 calculators, 133
Cronbach’s alpha calculation, 86 statistic, critical values, 54
crosstabulation table, 8, 79–80, 84 worksheet, 144–149
data table, 77 Tables, 1, 3–4, 7, 13, 18–20, 25, 31–32, 39,
directionality of questions, 87–88 41–44, 48–49, 51, 54–56, 58–60, 67–70,
double-barreled question, 71 77, 79–87, 91, 99, 105–106, 108–110, 112,
double-negative statements, 73–74 117–122, 124–125, 131–134, 139–140,
dual answers, 72 144–145, 147–148, 150–151, 153–154, 156,
face validity, 86 158, 160
fixed sum scale, 71 format, interval-ratio data table, 44
focus in question, 72 interval-ratio variable tables, 42–44
forced ranking scale, 71 memory, data tables as, 41–44
frequency data, 65, 79–84 nominal variable tables, 41–42
Kruskal-Wallis test, 77 ordinal variable tables, 41–42
Likert/attitude items, 68–70 questions, 67–68
Likert question, 77 structure, 117–118, 121–122
loaded statements, 74 survey, 68
Mann-Whitney U test, 77 Temperature data graph, 52
measure of central tendency, 78 Thinking process, regulation of, 8
median, 78 Three-axis x-y-z graph with labeled axes, 4
missing categories, 74 Title of laboratory report, 120
multichotomous questions, 66 Trend, 14–15, 33–34, 36–38, 47, 95–96, 122,
nonparametric data, 87 133

170
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
INDEX

Trend bar, 34 nominal data, 13


Turkey’s test, 57–58 nominal variables, 13
Types of independent variables, 13 ordinal data, 13
Types of questions, 63–71 predicting from graphs, 15
Types of variables, 13–24 raw data graphing, 14–15
Excel, 15 relationships between, 38
graphing, 16–21 relative ordered relationship, 13
higher-order variables, scaffolding types of independent variables, 13
toward, 16 typography of data variables, 13–14
independent variables, 13 Vee maps, 97, 119, 123–129
interpreting graphs, 16–17 Verbal frequency scale, 71
interval-ratio data, 13–14 Visual data, 132–133
nominal data, 13
nominal variables, 13
ordinal data, 13 Web resources, 89, 131–133, 143
predicting from graphs, 15 Wilcoxon signed-rank test, 77
raw data graphing, 14–15 Worksheets, 14, 51, 54, 58, 60, 89, 91, 93,
relative ordered relationship, 13 95, 97, 131, 143–145, 147, 149–151, 153,
Typography, data variables, 13–14 155–159, 161, 163
ANOVA, 150–155
correlation analysis coefficients, 156–159
Uncertainty with higher-order thinking, 8, 132 regression analysis coefficients, 156–159
Unclear instructions, 73 statistical analysis, 145–163
t-tests, 144–149

Variables, 1, 3–6, 13–32, 35–39, 41–44, 46–


47, 49, 51–53, 56, 58, 67, 78–80, 82, 84, 86, X- and y- axis, 35
93–94, 97, 99, 108, 110, 117–118, 122, 124, X-y data, 32, 60
133, 135–137, 157, 159 X-y graphs
Excel, 15 interpreting, 32–34
graphing, 16–21 with labeled axes, 3
higher-order variables, scaffolding X-y-z graph, labeled axes, 4
toward, 16
interpreting graphs, 16–17 Yes/no questions, 64–65
interval-ratio data, 13–14

171
Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.
“Part of being able to take a more informed (some might say skeptical) view of data is
being literate in how data are manipulated and subsequently presented: how they are collected, made
into tables, and shown in pictures or graphs. Once you know how to do this the right way, such as you
might learn in a science classroom, you can start asking if someone else is doing it in a way that is
fair, or if they are distorting the data for their own purposes.”
—From the foreword to The Basics of Data Literacy

Authors Michael Bowen and Anthony Bartley have long known how important data literacy
is to informed citizens. But after years of leading workshops on data literacy, they saw just
how intimidated teachers can be at the prospect of helping students make sense of data
sets they have collected.

In response, Bowen and Bartley wrote this guide—the ideal book for teachers with
little or no statistics background. With its informal tone and easy-to-grasp examples,
The Basics of Data Literacy teaches you how to help your students collect,
summarize, and analyze data inside and outside the classroom. This book helps
you understand how to make sense of data in a way that
• is conceptually grounded in hands-on practices,
• reflects the ways scientists use and make sense of data, and
• extends the ways of understanding to simple statistical analysis.

Because it is so central to many of the ideas in the Next Generation


Science Standards, the ability to work with data is an important
science skill for both you and your students. This accessible book
will help you overcome your anxiety so you can teach your students
how to evaluate messy data from their own investigations, the
internet, and the news, as well as in future negotiations with
car dealers and insurance agents.

PB343X
Grades 6–12 ISBN: 978-1-938946-03-5

Copyright © 2014 NSTA. All rights reserved. For more information, go to www.nsta.org/permissions.

You might also like