Open navigation menu
Close suggestions
Search
Search
en
Change Language
Upload
Sign in
Sign in
Download free for days
0 ratings
0% found this document useful (0 votes)
772 views
Carmines y Zeller Reliability and Validity
m
Uploaded by
jairo
AI-enhanced title
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here
.
Available Formats
Download as PDF or read online on Scribd
Download now
Download
Save Carmines-y-Zeller-Reliability-and-validity For Later
Download
Save
Save Carmines-y-Zeller-Reliability-and-validity For Later
0%
0% found this document useful, undefined
0%
, undefined
Embed
Share
Print
Report
0 ratings
0% found this document useful (0 votes)
772 views
Carmines y Zeller Reliability and Validity
m
Uploaded by
jairo
AI-enhanced title
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here
.
Available Formats
Download as PDF or read online on Scribd
Download now
Download
Save Carmines-y-Zeller-Reliability-and-validity For Later
Carousel Previous
Carousel Next
Download
Save
Save Carmines-y-Zeller-Reliability-and-validity For Later
0%
0% found this document useful, undefined
0%
, undefined
Embed
Share
Print
Report
Download now
Download
You are on page 1
/ 12
Search
Fullscreen
F 519.5018 Q1 No.1? Ej4 FLAGSO. Se Moxco - Bice roars ii i SAGE UNIVERSITY PAPERS Series: Quantitative Applications ‘ in the Social Sciences Series Editor: Michael S, Lewis-Beck, Univarsily of lowa Editorial Consultants Richard A. Berk, Sociology, University of Caomia, Los Angoles Wiliam 0. Berry, Pottical Scienco, Florida Sia Univeral, Kenneth A. Bollen, Sociology, University of North Carolina, Caged Hil Linda B. Bourque, Puatc Heath, University of Calfornia, Los angeles daeques A. Hagenaars, Socia! Soiences, Tilburg Universi, Sally Jackson, Communications, University of Arzone Flchard M. Jaeger (recenty deceased}, Education, Universty of North Carolina, Greensboro Gary King, Cepartment of Government, Harvard University Roger E. Kirk, Psychology, Baylor University Helena Chmura Kraemer, Psychiatry and Beheviorsl Sciences, Staniord University Petor Marsden, Sociology, Harvard University Helmut Nerpoth, Poilical Salance, SUNY, Story Brook kL. Schmidt, Management ard Organization, University ot lowa lerbert Weisberg, Political Scisnco, The Ohio Stale Univeral Publisher Miller McCune, Sage Pubtcations, In. INSTRUCTIONS To POTENTIAL CONTAIBUTORS For guidelines on submission of ammanograph proposal to is sees, pase wite Michael. Lawie-Hock, Cet ‘ge GASS Sere Decanmant of Pelical Scenes Univeral of twa loa iy ta 32242 Series Number 07-017 RELIABILITY AND VALIDITY ASSESSMENT EDWARD G. CARMINES Indiana University RICHARD A. ZELLER Bowling Green State University SAGE PUBLICATIONS The International Professional Publishers Newbury Park London New Delhi 919059Copyright © 1979 by Sage Publieatons, Ine. All rights reserved. No part of this book may be reproduced or uilized in any form ox by any means, electronic or mechanical, including photocopying, recording, ot by any information storage and retrieval system, without permission in writing from the publisher. Por information address; SAG Poblications, Ine. 2455 Tlic Rou ‘Thousand Oaks, California 91:20 Hema: order@sagepub-com SAGE Publications Lid ‘6 Bonk Steet, ‘London EC2A 4PU United Kingdom ‘SAGE Publications India Pvt Lad, M32 Market Greate Kalas 1 New Delhi 110 048 India Printed in the United States of Amesiea International Standard Book Number 0-8039-1371-0 Library of Congress Catalog Card No, 1sC. 79-67629 08 07 06 05 33 32 31 30 ‘When citing « university paper, please use the proper farm. Remember wo cite the Sage ‘University Paper sro ie ince the paper nine. One the following erat ean be adapted (depending onthe syle manual wd). (i) HENKEL, RAMON E. (1976) Tests of Significance. Sage University Paper Sedes on ‘Quanttaive Application in the Socal Scicaces, 07-004. Newbury Pack, CA: Sage on (2) Henkel, RF (1976). Te ofsiniconce (Sage University Paper Sesiex on Quantitative Applications in de Sova! Sciences, seriesno, 07-04). Newbury Park, CA: Sage CONTENTS Editor's Introduction 5 Introduction 9 Definition of Measvrement 9 Reliability and Validity Defined 11 Random and Nonrandom Measurements Error 13 Conelusion 15 Validity 17 Criterion-Related Validity 17 Content Validity 20 Construct Validity 22 Conclusion 26 3, Classical Test Theory 29 Reliability of Measurements 29 Parallel Measurements 32 Conclusion 34 4. Assessing Reliability 37 Retest Method 37 Alternative-Form Method 40 Split-Halves Method 41 Internal Consistency Method 43 Correction for Attenuation 48 Conclusion 49 Notes 53 References 55 ‘Appendix: The Place of Factor Analysis in Reliability and Validity Assessment 59 Factor Analysis and Reliability Estimation $9 Factor Analysis and Construct Validity 62 Conclusion 69 About the Authors 711. INTRODUCTION Definition of Measurement The notion that measurement is crucial to science scems a commonplace and uncxceptional observation. Most book-length ‘treatments of the philosophy of science include a discussion of the topic. And books focusing on research methods invariably have a chapter dealing with the problems associated with measurement. Yet, the widespread acknowledgment of the importance of good measurement has not—until quite recently—ted tothe development of systematic and general approaches to measurement in the social sciences. Quite the contrary, historically, measurement has been ‘more of an abstract, almost ritualistic concern instead of being an integral and central aspect of the social sciences. The coexistence of this asymmetric condition of ritualistic com com but lack of systematic attention with regard to measurement may be partially attributable to the way in which this term is most commonly defined. The most popular definition of measurement is that provided by Stevens more than 25 years ago."Measurement,” Stevens wrote, “is the assignment of numbers to objects or events according to rules” (1951: 22). The problem with this definition, from the point of view of the socialscientis, is thal strictly speaking, many of the phenomena to be masured arc neither objects nor events, Rather, the phenomena to be measured are typically too abstract to be adequately characterized as either objects or events.10 Thus, for example, phenomena such as political efficacy, alienation, gross national product, and cognitive dissonance are too abstract to be considered “things that can be seen or touched” (the definition of an object) or merely as a “result, consequence, or outcome” (the definition of an event) In other words, Stevens's classical definition of measurement is much more appropriate for the physical than the social sciences. Indeed, it may have inadvertently impeded efforts to focus systematically on measurement in social research, AA definition of measuremeat that is more relevant to thes sciences in that suggested by Halock’s observation that: Sociological theorists often use concepts that are formulated at rather high levels of abstraction. These are quite different from the variables that are the stock-in-trade of empirical sociologists. ... The probem of bridging the gap between theory and research is then seen as one of measurement error [1968: 6; 12) In other words, measurement is most usefully viewed as the ‘process of linking abstract concepts to empirical indicants” (Zeller and Carmines, forthcoming), as a process involving an “expli organized plan for classifying (and often quantifying) the par- ‘cular sense data at hand—th: indicants—in terms of the general ‘concept in the researcher's mind™ (Riley, 1963: 23). ‘This definition makes it clzar that measurement is a process involving both theoretical as well as empirical considerations, From an empirical standpoint, the focus is on the observable response— whether it takes the form of a mark ona self-administered question- naire, the behavior recorded in an observational study, or the answer given to an interviewer. Theoretically, interest lies in the underlying unobservable (and directly unmeasurable) concept that is represented by the response, Thus, using the above examples, ‘the “mark” may represent one’s level of sell-esteem, the “behavior” ‘may indicate one's level of petsonal integration during @ conflict situation, and the “answer” may signify one’s attitude toward Presi- dent Carter, Measurement focuses on the crucial relationship be- ‘ween the empirically grounded indicator(s)—that is, the observable response—and the underlying unobservable concepts). When this uw relationship is @ strong onc, ana'ysis of empirical indicators can lead to useful inferences about the relationships among the under- lying concepts. In this manner, social scientists can evaluate the empirical applicability of theoretical propositions. On the other hand, if the theoretical concepts have no empirical referents, then the empirical tenability of the theory must remain unknown, But ‘what of those situations in which the relationshin hetween eancept and indicators is weak or faulty? In such instances, analysis of the indicators can lead possibly to incorrect inferences and misleading conclusions concerning the underlying concepts, Most assuredly, research based on such inadequate measurement models does not result in a greater understanding of the particular social science phenomenon under investigation. Viewed from this perspective, the auxiliary theory specifying the re'ationship between concepts and indicators is equally important to social research as the substantive theory linking coneepts to one another. Reliability and Validity Defined Given the above definition of measurement, the question natu- rally arises as to how social scientists ean determine the extent to which a particular empirical indicator (or a set of empirical indi- cators) represents a given theoreteal concept. How, for example, can one evaluate the degree to which the four items used to measure political efficacy in The American Voter (Camphell ct al, 1960) accurately represent that concept? Stated somewhat differently, what are the desirable qualities of any measuring procedure or instrament? ‘At the most general level, there are two basic properties of em- pirical measurements. First, one can examine the reliability of an indicator. Fundamentally, reliabiliy concerns the extent to which fan experiment, test, or any measiring procedure yields the same results on repeated trials. The measurement of any phenomenon always contains a certain amount of chance error. ‘The goal of error- free meastirement—while laudable—is never attained in any area of scientific investigation.* Instead, as Stanley has observed, “The amount of chance error may be large or small, but itis universally present to some extent. Two sets of measurements of the same12 features of the same individuals will never exactly duplicate each other" (1971: 356). Some particular sourees of change error will be discussed later in this chapter. For the moment itis simply necessary to realize that because repeated measurements never exactly equal fone another, witeliability is always present to at least a limited extent. But while repeated measurements of the same phenomenon never precisely duplicate each other, thay do tend to he consistent from measurement to measurement. The person with the highest blood pressure on a first reading, for example, will tend to be among, those with the highest reading on a second examination given the next day. And the same will be true among the entire group of patients whose blood pressure is being recorded: Theit readings will not be exactly the same from one measurement to another but they will tend to de consistent. This tendency toward consistency found in repeated measurements of the same phenomenon is referced to as reliability. The more consistent the results given By repeated ‘measurements, the higher the reliability of the measuring procedure; conversely the less consistent the results, the lower the reliability. But an indicator must be more than reliable if it is to provide an accurate representation of some abstract concept. It must also be valid, In @ very general sense, any measuring device is valid ifit does what it is intended, to do. An indicator of some abstract eoncept js valid to the extent that it measures what it purports to measure. For example, the California F Seale (Adorno et al., 1950) is con- sidered a valid measure of adherence to authoritarian beliefs to the degree that it does measure this theoretical concept rather than reflecting some other phenomenon. Thus, while reliability focuses on a particular property of empirical indicators—the extent to which they provide consistent results across repeated measure ments—validity concerns the crucial relationship between concept ‘and indicator. This is another way of saying that there are almost always theorctical claims being made when one assesses the validity Of sovial seivnice measures. Indzed, strictly speaking, one does not assess the validity of an indicator but rather the use to whieh itis being put. For example, an intelligence test may be valid for assessing the native intellectual potential of students, but it would not neces- sarily be valid for other purposes, such as forecasting their level of income during adulthood (Nunrally, 1978). 13 Just as reliability isa matter of degree, also is validity. Thus, the objective of attaining a perfectly valid indicator—onte that repre- sents the intended, and only the intended, concept—is unachievable, Instead, validity is a matter of degree, not an all-or-none property. Moreover, just because an indicator is quite reliable, this does not ‘mean that it is also relatively valid. For example, let us assume that 1 particular yardstick does not equal 36 inches; instead, the yardstick is 40 inches iong. Thus, every time this yardstick is used to determine the height of a person (or object), it systematically underestimates height by 4 inches for every 36 inches. A person who is six feet tall according to this yardstick, for example, is actually six feet cight inches in height. ‘This particular yardstick, in short, provides an invalid indication of height. Not, however, thet this error of 4 inches per yard will not affect the reliability of the yardstick since it does not lead to inconsistent results on repeated measurements. On the contrary, the results will be quite consistent although they wil obviously be incorrect. In short, this particular yardstick will provide a quite reliable but totally invalid indication of height. Random and Nonrandom Measurement Error ‘There are two basic kinds of errors that affect empirical measure- ments: random error and nonrandom error. Random error is the term used to designate all of these chance factors that confound the measurement of any phenomenon. The amount of random error is inversely related to the degree of reliability of the measuri instrument. To take a practical example, if a scale gives grossly in- accurate indications of the weight of objects—sometimes greatly overweighing them and other times underweighing them—then the particular scale is quite unreliable. Similac, ifthe shots fired from ‘a well-anchored rifle arc scattered widely about the target, then the rifle is unseliable. But if the shets are concentrated around the target, then the rifle is reliable, Thus, a highly retiable indicator of a theoretical concept is one that ‘eads to consistent results on re- peated measurements because it does not fluctuate greatly due to random error. While a formal discussion of random error and its affect on reliability estimation will be presented later in this volume, it is4 important for present purpos:s to make two observations about random error. Fist, indicators always contain random error to a greater or lesser degree. That is, the very process of measurement Introduces random error to at ‘east a limited extent. The distinction among indicators, therefore, is not whether they contain random error, but rather the extent to which they contain random error, “The second point that needs to be emphasized ie that, as suggested above, the effects of random error arc totally unsystematic in cher= acter. Referring to the earlier example of the rifle, random error ‘would be indicated if the shots were as likely to hit above the target, as below it or as likely to hit to the right of the target as to its left. Similarly, a scale that is affected by random error will sometimes overweigh a particular object aid on other aceasions under weigh i The specific sources of random measurement error that arise in the social sciences are too numerous to fully enumerate.} In survey research, the kinds of ecrors that may be assumed to be random, include errors due to coding, ambiguous instructions, differential emphasis on different words curing an interview, interviewer fae tigue, and the like. But random error is not limited to survey research, It also arises in data collected from participant observations, con- tent analysis, as well as simulations and experiments. Random ‘measurement error is endemic to social research, as itis to all areas of scientific investigation incliding the physical and biological sciences, ‘The second type of error that affects empirical measurements is nonrandom error. Unlike random error, nonrandom error has a systematic biasing effect or, measuring instruments, Thus, a scale ‘tat always registers the weight of an object two pounds below its ‘actual weight is affected by nonrandom measurement error. Similar- ly, ifa thermometer always regis:ers 10 degrees higher than it should, then it is evidencing nonrandom measurement error. A third ex. ample of nonrandom measurement error can be given by slightly altcring ous earlier illustration Zocusing on the shots fired from a Wwell-anchored rifle. If those shots aimed at the bull’s eye hit approx- imately the same location but not the bull's eye, then some form of nonrandom error has affected the targeting of the rifle Nonrandom error lies at the very heart of validity. As Althauser land Heberlein observe, “matters of validity arise when other fac- 18 tors—more than one underlying construct or methods factors or other unmeasured variables—are scen to affect the measures in addition to one underlying concept and random error” (1970: 152; see also Werts and Linn, 1970). That is, invalidity arises because of the presence of nonrandom error, for such error prevents indicators from representing what they are intended to: the theoretical concept Instead, the indicators represent something other than the intended theoretical concept—perhaps a different concept entirely. Thus, if'a researcher uses a particular scale to represent ideological prefer- ‘ence but later discovers that the scale actually taps party identifi- cation, then the scale is obviously an invalid indicator of ideology. ‘Just as reliability is inversely related to the amount of random error, so validity depends on the extent of nonrandom error present in the measurement process. For example, high scorers on the California F Seale (Adorno et a., 1950) have been shown to be persons who not only adhere to authoritarian beliefs but also “yea sayers" who agree with just about any assertion, In other words, the California F Scale scems to measure two different phenomena: adherence to authoritarian beliefs and the personality trait of acqui- scence The California F Scale, in short, is not a totally valid measure of adherence to authoritarian beliefs, However, it would be a far less valid measure of this concept if Inter research concluded that the scale only measured acqaiescence, This is another way of saying that validity, like reliability, is a matter of degre, and that it critically depends on the extent of nonrandom error in the measure- ‘ment procedure (just as reliability depends on theamount of random error). Conclusion Reliability and especially validity are words that have a definite positive connotation, For anything to be characterized as reliable fand valid is to be deseribed in positive terms. So itis with any type of test, experiment, or measuring procedure, If it is reliable and valid, then it has gone a long way toward gaining scientilic accep- ‘tance, Reliability concerns the degree to which results are consistent across repeated measurements, An intelligence testis quite reliable, for example, if an individual obtains approximately the same score16 fon repeated examinations. Aay measuring instrument is relatively reliable if it is ainimally affected by chance disturbances (i.e, ran- dom measurement error). Bu: empirical measures that are reliable have only come half way toward achieving scientific acceptance. ‘They must also be valid for the purpose for which they are being used. Reliability is basically an empirical issue, focusing on the performance of empirical measures. Validity, in contrast, is usually more of a theoretically oriented issue because it inevitably raises the question, “valid for what purpose?” ‘Thus, a driver's test may be auite valid as an indicator of tow well someone ean drive an auto- mobile but it is probably quite invalid for many other purposes, such as one's potential for doing well in college, Validity, then, is evidenced by the degree that a particular indicator measures what it is supposed to measure rather than reflecting some other phe- nomenon (i¢., nonrandom measurement erro). In the beginning of this chapter we noted that, following Stevens, measurement is usually defined as the assignment of numbers to objects or events according to rules. But as we have seen, for any measuring procedure to be scientifically useful, it must lead to results that are relatively reliable and valid, In other words, viewed froma scientitie perspective, itis erucial thatthe process of assigning numbers f0 objects or event leads to results that are generally con- sistent and fulfills its explicit purpose. The same point holds for Blalock’s more social science ariented definition of measurement, Thus, for an indicator to be useful in social science research, it must lead to quite consistent results on repeated measurements and reflect its intended theoretical oncept, This chapter has outlined seme basic considerations in measure- ment, especially in regard to the social sciences, The remaining chapters in this monograph will expand upon this discussion Chapter 2 will consider the various types of validity thar are relevant in the social sciences. Chapter 3 will outline the logical, empirical, and statistical foundations of the theoty of (saudous) sutasurement error, and Chapter 4 will discuss a variety of procedures for assess- ing the reliability of empirical measurements. Finally, the appendix will discuss and illustrate the role of factor analysis in assessing the reliability and validity of multitem measures 2. VALIDITY In Chapter 1 we defined valicity as the extent t0 which any measuring instrument measures what it is intended to measure. However, as we pointed out in Chapter I, strictly speaking, *One validates, not a test, but an interpretation of data arising Jrom a specified procedure” (Cronbach, 1971: 447) The distinction is cen- tral to validation because itis quite possible for a measuring instcu- ‘ment to be relatively valid for measuring one kind of phenomenon but entirely invalid for assessing other phenomena. Thus, one validates not the measuring instrument itself but the measuring instrument in relation to the purpose for which it is being used. While the definition of validity seems simple and straightforward, there are several different types of validity that are relevant in the social sciences. Each of these types of validity takes a somewhat ferent approach in assessing the extent to which a measure ‘measures what it purports to. The primary purpose of this chapter is to discuss the three most basic types of validity, pointing out their different meanings, uses, and limitations. Criterion-Related Validity Criterion-related validity (sometimes referred to as predictive validity) has the closest relationship to what is meant by the everyday usage ofthe term. That is this type of validity has an intuitive mea ing not shared by other types of validity. Nunnally has given a useful dcfinition of criterion-rclated validity. Criterion-related validity, hhe notes, “is at issue when the purpose is to use an instrument to ‘estimate some important form of behavior that is extemal to the measuring instrument itself, the latter being referred to as the terion” (1978: 87). Fot example, one “validates” a written driver's test by showing that it uccurately predicts Low well some group of persons can operat: an automobile. Similarly, one assesses the validity of college board exams by showing that they accurately predict how well high school seniors will do in college instruction, ‘The operational indicator of the degree of correspondence be tween the test and the eriterion is usually estimated by the size of their correlation. Thus, in practice, for some well-defined group of "718 subjects, one correlates performance on the test with performance om the criterion variable (this correlation, for obvious reasons, is sometimes referred to as a validity coefficient). Obviously the test will not be useful unless it corcelates significantly with the criterions and similarly, the higher the correlation, the more valid is this test for this particular criterion.s We have said that che dearee of criterion-related validity depends fon the extent of the correspoadence between the test and the eri- {erion, It is important to realize that this is the only kind of evidence that is relevant to criterion-related validity. Thus, to take a rather unlikely example, “if it were found that accuracy in horseshoe pitching correlated highly withsuecess in college, horseshoe pitching Would be a valid measure for prediting success in college” (Nun- nally, 1978: 88). The obtainec correlation tells the entire story as regards criterion-related validity. Thus, criterion-related validit lends itself to being used in an atheoretical, empircally dominated manner. Nevertheless, theory usually enters the provess indirectly because there must be some basis on which to select the criterion variables. Notice, further, that there is no single eriterion-related validity coefficient. Instead, there are as many coefficients as there are criteria for a particular measure Technically, one can differentiate between two types of eriterion- related validity. Ifthe criterion exists in the present, then concurrent validity is assessed by correlating a measure and the eriterion at the same point in time, For example, a verbal report of voting behavior ‘could be correlated with participation in an election, ax revealed by official voting records. Predictive validity, on the other hand, concerns a future criterion waich is correlated with the relevant ‘measure, Tests used for selection purposes in different occupations are, by nature, concerned with predictive validity. Thus, a test used to sereen applicants for police work could be validated by correlating their test scores with future performance in fulfilling the duties and responsibilitics associated with police work. Nutive that the logic and procedures are the same for both concurrent and predictive validity; the only difference between them concerns the current of future existence of the criterion variable. Lis important to recognize that the scientific ané practical uility of criterion validation depends as much on the measurement of the 9 criterion as it does on the quality of the measuring instrument itself, This is sometimes overlooked in setting up and assessing validation procedures. Thus, in many different types of training programs, much effort and expense goes isto the development of a test for predicting who will benefit from the program in terms of subsequent job performance. Take, for example, a managerial training program which a soreening test is used to calect those few individuals who will be given supervisory responsibilities upon completion of the program. How is their subsequent performance—the criterion— measured? Often very little attention is given to the measurement of the criterion, Moreover, it is usually the ease that subsequent performance is difficult to measure under the best of circumstances because, as Cronbach observes, “success on the job depends on nonverbal qualities that are hard to assess” (1971: 487). In short, those employing criterion validation procedures should provide independent evidence of the extent to which the measurement of the criterion is valid.® Indeed, Cronbach has suggested that “all vali- ation reports carry the warning clause, ‘Insofar as the criterion is ‘truly representative of the outcome we wish to maximize’ (1971: 488), ‘As we have seen, the logic underiying critetion validity is quite simple and straightforward. It has been used mainly in psychology and education for analyzing the validity of certain types of tests and selection procedures. It should be used in any situation or area of scientific inquiry in which it mates sense to correlate scores ob- tained on a given test with perfermance on a particular criterion or set of relevant criteria. At the same time, itis important to recognize that criterion vali- dation procedures cannot be applied to all measurement situations in the social seiences. The most important limitation is that, for many if not most measures in the social sciences, there simply do not exist any relevant criterion variables. For example, what would bbe an appropriate criterion for a measure of a personality tait such as self-esteem? We know of no specific type of behavior that people with high or low self-esteem exhibit such that it could be used to validate a measure of this personality trait. Generalizing from this situation, itis not difficult to sce that criterion validation procedures hhave rather limited usefulness in the social sciences for the simple2 reason that, in many situations, there are no criteria against which the measure can be reasonably evaluated, Moreover, it is clear that the more abstract the concept, the less likely one is to discover an appropriate criterion for assessing a measure of it. In sum, however desirable it may be to evaluate the criterion-related validity of social science measures, it is simply inapplicable to many of che abstract concepts used in the social scisnces, Content Validity A second basic type of validity is content validity. This type of validity has played a major role in the development and assessment of various types of tests used in psychology and especially education but has not been employed widely by political scientists or so ologists. Fundamentally, content validity depends on the extent to which an empirical measurement reflects a specific domain of content. For example, a test in arithmetical operations would not be content valid if the test problems focused only on addition, thus neglecting subtraction, multiplication, and division. By the same token, a content-valid measure of Seeman’s (1959) concept of alien ation should include attitudinal items representing powerlessness, ormlessness, meaninglessness. social isolation, and self estrangc- ment, The above examples incicate that obtaining a content-valid measure of any pheaomenon involves a number of interrelated steps. First, the researcher must be able to specify the full domain of con- tent that is relevant to the particular measurement situation, In constructing a spelling test for fourth graders, for cxample, one stiust specify all of the words that a fourth grader should know how to spell. Second, one must sample specific words from this collec- tion since it would be impractical to include all of these words in a ingle test. While it would be possible to select the sample of words for the test by simple random procedures, it might be important under certain circumstances to “oversample” particular types of words (e., nouns), Thus, the person constructing the test must be careful to specify the particularsampling proceduresto be employed. Finally, once the words have teen sclected, they must be put in 2 form that is testable. For example, one might use a multiple-choice a procedure whereby the eorrect spelling ofthe word would be included ‘with several incorrect spellings with the students’ having to choose the former. What should emerge from this process is a spelling test ‘that adequately reflects the domain of content that isto be measured by the test.” To take a different example, how would one go bout establishing a content-valid measure of an attitude such as alienation? Presum- ably, one would begin by thoroughly exploring the available litera- ture on alienation, hoping thereby to come to an understanding of the phenomenon, A thorough search and examination of the litera- ture may suggest, for example, that alienation is properly conceived of in terms of the five dimensions proposed by Seeman: powerless- ness, normlessness, meaninglesmess, social isolation, and self estrangement. In addition, it may be useful to further subdivide these dimensions. One may want to subdivide powerlessness, for ‘example, into its political, social, and economic aspects. Tt is then necessary to construct items that reflect the meaning associated with each dimension and each subdimension of aliena- tion, It is impossible to specify exactly how many items need to be developed for any particular dorrain of content, But one point can be stated with confidence: It is always preferable to construct too many items rather than too few. inadequate items can always be climinated, but one is rarely in a position to add “good” items at a later stage in the research when the original pool of such items is inadequate. From the above discussion, it should be clear that establishing a content-valid measure of an attitude such as alienation is far more difficult than establishing a contert-valid achievement or proficiency test in some arca (auch as the spelling test above). There are two subtle but important differences between the two situations. First, however easy it may be to specify the domain of content relevant to a spelling test, the process is considerably more complex when dealing with the abstract concepts typically found in the so sciences. Indeed, itis difficult to think of any abstract theoretical concept—including alienation—-for which there is an agreed upon domain of content relevant to the phenomenon, Theoretical con- cepts in the social sciences have simply not been described with2 the required exactness. The second, related problem is thal, in measuring most concepts in the social sciences, itis impossible to sample content. Rather, one formulates a set of items that is in- tended to reflect the content of a given theoretical concept, Without a random sampling of contert, however, itis impossible to insure the representativeness of the particular items. ‘These differences reveal quite clearly the rather fundamental limitations of content validity, In content validity, as Cronbach and Mech! observe, the “acceptance of the universe of content as defining the variable to be measured is essential” (1955: 282). As we have illustrated, however easy this may be to achieve with regard to reading or arithmetic tests, it hes proved to be exceeding difficult ‘with respect to measures of the more abstract phenomena that tend to characterize the social sciences. Second, there is no agreed upon criterion for determining the extent to which a measure has attained content validity. In the abseice of well-defined, objective criteria, ‘Nunnally has noted that “inevitably content validity rests mainly ‘on appeals to reason regarding the adequacy with which important content has been sampled ard on the adequacy with which the content has been east in the form of test items” (1978; 93), Indeed, Bohrnstedt has argued that “while we enthusiastically endorse the procedutes, we reject the coneept of content validity on the grounds, that there is no rigorous way to assess it" (forthcoming). In sum, while one should attempt to insure the content validity of any em- pirical measurement, these twin problems have prevented content validation from becoming fully sufficient for assessing the validity of social science measures. Construct Validity We have suggested that both criterion validity and content va- lidity have limited usefulness for assessing the validity of empirical measures of theoretical concepts employed in the social sciences, It is partly for this reason that primary attention has been focused on construct validity. As Cronbach and Meehl observe, “Construct validity must be investigated whenever no criterion or universe of content is accepted as entirely adequate to define the quality to be B measured” (1955: 282). Construct validity is woven into the theo- retical fabric of the social sciences, énd is thus central to the measure- ‘ment of abstract theoretical concepts. Indeed, as we will see, con= struct validation must be conceived of within a theoretical context, Fundamentally, construct validity is concerned with the extent to which a particular measure relates to other measures consistent with theoretically derived hypotheses eoncerning the concepts (or con structs) that are being measured. ‘While the logic of construct validation may at first seem compli- ‘ated, itis actually quite simple andstraightforward, asthe following example illustrates. Suppose a researcher wanted to evaluate the construct validity of a particula: measure of self-esteem—say, Rosenberg’s self-esteem scale. Theoretically, Rosenberg (1965) has argued that a student's level of self-esteem is positively related to participation in school activities. Thus, the theoretical prediction is that the higher the level of self-esteem, the more active the student ‘will be in school-related activities. One then administers Rosenbergs self-esteem scale to a group of students aad also determines the extent of their involvement in school activities, These two measures are then correlated, thus obtaining a numerical estimate of the relationship. If the correlation is positive and substantial, then ‘one piece of evidence has been acduced to support the construct validity of Rosenberg’s self-esteem scale.* Construct validation involves three distinct steps. First, the theoretical relationship between the concepts themselves must be specitied. Second, the empirical relationship between the measures of the concepts must be examined Finally, the empirical evidence ‘must be interpreted in terms of how it clarifies the construct validity of the particular measure, Ik should be clear that the process of construct validation is, by necessity, theory-laden. Indeed, strictly speaking, itis impossible to “validate” a measure of a concept in this sense unless there exists a theoretical network dal suizouuls the concept, For without this network, it is impossible to genera‘e theoretical predictions which, in turn, lead directly to empirical tests involving measures of the concept, This should not lead to the erroneous conclusion that only formal, fully developed theories arerelevant to construct validation,4 (On the contrary, as Cronbact. and Mech! observe: ‘The logic of construct velidation is involved whether the construct is highly systematized or loose, used in ramified theory or a few simple propositions, used in absolute propo- sitions or probability statements [1955: 284] What is required is that one be able to state several theoretically derived hypotheses involving the particular concept. The more elaborate the theoretical framework, of course, the ‘more rigorous and demanding the evaluation of the construct va~ lidity of the empirical measure, Notice that in the self-steem example diseussed above, we concluded that the positive association between Rosenberg's self-esteem scale and participation in school activities provided one piece of evidence supporting the construct validity of this measure. Greater confidence in the construct validity of this measure of self-esteem would be justified if subsequent analyses revealed numerous successful predictions involving diverse, theo- retically related variables. Thus, construct validity isnot established by confirming a single prediction on different occasions or con- firming many predictions in a single study. Instead, construct validation ideally requires a pattern af consistent findings involving different researchers using different theoretical structures across a number of different studies." But what is a researcher tc conclude if the evidence relevant to construct validity is negative! That is, if the theoretically derived predictions and the empirical relationships are inconsistent with teach other, what is the appropriate inference? Four different inter- pretations ‘are possible (Cronbach and Mechl, 1953). The most typical interpretation of such negative evidence is that the measure lacks construct validity. With this interpretation, itis concluded that the indicator does not measure what it purports to measure. This does not mean, of course, that the indicator does not measure some other theoretical eonstruet, but only that it does not measure the construct of interest. In other words, as negative evidence accum- ulates, the inference is usually drawn that the measure lacks con- struct validity as a measnre of a particular theoretical concept. Consequently, it should not b2 used as an empirical manifestation 25 of that concept in future research. Moreover, previous research employing thar measure of the concept is also called into serious question. Unfortunately, however, this is not the only eonchusion that is consistent with negative evidence based on construct validation, Negative evidence may also support one or more of the following inferences. First, the theoretical framework used to generate the empiticel predictions is incorrect, To continue with the earlier example, it may be the case that, from a theoretical perspective, self-esteem should not be positively related to participation in school activities. There- fore, a nonpositive relationship 2etween these variables would not undermine the construct validity of Rosenberg’s self-esteem scale but rather east doubt on the unceriying theoretical perspective Second, the method or procedure used to test the theoretically, derived hypotheses is faulty or irappropriate. Perhaps itis the ease that, theoretically, sel-esteem should be positively associated with participation in school activities and that the researcher has used a reliable and valid measure of selfsteem. However, even under these circumstances, the hypothesis will still not be confirmed unless it is tested properly. Thus, to take a simple example, the negative evidence could be due to the use of an inappropriate statistical tech- nique or using the proper technique incorrectly, Third, the final interpretation that can be made with respect to negative evidence is that itis dus to the lack of eonstruet validity ‘or the unreliability of some other variable(s) in the analysis, in a very real sense, whenever one asvesses the construct validity of the ‘measure of interes, one is also evaluating simultaneously the con struct validity of measures of the other theoretical concepts. In the selF-estoem example, it could be the case that Rosenberg’ self-esteem scale has perfect construct validity but that the measure of “pastici- pation in school activities” is quite invalid or unreliable. Unfortunately, there is no foolproof procedure for determining which one (or more) of these interpretations of negative evidence is correct in any given instance. It is the total configuration of em- pirical evidence that lends eredence to one interpretation rather than another, The fist interpretation, that the measure lacks con- struct validity, becomes increasingly compelling as grounds for26 accepting the other interpretstions become untenable. Most im- portant, to the degree possible, one should assess the construct, validity of @ particular measure in situations in which the other variables are well-measured (i¢., have relatively high validity and reliability). Only ia these situations can one confidently conclude that negative evidence is probably due to the absence of construct, validity of a particular measure of a given theoretical concept. Theoretically relevant and well-measured external variables are thus crucial {0 the assessment of the construct validity of empirical measurements (Curtis and Jeckson, 1962; Sullivan, 1971, 1974; Taleh, 1974). The logic of construct validation usually implies that the relationship among multisle indicators designed to represent f given theoretical concept and theoretically relevant external variables should be similar in terms of direction, strength, and consistency. For example, two indieators, both of which are designed to measure social status, should have similar correlations with political interes if the latter isa theoretically appropriate external variable for the former. Conversely, if the two empirical indieators of social status relate differentially to external variables, this im that the indieators are not representing the same theoretical concept. Instead, this pattern of empirical relationships would suggest that the two indicators represent different aspects of social status or different concepts entirely for they do not behave in accordance with, theoretical expectations. It is tus easy to see that construct valida- tion is enhanced if one has ovtained multiple indicators of all of the relevant variables"? Conclusion In this chapter we have discussed the threc basic types of validity’ content validity, criterion-related validity, and construct validity. Both content validity and criterion-related validity have limited usefulness in assessing the quality of social science measures. Content validity, we argued, is not so much a specific type of validity as itis fa goal to be achieved in order to obtain valid measurements of any type—namely, that the empirical measure covers the domain of content of the theoretical concept. Content validity, however, pro- vides no method or procedute :o determine the extent to which this 2 goal is achieved in practice. Thus, in the final analysis, it is not possible to determine the specific extent to which an empirical measure should be cousidered ecntent valid. On the contrary, con- tent validity, by necessity, is an imprecise standard against which to evaluate the validity of empirical measurements. Criterion-related validity is similarly limited regarding general- ized applicability in the social sciences. This is nor to argue that there are not certain practical circumstances under which it makes 1 good deal of sense to validate a measure by comparing perfor- mance on that measure with performance on a pacticular criterion variable. Thus, it is a reasonable strategy to compare airplane pilots’ performance on a written examination with their ability to fly an airplane in order to valicate the written exam. Yet, as we have pointed out, the vast majority of social science measures are ‘not of this character, Instead, because they usually represent ab- stract theoretical concepts, there are no knlown criterion variables against which they ean be compared. In contrast to both content validity and criterion-related validity, construct validation has generalized applicability in the social sciences. The social scientist can assess the construct validity of an empiical measurement if the measure can be placed in theoretical context. Thus, construct validaticn focuses on the extent to which a measure performs in accordance with theoretical expectations, Specifically, if the performance of the measure is consistent with theoretically derived expectatiors, then it is concluded that the measure is construct valid. On the other hand, ifit behaves incon- sistently with theoretical expectations, then it is usually inferred that the empirical measure does nat represent its imtended theoretical coneept, Instead, it is concluded that the measure lacks construct validity for that particular concept. This chapter has focused on the different types of validity, point- ing out their different meanings, uses, and limitations. The next chapter will present a theoretical framework that can be used to assess the reliability of empirical measurements
You might also like
CH 3 Measurement and Data
PDF
No ratings yet
CH 3 Measurement and Data
42 pages
Medición en Las Ciencias Sociales
PDF
No ratings yet
Medición en Las Ciencias Sociales
19 pages
Paul Lazarsfeld - Evidence and Inference in Social Research
PDF
No ratings yet
Paul Lazarsfeld - Evidence and Inference in Social Research
33 pages
Criteria of Measurement Quality
PDF
No ratings yet
Criteria of Measurement Quality
20 pages
Andrew's Midterm Notes
PDF
No ratings yet
Andrew's Midterm Notes
26 pages
Conceptualization, Operationalization, and Measurement
PDF
No ratings yet
Conceptualization, Operationalization, and Measurement
21 pages
W Lawrence Neuman Social Research Methods - Qualitative and Quantitative Approaches Pearson Education Limited 2013 216 2
PDF
No ratings yet
W Lawrence Neuman Social Research Methods - Qualitative and Quantitative Approaches Pearson Education Limited 2013 216 2
8 pages
Chapter 3 BSRM
PDF
No ratings yet
Chapter 3 BSRM
4 pages
Validity and Reliability
PDF
No ratings yet
Validity and Reliability
6 pages
Measurement
PDF
No ratings yet
Measurement
34 pages
(1996) The Rasch Model As A Foundation For The Lexile Framework-Dikonversi
PDF
No ratings yet
(1996) The Rasch Model As A Foundation For The Lexile Framework-Dikonversi
23 pages
InesSinthyaBrPandia - Reliability and Validity (Repitition, Review)
PDF
No ratings yet
InesSinthyaBrPandia - Reliability and Validity (Repitition, Review)
3 pages
Babbie Chapter5
PDF
No ratings yet
Babbie Chapter5
24 pages
The Investment Policy Statement
PDF
No ratings yet
The Investment Policy Statement
50 pages
Unit V BRM
PDF
No ratings yet
Unit V BRM
20 pages
An Application-Based Discussion of Construct Validity and Internal Consistency Reliability
PDF
No ratings yet
An Application-Based Discussion of Construct Validity and Internal Consistency Reliability
25 pages
Basics of Social Research Canadian 3rd Edition Neuman Test Bank 1
PDF
100% (62)
Basics of Social Research Canadian 3rd Edition Neuman Test Bank 1
23 pages
What Are Examples of Variables in Research
PDF
No ratings yet
What Are Examples of Variables in Research
13 pages
Validity and Reliability
PDF
100% (1)
Validity and Reliability
6 pages
4310 Exam 2
PDF
No ratings yet
4310 Exam 2
11 pages
4.good Reserch and Level of Measurment
PDF
No ratings yet
4.good Reserch and Level of Measurment
27 pages
Neuman. (2014) - Qualitative and Quantitative Measurement
PDF
No ratings yet
Neuman. (2014) - Qualitative and Quantitative Measurement
35 pages
Lect 4 Oct 25 Measurement Notes1
PDF
No ratings yet
Lect 4 Oct 25 Measurement Notes1
20 pages
Measurement in Research
PDF
No ratings yet
Measurement in Research
40 pages
Introduction to research methodology notes
PDF
No ratings yet
Introduction to research methodology notes
21 pages
Quantitative Research Design and Methods
PDF
No ratings yet
Quantitative Research Design and Methods
35 pages
Babbie Kapitel 6
PDF
No ratings yet
Babbie Kapitel 6
34 pages
Cooper&Schindler Chap7
PDF
100% (1)
Cooper&Schindler Chap7
23 pages
Qualitative and Quantitative Measurement - Part 3 1 17062024 025643am
PDF
No ratings yet
Qualitative and Quantitative Measurement - Part 3 1 17062024 025643am
29 pages
W2 1+Conceptualization+and+Measurement
PDF
No ratings yet
W2 1+Conceptualization+and+Measurement
43 pages
Sage Encyclopedia of Social Research Methods. Conceptualization, Operationalization and Measurement (2004)
PDF
No ratings yet
Sage Encyclopedia of Social Research Methods. Conceptualization, Operationalization and Measurement (2004)
10 pages
Methods of Research-Lession 4
PDF
No ratings yet
Methods of Research-Lession 4
57 pages
Quantitative Research Design and Methods
PDF
No ratings yet
Quantitative Research Design and Methods
35 pages
Chapter 5 - Concepts Operationalization and Measurement
PDF
No ratings yet
Chapter 5 - Concepts Operationalization and Measurement
31 pages
Unit III-1
PDF
No ratings yet
Unit III-1
28 pages
CH 06
PDF
No ratings yet
CH 06
21 pages
Basics of Measurement Theory
PDF
No ratings yet
Basics of Measurement Theory
39 pages
Research Notes LPU
PDF
No ratings yet
Research Notes LPU
132 pages
Retrieve
PDF
No ratings yet
Retrieve
20 pages
4 Measurementv2
PDF
No ratings yet
4 Measurementv2
37 pages
Concepts: September 7 September 12 September 14 September 19 September 26 September 28 October 10 October 31 November 23
PDF
No ratings yet
Concepts: September 7 September 12 September 14 September 19 September 26 September 28 October 10 October 31 November 23
28 pages
Chapter 5 Research Methods (1)
PDF
No ratings yet
Chapter 5 Research Methods (1)
181 pages
The Measurement of Behaviour: Psych 3F40 Psychological Research Mike Maniaci 9 / 2 5 / 2 0 1 3
PDF
No ratings yet
The Measurement of Behaviour: Psych 3F40 Psychological Research Mike Maniaci 9 / 2 5 / 2 0 1 3
33 pages
Chapter 5 PDF
PDF
No ratings yet
Chapter 5 PDF
181 pages
Review Sheet For Experimental Exam1
PDF
No ratings yet
Review Sheet For Experimental Exam1
4 pages
Validity and Reliability
PDF
No ratings yet
Validity and Reliability
6 pages
RMM Lecture 17 Criteria For Good Measurement 2006
PDF
No ratings yet
RMM Lecture 17 Criteria For Good Measurement 2006
31 pages
8 - Study Guide - Midterm
PDF
No ratings yet
8 - Study Guide - Midterm
57 pages
Validity and Reliability
PDF
No ratings yet
Validity and Reliability
5 pages
Quantitative Research Design and Methods
PDF
No ratings yet
Quantitative Research Design and Methods
35 pages
Quantitative Research Design and Methods
PDF
No ratings yet
Quantitative Research Design and Methods
35 pages
Chapter 5
PDF
No ratings yet
Chapter 5
16 pages
Encyclopedia Of Social Measurement Volume 1 1st Edition Kimberly Kempf Leonard instant download
PDF
100% (1)
Encyclopedia Of Social Measurement Volume 1 1st Edition Kimberly Kempf Leonard instant download
83 pages
Introduction to Measurement Theory, 1st Edition Digital PDF Download
PDF
100% (9)
Introduction to Measurement Theory, 1st Edition Digital PDF Download
14 pages
Miller 2011 Making Measurements
PDF
No ratings yet
Miller 2011 Making Measurements
11 pages
Session 6
PDF
No ratings yet
Session 6
24 pages
Introduction to Measurement Theory First Edition Mary J. Allen download
PDF
100% (1)
Introduction to Measurement Theory First Edition Mary J. Allen download
76 pages
NOTE 5 - Validity and Data Gathering Technique
PDF
No ratings yet
NOTE 5 - Validity and Data Gathering Technique
6 pages