Measurement Problems in Criminal Justice Research: Workshop Summary (2003)

National Academies Press: OpenBook

Chapter: 3. Comparison of Self-Report and Official Data for Measuring Crime

Visit NAP.edu/10766 to get more information about this book, to buy it in print, or to download it as a free PDF.

Suggested Citation:"3. Comparison of Self-Report and Official Data for Measuring Crime." National Research Council. 2003. Measurement Problems in Criminal Justice Research: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/10581.

3
Comparison of Self-Report and Official Data for Measuring Crime

Terence P. Thornberry and Marvin D. Krohn

There are three basic ways to measure criminal behavior on a large scale. The oldest method is to rely on official data collected by criminal justice agencies, such as data on arrests or convictions. The other two rely on social surveys. In one case, individuals are asked if they have been victims of crime; in the other, they are asked to self-report their own criminal activity. This paper reviews the history of the third method—self-report surveys—assesses its validity and reliability, and compares results based on this approach to those based on official data. The role of the self-report method in the longitudinal study of criminal careers is also examined.

HISTORICAL OVERVIEW

The development and widespread use of the self-report method of collecting data on delinquent and criminal behavior together were one of the most important innovations in criminology research in the twentieth century. This method of data collection is used extensively both in the United States and abroad (Klein, 1989). Because of its common use, we often lose sight of the important impact that self-report studies have had on the study of the distribution and patterns of crime and delinquency, the etiological

This study was supported by the National Consortium on Violence Research.

Suggested Citation:"3. Comparison of Self-Report and Official Data for Measuring Crime." National Research Council. 2003. Measurement Problems in Criminal Justice Research: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/10581.

study of criminality, and the study of the juvenile justice and criminal justice systems.

Sellin made the simple but critically important observation that “the value of a crime rate for index purposes decreases as the distance from the crime itself in terms of procedure increases” (1931:337). Thus, prison data are less useful than court or police data as a measure of actual delinquent or criminal behavior. Moreover, the reactions of the juvenile and criminal justice systems often rely on information from victims or witnesses of crime. It does not take an expert on crime to recognize that a substantial amount of crime is not reported and, if reported, is not officially recorded. Thus, reliance on official sources introduces a number of layers of potential bias between the actual behavior and the data. Yet, through the first half of the twentieth century, our understanding of the behavior of criminals and those who reacted to crime was based almost entirely on official data.

While researchers were aware of many of these limitations, the dilemma they faced was how to obtain valid information on crime that was closer to the source of the behavior. Observing the behavior taking place would be one method of doing so, but given the illegal nature of the behavior and the potential consequences if caught committing the behavior, participants in crime are reluctant to have their behavior observed. Even when observational studies have been conducted—for example, gang studies (e.g., Thrasher, 1927)—researchers could observe only a very small portion of the crimes that took place. Hence, observational studies had limited utility in describing the distribution and patterns of criminal behavior.

If one could not observe the behavior taking place, self-reports of delinquent and criminal behavior would be the data source nearest to the actual behavior. There was great skepticism, however, about whether respondents would be willing to tell researchers about their participation in illegal behaviors. Early studies (Porterfield, 1943; Wallerstein and Wylie, 1947) found that not only were respondents willing to self-report their delinquency and criminal behavior, they did so in surprising numbers.

Since those very early studies, the self-report methodology has become much more sophisticated in design, making it more reliable and valid and extending its applicability to myriad issues. Much work has been done to improve the reliability and validity of self-reports, including the introduction of specialized techniques intended to enhance the quality of self-report data. These developments have made self-report studies an integral part of the way crime and delinquency are studied.

Although the self-report method began with the contributions of

Suggested Citation:"3. Comparison of Self-Report and Official Data for Measuring Crime." National Research Council. 2003. Measurement Problems in Criminal Justice Research: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/10581.

Porterfield (1943, 1946) and Wallerstein and Wylie (1947), the work of Short and Nye (1957, 1958) “revolutionized ideas about the feasibility of using survey procedures with a hitherto taboo topic” and changed how the discipline thought about delinquent behavior itself (Hindelang et al., 1981:23). Short and Nye’s research is distinguished from previous self-report measures in their attention to methodological issues, such as scale construction, reliability and validity, and sampling and their explicit focus on the substantive relationship between social class and delinquent behavior. A 21-item list of criminal and antisocial behaviors was used to measure delinquency, although in most of their analyses a scale comprised of a subset of only seven items was employed. Focusing on the relationship between delinquent behavior and the socioeconomic status of the adolescents’ parents, Nye et al. (1958) found that relatively few of the differences in delinquent behavior among the different socioeconomic status groups were statistically significant.

Short and Nye’s work stimulated much interest in both use of the self-report methodology and the relationship between some measure of social status (socioeconomic status, ethnicity, race) and delinquent behavior. The failure to find a relationship between social status and delinquency served at once to question extant theories built on the assumption that an inverse relationship did in fact exist and to suggest that the juvenile justice system may be using extra-legal factors in making decisions concerning juveniles who misbehave. A number of studies in the late 1950s and early 1960s used self-reports to examine the relationship between social status and delinquent behavior (Akers, 1964; Clark and Wenninger, 1962; Dentler and Monroe, 1961; Empey and Erickson, 1966; Erickson and Empey, 1963; Gold, 1966; Reiss and Rhodes, 1959; Slocum and Stone, 1963; Vaz, 1966; Voss, 1966). These studies advanced the use of the self-report method by applying it to different, more ethnically diverse populations (Clark and Wenninger, 1962; Gold, 1966; Voss, 1966), attending to issues concerning validity and reliability (Clark and Tifft, 1966; Dentler and Monroe, 1961; Gold, 1966), and constructing measures of delinquency that specifically addressed issues regarding offense seriousness and frequency (Gold, 1966). These studies found that, while most juveniles engaged in some delinquency, relatively few committed serious delinquency repetitively. With few exceptions, these studies supported the general conclusion that, if there were any statistically significant relationship between measures of social status and self-reported delinquent behavior, it was weak and clearly did not mirror the findings of studies using official data sources.

Suggested Citation:"3. Comparison of Self-Report and Official Data for Measuring Crime." National Research Council. 2003. Measurement Problems in Criminal Justice Research: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/10581.

During this period of time researchers began to recognize the true potential of the self-report methodology. By including questions concerning other aspects of an adolescent’s life as well as a delinquency scale on the same questionnaire, researchers could explore a host of etiological issues. Theoretically interesting issues concerning the family (Dentler and Monroe, 1961; Gold, 1970; Nye et al., 1958; Stanfield, 1966; Voss, 1964), peers (Erickson and Empey, 1963; Gold, 1970; Matthews, 1968; Reiss and Rhodes, 1964; Short, 1957; Voss, 1964), and school (Elliott, 1966; Gold, 1970; Kelly, 1974; Polk, 1969; Reiss and Rhodes, 1963) emerged as the central focus of self-report studies. The potential of the self-report methodology in examining etiological theories of delinquency was perhaps best displayed in Hirschi’s (1969) Causes of Delinquency.

The use of self-report studies to examine theoretical issues continued throughout the 1970s. In addition to several partial replications of Hirschi’s arguments (Conger, 1976; Hepburn, 1976; Hindelang, 1973; Jensen and Eve, 1976), other theoretical perspectives such as social learning theory (Akers et al., 1979), self-concept theory (Jensen, 1973; Kaplan, 1972), strain theory (Elliott and Voss, 1974; Johnson, 1979), and deterrence theory (Anderson et al., 1977; Jensen et al., 1978; Silberman, 1976; Waldo and Chiricos, 1972) were evaluated using data from self-report surveys.

Another development during this period was the introduction of national surveys on delinquency and drug use. Williams and Gold (1972) conducted the first nationwide survey, with a probability sample of 847 boys and girls 13 to 16 years old. Monitoring the Future (Johnston et al., 1996) is a national survey on drug use that has been conducted annually since 1975. It began as an in-school survey of a nationally representative sample of high school seniors and was expanded to include eighth- and tenth-grade students.

One of the larger undertakings on a national level is the National Youth Survey (NYS), conducted by Elliott and colleagues (1985). The NYS began in 1976 by surveying a national probability sample of 1,725 youth ages 11 through 17. The survey design was sensitive to a number of methodological deficiencies of prior self-report studies and has been greatly instrumental in improving the self-report method. The NYS is also noteworthy because it is a panel design, having followed the original respondents into their thirties.

Despite the expanding applications of this methodology, questions remained about what self-report instruments measure. The discrepancy in findings regarding the relationship between social status and delinquency

Suggested Citation:"3. Comparison of Self-Report and Official Data for Measuring Crime." National Research Council. 2003. Measurement Problems in Criminal Justice Research: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/10581.

based on self-report data versus official (and victim) data continued to perplex scholars. Early on, self-reports came under heavy criticism on a number of counts, including the selection of respondents and the selection of delinquency items. Nettler (1978:98) stated that “an evaluation of these unofficial ways of counting crime does not fulfill the promise that they would provide a better enumeration of offensive activity.” Gibbons (1979:84) was even more critical in his summary evaluation, stating:

The burst of energy devoted to self-report studies of delinquency has apparently been exhausted. This work constituted a criminological fad that has waned, probably because such studies have not fulfilled their early promise.

Two studies were particularly instrumental at that time in pointing to flaws in self-report measures. Hindelang and colleagues (1979) illustrated the problems encountered when comparing the results from studies using self-reports and those using official data or victimization data by comparing characteristics of offenders across the three data sources. They observed more similarity in those characteristics between victimization and Uniform Crime Reports data than between self-report data and the other two sources. They argued that self-report instruments did not include the more serious crimes for which people are arrested and that are included in victimization surveys. Thus, self-reports tap a different, less serious domain of behaviors than either of the other two sources, and discrepancies in observed relationships when using self-reports should not be surprising. The differential domain of crime tapped by early self-report measures could also explain the discrepancy in findings regarding the association between social status and delinquency.

Elliott and Ageton (1980) also explored the methodological shortcomings of self-reports. They observed that a relatively small number of youth commit a disproportionate number of serious offenses. However, most early self-report instruments failed to include serious offenses in the inventory and truncated the response categories for the frequency of offenses. In addition, many of the samples did not include enough high-rate offenders to clearly distinguish them from other delinquents. By allowing respondents to report the number of delinquent acts they committed rather than specifying an upper limit (e.g., 10 or more) and by focusing on high-rate offenders, Elliott and Ageton found relationships between engaging in serious delinquent behavior and race and social class that are more consistent with results from studies using official data.

Hindelang and colleagues (1979) and Elliott and Ageton (1980) sug

Suggested Citation:"3. Comparison of Self-Report and Official Data for Measuring Crime." National Research Council. 2003. Measurement Problems in Criminal Justice Research: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/10581.

gested designing self-report studies so that they would acquire sufficient data from those high-rate, serious offenders who would be most likely to come to the attention of the authorities. They also suggested a number of changes in the way in which self-report data are measured, so that the data reflect the fact that some offenders contribute disproportionately to the rate of serious and violent delinquent acts.

The development of instruments to better measure serious offenses and the suggestion to acquire data from high-rate offenders coincided with a substantive change in the 1980s in the focus of much criminology work on the etiology of offenders. The identification of a relatively small group of offenders who commit a disproportionate amount of crime and delinquency led for a call to focus research efforts on “chronic” or “career” criminals (Blumstein et al., 1986; Wolfgang et al., 1972, 1987). Blumstein et al.’s observation that we need to study the careers of criminals, including early precursors of delinquency, maintenance through the adolescent years, and later consequences during the adult years, was particularly important in recognizing the need for examining the life-course development of high-rate offenders with self-report methodology.

The self-report methodology continues to advance in terms of both its application to new substantive areas and the improvement of its design. Gibbons’s (1979) suggestion that self-reports were just a fad, likely to disappear, is clearly wrong. Rather, with improvements in question design, administration technique, reliability and validity, and sample selection, this technique is being used in the most innovative research on crime and delinquency. The sections that follow describe the key methodological developments that have made such applications possible.

DEVELOPMENT OF THE SELF-REPORT METHOD

Self-report measures of delinquent behavior have advanced remarkably in the 30-odd years since their introduction by Short and Nye (1957). Considerable attention has been paid to the development and improvement of their psychometric properties. The most sophisticated and influential work was done by Elliott and colleagues (Elliott and Ageton, 1980; Elliott et al., 1985; Huizinga and Elliott, 1986) and by Hindelang, Hirschi, and Weis (1979, 1981). From their work a set of characteristics for acceptable (i.e., reasonably valid and reliable) self-report scales has emerged. Five of the most salient of these characteristics are the inclusion of (1) a wide array of offenses, including serious offenses; (2) frequency response sets; (3)

Suggested Citation:"3. Comparison of Self-Report and Official Data for Measuring Crime." National Research Council. 2003. Measurement Problems in Criminal Justice Research: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/10581.

screening for trivial behaviors; (4) application to a wider age range; and (5) the use of longitudinal designs. Each is discussed below.

Inclusion of a wide array of delinquency items. The domain of crime covers a wide range of behaviors, from petty theft to aggravated assault and homicide. If the general domain of delinquent and criminal behavior is to be represented in a self-report scale, it is necessary for the scale to cover that same wide array of human activity. Simply asking about a handful of these behaviors does not accurately represent the theoretical construct of crime. In addition, empirical evidence suggests that crime does not have a clear unidimensional structure that would facilitate the sampling of a small number of items from a theoretically large pool to adequately represent the entire domain.

These considerations suggest that an adequate self-report scale for delinquency will be relatively lengthy. Many individual items are required to represent the entire domain of delinquent behavior, to represent each of its subdomains, and to ensure that each subdomain (e.g., violence, drug use) is itself adequately represented.

In particular, it is essential that a general self-reported delinquency scale tap serious as well as less serious behaviors. Early self-report scales tended to ignore serious criminal and delinquent events and concentrated almost exclusively on minor forms of delinquency. Failure to include serious offenses misrepresents the domain of delinquency and contaminates comparisons with other data sources. In addition, it misrepresents the dependent variable of many delinquency theories (e.g., Elliott et al., 1985; Thornberry, 1987) that attempt to explain serious, repetitive delinquency.

Inclusion of frequency response sets. Many early self-report studies relied on response sets with a relatively small number of categories, thus tending to censor high-frequency responses. For example, Short and Nye (1957) used a four-point response with the highest category being “often.” Aggregated over many items, these limited response sets had the consequence of lumping together occasional and high-rate delinquents, rather than discriminating between these behaviorally different groups.

Screening for trivial behaviors. Self-report questions have a tendency to elicit reports of trivial acts that are very unlikely to elicit official reactions and even acts that are not violations of the law. This occurs more frequently with the less serious offenses but also plagues responses to serious

Suggested Citation:"3. Comparison of Self-Report and Official Data for Measuring Crime." National Research Council. 2003. Measurement Problems in Criminal Justice Research: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/10581.

offenses. For example, respondents have included as thefts such pranks as hiding a classmate’s books in the respondent’s locker between classes, or as serious assault events that are really roughhousing between siblings.

Some effort must be made to adjust or censor the data to remove these events if the delinquency of the subjects is to be reflected properly and if the rank order of subjects with respect to delinquency is to be portrayed properly. Two strategies are generally available. First, one can ask a series of follow-up questions designed to elicit more information about an event, such as the value of stolen property, the extent of injury to the victim, and the like. Second, one can use an open-ended question asking the respondent to describe the event and then probe to obtain the information necessary to classify the act. Both strategies have been used with some success.

Application to a wider age range. With increasing emphasis on the study of crime across the entire life course, self-report surveys have had to be developed to take into account both the deviant behavior of very young children and the criminal behavior of older adults. The behavioral manifestations of illegal behaviors or the precursors of such behavior can change depending on the stage in the life course at which the assessment takes place. For the very young child, measures have been developed that are administered to parents to assess antisocial behavior such as noncompliance, disobedience, and aggression (Achenbach, 1992). For the school-age child, Loeber and colleagues (1993) have developed a checklist that expands the range of antisocial behaviors to include such behaviors as stubbornness, lying, bullying, and other externalizing problems.

There has been less development of instruments targeted at adults. Weitekamp (1989) has criticized self-report studies for being primarily concerned with the adolescent years and simply using the same items for adults. This is particularly crucial given the concern over the small but very significant problem of chronic violent offenders.

Use of longitudinal designs. Perhaps the most significant development in the application of the self-report methodology is its use in following the same subjects over time in order to account for changes in their criminal behavior. This has enabled researchers to examine the effect of age of onset, to track the careers of offenders, to study desistance, and to apply developmental theories to study both the causes and consequences of criminal behavior over the life course.

Suggested Citation:"3. Comparison of Self-Report and Official Data for Measuring Crime." National Research Council. 2003. Measurement Problems in Criminal Justice Research: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/10581.

While broadening the range of issues that can be examined, application of the self-report technique within longitudinal panel designs introduces potential threats to the reliability and validity of the data. In addition to concern over construct continuity in applying the technique to different-aged respondents, researchers need to consider the possibility of panel or testing effects.

All of these newer procedures are likely to improve the validity, and to some extent the reliability, of self-report scales since they improve our ability to identify delinquents and to discriminate among different types of delinquents. These are clearly desirable qualities.

To gain these desirable qualities, however, requires a considerable expansion of the self-report schedule. This can be illustrated by describing the major components of the index currently being used in the Rochester Youth Development Study (Thornberry et al., in press) as well as the other two projects of the Program of Research on the Causes and Correlates of Delinquency (see Browning et al., 1999). The inventory includes 32 items that tap general delinquency and 12 that tap drug use, for a total of 44 items. For each item the subjects are asked if they committed the act since the previous interview. For all items to which they respond in the affirmative, a series of follow-up questions are asked, such as whether they had been arrested. In addition, for the most serious instance of each type of delinquency reported in the past six months, subjects are asked to describe the event by responding to the question: “Could you tell me what you did?” If that open-ended question does not elicit the information needed to describe the event adequately, a series of questions, which vary from 2 to 14 probes depending on the offense, are asked.

Although most of these specific questions are skipped for most subjects since delinquency remains a rare event, this approach to measuring self-reported delinquency is a far cry from the initial days of the method, when subjects used a few categories to respond to a small number of trivial delinquencies with no follow-up items. Below we evaluate the adequacy of this approach for measuring delinquent and criminal behavior.

RELIABILITY AND VALIDITY

For any measure to be scientifically worthwhile it must possess both reliability and validity. Reliability is the extent to which a measuring procedure yields the same result on repeated trials. No measure is absolutely, perfectly reliable. Repeated use of a measuring instrument will always pro

Suggested Citation:"3. Comparison of Self-Report and Official Data for Measuring Crime." National Research Council. 2003. Measurement Problems in Criminal Justice Research: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/10581.

duce some variation from one application to another. That variation can be very slight or quite large. So the central question in assessing the reliability of a measure is not whether it is reliable but how reliable it is; reliability is always a matter of degree.

Validity is a more abstract notion. A measure is valid to the extent to which it measures the concept you set out to measure and nothing else. While reliability focuses on a particular property of the measure—namely, its stability over repeated uses—validity concerns the crucial relationship between the theoretical concept one is attempting to measure and what one actually measures. As is true with the case of reliability, the assessment of validity is not an either/or proposition. There are no perfectly valid measures, but some measures are more valid than others. We now turn to an assessment of whether self-reported measures of delinquency are psychometrically acceptable.

Assessing Reliability

There are two classic ways to assess the reliability of social science measures: test-retest reliability and internal consistency. Huizinga and Elliott (1986) make a convincing case that the test-retest approach is fundamentally more appropriate for assessing self-reported measures of delinquency.

Internal consistency means that multiple items measuring the same underlying concept should be highly intercorrelated. Although a reasonable expectation for attitudinal measures, this expectation is less reasonable for behavioral inventories such as self-report measures of delinquency. Current self-report measures typically include 30 to 40 items measuring a wide array of delinquent acts. Just because someone was truant is no reason to expect that they would be involved in theft or vandalism. Similarly, if someone reports that they have been involved in assaultive behavior, there is no reason to assume they have been involved in drug sales or loitering. Indeed, given the relative rarity of involvement in delinquent acts, it is very likely that most people will respond in the negative to most items and in the affirmative to only a few items. This is especially the case if we are asking about short reference periods (e.g., the past year or past six months). There is no strong underlying expectation that the responses will be highly intercorrelated, and therefore an internal consistency approach to assessing reliability may not be particularly appropriate. (See Huizinga and Elliott, 1986, for a more formal discussion of this point.)

Suggested Citation:"3. Comparison of Self-Report and Official Data for Measuring Crime." National Research Council. 2003. Measurement Problems in Criminal Justice Research: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/10581.

Some theories of crime (e.g., Gottfredson and Hirschi, 1990; Jessor et al., 1991) assume there is an underlying construct, such as self-control, that generates versatility in offending. If so, there should be high internal consistency among self-reported delinquency items. While this result may be supportive of the theoretical assumption, it is not necessarily a good indicator of the reliability of the measures. If internal consistency were low, it may not have any implication for reliability but may simply mean that this particular theoretical assumption was incorrect. Nevertheless, we do note that studies that have examined the internal consistency of self-report measures generally find acceptable alpha coefficients. For example, Hindelang and colleagues report alphas between 0.76 and 0.93 for various self-report measures (1981:80).

We will focus our attention on the test-retest method of assessing reliability. This approach is quite straightforward. A sample of respondents is administered a self-reported delinquency inventory (the test) and then, after a short interval, the same inventory is readministered (the retest). In doing this the same questions and the same reference period should be used at both times.

The time lag between the test and the retest is also important. If it is too short, it is likely the answers provided on the retest will be a function of memory. If so, estimates of reliability would be inflated. If the time period between the test and the retest is too great, it is likely the responses given on the retest would be less accurate than those given on the test because of memory decay. In this case the reliability of the scale would be underestimated. There is no hard-and-fast rule for assessing the appropriateness of this lag, but somewhere in the range of one to four weeks appears to be optimal.

A number of studies have assessed the test-retest reliability of self-reported delinquency measures. In general, the results indicate that these measures are acceptably reliable. The reliability coefficients vary somewhat depending on the number and types of delinquent acts included in the index and the scoring procedures used (e.g., simple frequencies or ever-variety scores), but scores well above 0.80 are common. In summarizing much of the previous literature in this area, Huizinga and Elliott (1986:300) state:

Test-retest reliabilities in the 0.85 - 0.99 range were reported by several studies employing various scoring schemes and numbers of items and using test-retest intervals of from less than one hour to over two months (Kulik et al., 1968; Belson, 1968; Hindelang et al., 1981; Braukmann et al., 1979;

Suggested Citation:"3. Comparison of Self-Report and Official Data for Measuring Crime." National Research Council. 2003. Measurement Problems in Criminal Justice Research: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/10581.