Traditionally, standardized test scores have been used to measure the percent of students who are (or are not) proficient according to statewide standards. In the last decade, the practice of using test scores to track individual student growth over time began. More recently this student growth data is being used to determine teacher and/or school effectiveness.
Meanwhile, in New Jersey and across the nation, teacher evaluation policies are being revised to include the use of student standardized test scores as a means of determining teacher effectiveness. The N.J. Department of Education is using student growth percentiles as a key element in the new pilot teacher evaluation program.
Determining student growth percentiles requires a complex formula, and since these figures may comprise up to 45 percent of a teacher’s evaluation, the Association asked Dr. Derek Briggs, a professor at the University of Colorado, to explain the most well-known model used to analyze student growth. Briggs also comments on the use of student test scores at the classroom and school level.
Since the passage of the No Child Left Behind Act in 2001, educators have become very familiar with the use of status-based statistics (e.g., percent of students classified as “proficient” or “advanced”). Today, however, most discussions of student data involve growth-based statistics. A key distinction between the two is that while status-based statistics tend to be strongly correlated with the pre-existing achievement and demographic characteristics of students attending a given school, growth-based statistics are only weakly correlated. In this sense, they provide for a fairer and more objective comparison among classrooms and schools because they serve to level the playing field. But as with all statistical models, there are limitations to the inferences that can be supported, and these need to be taken into consideration.
There is a diversity of opinion about how large-scale state assessment data can and should be used in accountability systems. On one extreme would be those who believe large-scale state assessment results are the only “objective” indicator and thus any judgment about educator/education quality should be based on such measures. At the other extreme are those who would argue that any use of large-scale assessment data in accountability decisions is an abuse because of the limitations associated with statistical modeling. My experience in discussing these issues in numerous contexts is that stakeholder perspectives fall in between these two extremes. And I would argue that when examined thoughtfully, the results of large-scale state assessments—particularly when examined in a longitudinal fashion—can yield important insights about the manner in which an education system is functioning.
How does the Colorado Growth Model work?
The Colorado Growth Model has been used in Colorado since the summer of 2008. While it is too soon to evaluate the impact of this model on school improvement efforts, a 2011 survey of Colorado districts and schools suggests that results from the model have been used to make decisions about topics such as the alignment of curriculum, goal setting, parent presentations, professional development needs of teachers, student learning needs, and planning for interventions.
The following example—which is based on actual data from a state using the Colorado Growth Model—illustrates the sorts of insights that can be gleaned from its use. We start in the year 2010 with a student named Pete who has just finished the fourth grade in elementary school. In the third grade, Pete received a score of 260 on the state’s math test. Given that the average score for all 12,025 third-grade students taking the state test in 2009 is 294, this score placed Pete at the 20th percentile of the full score distribution (i.e., the 12,025 test scores in the state ordered from lowest to highest). In other words, Pete is performing better than 20 percent of all third grade students taking the same test. We will refer to this baseline performance as an unconditional achievement percentile.
Pete has four classmates named Glenn, Jody, Keith and Susan. Table 1 shows how Pete’s unconditional achievement percentile on the state test compares to those of his four classmates. We can see that Pete performed the most poorly among the five students. Note that a key feature of Table 1 is that for each student, we are given the count of other fourth grade students in the state that had the same score in 2009 as third graders. There are 342 students who, like Pete, also had a score (260) at the 20th percentile in third grade; likewise there are 420 students who, like Glenn, also had a score (285) at the 40th percentile in third grade; etc.
Student growth percentiles
Now we turn to the test performance of these same five students when they are in the fourth grade in 2010 (Table 2). A Student Growth Percentile (SGP) is a conditional achievement percentile, and serves to contextualize these scores relative to the performance of only those students with the same test scores. In other words, say that Pete scores a 236 on the state’s math test in the fourth grade.
Instead of asking how this performance compares to all other fourth-grade students, we evaluate how it compares to a subset: the 342 students who had the same test score in the third grade. Relative to the distribution of math test scores for these 342 students, Pete’s score gives him an SGP of 6. In other words, even taking into account where he started in third grade and comparing him to students who started from the same place, Pete’s performance on the state test in fourth grade is quite poor. That is, out of 342 comparable students, about 321 will have had a higher score, placing Pete’s score at the sixth percentile. As such, a low SGP such as this raises a flag that we should be concerned not just with the level of Pete’s math achievement in a criterion-referenced sense, but also with his lack of progress in a more normative sense. Likewise, Glenn scored better than only 20 percent of the students who had the same test score in the previous grade.
Two things about SGPs are especially important to note. First, a student’s SGP is not solely determined by his or her unconditional achievement percentile in the previous year. It is influenced by the performance of his cohort in the current year. For example, Keith has an unconditional percentile in third grade of 70, but his SGP in fourth grade is 29, indicating that he scored better than only 29 percent of his fourth-grade cohort. Likewise, it would be possible to find a situation in which a student has a very low unconditional percentile in third grade, but a high SGP in fourth. Second, the interpretation of a student’s SGP does not require that test scores from one grade to the next be placed on a vertical scale (a single continuum that allows for the computation of grade-to-grade gain scores). Indeed, since the data in this example come from a state where the test scores are not vertically scaled, knowing that a student’s score is higher from one year to the next has no direct interpretation as growth. When interpreting SGPs, growth has somewhat of a metaphorical connotation: A student who has shown growth is one who has performed better than expected relative to other students with the same baseline achievement.
School- or classroom-level growth: MGPs
To summarize information about school or classroom-level growth, we gather all the students in a school or classroom for whom we have at least two consecutive years of test score information and no unexpected “holes” in their test-taking history. Consider an example at the school-level. Imagine we have an elementary school with students in grades K through 5. Because students are not tested by the state until grade 3, SGPs would be available only for students in grades four and five. Hence a determination of school-level growth would be limited to these grade levels. For students in grade four, we use one prior test score (from grade three) to compute an SGP; for students in grade five we use two prior test scores (from grades three and four) to compute an SGP. We then remove students who have not been enrolled in the school for a full year, and then order all the remaining SGPs from lowest to highest (irrespective of the grade students are in) and take the middle value. This number represents the median growth percentile (MGP) of a subset of students in the school.
A similar process could be used to compute an MGP at the classroom-level. The basic idea is this: once an SGP has been computed for each student, it is up to stakeholders to decide the level to which they will be aggregated and the inferences that the resulting summary statistic is intended to support. A school or classroom with an MGP below the 50th percentile might be taken as an indication that students tend to be progressing more slowly on the state standardized assessment relative to similar students throughout the state.
As part of its NJ SMART Education Data System, the N.J. Department of Education has produced at 14-minute video titled “Using Student Growth Percentiles.” Go to http://survey.pcgus.com/njgrowth/player.html to watch the video.
Derek Briggs is the chair of the Research and Evaluation Methodology Program and an associate professor of education at the University of Colorado.