Much More than a Score

Science Outside
May 7, 2025
9 min read

Updated: May 23, 2025

*"Not everything important is measurable, and not everything measurable is important." -Elliot Eisner*

In the world of education, few numbers carry as much weight, or cause as much confusion, as the ones students earn on their AP exams. These five digits are often used not just to evaluate student understanding, but to judge the effectiveness of the teachers who guide them.

But here's the truth: a test score is a data point, not a verdict.

Beyond the Numbers: What Makes a Great Teacher?

In a healthy school culture grounded in a growth mindset, we know that teacher effectiveness isn't defined by a single outcome. The best teachers:

Cultivate curiosity
Build lifelong learning habits that prioritize long-term understanding
Help students grow both academically and personally

However, in some schools, AP teacher performance is reduced to a single metric: the Weighted Average AP Exam Score. In this narrow view, teachers of students from wealthier, more resourced backgrounds are unfairly advantaged. The result? Teachers are evaluated less on their instructional skill and more on their students’ socioeconomic status.

The Pitfalls of Using AP Scores to Evaluate Teachers

Standardized tests can inform instruction, identify learning gaps, and help students prepare for college. But, as Sir Ken Robinson once said:

“Standardized tests have a place. But they should not be the dominant culture of education. They should be diagnostic. They should help.”

And AP exam scores, despite their value, fail to account for the many variables that shape student performance. Students don’t arrive in AP classrooms as blank slates. They bring with them years of life experiences and educational history, both of which shape how they engage and perform.

Many factors determine student performance on AP exams, making an honest and objective analysis of AP teacher effectiveness extremely challenging. Each AP course and teacher attracts a different population of students. The reality is that students self-select certain classes and teachers for various reasons, including subject interest, college and career goals, teacher personality, and the specific school culture. Evaluating teacher effectiveness based on AP score results requires accurately evaluating the diverse populations of students who enter their classrooms. This is evident in the wide variation between the average AP scores of students in different class periods and the fluctuations that occur with the same teacher in different years.

Socioeconomic status (SES) is strongly linked to scores on standardized tests. A Harvard-based team of researchers found that children of the wealthiest 1 percent of Americans were 13 times more likely than children of low-income families to score 1300 or higher on the SAT. To be blunt, if you are teaching students with a median income of 200K or higher, you are almost certainly going to have higher scores than if you are teaching students who qualify for free and reduced lunch. I don’t like this truth, but the data unequivocally supports this conclusion. (Reference: New study finds wide gap in SAT/ACT scores between wealthy, lower income kids)

SES varies widely not only between students in different schools, but also between groups of students within the same school. Teachers even have markedly different student populations in different AP courses in the same school. For instance, the student populations enrolled in AP courses such as AP Physics C and AP Calculus BC are significantly different than those enrolled in AP Physics 1, AP Environmental Science, and Statistics. Individual school cultures, teachers, and/or curriculum pathways lead to variations in the course selection of higher-performing students. As a result, AP scores fluctuate less in classes that attract the highest-performing students, who, unsurprisingly, earn high scores year after year.

There is a very strong and consistent relationship between student performance on the SAT Suite of Assessments and AP scores for nearly all courses. The SAT Suite of Assessments includes the SAT, PSAT/NMSQT, PSAT 10, and PSAT 8/9. Similar research has been done with preACT and ACT test scores with similar results. (Reference: PreACT and ACT Test Scores Associated with AP Exam Success)

Table 1. Correlation between SAT Suite Total Scores and AP Scores for Science Courses

AP Exam	Correlation
Biology	.752
Chemistry	.655
Environmental Science	.706
Physics 1	.647
Physics C: Electricity & Magnetism	.474
Physics C: Mechanics	.586

Source: The College Board, AP Potential: Score Correlations

The correlation values in Table 1 range from -1 to +1, with absolute values of approximately 0.1 considered to represent a small relationship, absolute values of 0.3 considered to represent a moderate relationship, and absolute values of 0.5 or higher considered to represent a strong relationship. The correlation values for AP Biology, AP Chemistry, AP Environmental Science, and AP Physics 1 are remarkable.

“The point is not to be the best, but to be the best you can be.” -Sir Ken Robinson

Peer effects in the classroom significantly impact standardized test scores. Students tend to mirror the motivation and academic behaviors of their peers, meaning high-achieving classmates can boost performance through positive norms, collaborative learning, and peer support. Conversely, disruptive peers can hinder focus and reduce overall achievement. Teachers also adjust their instruction based on class composition, which can amplify or dampen learning outcomes. Research shows that having motivated, high-performing peers benefits all students, though even a few disruptive individuals can negatively affect the whole class.

The number of AP exams a student takes is a powerful, measurable predictor of student motivation and AP exam success. Students who score a 3 or better on an AP exam are significantly more likely to enroll in more AP exams the following year. If you teach students who have taken no (or few) other AP exams, it is reasonable to expect that your students will almost certainly earn lower scores than if they had taken several other AP exams. (Reference: Giving College Credit Where It Is Due: Advanced Placement Exam Scores and College Outcomes)

Consider the following scenario: Teacher A in one school taught 57 students who had never taken a prior AP exam and averaged a 2.00 score on the AP Environmental Science exam. Teacher B, in a much different school, taught 17 students who averaged a 3.25 score on the same AP Environmental Science exam. Teacher B’s students had taken an average of five other AP exams and had a Weighted Average Score of 3.90 on those exams. Which teacher was more “effective”?

Course-to-course comparisons are particularly challenging as some courses have far higher Weighted Average Score results than others. While this has always been true, it has been exacerbated in recent years, as some AP course exams have been “recalibrated” by the College Board while other AP course exams have not. (Reference: The Great Recalibration of AP Exams)

Consider another scenario: A Physics 1 teacher taught students who earned a 2.75 mean score on the exam. A Calculus BC teacher taught students who earned a 3.75 mean score on the AP exam. Which teacher is more “effective”? The Weighted Average Score for Calculus BC is 3.87, while the Weighted Average Score for Physics 1 is 2.53. The differential performance of the students (difference between student scores and the national average in each exam subject area) suggests that all else being equal, the Physics 1 teacher was significantly more “effective” than the Calculus BC teacher in this scenario. (Reference: 2024 AP Score Distributions)

The analysis gets even more complex as we introduce more variables. Did students entering the course complete the prerequisites? Were the class sizes the same? What grade were students in when they took the course: 9th, 10th, 11th, or 12th? Did some courses have more students who participated in demanding extracurricular activities with inflexible schedules (ex: sports, musicals) than others? How did dual enrollment impact which students chose to take the AP exam?

How should teachers weigh their efforts regarding the following goals that I think we all share, but do exist in tension?

Increasing AP course access

Increasing scores on the AP exam

The College Board believes that increasing AP course enrollment is the most effective way to improve college readiness. (References: Impacts of AP: More Than a Score, A Broader View of College Readiness) They base their position on research showing that taking even one AP course in high school improves students’ first-year college GPA and chances of graduating in four years. (Reference: New Analysis of AP Scores 1 and 2)

The College Board expects science teachers to devote at least 25% of instructional time (for most teachers in the United States, this equates to a minimum of 33 hours) to scientific investigations to meet the AP Course Audit curricular requirements. Teachers sign a written statement indicating that students receiving a qualifying score on a science AP exam in their class also have mastered laboratory and field study techniques in addition to content knowledge. This requirement is in place to prove to colleges and universities that students did not merely prepare for the AP test but also engaged in the same science practices they would have encountered if they had completed the equivalent college course at their institution. It’s a matter of integrity.

There is widespread concern that an increasing focus on AP scores in science courses is leading to a reduction of instructional time devoted to authentic scientific practices. I’ll offer a personal anecdote as evidence to validate this concern.

One of my daughters recently completed an AP Biology course. She and several of her classmates reported that the students in AP Biology spent nearly all of their class time on their Chromebooks using AP Classroom to watch AP Daily videos and answer multiple-choice questions related to the videos. The students in her class reported that they experienced less than five hours of scientific practice throughout the entire course, virtually all of those after the AP exam. We talked regularly throughout the school year about students in the course yearning for more genuine scientific inquiry that includes direct observation of scientific phenomena via hands-on investigations and demonstrations. My daughter earned a 5 on the AP exam, yet her education was truly short-changed in the course. The numeric outcome was good, but the process of achieving that outcome was not. While outcomes are important, we should primarily focus on the process if we want to promote authentic learning.

"We have sold ourselves into a fast food model of education,

and it's impoverishing our spirit and our energies

as much as fast food is depleting our physical bodies."

-Sir Ken Robinson

Before taking AP Biology, my daughter wanted to be a physician. While taking the course, she decided she wanted to be an engineer. Why? She was simultaneously enrolled in AP Physics, and her AP Physics teacher inspired her with a well-rounded science experience that included regular observation of phenomena and several authentic investigations.

“Education is not the filling of a pail but the lighting of a fire.”

-William Butler Yeats

"Be the flint and steel." -Christopher Kling

School district priorities significantly impact the instructional strategies science teachers choose to employ. The incentives we choose to put in place influence human behavior in practical terms. Clear communication regarding a school district’s highest values about science instruction is essential for science teachers and guidance counselors as they resolve the following common dilemmas:

AP Science Teacher/Guidance Counselor Dilemma #1: A guidance counselor calls and requests a waiver to place a student in the class who doesn’t meet all of the prerequisites for the course (or it is after the add/drop deadline, or the class is already overenrolled) because of special circumstances that indicate it would be beneficial for the student to participate in the course. How should the teacher respond? How is the teacher incentivized by the school district leadership to respond?

AP Science Teacher/Guidance Counselor Dilemma #2: An AP student has never before taken an AP course or exam and has not yet performed on summative assessments at a level that would lead us to objectively project they would earn a 3 or better on the AP exam. Should the teacher encourage this student to take the AP exam? How does the school district leadership incentivize the teacher to respond?

If your students earn low AP scores, it does not mean you are an ineffective teacher. The converse is also true. If your students earn high AP scores, it doesn’t mean you’re an amazing teacher. The scores belong to the students who earned them. If we view our students’ AP exam scores as “our” scores, we take away from the students who earned them. Don’t get me wrong, our efforts as teachers matter a lot, but we should have the humility to acknowledge that we are only one part of a large and diverse community of people who have played a role in the education of our students. Our vital role as teachers is to catalyze curiosity and provide optimal learning conditions to promote the long-term flourishing of our students. How well we do that is the true measure of our effectiveness as teachers, and it’s not captured accurately in student AP scores.

AP and other standardized tests are designed to make norm-referenced interpretations of students' knowledge and/or skills relative to those of students nationally. We can argue about how well they achieve that goal, but they are unquestionably a poor metric to evaluate teacher effectiveness. That's not what they are intended to do. It’s like measuring temperature with a graduated cylinder. Drawing valid conclusions from AP score data requires comprehensive statistical analysis, and the conclusions that can be drawn do not encompass many critical aspects of teaching.

In a future blog post, we will explore: