On the redesigned PSAT, the gaps between female and male students are closing, especially among high-scoring students.  The bigger problem might be among male students at the other end of the scale.

When David Coleman announced the redesign of the SAT in 2014, his watchword was “opportunity.” He declared, “What this country needs is not more tests, but more opportunities,” and he acknowledged that the SAT had not succeeded in carrying out its mission to “propel students into the opportunities they have earned.” His focus, that day, was mainly on low-income and first generation-students who perform at very high levels of academic achievement but do not apply to a single selective college.  The SAT was being redesigned, Coleman announced, to make it easier for students to have their hard work in school show up in their SAT scores.

Historically, the SAT and PSAT have done less well propelling female students into opportunities. (We’ll return to issues of income and of race and ethnicity in future posts.) The PSAT has a particularly spotty track record when it comes to gender. In 1994, FairTest filed a complaint against the Educational Testing Service and the College Board with the Office for Civil Rights in the Department of Education.  The complaint alleged that the PSAT, which was and is used as the main component in awarding National Merit Scholarships, was biased against girls.  Even though more than half of the testtakers were female, more than half of the National Merit Semifinalists were male.  A multiple-choice writing section was added to the PSAT in 2004–and to the SAT in 2005–presumably because girls performed better on Writing than boys. That advantage could, the thinking went, balance out the main source of the National Merit Scholar gap on the exam:  the Math section.

Female students, on average, do less well on the Math section of the SAT, ACT, and PSAT, as we can see in the table for the class of 2017. For the SAT and PSAT, instead of comparing the Evidence-Based Reading and Writing scores, we compare the Reading Test score, which makes up half the ERW score, and the Writing and Language Test score, which make up the other half.  This comparison has the benefit of mirroring both the ACT and the old format of the SAT, which will let us, in a moment, make a longitudinal analysis of the PSAT over almost two decades.

TEST Math Average (Female/Male) Reading Average (Female/Male) English/Writing Average (Female/Male)
ACT 20.4/21.2 21.8/21.2 20.8/19.9
PSAT 479/485 26/25 26/25
SAT 516/538 27/27 27/27

Averages are not the best way to look at differences in performance, however.  It is hard to understand what a 6-point gap between male and female students on the PSAT Math section means compared with a 0.8 gap on the ACT.  A better way to analyze how male and female students are performing on the exam is to look at how the scores are distributed across the entire range.  If you care about who wins National Merit or gets into an Ivy League school, average scores don’t come into the conversation.  We should instead look at whether a female student is as likely to score at the high end of the score range than a male student is.  Likewise, if we’re concerned about who goes to college, we need to focus on who is more likely to escape the very low end of the scale.

Properly comparing male-vs.-female performance on the PSAT, SAT, or ACT requires more than merely looking at how many students score in a particular range.  In 2013, for instance, we know that 44,008 guys scored between 70 and 80 on the PSAT (the old PSAT scale mirrored the SAT, with a zero knocked off the score on each section, so it went from 20 to 80, instead of 200 to 800).  25,065 girls scored between 70 and 80 that year.  That ratio, 1.76 to 1, already looks bad, but it’s important to note that more girls take the tests than do boys, so it is necessary to adjust for that difference by looking at the percentage of boys and the percentage of girls who score in a particular range.  In 2013, 5.9% of male test takers scored between 70 and 80, while 3% of female test takers did. We compared those percentiles by gender, by dividing the percentage of male students in a particular score band by the percentage of female students. We call that number, the ratio itself, the gender gap. In 2013, for example, we divide 5.9% by 3% to get a ratio of 1.97 to 1, or a gender gap of 1.97, which is worse than the gap we would get comparing sheer numbers of test takers.

The graphs below track the gender gap on the PSAT over a 16-year period.  Looking at just two or three years could mask trends already in play, as well as the degree to which the gender gap varied from year to year.

One caveat:  when the PSAT and SAT were revised in 2016, the scoring on the PSAT changed too.  The new scale on the PSAT for each section goes from 160 to 760; the SAT goes from 200 to 800. For the new PSAT, we used the Section scale (160 to 760) for Math but the Test scale (8 to 38) for the Reading and Writing and Language sections, in order to compare the new test with the old.  We used the same score range distribution that College Board uses in its reporting.  This led to some expansion of the lowest score bucket and some compression of the highest score bucket.  This is not ideal, but both genders fall victim to the effect and we are relying on the data available, so we decided to go with the compromise of comparing scales that don’t match up at every point year over year.

What is interesting is that, historically, the gender gap is at its largest at the very high end of the Math scale, 750-800, where very few students score and where the male-to-female ratio on Math is typically greater than 2 to 1.  As Nitin Sawhney notes in the comments below, the new test lifted scores for everyone, and this effect could lie behind the closing of the gender gap on the Math section.  Like a capped tube of toothpaste, the score range has been squeezed higher, pushing male and female scores up and causing compression at the top of the scale because scores cannot go higher than 760.  Since the gender gap was largest there, the top range has the greatest potential to shrink.


The two most significant findings from this analysis are almost perfectly paired opposites:

  1. The Math gap among high-scoring students has shrunk significantly at the high end of the score range, and other ranges moved closer to parity as well.
  2. Reading is noisy, but it’s quite close to parity at most ranges (as is Writing and Language); the growth of a gap at the low end of the range, with male students lagging behind here, will be important to watch.

It’s important to add a word of caution about the lowest score range (20-20/8-14) on the new exam. When College Board got rid of the guessing penalty on the redesigned SAT, it became harder to score in this range. Filling in (A) for all the questions on the three 2016 PSAT forms, for instance, would have gotten you a 290 on Math on one form, a 320 on another, and a 380 on a third.  For Evidence-Based Reading and Writing, the numbers are 300, 320, and 330.  As a result, there are many fewer students scoring in the lowest range, and with small numbers, differences look larger when we compare them. There were so few students in that range on the redesigned Reading section that the percentage of female and male students scoring that low was effectively 0, which is why I assigned that score band a gap number of 1.

MATH

The Math trends are the most striking of all.  While much of the score range has remained relatively stable, the gap between male and female students at the high end of the range has declined dramatically, even with a slight uptick in the 700-760 range this year.

Although it is distressing that high-achieving girls continue to lag behind high-achieving boys, the attenuation of that gap is hopeful.  The gap was shockingly large 15 years ago and it remains too large today, but it’s encouraging to see it shrink.

It’s also curious.  Why has the math gender gap shrunk at the high end of the curve?

The most optimistic explanation is that the PSAT reflects steady improvements in math made by female students, perhaps as a result of changing social norms and/or of the implementation of a common core curriculum and other changes in pedagogical practice and policy.  This could very well be the explanation, or at least part of it, but if that were so, then we would expect to see similar improvements against the ACT.

That is not the case.Using ACT data from 2006 to 2015 in order to graph the gender gap among students scoring 32 to 36 on each of the ACT’s sections, we can see that while the math gap has also shrunk on the ACT, it has not shrunk as much as the SAT gap has.  This suggests that some of the decrease in the math gap might be attributed not to the test takers, but to the test.

The graph of the PSAT Math gender gap reveals two moments in which the decline picks up pace:  2005 and 2015.  2015 was the year the latest redesign of the PSAT debuted.  The previous redesign was released in 2004.  It’s not clear why the effect of the redesign was delayed until 2005; perhaps it took time for test prep to catch up.  For the latest redesign, College Board released practice tests well in advance.

Why the redesign would shrink the gender gap is an important, if challenging, question.  In both revisions, the SAT math content moved closer to school math and further away from IQ-style questions that resemble schoolwork very little, if at all, .  Part of the challenge of these questions is mastering their format and tricky language.  Consider the quantitative comparison question, a format that once made up 30% of the math on the PSAT.  Students had to decide which of two quantities was the larger, whether they were equal, or if it was not possible to determine which was larger.The thing about quant comp questions is that they are easy to beat using a basic test prep technique. (See below for how to beat this question.)

It’s not clear why getting rid of this kind of question would lead to more girls doing better on the math, or why we can see a similar result with the redesign of the PSAT in 2015 and the SAT in 2016.  (We’ll share similar findings about the SAT in the future.)  It is pretty clear, however, that question design played a role in closing the gap.  ACT, in contrast, has introduced no changes to its question design in more than two decades.  ACT scoring curves have improved on Math and Science, and most tests now have six rather than seven Science passages, which might have removed some of the timing pressure from the exam.

So, if changing the style of the questions on the SAT has led to smaller disparities among male and female students, what does that say about standardized tests?  On the one hand, we could look at the redesigned exam as a better representation of reality, a clearer reflection of the actual abilities of high-achieving male and female students in math.  On the other hand, we might see it as shaping rather than reflecting reality, by increasing the number of questions that girls do better on.

I’m not sure either view is correct.  They both assume that there is some objective measure out there, an empirical measurement of intelligence (academic potential? merit?), waiting to be identified and that we just need to make a better test to capture that fact.  The truth, of course, is that intelligence (and potential and merit) are much too complicated for any test to capture perfectly.  A good test will determine what should be tested and what is the best way to test it.  The SAT’s thinking about both of these factors has changed significantly, twice, in the past 15 years, and it looks like that is leading to better outcomes for academically high-achieving female students. The ACT has undergone much less change, so it is not surprising that the gender gaps on that exam have held steadier and are significantly larger on Math and Science than on any section of the SAT.

WRITING and READING

Neither the Writing nor Reading section of the PSAT displays anything like the gap between high-scoring male and female students on the Math section.  The Writing section has maintained near-parity across almost all score ranges for 15 years.  The one exception is at the very lowest range, where male students are much more likely to score, but the disparity there is misleading.  Very, very few students score in that range, often less than 1%, so the ratios look more significant than they are.

The same cannot be said about the 30-39/15-19 range on the Reading section.  Over the past 15 years, about 20 percent of male students have scored in this range, which amounts to over 150,000 boys.  A score in that range, which is in the bottom 20th percentile, means these students are well below the college readiness benchmark and puts them at a serious disadvantage at any school that gives SAT scores consideration in admissions.Here, again, we can point to the changes the redesigned test made to question formats having an effect on a particular demographic.  The new PSAT/SAT no longer tests difficult vocabulary, and it uses more difficult reading passages on a section that is over an hour long.  Could these changes be hitting low-performing boys especially hard?

Once again, we need to ask, is the redesigned SAT revealing more clearly the deficits that were masked by the old exam, or is the exam punishing students who are already lagging behind their peers?  Or could it be that with the growth of school-day testing and the expansion of the pool of PSAT test takers to students who might not traditionally have taken the exam, that the ranks of students at the low end of the score range has swelled.  But why would that affect boys more than girls?

What matters even more than the explanation is the response to this problem.  If struggling students are doing even worse on the SAT Reading section, this situation needs attention.  Important as it is to track how the new SAT is playing out for high-scoring students–who are also disproportionately wealthy–the attention of educators needs to be spread across the range of scores to make sure that equity is equity for all.

A NOTE ON DATA

The information in this piece was assembled using public records released by the College Board between 2003 and 2015 and previously unreleased data provided upon request by the College Board.  I am grateful to the College Board for sharing this valuable information.  I hope that in the coming months and years, the College Board will return to its old ways and become a model of transparency once again.  The ACT data was shared with me by a source who works in admissions and had collected this information over the past 15 years.  ACT shares a sliver of it here, and College Board used to release it in its annual reports for both the PSAT and SAT, which remain archived on its website.  With the release of the redesigned exam, College Board stopped releasing that data, although it can be obtained by high school counselors in the College Board counselor portal. It would be to everyone’s benefit, students’ in particular, were ACT and the College Board to be more open organizations when it comes to sharing information about the exams and examiners.

How to beat the question:  Try plugging in 2 for a and 3 for b.  Now try 2 for a and -3 for b.  You get a different result, so the answer is (D).  Pretty easy, and pretty silly as a way to test math, which may be why College Board removed these question types in 2004 from the PSAT and 2005 from the SAT. (Note:  the GRE still uses quant comp questions.)

5 thoughts on “The Good (and not so good) News about the PSAT

  1. As you note, effective range of scores on the new SAT has shrunk, from 570 per section to about 500 per section. That has shrunk the gaps between scores, and there is more compression at the top. A 560 Math on the old test, for instance, is 580 on the new test. This crowding at the top would mean that gaps between scores have shrunk — for everyone — and potentially explain much of the shrinking gap between boys and girls.

    A look at the College Board’s old SAT to new SAT Math concordance table will show you the effect in detail.

    https://collegereadiness.collegeboard.org/xls/concordance-tables-new-sat-scores-old-sat-scores.xls

    Best,
    Nitin Sawhney

    1. Hey Nitin! This is a terrific point. I edged away from getting too much into the weeds above with concordance and compression, since the big point is really the observation of the gap, and in practical terms, the why doesn’t matter for people evaluating scores for admission and merit aid. But you’ve given me a good idea for thinking about how to explain this. Thanks. Also, a question–the ACT has always had a compressed scale, since it didn’t have the guessing penalty and its mean is off-center. Thoughts on why the STEM gap persists there?

      1. At least 40% Geometry! Their new labeling confuses the issue but at least 20 of 50 questions on every test are those that we all would commonly label as geometry. If you include Trig it’s at least 25 of 60. There are good arguments for the nurture case for the (well-established) gender gap in spatial reasoning scores, but not about whether the gap exists. That has been shown in numerous studies. I could post links but there are so many.

        All best!

  2. A qualification: The gender based gaps in spatial reasoning skills seem to exist in high school and adulthood. In primary school, the research is mixed, and some studies suggest that there is no gap. This is support for the nurture case.

    I caved and found some interesting links. You hit upon a topic I find super interesting. I’ve taught and tutored Math in the US for 20 years and grew up in India, and I find the gender-based gaps in Math a very troubling American phenomenon.

    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2867482/

    http://m.nautil.us/issue/32/space/men-are-better-at-maps-until-women-take-this-course

    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3169128/

Leave a Reply