A Score Is Not a Child: What Parents, Educators, and Providers Should Know About High-Stakes Testing
A Score Is Not a Child: What Parents, Educators, and Providers Should Know About High-Stakes Testing
In my waiting room, and in conversations with children, parents, educators, and providers, I have been hearing the same concern more often:
Why are children being tested so much, and what do all these scores actually mean?
High-stakes educational testing has become a normal part of childhood. Children talk about test days, practice tests, benchmark tests, progress monitoring, state testing, and the pressure they feel when adults tell them the scores matter.
Some students are only mildly annoyed. Others become anxious. Some parents feel confused. Others worry that their child is not “measuring up.” Educators may feel pressure to raise scores while also trying to protect instructional time, relationships, and meaningful learning. Providers may see the emotional and developmental effects when children begin to believe a number says something permanent about their intelligence, future, or worth.
These concerns are not imaginary.
Depending on age and grade, a student may encounter measures such as NWEA MAP Growth, M-STEP, PSAT 8/9, PSAT 10, SAT, science and social studies assessments, local benchmark tests, progress monitoring tools, and other required measures. The names vary. The effect accumulates.
As a pediatric psychologist who has spent a significant part of my research life studying academic measurement, especially in communities labeled as having “low-performing schools,” I am concerned.
I am concerned not because measurement is bad.
I am concerned because measurement is often used too casually, too broadly, and with too little attention to what a given test was actually designed to tell us.
A score can help.
A score can also harm.
Testing Time Is Childhood Time
For individual students, the issue is not abstract. Every hour spent testing is an hour away from instruction, discussion, reading, writing, teacher feedback, peer interaction, movement, play, and the ordinary human work of learning.
Some children experience these tests as routine. Others experience them as judgment.
Parents may see a report and wonder whether their child is falling behind, whether something is wrong, or whether a single score has revealed a larger truth about who their child is.
Educators may be asked to make decisions from data that appear more precise than they really are.
Providers may be asked to interpret scores in the context of attention, anxiety, learning disorders, developmental history, trauma, language exposure, medical issues, or unequal educational opportunity.
Sometimes a score is useful. Sometimes it raises an important question. Sometimes it points to a skill area that needs more attention.
But sometimes a score creates anxiety without giving families, teachers, or providers enough context to understand what was actually measured, how precise the score is, what the score can reasonably tell us, and what should happen next.
That is where measurement matters.
Numbers Can Look More Certain Than They Are
A standardized score can look objective, scientific, and precise. But a number is not automatically valid because it is numerical.
A percentile is not automatically meaningful because it appears on a report.
A growth score is not automatically a clean measure of teaching.
A school accountability score is not automatically a fair representation of what children know, what teachers have taught, or what a community has invested in its schools.
Tests have purposes. They have design assumptions. They have technical limits. They have standard errors. They have intended uses. They also have misuses.
And children live inside the consequences.
The problem is not the existence of scores. The problem is the way scores can acquire authority beyond their validity. A number can look clean, scientific, and final even when it is only a limited estimate, produced under particular conditions, for a particular purpose, with a margin of error and a set of assumptions behind it.
A test may measure something real and still be misused.
That distinction matters.
The concern is not that tests measure nothing. The concern is that adults may ask a score to tell us more than it was designed, validated, or ethically able to tell.
The Most Important Question: Is the Score Valid for This Use?
One of the most important questions in educational testing is content validity.
Content validity asks whether a test actually samples the academic content it claims to measure.
For example:
Does the reading test measure reading skills that were taught?
Does the math test represent the domain of math instruction in a developmentally appropriate way?
Does the assessment reflect the curriculum, standards, and instructional opportunities students actually received?
Does the score support the conclusion parents, educators, policymakers, or providers are drawing from it?
These questions matter.
A test may be useful for one purpose and inappropriate for another. A measure designed to estimate broad academic achievement does not automatically become an appropriate tool for high-stakes decisions about a teacher, a school, a district, or a child’s future potential.
This is not a small technical concern.
It is the center of the issue.
Tests Are Not Interchangeable
For example, MAP Growth may provide useful information when used as a broad estimate of achievement or growth over time. But using it as the primary progress-monitoring marker for intervention decisions can exceed what the measure is best designed to do.
For frequent intervention monitoring in reading and math, curriculum-based measurement is often a better fit. These tools are usually more closely tied to instruction, more efficient to administer, and better suited to answering the practical question schools most need answered:
Is this intervention helping this child learn the targeted skill?
That distinction matters because tests are not interchangeable.
A test that is useful for one decision may be too broad, too infrequent, or too indirect for another. A score can be helpful when interpreted within the limits of the instrument. It becomes risky when adults ask the score to carry more meaning than the test was designed to support.
Once a score is used to make judgments about children, teachers, schools, or communities, validity is no longer just a psychometric concept.
It becomes an ethical obligation.
The Larger Measurement Problem
This same mistake shows up in the way we talk about high-stakes educational testing more broadly.
Schools are often judged by numbers that appear objective, scientific, and precise. Dashboards, percentile ranks, proficiency rates, and growth scores can create the impression that we are seeing the whole truth clearly.
But we are usually seeing only part of a much larger story.
One of the most uncomfortable truths in education is that performance on high-stakes tests is strongly shaped by family socioeconomic status, neighborhood opportunity, and the tax base of the school system. This has been known for decades in the science of educational measurement and psychometrics.
This does not mean teachers do not matter.
It does not mean schools do not matter.
It does not mean children are not learning.
It does not mean tests measure nothing.
It means that test scores emerge from a much larger opportunity structure. When that opportunity structure is unequal, the scores will often reflect inequality as much as instruction, effort, or ability.
Even when a test is technically competent, the opportunity structure feeding the test may be profoundly unequal.
And then the system often treats the resulting score as if it were an individual moral fact: effort, talent, readiness, merit, or deservingness.
That is where measurement can become laundering.
The score converts social history into a number, and the number is then treated as if it emerged from the child alone.
The tragedy is that education has had decades of evidence and still often chooses the most comfortable interpretation: that the measurement problem is cleaner than the social problem.
But the measurement keeps pointing back to the social problem.
The thermometer is not the fever, but we keep using the thermometer to sort children and schools while leaving the fever untreated.
That does not mean children are not learning. It does not mean teachers are not teaching. It does not mean tests measure nothing.
It means we should be extremely careful before pretending that a high-stakes score cleanly separates school quality, teacher effectiveness, child effort, curriculum strength, family resources, neighborhood opportunity, and public investment.
The issue is not only technical.
It is ethical.
Do No Harm
When a score can be substantially predicted by the socioeconomic status of a child’s family and the resources available to a school system, we should be careful about using that score to label children, rank schools, evaluate teachers, or assign blame.
At that point, the ethical maxim “Do no harm” becomes directly relevant.
If measurement helps us understand what a child needs next, it can serve the child.
If measurement primarily tells us which children and schools have already had more opportunity, and then punishes those with less, it risks becoming harm dressed up as accountability.
When we ignore that problem, we mistake social advantage for educational merit.
We mistake resource inequality for school failure.
We mistake test performance for the whole child.
And, just as dangerously, we mistake the appearance of data for the presence of validity.
What Parents, Educators, and Providers Can Ask
When reviewing a test score, it may help to ask:
What was this test designed to measure?
Was it designed for this specific decision?
How closely does it connect to what the child was actually taught?
How precise is the score?
What else do we know about the child’s learning, development, health, stress level, language background, educational history, and access to opportunity?
What would we do differently for the child based on this score?
That last question matters especially.
If a score does not lead to better instruction, better support, better understanding, or better care, then we should be cautious about how much power we give it.
The right question is not simply, “What is the score?”
The better question is, “What does this score help us understand, and what humane, useful action should follow from it?”
A Score Is One Piece of Information
Schools, state education systems, and federal education systems absolutely need people who understand the science of psychometric measurement. Their expertise must be shared with policymakers.
Not because measurement is unimportant, but because it is too important to be used casually.
A score may raise a useful question. It may help identify a need. It may guide the next instructional step.
But a score is not a child.
It is not a teacher.
It is not a school.
It is not a community.
It is one piece of information.
And when that information enters the life of a child, we have an obligation to use it carefully, humbly, and humanely.
As always, if I can be of assistance, please reach out.
Dr. Miller