OECD: Grade the Process, Not the Product, in Science Education

When the OECD released its Digital Education Outlook 2026 in January, one recommendation stood out above the rest: schools should move toward "process-oriented assessment." The 245-page report, which examined international research on generative AI in education, concluded that "traditional assessment models that focus solely on final outputs are becoming inadequate." Instead, the OECD urged educators to evaluate "not just what students produce, but how they engage with learning to create products" (OECD, 2026).

This is not incremental policy guidance. It represents a fundamental shift in how international bodies think about student assessment in an age when AI can generate essays, solve equations, and even design experiments in seconds. For those of us at WhimsyLabs, it also represents something else: international validation of an approach we have been building since our founding.

What Exactly Does the OECD Recommend?

The OECD report is unambiguous: when AI can produce polished outputs instantly, grading those outputs tells you almost nothing about student learning. The solution, according to the report, is to assess the process instead. As summarised by the European Platform for Adult Learning, the OECD suggests that "instead of grading the final paper, teachers should evaluate how a student interacted with AI, how they critiqued its output, and how they refined their ideas over time" (EPALE, 2026).

This recommendation stems from research showing a troubling paradox: students using AI perform better on immediate tasks but worse when the AI is removed. The report cites studies showing that while AI-assisted students were 48% more successful at completing tasks, their performance dropped by 17% when tested without AI support. The OECD calls this phenomenon "cognitive offloading," where students let AI do the thinking and miss out on the mental struggle necessary for genuine learning.

The concern is not hypothetical. RTE reported the OECD's warning that "if used as a shortcut rather than a learning tool, AI can displace cognitive effort and weaken the skills that underpin deep learning." The report explicitly warns against "the mirage of false mastery, where the impressive outputs generated by AI mask the underdevelopment of essential skills" (RTE, 2026).

Why Traditional Assessment Cannot Adapt

The fundamental problem with traditional assessment is that it was designed to measure outputs at a time when outputs required human effort. When a student submitted an essay in 1990, the quality of that essay correlated reasonably well with the student's understanding. The student had to think through the argument, structure their ideas, and produce prose that reflected their comprehension. The output was, in a meaningful sense, evidence of process.

That correlation has broken. A student can now prompt an AI to generate a sophisticated essay in seconds, submit it with minor edits, and receive high marks despite having learned nothing. Teachers report spending increasing time trying to detect AI-generated work, time that could be spent on actual teaching. Meanwhile, students who work honestly may be disadvantaged compared to those who outsource their thinking.

The OECD report makes clear that detection is not the solution. AI-generated content will only become harder to identify, and the detection arms race diverts educational energy from learning to policing. The solution is not to catch cheaters but to change what we assess. If we assess the process, not the product, AI becomes a tool rather than a threat.

Why Science Labs Are the Perfect Case Study

Nowhere is the distinction between process and product more evident than in laboratory science. When a student performs a titration and reports the correct molarity, what have they actually demonstrated? They might have reasoned systematically through the procedure, made careful observations, and understood the underlying chemistry. Or they might have asked a neighbour for the answer, copied from last year's lab report, or simply gotten lucky. The final number tells you almost nothing.

This problem existed before AI, but AI has made it urgent. A student can now ask ChatGPT to write a complete lab report, including plausible observations, appropriate error analysis, and well-reasoned conclusions, without ever touching the equipment. The report looks correct because AI can synthesise what correct reports look like. But the student has developed none of the skills the lab was designed to teach.

Process-oriented assessment for labs means tracking how a student approaches an experiment. Did they form a hypothesis before beginning? Did they test variables systematically or haphazardly? When results surprised them, did they investigate or ignore? Did they interpret data thoughtfully or jump to conclusions? These questions get at what laboratory education actually aims to develop: scientific thinking.

How WhimsyLabs Has Built Process Assessment

At WhimsyLabs, process-oriented assessment is not a retrofit in response to AI. It is how we designed our platform from the beginning. Our virtual labs capture every action a student takes: which equipment they select, in what order they perform steps, how they respond to unexpected results, and how their technique improves over time. This interaction logging creates a detailed record of scientific thinking in action.

Action logging is the foundation. When a student measures temperature, we record not just the final reading but when they took it, how many readings they made, and whether they waited for the thermometer to stabilise. When they pipette a solution, we track their technique, including angle, speed, and whether they pre-wet the tip. These granular details reveal whether a student is developing proper laboratory habits or simply going through the motions.

Technique grading builds on action logging to assess procedural competence. Our AI tutor, WhimsyCat, evaluates not just outcomes but execution. A student who reaches the correct endpoint in a titration through sloppy technique receives different feedback than one whose technique was precise but who miscalculated the molarity. Both need improvement, but in different ways.

Expert pathway comparison places student behaviour in context. We have mapped how expert scientists approach common experiments, identifying the decision trees and problem-solving patterns that characterise skilled scientific thinking. When a student's approach diverges significantly from expert pathways, it signals an opportunity for targeted guidance. When their approach aligns with expert reasoning even if they reach an incorrect conclusion, it suggests their scientific thinking is developing appropriately.

What the Research Supports

The OECD recommendation aligns with decades of educational research. Black and Wiliam's landmark 1998 review demonstrated that formative assessment, feedback provided during learning rather than after, produces substantial learning gains across subjects and age groups. Their meta-analysis found effect sizes between 0.4 and 0.7, larger than almost any other educational intervention (Black & Wiliam, 1998).

More recent work has focused specifically on scientific reasoning. Research on virtual labs has shown that systems which capture process data can identify misconceptions that traditional assessments miss entirely. Students who produce correct answers through incorrect reasoning receive feedback that prevents those misconceptions from becoming entrenched. Students who reason correctly but make procedural errors receive targeted help rather than generic failure messages.

Arizona State University's Dreamscape Learn programme, which we discussed in a previous post, provides a concrete example. Their immersive biology labs track every student decision, allowing faculty to assess reasoning quality independent of outcome correctness. Early results show that students develop stronger scientific thinking skills when assessed this way, precisely because feedback addresses their actual cognitive process rather than their final submission.

Implications for Schools and Teachers

The OECD's recommendation carries significant implications for how schools approach assessment. The report notes that AI can reduce time spent on administrative tasks by approximately 31%, but only if schools rethink what those tasks involve. If teachers continue to grade products rather than processes, AI provides no meaningful help, and may even increase workload through detection efforts.

Process-oriented assessment requires different infrastructure. Schools need learning environments that capture process data, which is precisely what purpose-built educational technology provides. Off-the-shelf chatbots cannot track student reasoning through a chemistry experiment. Virtual labs designed for process capture can.

The transition also requires professional development. Teachers accustomed to marking lab reports need support in interpreting interaction logs, technique metrics, and pathway analyses. The OECD report explicitly calls for "new skills pathways and training frameworks" to help educators work effectively with AI-enhanced assessment systems. This is not a criticism of teachers but a recognition that the profession is being asked to develop genuinely new capabilities.

The Challenge of Implementation

Moving from outcome to process assessment is not simple. It requires rethinking what grades represent, redesigning rubrics, and helping students understand why the journey matters as much as the destination. Students accustomed to being judged solely on final answers may initially resist assessment systems that evaluate their approach.

There are also concerns about equity. Process-oriented assessment relies on technology that captures student behaviour, which requires devices, connectivity, and technical support that not all schools currently have. The OECD report acknowledges this, noting the emergence of a "second digital divide" based not on access but on quality of use. Students in well-supported environments may use process-capturing tools as sophisticated learning aids while students in under-resourced schools lack access entirely.

WhimsyLabs has designed our platform to run on standard Chromebooks over typical school networks precisely because infrastructure constraints are real. We cannot solve the digital divide, but we can ensure that process-oriented assessment does not require specialised hardware that many schools cannot afford.

What Comes Next

The OECD's endorsement of process-oriented assessment is significant because it provides international validation for an approach that has been building momentum in educational research for years. Schools and policymakers who were waiting for authoritative guidance now have it. The question is no longer whether to shift toward process assessment but how quickly and effectively that shift can occur.

For WhimsyLabs, this moment is encouraging but not surprising. We built our platform around process assessment because the research supported it, because laboratory education demands it, and because we believed that AI would eventually make outcome-only assessment untenable. The OECD's report confirms what we have been working toward: a future where students are assessed on their thinking, not just their answers, and where technology reveals the learning process rather than hiding it.

The challenge for the sector is now implementation. International reports provide direction, but transformation happens in individual schools, with individual teachers, in individual classrooms. We are ready to be part of that transformation.

References

Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education: Principles, Policy & Practice, 5(1), 7-74. https://doi.org/10.3102/00346543068002249
European Platform for Adult Learning in Europe (EPALE). (2026). The Future of Learning: Key Takeaways from the OECD Digital Education Outlook 2026. https://epale.ec.europa.eu/en/blog/future-learning-key-takeaways-oecd-digital-education-outlook-2026
OECD. (2026). OECD Digital Education Outlook 2026: Exploring Effective Uses of Generative AI in Education. OECD Publishing. https://www.oecd.org/en/publications/oecd-digital-education-outlook-2026_062a7394-en.html
O'Kelly, E. (2026). Warning over uncritical AI use in education. RTE News. https://www.rte.ie/news/education/2026/0119/1553973-ai-education/

OECD Says Grade the Process, Not the Product: What This Means for Science Education