3 evaluation signals that distinguish an ESS 6 from a 7
Most IB ESS candidates at Level 5 understand feedback loops and mechanisms. The gap to 6 and 7 lies in a specific evaluative scaffold that Section B rewards — here is what it looks like and how to…
The evaluation gap in ESS Paper 2 Section B
There is a well-known plateau in ESS Paper 2 where candidates know the content, can identify relevant case study examples, and can trace feedback loops with reasonable accuracy — yet consistently land at Level 5 or low 6. The most common explanation offered by students themselves is that they need more examples, better case studies, or a broader knowledge base. In practice, that diagnosis misses the actual problem.
The distinction between a Level 6 and a Level 7 response in ESS Paper 2 Section B is almost never about content coverage. It is about a specific evaluative scaffold — a structural logic that examiners follow when reading your answer — and whether your response demonstrates it. This article examines that scaffold directly: what it contains, how it appears in sample answers, and the concrete habits you can develop to construct it reliably under exam conditions.
Why Section B evaluation works differently from Section A explanation
To understand what the evaluation scaffold does, it helps to see what it replaces. Most candidates enter the Paper 2 examination room after thorough preparation in ESS content: systems thinking, thematic case studies, resource cycles, biodiversity indicators. That preparation is necessary and will carry you to Level 5 comfortably. But Section B questions — the 12-mark extended responses — are not testing whether you can describe. They are testing whether you can judge.
That distinction matters enormously for how you approach revision. When you read a syllabus topic and ask "what do I need to know about this?", you are preparing for Section A. When you read a syllabus topic and ask "what would I need to evaluate this?", "what perspectives could I weigh against each other?", and "what language signals that I am making a judgment rather than stating a fact?" — that is when you start preparing for Section B at the level that reaches 6 and 7.
The three-part evaluation scaffold
Examiners awarding marks in the upper bands of Section B are essentially checking for three structural features in your response. These are not formal mark scheme bullet points in the sense that they are not independently weighted — they are, rather, the underlying logic that makes an extended response evaluative rather than merely descriptive. Understanding them as a scaffold helps because you can consciously construct your answer around them.
Perspective integration
The first structural feature is perspective integration. Section B questions at this level require you to demonstrate that you can hold multiple points of view simultaneously and evaluate them against each other. This does not mean listing stakeholder positions side by side without judgement. It means identifying what different actors value, what different timescales prioritise, and what different spatial scales emphasise — and then using those differences to generate an evaluative conclusion.
In ESS, perspective integration is particularly important because the subject sits across both biophysical and socio-economic systems. A question about a conservation strategy is not only asking about ecological outcomes — it is asking you to weigh ecological outcomes against economic ones, against the interests of indigenous communities, against the obligations of intergenerational justice. Candidates who stay within a single analytical frame — ecological only, or economic only — restrict their access to the upper mark bands.
Criterion-based evaluation
The second feature is criterion-based evaluation. When a question asks you to compare two approaches or assess the success of an intervention, the underlying task is to establish evaluative criteria and apply them. Strong candidates identify the criteria that matter in the specific context of the question, apply those criteria consistently to the options being compared, and allow the criteria to generate a ranking rather than an equal-balance summary.
The practical implication is that you cannot evaluate without having something to evaluate against. If the question asks you to evaluate a policy, you need to establish what "good" means in that context — effectiveness, equity, feasibility, sustainability — and then measure the policy against those standards. If you describe the policy without evaluating it against defined criteria, you have not yet entered the territory that examiners are rewarding.
Explicit evaluative verdict
The third feature is the most frequently missing element in Level 5 responses: an explicit evaluative verdict. The conclusion of your answer must not merely summarise what you have said. It must state which option ranks higher, which perspective carries more weight, or to what extent the evidence supports the claim in the question — and it must do so in evaluative language that signals judgement rather than description.
Phrases such as "more sustainable because," "less effective in the long term because," "preferable given that," and "the evidence suggests that" are evaluative language markers. They signal to the examiner that you are making a judgment. Without them, even a technically accurate response can read as a description with a summary appended rather than a sustained evaluation.
Five evaluation signals examiners look for in upper-band answers
Beyond the scaffold itself, experienced examiners develop a set of signals that reliably indicate a Level 6 or 7 response. These signals are not secret, but candidates rarely engage with them explicitly during revision. Integrating them into your practice makes the difference between hoping your answer reads as evaluative and knowing that it will.
- Named case study evidence is cited with specific detail (location, mechanism, outcome) and explicitly connected to an evaluative claim rather than merely illustrating a descriptive point.
- Comparisons use criterion-based language — "is more effective in terms of X because" — rather than neutral listing — "X does this, Y does that."
- The conclusion makes an explicit ranking or judgement that follows from the criteria applied in the body, rather than stating that both options have strengths and weaknesses without deciding between them.
- Trade-offs are presented as conflicts to be resolved rather than as facts to be catalogued — the candidate takes a position on which trade-off matters more in the specific context.
- The response demonstrates awareness that different spatial or temporal scales can flip the evaluation — a strategy that works at local scale may fail at regional scale, or short-term gains may generate long-term costs.
Common structural patterns that keep candidates at Level 5
The most frequent structural error I see in ESS Paper 2 scripts at the Level 5 boundary is what I call the "parallel" structure. The candidate identifies the two options or perspectives, dedicates one paragraph to each, and ends with a conclusion that reads: "Both options have advantages and disadvantages." This structure is structurally safe and often scores reasonably well on content grounds — it is not wrong. But it is rarely evaluative enough to reach the upper bands.
The reason parallel structure rarely reaches Level 6 is that it withholds the judgement. When you treat two options as equally balanced, you are declining to evaluate. The examiner cannot award marks for evaluation that is not present. The fix is not to write more content — it is to restructure the conclusion so that it ranks the options according to the criteria you have established, using evaluative language that makes your preference explicit.
Another common pattern is the "example dump." The candidate knows extensive case study material — a PES scheme in Costa Rica, REDD+ in Brazil, the Montreal Protocol, protected area expansion in Costa Rica — and includes most of it in the response. But the examples are connected only loosely to evaluative claims. They illustrate rather than prove. Strong candidates select fewer examples and connect each one tightly to a specific evaluative point, showing how the evidence supports the ranking or judgement being argued.
What Level 6 and Level 7 answers look like side by side
A concrete comparison helps. Consider a question that asks candidates to evaluate the effectiveness of two strategies for reducing biodiversity loss in a tropical forest ecosystem: a strictly protected area (PA) versus a community-managed sustainable use zone.
A typical Level 5 response identifies both strategies, describes how each works, notes that both have been used in tropical forest contexts, and concludes that both have merits. Content coverage is solid. The evaluation is present in seed form but never develops into a judgement.
A Level 6 response does something different. It establishes evaluative criteria — effectiveness in reducing habitat loss, equity for local communities, long-term financial sustainability. It applies those criteria to both options. It concludes that the community-managed zone is likely to be more effective in contexts where enforcement capacity is limited, because local buy-in reduces poaching more reliably than external guard forces. The evaluative verdict is present. The scaffold is partially visible.
A Level 7 response adds one more layer. It notes that the community-managed zone's success depends on boundary conditions — it performs well when social cohesion is high and outside logging pressure is moderate, but may fail in contexts of rapid commercialisation or weak internal governance. It compares the two strategies across multiple spatial scales, noting that strictly protected areas can serve as refugia that benefit the broader landscape even when they exclude local communities. It does not retreat into balanced neutrality — it makes an explicit recommendation based on the criteria established — but it qualifies that recommendation with contextual awareness. The evaluation is sustained, criterion-based, and explicit.
The command terms that require evaluation — and what each one demands
ESS Paper 2 questions are built around a set of command terms that define the type of evaluation required. Understanding what each command term demands is not a vocabulary exercise — it is a structural instruction that tells you how to build your answer.
| Command term | Evaluative demand | Common mistake |
|---|---|---|
| Evaluate | Make an overall judgement using evidence or reasoned argument; determine the value or merit of something relative to alternatives | Describing features without making a judgement; writing "in conclusion, this has both advantages and disadvantages" |
| Discuss | Offer a considered and balanced review that includes a range of different arguments, factors, or perspectives; show how they interact or conflict | Listing arguments in isolation without showing how they relate to each other; presenting one dominant perspective without acknowledging others |
| Examine | Inspect and investigate in detail; look closely at the structure and components of an argument, system, or process | General description without close analysis; missing the "inspect" component and staying at a surface level |
| Compare and contrast | Give an account of the similarities and differences between two or more items; must include an evaluative conclusion about which is preferable and why | Describing each item separately without making direct comparisons; stating similarities and differences without ranking |
| Justify | Give reasons to support a proposition, decision, or conclusion; defend one position against counterarguments | Describing features that support your position without acknowledging or refuting opposing views |
| To what extent | Consider the merits of a proposition; determine how far something is true or applicable; requires both supporting and opposing evidence | Agreeing or disagreeing without sufficient evidence; not engaging with the degree element of the question |
The key insight here is that every command term in Section B carries an implicit evaluative instruction. "Evaluate" and "to what extent" are the most demanding because they require an explicit verdict. "Discuss" requires you to show how perspectives interact rather than simply listing them. "Compare and contrast" requires a ranking. The common thread is that all of them demand more than description.
A practical revision strategy for building the evaluation scaffold
Developing the evaluation scaffold is not a content problem — it is a habit problem. The habits that produce Level 7 responses are learnable, but they require deliberate practice that goes beyond revising content.
Start by working backwards from the question. When you read an ESS topic during revision — any topic from the five themes — do not ask only "what do I know about this?" Ask also: "if I were asked to evaluate this, what would I need?" The answer is usually: one or two named examples with specific evidence, at least two evaluative criteria that are relevant to the context, and the evaluative language to signal that you are making a judgement rather than stating a fact.
Practice with past questions using self-assessment. After writing a response to a Paper 2 question, go back and check your answer against the three-part scaffold. Does your answer integrate multiple perspectives, or does it stay within a single frame? Does it establish and apply evaluative criteria, or does it describe features without ranking them? Does it conclude with an explicit evaluative verdict, or does it end with a balanced summary that declines to choose?
Focus on the conclusion first. In my experience, most candidates who are close to Level 6 but not quite there have a body that is closer to Level 7 than they realise — the evaluative moves are present but the conclusion does not crystallise them. Rewriting the conclusion to make the evaluative verdict explicit is often the single most efficient improvement you can make to a practice response.
Finally, build your bank of evaluative language. Phrases such as "more effective in terms of X because," "preferable given that Y," "the evidence suggests," "this is less convincing because," and "while Z is significant, it is outweighed by" are not filler — they are the evaluative infrastructure of a high-scoring response. When you practise writing conclusions, deliberately insert this language and check that your conclusion is making a claim rather than a summary.
Conclusion and next steps
The evaluative scaffold that separates a Level 6 from a Level 7 in ESS Paper 2 is not mysterious. It has three components: multiple perspectives integrated rather than listed, evaluative criteria applied rather than implied, and an explicit verdict delivered rather than withheld. What makes it difficult to acquire is that it requires a shift in how you approach revision — from content acquisition to structural habit formation.
You are not looking to learn more ESS content. If you are consistently scoring at Level 5, your content knowledge is sufficient. You are looking to build the habit of evaluation: identifying criteria, applying them, and stating your conclusion in language that signals judgement. That habit is built through practice, not through further reading.
If you are working through ESS Paper 2 practice questions and finding that your evaluations are landing at Level 5 despite solid content knowledge, the gap is structural. IB Courses' one-to-one ESS tutoring examines each student's Paper 2 scripts against the Level 6 and 7 descriptors, identifies where the evaluative scaffold is present but incomplete, and builds a targeted practice programme that closes that specific gap.