IB History Paper 2: the four-step evaluation framework that separates 6s from 7s
Most IB History candidates understand sources appear in both papers. Fewer understand that Paper 2 demands a structurally different evaluation skill — one that trips up even capable students.
IB History Paper 2 presents a challenge that puzzles even diligent students: you have learned to analyse sources for Paper 1, yet Paper 2 seems to demand something subtly different — and your marks reflect that gap. The confusion rarely stems from a lack of historical knowledge. It stems from a mismatch between the evaluation skill you practised and the evaluation skill the rubric rewards. Understanding that distinction is the single most impactful change you can make to your Paper 2 preparation.
Why Paper 2 source questions operate under different constraints
Paper 1 tests your ability to evaluate sources within a tight interpretive window — usually one prescribed historical topic. You receive a handful of sources and must answer a structured question that asks you to compare, evaluate utility, or assess reliability within that topic's framework. Paper 2, by contrast, embeds source analysis inside essay responses on broader thematic questions across two prescribed topics. The evaluation does not stop an answer; it is woven through a sustained historical argument. This structural difference changes what examiners are watching for in every paragraph you write.
On Paper 1, a strong source evaluation answer typically does the following: identifies provenance features (author, date, purpose, intended audience), assesses reliability using contextual knowledge, and weighs utility by matching the source's specific claims against the question's demands. On Paper 2, that same skill must function as a building block inside a 1,200–1,500 word essay that also requires you to demonstrate conceptual understanding, build a sustained argument, and reach an evaluative judgment. The source does not sit in a separate box — it is integrated into your reasoning at every stage.
The time-pressure dimension
Paper 2 gives you 1 hour 30 minutes for two essays. That is 45 minutes per response on average. You cannot afford to spend the first ten minutes dissecting a source in isolation the way you might on Paper 1. The evaluation must be quick, purposeful, and directly connected to the line of argument you are developing. Students who approach Paper 2 source questions the same way they approach Paper 1 often run out of time or produce disconnected commentary that fails to advance their argument.
What examiners actually assess in Paper 2 source evaluation
The markbands for Paper 2 essays distinguish several levels of performance. At Level 3 and below, candidates typically describe sources, summarise their content, or offer surface-level judgments without sufficient substantiation. At Level 5, candidates demonstrate understanding of both provenance and content, use sources to support an argument, and show awareness of limitations. At Level 7, the work goes further: candidates demonstrate historiographical awareness, recognise the nature and purpose of each source as a construction shaped by its context, and use this understanding to qualify, nuance, or directly challenge the source's claims within their argument.
Notice that Level 7 performance requires you to read sources as products of their historical moment — not just as evidence of historical facts. A document from a colonial administrator in the 1880s is not merely a source of information about African societies; it is also a window into the assumptions, power structures, and discursive frameworks of its time. Recognising that dimension and using it critically is what separates the top band from the upper-middle band.
The provenance-utility distinction that matters
Provenance and utility are related but not identical concepts. Provenance refers to who created the source, when, where, why, and for whom. Utility refers to how useful that source is for answering the specific question you are addressing. A source with excellent provenance — a primary document from a key actor — may have limited utility if its content does not speak to the particular question being asked. Conversely, a seemingly humble source — a local newspaper report, an anonymous letter — may be exceptionally useful because it captures perspectives or details that high-status sources omit. The markbands reward candidates who can make this distinction fluently, not those who simply declare a source "reliable" or "unreliable."
A four-step evaluation framework for Paper 2 responses
Experienced IB History tutors and examiners have converged on a repeatable framework that works across the range of Paper 2 questions. It has four components, each of which maps directly to rubric expectations.
- Identify the source's key claim or information in one sentence. Before you evaluate, be precise about what the source actually says. Vague paraphrasing invites vague evaluation.
- Name the provenance factors that shape how the source was constructed. Who is speaking, what was their role, what constraints or purposes shaped their reporting? This is not a formula — you select the factors most relevant to this specific source and question.
- Assess how these provenance factors affect the source's reliability and representativeness. Does the author's position give them privileged access to information, or does it create blind spots? Does the purpose of the source distort what it records?
- Evaluate the source's utility by connecting your provenance analysis directly to the question. This is where your reasoning must be explicit. State clearly whether and why the source helps or limits your ability to answer the question, and note any gaps or biases the source creates.
Practising this sequence under timed conditions is essential. Most candidates can execute steps 1 and 2 fluently when they have unlimited time. Under exam pressure, they collapse steps 3 and 4 into vague statements like "this source is useful because it provides primary evidence" — which earns marks at Level 3 at best.
HL versus SL: what changes in Paper 2 expectations
The Paper 2 rubric is shared across HL and SL, but the depth and sophistication expected differ. In practice, this means three things: HL candidates are held to a higher standard on historiographical awareness, HL candidates typically engage with more complex source material that requires more nuanced interpretation, and HL essays are expected to demonstrate a more sustained evaluative line throughout the entire response rather than a single evaluative conclusion.
| Dimension | SL expectations | HL expectations |
|---|---|---|
| Source integration | Sources used to support or illustrate arguments; clear relevance to the question demonstrated | Sources actively interrogated; their limitations and biases used to qualify or challenge historical claims |
| Historiographical awareness | Awareness that historical accounts are constructed; basic provenance reasoning | Explicit recognition of how different historians have interpreted the same events; ability to position sources within interpretive debates |
| Sustained evaluation | Clear evaluative conclusion; reasoning connects evidence to judgment | Evaluation threaded through the essay; multiple layers of judgment as the argument develops |
| Depth studies required | One from the list of prescribed subjects | Two from the list of prescribed subjects; greater breadth of contextual knowledge expected |
| Command terms | Consistently demonstrates understanding of evaluate, examine, compare | As SL, plus more demanding variants requiring synthesis across topics |
For HL candidates, the additional depth study is not just extra content to memorise — it is additional context you can deploy to assess the provenance and utility of sources. A candidate studying the Cold War who encounters a Soviet diplomatic document will be better placed to evaluate it if they have studied the depth of the Cuban Missile Crisis with full conceptual rigour. The depth study gives you the contextual granularity that makes source evaluation specific and compelling rather than generic.
Command terms: how their specific demands shape your response
IB History Paper 2 questions typically use command terms that require evaluative reasoning rather than descriptive responses. The three most common are evaluate, examine, and compare. Each has a specific meaning that shapes your paragraph structure.
Evaluate requires you to make a reasoned judgment, not merely a balanced description. You must weigh evidence, consider competing interpretations, and arrive at a conclusion that is defensible given the available sources. "Evaluate the effectiveness of League of Nations peacekeeping measures between 1920 and 1936" does not want a description of what the League did — it wants your assessment of whether those measures were effective, supported by specific examples and explicit reasoning about why some evidence counts more than other evidence.
Examine requires you to investigate and analyse a phenomenon by looking closely at its causes, mechanisms, or implications. This command term often signals a causal reasoning task. You might be asked to examine the causes of a specific historical event or the consequences of a particular policy. Your response should trace mechanisms and identify patterns, not just catalogue events.
Compare requires you to identify similarities and differences between two or more cases, and then assess the significance of those patterns. A common error is to produce a side-by-side description and stop there. The evaluative step — why the similarities or differences matter — is what the rubric is measuring.
Integrating command-term awareness into your evaluation framework
When you encounter a source in the context of a question with one of these command terms, the evaluation step of your framework should make explicit reference to the demand. For example, in response to "Evaluate the extent to which Source A supports the view that the Cold War was inevitable," your evaluation of Source A should include a direct judgment about the source's contribution to answering the "extent" question — not just whether the source is reliable, but how much weight it carries in answering the specific evaluative demand.
Common pitfalls and how to avoid them
The following errors appear repeatedly in Paper 2 scripts across examination sessions. Each has a specific cause and a specific remedy.
Reliability treated as binary. Candidates say a source is "reliable" or "unreliable" as though these were simple binary states. In historical scholarship, reliability is always contextual and partial. A source can be reliable on certain points and unreliable on others, credible in its factual claims but biased in its interpretive framework. Replace the binary judgment with a conditional one: "Source B is credible regarding the facts it reports because it comes from a contemporary witness, but its analysis of motivations is shaped by Cold War assumptions that limit its utility for understanding the actor's strategic logic."
Provenance without consequence. Students name provenance factors — "this is a government document" — without explaining how those factors affect what the source says or how much weight it should carry. Naming is not evaluating. The evaluation lives in the consequence: "A government document from this period typically omits references to domestic criticism because officials had institutional incentives to present a unified position."
Source insertion rather than source integration. Some candidates treat sources as inserts — quoted passages that appear in the middle of an argument without being integrated into the reasoning. The source should actively participate in your argument. Its claims should be weighed, tested, and used to support or qualify your evaluation, not dropped in as evidence that the argument exists.
Ignoring counter-evidence from sources. When sources present information that complicates your argument, strong candidates address it directly. Weaker candidates glide past uncomfortable evidence or implicitly assume it away. The evaluative quality of a Paper 2 essay is often demonstrated precisely in how it handles inconvenient evidence.
Descriptive source accounts replacing analysis. Summarising what a source says is not the same as analysing what the source reveals. A Level 5 or above response must do both — and the analysis is where your marks live. Ask yourself: what does this source show about the historical situation that I could not learn from other types of evidence?
The transition from description to evaluation: building evaluative reasoning skills
For many IB History candidates, the gap between a Level 4 and a Level 6 response is not knowledge — it is evaluative reasoning. Description says what happened. Analysis asks why it happened, how it fits a broader pattern, and what it reveals about the forces at work. Evaluation asks what weight this evidence carries relative to other evidence, and whether the interpretive framework we are applying holds up against the source's specific testimony.
Building evaluative reasoning requires deliberate practice with a specific habit: after making any factual claim about historical evidence, ask yourself what would strengthen or weaken that claim, and then engage with that question explicitly in your writing. This habit — sometimes called "testing your evidence" — is what transforms descriptive summaries into evaluative arguments.
When you read historical sources in preparation, begin actively asking: Who produced this? What were they trying to achieve? What would a source from a different perspective look like? How does this fit with other evidence I have encountered? These questions are not supplementary to historical thinking — they are the core of it. The candidates who score at Level 7 are those who have internalised these questions so thoroughly that they apply them automatically in exam conditions.
Preparing for Paper 2: practical strategies
Effective Paper 2 preparation is not primarily about reading more content. It is about developing and refining a specific set of skills that you can deploy under exam conditions. The following strategies address the highest-impact preparation activities.
Practise timed evaluation under exam conditions. Take a past Paper 2 question, read the accompanying sources, and write a complete essay response in 45 minutes. Then critically evaluate your own response against the rubric markbands — not just to see where you lost marks, but to identify the specific gap (source integration, evaluative reasoning, provenance analysis) that caused the loss. Targeted feedback of this kind is far more productive than writing essays and checking them against a mark scheme without analysis.
Build a provenance bank for each prescribed topic. For each prescribed subject you study, develop a one-page summary of the typical provenance categories that appear in sources: who writes government documents in this period and why, what kinds of sources capture popular perspectives, what biases are structurally embedded in official records. This bank does not replace your own reasoning — it accelerates it under exam conditions by giving you a starting point for provenance analysis.
Read historians, not just textbooks. Textbooks summarise what historians have concluded. Historiography — the history of historical interpretation — gives you the raw material for evaluation. When you read an academic article or book chapter that discusses how different historians have interpreted the same events, you are practising exactly the evaluative reasoning Paper 2 demands. Even reading the introduction and conclusion of one monograph per topic will shift how you think about evidence.
Write one practice essay per week with a focus on source integration. Do not simply write essays and submit them. Choose one specific skill — provenance analysis, counter-evidence handling, evaluative conclusion — and make it the explicit focus of that week's practice. Vary the focus across weeks so that by the time the exam arrives, you have targeted each evaluative component deliberately.
Conclusion and next steps
Paper 2 source evaluation is not a specialised skill that applies only to a subset of questions — it is a mode of historical thinking that should inform every paragraph you write. The candidates who score at the top of the rubric are those who have moved past describing sources and begun using sources as instruments of historical reasoning: weighing them, testing them against each other, and deploying their insights and limitations to build and qualify arguments.
The four-step evaluation framework — identify, name, assess, evaluate — gives you a reliable structure for integrating sources into your essays under exam conditions. Commit it to muscle memory through deliberate practice, and it will become the scaffolding that holds your evaluative arguments together when time pressure is highest.
IB Courses' one-to-one IB History programme works through each student's Paper 2 scripts against the rubric and builds a personalised revision plan targeting the specific gaps identified in source evaluation, historiographical awareness, and evaluative reasoning. If you are preparing for the May or November examination session, targeted coaching on this specific skill area is often the highest-leverage investment you can make.
Frequently asked questions
How many sources should I reference in a Paper 2 essay?
There is no fixed number that earns marks — what matters is how purposefully you use each source. As a practical guide, referencing two or three sources with full provenance and utility analysis is far more effective than mentioning five or six sources with only surface-level commentary. Each source you bring in should advance your argument or qualify a counter-argument; sources that exist only to demonstrate that you have read widely tend to score in the middle bands.
Should I evaluate sources in a dedicated paragraph or integrate them throughout my essay?
For most question types, integrating source evaluation throughout your argument is more effective than isolating it in a separate paragraph. A dedicated "source evaluation paragraph" risks making source analysis feel like a separate task from historical reasoning. The strongest responses thread source evaluation through paragraphs that also contain substantive historical argument, so that provenance awareness and content analysis develop in parallel.
How do I handle sources that seem purely descriptive — where there appears to be little to evaluate?
Even descriptive sources offer evaluation opportunities. Ask: who is the intended audience, and what does that imply about what the source chooses to report or omit? What kind of language is used, and what does that reveal about the source's perspective or purpose? Is the description itself an interpretation — a selective account that reflects the values or priorities of its author? These questions transform even factual-looking sources into evaluative terrain.
Does the difference between HL and SL really matter for day-to-day preparation?
Yes, in two specific ways. First, HL candidates have two depth studies rather than one, which means they have more contextual knowledge to deploy when evaluating source provenance — especially for understanding why certain types of sources were created and how they represent particular historical perspectives. Second, HL candidates are expected to sustain evaluative reasoning throughout the entire essay, not just at the conclusion. Building this habit during preparation rather than discovering it in the exam room is a significant advantage.
What is the most common reason Paper 2 essays plateau at Level 5?
In most cases, the plateau occurs because the evaluation remains at the level of comparing sources rather than interrogating them. A Level 5 response compares Source A and Source B, notes where they agree and disagree, and arrives at a balanced conclusion. A Level 7 response goes further: it uses the provenance of each source to explain why they disagree, deploys that explanation to qualify the evidence each provides, and arrives at a judgment that is conditioned by an understanding of how historical accounts are constructed. The move from comparison to interrogation is the critical threshold.