Imagine that you are a traveler on a long journey and have arrived at the entrance to a bridge, where a troll stops you and your fellow traveler. He says that, in order to cross, you each must answer a science question correctly. He pulls out a card and asks your friend, “How long does it take for the Earth to spin once on its axis?”
Your companion easily answers the question and walks across the bridge.
When it’s your turn, the troll asks, “How many dwarf planets are in the solar system?” You are surprised at the difficulty of the question — you have no idea and guess the wrong answer. You are told you cannot cross the bridge.
Exasperated, you exclaim, “Why was my test so hard and hers so easy?”
It’s clear that the troll doesn’t want just anyone crossing the bridge, just like you don’t want to create an exam that students think is too easy. Let’s say that you want the average traveler to be able to answer 70-75% of the questions correctly in order to be allowed to “cross the bridge.” Well, that’s where Item Response Theory (IRT) comes in. It allows us to build an End of RotationTM exam to this pre-specified difficulty level.
What Is Item Response Theory?
The troll clearly never took a statistics class in Item Response Theory. If he had used IRT, the different questions he asked each traveler would have been equivalent in difficulty. Unlike the troll’s assumption that all his questions were equal, IRT assumes that questions have varying levels of difficulty and different probabilities of being answered correctly that are conditional on test-taker ability.
IRT is a psychometric framework —“psychometrics” being the science of measuring mental capacities — that allows us to build an exam to a pre-specified difficulty level. This is done through pre-testing 20 questions for the End of Rotation exams with cohorts of PA students. Having a sample of students answer each pre-test question allows us to calculate the probability of answering each question correctly, given a student’s ability.
In this way, the difficulty of the questions, as well as the common ability of PA students, can be pre-determined and directly compared. This allows all students, receiving different questions, to be placed on a level playing field.
Use of IRT is a best practice in the assessment field. Many other well-established, high-stakes exams use IRT, including the GRE (Graduate Record Examinations), GMAT (Graduate Management Admission Test), and the PANCE (Physician Assistant National Certifying Examination).
Why You Should Care
The pre-testing process of IRT allows for stability of the exam questions and forms. This means that a student can be assured they are taking an exam that is equal in difficulty to those of their peers — even if taking a different form of the same exam.
The equivalency in difficulty also means that scores on the exams can be compared from year to year and against national averages. For faculty and program directors, this means that class performance can be compared between cohorts and against national averages as well.
And with each passing year, as more cohorts of PA students take the End of Rotation exams, the accuracy of our data increases.
Over the past two years, PA students have taken more than 30,000 End of RotationTM exams. This has allowed PAEA to build a question bank of pre-tested, validated questions written by PA content experts. Thanks to those students and PA programs around the country, we now have enough questions and data to roll out fully functional IRT-based exams.
PAEA is constantly working to provide the best PA clinical exams for our members. Through the use of IRT, we are creating exams that are increasingly accurate as assessment tools for measuring PA student knowledge.
With the expertise of psychometricians at the National Commission on Certification of Physician Assistants (NCCPA), PAEA has instituted IRT going forward as the measurement framework for our End of Rotation exams — starting with the release of Version 3 on July 8.
In addition to statistics on national student performance averages and standard deviation — data we have provided for the last two years — we also will begin sharing standard error of measurement and reliability statistics for each of the exams. Finally, for those of you who really want to look under the hood and understand the statistical and psychometric mechanics of the exam, we will be providing more in-depth data and analysis in the coming months.
For more information about the End of Rotation exams, visit our website at www.endofrotation.org.