Scoring and Interpretation
Statement of Purpose
The PAEA End of Curriculum exam is an objective, standardized evaluation of a student’s medical knowledge as one component of their readiness for graduation. It is important that programs have a clear understanding of the characteristics, meaning, intended interpretation, as well as the limitations of the End of Curriculum scoring and analytical reports. Scores should only be used to evaluate a student’s medical knowledge as one component of their readiness for graduation.
Programs are cautioned against interpreting the scoring and analytical reports outside of the statement of purpose. The End of Curriculum exam should not be used as a stand-alone summative evaluation. A single multiple-choice exam is not capable of addressing and measuring the full scope of graduation readiness. Research suggests that multiple assessments, and assessment types, increase the reliability that programs are truly measuring the skill or knowledge they are intending to measure.
Scale Scores and Form Equation
The End of Curriculum exam utilizes scale scores. Scale scores are scores that have been mathematically transformed from one set of numbers (i.e., the raw score) to another set of numbers (i.e. the scale score), in order to make them more easily comparable between different forms of the same exam. Doing so, allows for a single performance report even though there are two forms of the exam—a process known as equating. The primary benefit to scale scores is that they allow all scores on all versions and forms of End of Curriculum to be comparable across years as they all use the same scale metric. The scale for the End of Curriculum is 1200 to 1800.
Policy Level Descriptors
For this exam policy level descriptors were developed and will be identified on score reports. Policy level descriptors are “policy definitions that determine how rigorous and challenging the standards will be for the assessments”. They are not linked to content but are more general statements that assert an organization’s position on the desired level of performance or rigor intended at each level.” The following policy level descriptors were developed for the End of Curriculum exam.
|Limited Medical Knowledge||Satisfactory Medical Knowledge||Advanced Medical Knowledge|
|Policy Level||The learner at the limited performance level demonstrates a partial understanding of general medical knowledge.||The learner at the satisfactory performance level demonstrates a sufficient understanding of general medical knowledge.||The learner at the advanced performance level demonstrates a comprehensive understanding of general medical knowledge.|
Standard SettingThe End of Curriculum exam uses categorical performance levels to indicate whether the student has limited, satisfactory, or advanced medical knowledge. These levels are defined at a broad level by policy level descriptors. The performance standards were determined by a 21-member workgroup and approved by the PAEA Board of Directors following two rounds of pilot testing and item validation. On the scale from 1200 to 1800, the performance standard between limited and satisfactory medical knowledge is 1400 and the performance standard between satisfactory and advanced medical knowledge is 1525.
- The scale score for the learner at the limited performance level is between 1200-1399
- The scale score for the learner at the satisfactory performance level is between 1400-1524
- The scale score for the learner at the advanced performance level is between 1525-1800
Every student receives a scale score and a graphical representation of their score compared to the national average by performance level. Feedback is also provided by content and task areas, as well as by entrustable professional activity (EPA), patient care setting, life course, and Bloom’s Taxonomy level—all with national comparison data. This is a summative assessment so keyword feedback is not provided. All of this information is included in the program’s composite and cohort performance reports, which includes cohort means.
Interpret with Caution
For the PAEA End of Curriculum exam, categorical scores and policy level descriptors will be provided to programs. PAEA urges programs to interpret and use these categorical scores, as well as national comparison data, carefully and thoughtfully, as a variety of factors can influence individual and cohort-level performance. It is up to individual programs to decide how to use the categorical scores and to establish the requirements for successful completion. In addition, individual programs determine the weight of the assessment that is most appropriate for their program.
In addition, data on sub-scales should be interpreted with caution as they may be associated with a small number of items with varying levels of difficulty. They do, however, have value if used to assess a content section on multiple exams. For example, if sub-scores in hematology are consistently low across locally developed exams, PAEA End of Rotation exams and/or PACKRAT, and the PAEA End of Curriculum exam, it may be valuable to reflect on the program’s curriculum to determine if hematology is covered in sufficient detail during the didactic phase of the program.
Data triangulation is a useful way to validate that a student has met a specific outcome relative to, and required by, your program. Triangulation takes multiple forms of assessment tools and ensures that the results match and/or are consistent with one another. While these forms of assessment are not exactly the same, a student who is ready to graduate should achieve consistent and similar scores on all three assessments.
For each End of Curriculum exam, scale scores can be compared between exam forms and across cohort years. We recommend that programs look at cohort data, examine cohort trends year over year, and compare them to the national exam data and national trends year over year.