Grading Case Studies

There are as many ways to provide grades for End of Rotation™ exams and overall Supervised Clinical Practice Experiences (SCPEs) as there are programs that deliver the exams. Below are examples, provided by very different programs, that may help your program develop a strategy that works for your faculty. These case studies walk you through their whole process, from data analysis to decision-making discussions, and from communicating with students to plans for follow-up analysis.

Large accelerated program in a health professions college that has been using End of Rotation exams since 2016

Please describe the model your program uses to grade End of Rotation exams. Include how this fits into your broader SCPE evaluation.

We grade student End of Rotation exams by setting performance bands based on a calculation using the standard deviation and national average provided by PAEA for each exam:

A = +1.0 SD above the PAEA national average
A- = +0.5 SD above the PAEA national average
B+ = at the PAEA national average
B = -0.5 SD below the PAEA national average
B- = -1.0 SD below the PAEA national average
C+ = -1.5 SD below the PAEA national average
C = -2.0 SD below the PAEA national average
F = > -2.0 SD below the PAEA national average

When we release the scores to them, they are able to determine how they did immediately.

The final SCPE course grade is determined by successfully completing multiple course elements including preceptor evaluations of the student, successful completion of the PAEA End of Rotation exam, completion of specialty-specific review questions from a question bank, submission of all assignments and paperwork, attendance at and participation in all SCPE activities, and professional demeanor. All students must pass the End of Rotation exam. A failing grade for the End of Rotation exam is a grade less than two standard deviations below the national average. Students who do not pass the End of Rotation exam will be given a remediation assignment and a second opportunity to take the exam. If a student does not pass the second attempt, a final grade of “F” will be assigned. The student will be required to repeat the SCPE in its entirety at a date and time determined by the program.

How long has your program utilized this method?

We’ve used this method since our students started taking the PAEA End of Rotation exams in 2016.

If you changed methods to account for scale scores, how did you adjust? What rationale did you use to justify the change, both within your program and with students?

Because we publish standard deviations rather than actual numbers, we didn’t have to change our methods. Once we had the standard deviation and the national average, it was easy to convert the new scale scores to letter grades just as we did with the raw scores. We informed the students that this change would be occurring before the exams, and that we could convert the scale score back to the raw score if they wanted that information. We told the students that PAEA had made the change to scale scores, so the exams were similar to other national standardized exams. There was no big change at their end, and their exams went smoothly. No students were concerned about the scores or asked for their raw score.

As part of our ongoing self-assessment, we have a system in place by which we review trends of multiple assessment methods on student performance, one of which is on the End of Rotation exams. We use these trends to help establish our program benchmark on End of Rotation exam performance. Also, we informally correlate End of Rotation exam performance with the students’ performance on their didactic year exams to compare consistency in performance. Now that we have the ability to convert older scores, we’re going to take some time to look over all of the years and do a long-term assessment.

(Editor’s Note: This program gave Version 6 of the End of Rotation exams on the first day of publication. If you have any questions about this process in particular, you may contact Nicole Dettmann at Massachusetts College of Pharmacy and Health Sciences – Manchester/Worcester.)

A mid-size program in an academic medical center that has used End of Rotation exams since 2014

Please describe the model your program uses to grade End of Rotation exams. Include how this fits into your broader SCPE evaluation.

Our clinical faculty sets a passing bar for each PAEA End of Rotation exam at the beginning of the clinical year by looking at the prior cohort’s scores on all seven End of Rotation exams and PANCE together. We convert the scores on each of those exams to Z-scores so we can compare the results on a single metric. We then evaluate these data points together on an Excel spreadsheet. The passing bar is usually very clear and has been stable, around 1.4 standard deviations below the mean, for the last several years. Based on cohort trends, our clinical faculty has been considering a higher bar and will likely move the passing bar this coming year closer to 1.0 standard deviation below the mean.

We provide the passing scores for each exam to students at the beginning of the clinical year so they know what standard they need to meet. When PAEA publishes its national means and standard deviations ahead of each new version, we use that information to calculate and update the table of passing scores that corresponds to each unique End of Rotation exam form. The passing score is currently calculated using the unique means and standard deviations for each End of Rotation exam form (mean minus 1.4 x SD).

We have a proactive remediation program for students who fall below the passing bar. Because we publish the passing bar at the beginning of each year, students know right away if they failed an exam. If a student fails an End of Rotation exam, they meet with our clinical evaluation faculty to go over their results, including keyword feedback, to determine the most appropriate remediation plan. In addition to study, the plan typically includes using the End of Rotation exam’s keyword feedback to build two 60-question self-assessment quizzes using our online question bank service and achieve an agreed-upon score. Once they have completed the remediation, the student has to take the second form of the End of Rotation exam within 10 days.

How did your program decide on this method?

We did not want to set an arbitrary passing bar or use the mean as a passing bar. We chose the Z-score method (and the deliberative process that goes with it) as it tells us how far from the mean the score is (the number of standard deviations) and allows us to look at exams with different means on the same metric. We do a combined grade for the student’s SCPE assessment using a number of components (such as the preceptor evaluation and their End of Rotation exam). Students must pass every component of the course grade to pass the SCPE.

How long has your program utilized this method?

We’ve used this method from the very beginning (2014). The method doesn’t change, but the passing bar could be different year-to-year.

If you changed methods to account for scale scores, how did you adjust? What rationale did you use to justify the change, both within your program and with students?

We are using the scale score calculator to allow us to stay with the previously set raw-score passing bar for the rest of this class, as there are only two SCPEs left. We’ll switch to using scale scores once we can implement the new scale-based passing bar, with the start of the next cohort’s clinical year. We will use the exact same method to determine and publish the passing bar, but we will publish scale scores going forward. This will be easier for both faculty and students since there is only one passing score for each exam, as opposed to a scale score for each individual exam form. We also believe it will help students understand scale scores as they approach their PANCE exam.

(Editor’s Note: For any questions, please contact Jennie Coombs at University of Utah.)

A program located at an academic medical center that has been using End of Rotation exams since 2015

Please describe the model your program uses to grade End of Rotation exams. Include how this fits into your broader SCPE evaluation.

(Editor’s Note: In practice, the model this program uses is very similar to the previous pass/fail case. The differences are in the process and implementation plan.)

We grade End of Rotation exams using a pass/fail system. Our goal is to set a passing bar for each End of Rotation exam that is just high enough to identify students who have not yet picked up adequate knowledge from a core rotation to be on track to pass the PANCE. We validated this approach in part through a regression analysis using Version 1 (and later adding Version 2) scores to predict PANCE scores. We found that End of Rotation exam scores account for a substantial portion of the variance in PANCE scores for students. (That analysis, which included data from several PA programs, has been published in JPAE.)

Starting with Version 2 of the End of Rotation exams, we have chosen the passing scores for each exam by comparing the score distribution of our own students with the national score distribution. Our academic coordinator chooses a multiple of the national standard deviation (usually between 0.5 and 1.5) that is subtracted from the national mean to calculate the passing score for each End of Rotation exam version. This bar is calibrated so that only “outlier” scores will be identified as failing. (Note that our goal is to identify students whose scores are local outliers based on our previous years of graduated students, not necessarily national outliers — that’s why we consider both internal and national score distributions.)

Additionally, our passing scores for End of Rotation exams taken in the first half of the clinical year are intentionally set lower than our passing scores for exams taken in the second half of the clinical year, since most students’ scores on all core End of Rotation exams increase as they build their comprehensive fund of knowledge throughout the clinical phase of their education. (For example, we might set the first-half passing score for an End of Rotation exam at the national mean minus 1.5 SD, and the second-half passing score for the same exam at the national mean minus 1.0 SD, which we have found identifies the lowest performers.)

Then, in order to pass each SCPE, our students must pass the End of Rotation exam for that subject area and also must meet a number of other specific requirements, such as completing logging requirements, online interactive cases (for certain rotations), and evaluations. All SCPE’s are graded pass/fail. Preceptor evaluations of student performance provide us with valuable information, but are not used by our program as part of the pass/fail decision because that approach introduces a degree of bias and subjectivity that we consider unacceptable for a high-stakes assessment.

How did your program decide on this method?

This method is an extension of the method our program has used for over 20 years. Before the PAEA End of Rotation exams with their national score distributions were available, we used student data from the most recent five to eight years of scores on internally-developed rotation exams to set the passing scores. We continue to use our internal score distribution data to set passing scores for elective rotation exams.

If you changed methods to account for scale scores, how did you adjust? What rationale did you use to justify the change, both within your program and with students?

We used the same general method for setting specific passing scores for the scale scores. The distributions for the scale scores are different from those of the raw scores, so we chose a practical “trial and error” approach to calibrating the passing scores. Specifically, we used the online scale score conversion tool to ensure that the standard deviation calculation we set for the new scale corresponds as closely as possible with appropriate “raw” failing scores on Versions 2 through 4 of the End of Rotation exams that had been taken by students who have already graduated and taken the PANCE. In other words, we identified passing bars with the new scale scores that would have resulted in failing the “correct” number of students on each exam, in retrospect.

Communicating with students about the change:

The new scale score passing bars were added to the table of passing scores that we had already published for our current clinical phase students. The table has the raw score passing bars for Version 5, and we added the scale score passing bars for Version 6. We notified students by email that the PAEA End of Rotation exams were moving to scale scores, but that we were keeping the same general approach for identifying any given score as “pass” or “fail.” We consciously tried not to make a big deal out of the change, as it really shouldn’t affect students’ lives very much. The methodology is the same, but the numbers are different, just as they would be for moving between different versions in any other year.

Future plan:

We will re-evaluate how the scale score passing bars function in three or four months, at which point we may decide to adjust them. When PANCE scores are available for this cohort of students, we will have a better sense of whether the pass/fail point is set appropriately in light of our goal of only “catching” students who are at-risk.

If the system is working well, we will be able to keep the same scale score passing bars across multiple years of exams and multiple student cohorts (since all the adjustments for difficulty differences and differently-shaped score distributions are done behind the scenes by the psychometricians who calculate the scale scores).

A long-time user program in a liberal arts college with a cohort of 55

Please describe the model your program uses to grade End of Rotation exams. Include how this fits into your broader SCPE evaluation.

In the past, we set a single passing bar for all End of Rotation exams: 70% was passing, below 70% was failing and the student took the exam again using another form. If they failed the second exam, they were required to repeat the rotation.

When the End of Rotation exams were changed to scale scores last summer, we were encouraged by PAEA not to try to calculate the raw score. We decided to use the scale score to determine the passing score, which for us was 1.5 SD below the mean for each exam because that approximately correlates to the program’s pass/fail benchmark for the PANCE in the old method – if we were to convert, the old benchmark would fit.

We have kept the same remediation policy in place for students who fall below that benchmark. We also instituted a “low pass” category for students with scores that fall between 1 and 1.5 SD below the mean. These students are required to submit “topic lists” that provide outlines for the keyword feedback topics they missed on the exam. (Ed: PAEA has copies of the instruction and template documents available upon request. Essentially, they ask for a description of the disease process for the diagnosis keyword and the related topic area that was missed.)

The next problem we faced was that our grading system did not accept a pass/fail designation. Furthermore, we felt that it would not be fair to give everyone the same percentage grade no matter how they scored on the End of Rotation exam.

So we created a table for each End of Rotation exam that lists which scale score would receive which percentage grade.

1.5 SD below the mean = 80%
Mean = 90%
1.5 SD above the mean = 100%
Starting with 80%, we then add 1% for each increase in the scaled score of 2–4 points. Using Internal Medicine as an example, the percentage grades would be:

370–372 = 80%
373–376 = 81%
377–379 = 82%
Up to 405 = 90%
> 439 = 100%
With the table, it takes about 10 minutes to convert the scale scores into a percentage for grading. Each student is thus given a score relative to how they did on the exam.

We used the ExamDriver conversion tool to find the raw score for a given scale score to verify our grading scale. However, it takes a lot of work and time to do so for all of our students. We found that for each exam, the scale score 1.5 SD below the mean translated to a 68–71% raw score (depending on the exam), which matched the grading scale we had used before.

How did your program decide on this method?

The standard deviation calculations are the same; we just adjusted the numerical values to the scale for our percent-based grades. These changes were made by the faculty in a group discussion and in order to fit into our long-standing grading policies.

How long has your program utilized this method?

This specific method has been in place since September 2018 — the first clinical cohort that did not overlap the two scoring methods.

If you changed methods to account for scale scores, how did you adjust? What rationale did you use to justify the change, both within your program and with students?

After doing this now for four End of Rotation exam call-backs, I have realized that the grades are a bit more inflated than I am comfortable with. It is not terrible, and I don’t want to change it for students in the middle of the year. But, for next year I am going to adjust the grading tables to make the score 1.5 SD below the mean = 70%, and the mean = 85%. I think anyone who gets > 1.5 SD above the mean deserves a 100%.

This has been explained to the students so they understand that the percentage is not their raw score, and I have not had any pushback.

(Editor’s Note: If you have any questions about this process in particular, you may contact Scott Gardner at Kettering College.)

 

A long-time user program in a private university with a cohort of 25 students.

Please describe the model your program uses to grade End of Rotation exams. Include how this fits into your broader SCPE evaluation.
We grade student End of Rotation exams by setting performance bands based on a calculation using the standard (SD) deviation and national average provided by PAEA for each exam type:

A = > +1.5 SD above the PAEA national average
A- = +1.0 SD above the PAEA national average
B+ = +0.5 SD above the PAEA national average
B = at the PAEA national average and < 0.5 SD above the PAEA national average
B- = -0.5 SD below the PAEA national average
C+ = -1.0 SD below the PAEA national average
C = > -1.5 but <-2.0 SD below the PAEA national average
F = > -2.0 SD below the PAEA national average

Each performance band’s individual scale scores have been assigned range groupings and a percentage grade. We subjectively decided to assign a letter grade of “B” to the scale scores between the PAEA national average and 0.5 SD above it, as we feel that this letter grade represents that the student has average medical knowledge compared to peers. From here, we could move forward and calculate the performance bands and associations to letter grades above and below the national mean.

After establishing the performance bands, each letter grade must be broken down into individual scale scores and percent grade associations. Our university utilizes a 4.0 grading scale where percentages range from 0-100%. In our program, any grade below a 73% converts to a letter grade of “F.” For this reason, we utilized the percentages 65-100% to divide among the individual performance bands.

To create the individual scale score and percent grade associations, we broke down the percent grades that are included within each letter grade.

Example: Letter grade A = 100%, 99%, 98%, 97%, 96%, 95%, 94%, and 93%

Then, you assign a scale score to the highest and lowest percent in the set. For 100% we use “500” and for 65% we use “300.” To determine the scale score for the “A” grade, you need to look at your performance band definition. We define “A” as being > +1.5 SD above the PAEA national average. In the Family Medicine End of Rotation exam, this number would be “441” when rounded to a whole number. This will be the number of your lowest percent to achieve an A. You would repeat this in similar fashion for all performance bands once calculating the highest percent in the set.

Our next band is +1.0 SD above the national average. Now, as the previous band started at +1.5 SD, this band also includes any scale score that is <+1.5 SD above the PAEA national average. To get the top percent grade/scale number association, you subtract a number “1” from the lowest percent number/scale score association in the performance band prior. In the Family Medicine End of Rotation exam, the top-scale score for the second performance band would be “440.” You would repeat this in similar fashion for all performance bands.

Once you have the highest and lowest percent grade for each performance band, you can then calculate the ranges of scaled scores for each specific percentage. To do this, you need to take the highest scale score and subtract the lowest scale score. With this number you then divide it by the number of individual percent grades within a specific performance band.

Example:

A = > +1.5 SD above the PAEA national average
Canvas Grade Scale Score
100.00 500- 1
99.00 2
98.00 3
97.00 4
96.00 5
95.00 6
94.00 7
93.00 -441 8
500-441 = 59
59 /8 = 7.37 (rounded is approximately 7)

After a number has been obtained in division, use this to calculate the range of scale scores for each individual percent grade. Start with your lowest scale score and add this number to get the top of the range. For the next range score, add “1” to get the lowest number and repeat the previous steps.

Note: There are times that you may need to adjust the number used for the addition step in calculation of the range depending on how the grading percents and letter grades are broken down.

Continue to do the above steps for all grade ranges but your lowest grade (in our case, this is an “F”). For the performance band that is in the lowest letter grade category, instead of working from the bottom up you work from the top down. Still obtain the number for addition as stated above, but this instead will be used for subtraction from the highest number.

Example:

Family Medicine EORE National Scale Score μ (Mean): 403
σ (Standard Deviation): 25A = >
+1.5 SD above the PAEA national average
Canvas Grade Scale Score
100.00 500-497
99.00 496-489
98.00 488-481
97.00 480-473
96.00 472-465
95.00 464-457
94.00 456-449
93.00 448-441
A- = +1.0 SD above the PAEA national average
Canvas Grade Scale Score
92.00 440-437
91.00 436-432
90.00 431-428

When we release the scores to students, they are able to determine how they did immediately as End of Rotation exam scoring rubrics/charts have been made available to them.

The final SCPE course grade is determined by successfully completing multiple course elements including preceptor evaluations of the student, successful completion of the PAEA End of Rotation exam, completion of specialty-specific documentation, clinical tracking, submission of all assignments and paperwork, attendance at and participation in all SCPE activities, and professional demeanor.

All students are required to pass the End of Rotation exam. An End of Rotation exam score between 73% and 82.99% results in a mandatory keyword feedback assignment, and the student also has the option of retaking the End of Rotation exam. An End of Rotation exam score of less than 73% results in a mandatory keyword feedback assignment, and the student must also retake the failed End of Rotation exam. A second failure of the same End of Rotation exam type is grounds for dismissal from the program. End of Rotation exam scores from retaken exams are averaged with the initial score received.

How long has your program utilized this method?

This method was instituted for our students who started taking the PAEA End of Rotation exams in February 2019.

If you changed methods to account for scale scores, how did you adjust? What rationale did you use to justify the change, both within your program and with students?

As we are using standard deviations, we do not have to convert or use Z-scores as done previously by our program. Our students were informed of the grading prior to the examinations, and we published our grading rubrics for each End of Rotation exam type. We also make the students aware of the national mean and standard deviation for each exam.

The change in grading method has allowed us to grade End of Rotation exams more easily, and we hope to be able to compare data across current, future ,and past cohorts with greater ease and more standardization. We look forward to utilizing End of Rotation exam results to help assess our didactic curriculum, as well as with comparative data analysis with students’ PACKRAT® and with graduating classes’ PANCE results.

(Editor’s Note: If you have any questions about this process in particular, you may contact Priscilla Marsicovetere at Franklin Pierce University.)

A program in a liberal arts college with a cohort of 39 that has been using the exams for a couple of years.

Please describe the model your program uses to grade End of Rotation exams. Include how this fits into your broader SCPE evaluation.

The PAEA End of Rotation exam is just one component of our SCPE evaluation, others being preceptor evaluations, daily case logging, a rotation-specific case write-up or presentation, and professionalism work for each rotation. Students must pass all components to pass the SCPE.

Since our SCPEs are graded courses, the following are the steps we use to arrive at the grade for the exam portion:

  1. Calculate z-scores for each student, z = katex is not defined, where S is the student’s reported score, µ is the reported national average, and σ is the reported standard deviation.
  2. Estimate the grade G of the student by using the formula G = 85 + 10z.
  3. Use the following table to arrive at an actual numerical grade that is used to calculate the final SCPE grade.
Calculated Student Grade
(G = 85+10z)
Assigned Numerical Grade
G < 70 No numerical grade, and remediation and reassessment is required. If failed again, then grade F is assigned as the final SCPE grade.
70 ≤ G < 75 70.
Student is required to submit write-ups on the frequently missed content, which could be a content or task area or a specific diagnosis
75 ≤ G< 80 76
80 ≤ G< 85 80
85 ≤ G< 90 85
90 ≤ G< 95 90
95 ≤ G< 100 95
G ≥ 100 100

How did your program decide on this method?

We chose to flag every student who scores below z = -1 because we decided to identify students for intervention who fall in approximately the lower 15th percentile nationally.

We consider those who score below z = -1.5 on the repeat exam as failing the rotation.

This calculator allows us to pick a target z-score based on a chosen percentage of examinees. For example, a program might select to fail students who place in the lowest 10 percent of all test-takers. To arrive at the z-score that would signify this threshold, enter 10 in “Proportion of Area,” click “One-Sided,” and click “Submit.” The z-score for this target is -1.28. So, a z-score of -1.3 will approximately flag for potential failure everybody who scores in the lowest 10 percent of all test takers nationwide.

This calculator allows one to estimate a student’s percentile rank by z-score. For example, a student scored S = 373 on an exam that has a national average of µ = 405 and standard deviation of σ = 23. His z score is z = katex is not defined = katex is not defined = -1.39. Enter -1.39 into the z-score field of the calculator, click “One-Sided,” and click “Submit.” This student scores in approximately the lower 8th percentile nationwide.

We chose to assign a grade of 100 to everybody with z ≥ 1.5 because these are high performers scoring in approximately the top 7 percent of all test takers.

With the numbers our program has chosen, we had to “map” an interval of three standard deviations over the grade range 70–100. For simplicity of calculations, the national average is placed right in the middle of grade interval (85) and a simple equation of G = 85 +10z is used to estimate grades. For a number of reasons, we believe that grades utilizing this formula should not be used to calculate the final SCPE grade alone. Not only are evaluations that measure areas other than medical knowledge required by accreditation, but the reported reliability for End of Rotation exams is around 80 percent (see inset). Therefore, we use the above table of conversion to calculate the grade. This table helps us flag anyone scoring below z = -1 as needing intervention. Everybody with a score above z = +0.5 will have an “A range” grade and everybody above or below 0.5 standard deviations from average scoring will have a “B range” of grades.

How long has your program utilized this method?

Less than a year.

If you changed methods to account for scale scores, how did you adjust? What rationale did you use to justify the change, both within your program and with students?

We did not change anything for the cohort that graduated in May, as their scale had been set when the transition was made. We used z-scores on reported raw scores, and we continued to do it with the current class using the provided grade converter. Starting next year, we will use z-scores calculated from the scale system.

Important note from PAEA Assessment: No single assessment is going to be 100 percent reliable in predicting student knowledge. Think of assessment data as pixels. The more pixels you have, the more meaningful your picture of the student’s achievement level. For a resource on interpreting item and exam statistics, see this article: Garino A, Van Rhee J. Test Item Analysis for the Physician Assistant Educator. Journal of Physician Assistant Education. Washington, DC: PAEA, 2009, 20(3), p 22-27.
To read the article in full, log into the membership portal and click on the JPAE link on the top navigation bar. Then, use the article title in quotation marks to perform a search.