The Kirkpatrick Model for Summative Evaluation of Training
and Instruction
The question of how to identify the effects of training has been an ongoing
challenge for designers as well as training managers. How do we know
the training was successful?
How successful was it? Was there
a quantifiable improvement in performance, or not? How is that measured? In
1975, Donald Kirkpatrick first presented a four-level model of evaluation that
has
become a classic in the training industry. These four levels provide a structure
to measure the effectiveness of instruction dependent on both the complexity
of the training, the transfer of knowledge over time, and the final impact
for the trainees organization.
These levels can be applied to technology-based training as well as to more traditional forms of delivery. Modified labels and descriptions of these steps of summative evaluation follow.
Level One: Students' Reaction
In this first level or step, students are asked to evaluate the training
after completing the program. These are sometimes called smile sheets
or happy sheets
because in their simplest form they measure how well students liked the
training. However, this type of evaluation can reveal valuable data if the
questions
asked are more complex. For example, a survey similar to the one used in
the formative evaluation also could be used with the full student population.
This
questionnaire moves beyond how well the students liked the training to
questions about:
With technology-based training, the survey can be delivered and completed online, and then printed or e-mailed to a training manager. Because this type of evaluation is so easy and cheap to administer, it usually is conducted in most organizations.
Level Two: Learning Results
Level Two in the Kirkpatrick model measures learning results. In other words,
did the students actually learn the knowledge, skills, and attitudes the
program was supposed to teach? To show achievement, have students complete
a pre-test
and post-test, making sure that test items or questions are truly written
to the learning objectives. By summarizing the scores of all students,
trainers
can accurately see the impact that the training intervention had. This type
of evaluation is not as widely conducted as Level One, but is still very
common.
Level Three: Behavior in the Workplace
Students typically score well on post-tests, but the real question is whether
or not any of the new knowledge and skills are retained and transferred
back on the job. Level Three evaluations attempt to answer whether or not
students'
behaviors actually change as a result of new learning. Ideally, this measurement
is conducted three to six months after the training program. By allowing
some time to pass, students have the opportunity to
implement new skills and retention rates can be checked. Observation surveys
are used,
sometimes called behavioral scorecards. Surveys can be completed by the
student, the student's supervisor, individuals who report directly to the
student,
and even the student's customers. For example, survey questions evaluating
a sales
training program might include:
Level Four: Business Results
The fourth level in this model is to evaluate the business impact of the training
program. The only scientific way to isolate training as a variable would be
to isolate a representative control group within the larger student population,
and then rollout the training program, complete the evaluation, and compare
against a business evaluation of the non-trained group. Unfortunately, this
is rarely done because of the difficulty of gathering the business data and
the complexity of isolating the training intervention as a unique variable.
However, even anecdotal data is worth capturing. Below are sample training
programs and the type of business impact data that can be measured.
Sales training. Measure change in sales volume, customer retention, length of sales cycle, profitability on each sale after the training program has been implemented.
Technical training. Measure reduction in calls to the help desk; reduced time to complete reports, forms, or tasks; or improved use of software or systems.
Quality training. Measure a reduction in number of defects.
Safety training. Measure reduction in number or severity of accidents.
Management training. Measure increase in engagement levels of direct-reports