Quarterly Newsletter
Keep me Informed!

 

 

Benchmarking: Benefits and Lessons Learned

Public-sector organizations have become increasingly committed to service quality and to measuring their own performance.  While these measurement and assessment initiatives have been very valuable, they have been primarily focused on individual programs and not on comparisons with other organizations.

Such comparisons - or benchmarking - can be extremely effective in helping organizations learn from one another. Through measurement and benchmarking, organizations can assess progress, understand areas for improvement, and identify good practice.

One of the primary tools that public- and private-sector organizations have adopted in measuring and tracking performance is the customer/client/citizen survey.  Using the results of these surveys, organizations can get a picture of how citizens view and experience services.  The value of such a survey is limited, however, without something to compare the results against.  In short, the organization needs to benchmark their results against another set of numbers - expectations, goals, past performance, an industry standard, or the performance of peers.

In recent years, the public sector has seen a number of measurement and benchmarking initiatives related to service delivery.  For example:

  • The American Customer Service Index (ACSI), a national indicator of customer satisfaction with the quality of goods and services available to household consumers in the United States, has been adapted to measure customer satisfaction with U.S. federal government agencies;
  • The People's Panel, a representative sample of UK citizens who serve as a national focus group, has been used to measure citizen expectations and satisfaction with public-sector service delivery in the United Kingdom.
  • Citizens First, a biennial national survey, assesses the expectations, priorities, and satisfaction of Canadians with service delivery across channels and levels of government.

With several years of experience in these and other jurisdictions, now is an opportune time to take stock of what we have learned about survey measurement and benchmarking to date.  This review considers some of the lessons that have surfaced and then looks toward what steps the public sector might take to further the service agenda through benchmarking.

Survey Measurement and Benchmarking: Lessons Learned
1. Comparing Apples and Oranges.

As initiatives such as Citizens First and the American Customer Satisfaction Index build up several years of survey results, there is now a growing set of time-series data that permits internal comparison and benchmarking against past performance.  Governments, ministries, and agencies can track their progress, identify strengths and weaknesses, and set goals for improvement.  A closer look at how this data is collected, analyzed, and reported, however, suggests that cross-jurisdictional benchmarking is still challenging and that steps could be taken to improve such comparisons.

a) Difference in Questions

It is a well-worn adage that the question can define (or at least significantly influence) the answer.  Despite their common focus on customer satisfaction, survey efforts in different jurisdictions tend to use different questions.  For example, compare the questions from Citizens First 2000 and the UK People's Panel below.

Both questions are legitimate questions that will reveal valuable information.  However, where the CF question asks the respondent to rate the quality of the service, the UK question asks the respondent to rate their satisfaction with the service.  While related, impressions of quality and of satisfaction are obviously not the same.  Therefore, any comparisons would be imperfect.  The satisfaction rating of 80 (out of 100) that libraries received from the People's Panel is not perfectly comparable to the service quality score of 77 (out of 100) that public libraries received in Citizens First.  This is not to discourage such comparisons, only to highlight the limitations.

In fact, even seemingly insignificant changes to the wording of questions can often impact the answers so it is best to leave questions unchanged over time and across jurisdictions.

b) Difference in Metrics

The validity of a benchmarking exercise can also be affected if two different surveys use different metrics.  Looking back on the example used above, where the CF survey asks the respondent for a rating between 1 and 5, the UK survey does not place numbers beside any of the five descriptors.  Therefore, while the CF survey reports a mean score for each service, the UK survey should only report percentage responses for each category.

A similar problem arises when different surveys use different numeric scales.  Unfortunately the results gathered using a five-point scale are not perfectly comparable to those gathered using a seven- or a ten-point scale.  Sophisticated mathematical formulas can be used to try and compare results gathered using different numeric scales, but such an exercise is less precise than the use of consistent scales.   

c) Difference in Reporting

Even if identical questions and metrics are used, however, problems can also arise associated with how the numbers are analyzed and reported.  The results of a survey that asks respondents to rate the quality of service on a scale from 1 to 5 could be reported in a number of different ways.  For example, the mean (average) score could be reported, the percentage of respondents who selected 5 could be reported, or the aggregate percentage of respondents who selected either 4 or 5 (top two boxes) could be reported.  Once again, none of these reporting choices is necessarily wrong.  How the data is analyzed and reported is primarily a function of what knowledge the organization is interested in gaining.  These different reporting options do, however, make external benchmarking more challenging.

2. Overcoming the Apples and Oranges Dilemma

There are a number of different ways to overcome the problems outlined above.  In their seminal work "Citizen Surveys: How to Do Them, How to Use Them, What they Mean," Miller and Miller use a scale conversion methodology called the "Percent to Maximum" (PTM) scale to compare results from 261 diverse surveys.  Using PTM, Miller and Miller took the results from surveys with different scales and different questions, and converted the results to a single scale.  While the conversion proved robust in testing, this is clearly a complex exercise that the average public sector manager would not want to consider.

A more practical strategy for overcoming barriers to benchmarking is to adopt a common or standardized survey tool, methodology, and reporting structure.  The Common Measurements Tool (CMT) is an example of such an effort.  Offering the user a ready-made bank of questions with which to construct a survey, by using the CMT public-sector managers can be confident of their ability to benchmark with other jurisdictions asking the same questions.  The importance of analyzing and reporting the results of CMT surveys in a consistent fashion should not be underestimated, but the survey tool itself lays a methodologically sound foundation for comparison.

3. Comparability of Business Lines

Traditionally, public-sector organizations which have undertaken satisfaction surveys have used the results to compare and rank the performance of different services.  Recent experience with benchmarking of satisfaction surveys, however, suggests such comparisons are not constructive.  Different service areas seem to be predisposed to different score ranges.  For example, the mean score of park service tends to fall between 70 and 80 (out of 100) while the mean score of tax service tends to be between 55 and 65 (out of 100).  Therefore, simply because a jurisdiction's park service scores higher than its tax service does not mean that it is providing good park service and poor tax service.  In fact, a jurisdiction whose park service scores 68 and whose tax service scores 63 may actually be providing very good tax services and very poor park services.

The reasons for these "predisposed" differences are not clear.  While it would seem that regulatory services or services that citizens use involuntarily might score lower than services that provide benefits, recent survey work has proven that regulatory and enforcement services can also garner high client-satisfaction scores.  More research is required in this area.

Comparing the scores of comparable services in several jurisdictions, we are beginning to get a sense for ordinal ranking of services and their "predisposed" score ranges.

Table 1: Comparison of Service Quality/Satisfaction Survey Results (*)

Service (~)
Citizens First 2000
Citizens First 1998
UK People's Panel (2000)(^)
American Customer Satisfaction Index (2000)
Miller and Miller (1991) (+)
           
Fire Services
80
86
77
-
81
Libraries
77
77
83
-
79
Museums
73
71
76
-
76
Garbage Disposal
72
74
79
74
78
Social Insurance (Benefits)
71
69
69
84
-
Recreational Facilities
71
70
72
-
68
Parks
71
73
75
73
72
Passport
65
66
72
73
72
Police
64
68
67
62
71
Local Transit
58
58
64
-
62
Public School
57
54
82
-
69
Hospitals
55
51
75
-
64
Tax Administration
55
57
64
51
Child Support Services
55
56
47
-
56
Employment Services
54
47
64
-
-
Public Housing
51
52
57
69
54
Road Maintenance
47
45
46
-
58
Courts
44
-
61
-
65


*
All scores reported are on a scale from 0-100.  Citizens First, Citizens First 2000, and UK People's Panel scores are reported as mean scores.  ACSI scores are based on the American Customer Satisfaction Index.  Miller and Miller scores are based on a scale conversion methodology called the "Percent to Maximum" (PTM).

~ Service descriptions vary from jurisdiction to jurisdiction.

^Mean scores from the UK study were calculated from published results and exclude "No Opinion" responses.

+Miller and Miller refers to the report Citizen Surveys: How to Do Them, How to Use Them, What They Mean, written by Thomas I. Miller and Michelle A. Miller and published by the International City/County Management Association (ICMA) in 1991.  A second edition was published in 2001 but does not include updated numbers.

 

Survey Measurement and Benchmarking: Using the Common Measurements Tool

As mentioned above, the Common Measurements Tool (CMT) is a standardized, ready-made bank of questions with which to construct a client satisfaction survey.  If used in a standardized way, it also facilitates benchmarking across time and jurisdictions.  The CMT was first published in 1998.  Over the past 3 + years, there have been more than 50 administrations of the CMT.  While these administrations met with varying degrees of success, each revealed important lessons in how to use and support a common or standardized survey instrument.

As the Institute for Citizen-Centred Service begins to develop a benchmarking database for CMT results, jurisdictions which have used/are using the CMT are being asked to share their datasets.  The following example illustrates how jurisdictions, even in these early days, can derive value from participating in the CMT benchmarking database.

Table 2 lists the results of the CMT's "overall satisfaction" question, both as a mean score and as a top-two-box score, for several organizations who have used the CMT recently.  Even with a limited number of datasets, the ICCS can begin to bring peer organizations together.  For example, Western Economic Diversification (WD) and Manitoba CareerStart (MCS) are both mandated to support their clients (western businesses and students respectively) in development efforts.  Both need to understand and respond to the needs of clients who are pursuing goals beyond the client-government relationship.  At one level, the ICCS can bring these organizations together as two organizations focused on quality service.  At another level, the benchmarking database can facilitate learning and the sharing of good practice between the organizations.  WD and MBS can compare their results across the nine core questions as well as any other questions they asked in common, identifying opportunities for service improvement.  It is important to note that, even when the data is provided on an anonymous basis, organizations can still get a sense for whether they are delivering service that both meets the expectations of their clients and compares with organizations of a similar nature.

Table 2: Comparison of CMT Results 

Organization
Mean
Top Two Boxes
Human Resources Development Canada    
Employment Insurance
77.5*
77%
Income Security
79.5*
79%
Western Economic Diversification
73.5*
77%
Manitoba CareerStart
78.5*
86%
Clients Speak: Single Window Study
81.7*
82%
Business
79.1*
79%
Individuals
84.3*
85%
A Provincial Ministry of Labour
-
79.5%

* Mean Scores were calculated based on published reports from each organization.

Survey Measurement and Benchmarking: Future Directions

Having considered some of the lessons about measurement and benchmarking that have surfaced through recent efforts, it is also useful to look toward what steps the public sector might take to further the service agenda through benchmarking.

1. Benchmarking the Drivers of Satisfaction

Benchmarking is an important tool for assessing service against peers and tracking progress.  It is important to know where you stand relative to the past, to others, and to your goals.  In the end, however, the ultimate purpose of benchmarking is to help public-sector organizations improve service delivery.

By identifying organizations with the highest scores, benchmarking can be used to find good practices from which other jurisdictions can learn.  Given the fact that good practices are always context specific, however, a more valuable exercise for service improvement is to measure and benchmark the drivers of satisfaction.  Citizens First and Citizens First 2000 identified five drivers of satisfaction as:

  • Timeliness
  • Staff Knowledge and Competence
  • Courteous and Friendly Service that "Goes the Extra Mile"
  • Fairness
  • Outcome

The relative importance of the five drivers will inevitably vary by service area (reinforcing the need to benchmark against peer services), but Citizens First 2000 demonstrated the robustness of the five drivers across diverse service areas.  When all of these elements are present in service delivery, citizens rate service quality at 80 (out of 100) or higher.  By measuring and benchmarking how service recipients rate a service on each of these five factors, public-sector organizations can identify their strengths and weaknesses and implement service improvement plans that address them accordingly.  It is with this goal in mind that the CMT provides nine Core Questions focused on the drivers of satisfaction.

2. Benchmarking Channels

Citizens First 2000 revealed that while the drivers of satisfaction are reasonably consistent across service areas (varying in emphasis), the drivers do vary across service channels.  The drivers of satisfaction for face-to-face service delivery are different than those for telephone or Internet service delivery. This revelation is especially important in a service-delivery world that is increasingly defined by its multi-channel nature.

Much more research is required in this area.  Citizens First 2000, however, did offer some insight into these differences.  As Figure 1 details, while the drivers of satisfaction for telephone service overlap significantly with those for face-to-face service, courtesy and fairness do not rank as a statistically significant drivers for telephone services.  The drivers of satisfaction for Internet services, however, are dramatically different than other channels: navigation, outcome, visual appeal, informative, and fast.

 

 

 

 

home | about | contact us | search | français | ^Back to Top^

Institute for Citizen-Centred Service (ICCS) 2004
A world-class centre of expertise and a champion for citizen-centred service