Critical Assessment of the Kano Model, Part 2

This continues an explanation of why the Kano Model is an unjustified way to prioritize product features.

In Part 1, I described how the Kano Model is based on poor survey items and its theory is dubious in product areas that have continual innovation and performance improvement (such as tech products). In this second post, I review some evidence from data, challenge the "success cases" for Kano, and discuss an alternative.

As always, if you disagree, that's OK! I hope researchers will consider evidence when using methods ... and hope that I can add to the evidence and considerations.

Issue 3: Empirically, the Data are Unreliable

My colleague Mario Callegaro and I fielded several surveys to examine the reliability and validity of the Kano Model survey items and the categorical assignments.

I won't go into all of the details (see the whitepaper) but two key questions were the following:

If you ask respondents an identical Kano question twice inside one survey, do they give the same answer? (This is one approach in psychometrics to assess "test-retest" reliability).
If you assign a feature to its Kano category using one sample, and then you check that against another sample, does it get the same assignment? (This is another form of reliability, at the level of an aggregated sample.)

For individual test-retest assessment, we asked respondents to consider a "touch screen" feature on their next mobile phone. We thought that was as certain of a "must have" (aka "table stakes") feature as one could expect ... and sure enough, 68.7% of respondents said that they expect a touch screen on their phone. Here's the item:

However, when we asked them a few other questions and then re-asked the same question, only 39.1% of those same respondents gave the same answer again, that they expected a touch screen. A larger proportion (40.5%) said they would like it.

Thus, for this simple and well-understood feature, 60.9% of respondents gave a different response — implying a different Kano category — when they were asked again. Using various metrics, we found reliability coefficients of 0.61 — 0.73, which imply low to poor reliability; "good" reliability might be 0.80 or higher. (For the complete analysis, see the whitepaper.)

Now, it's conceivable that individuals are unreliable and yet the average sample response — the assigned Kano category for the group — could be stable. So we looked at that. Our procedure was to draw random subsets of different sizes from the large sample, and compare their rates of agreement. For example, if one sample of N=100 assigned a feature to the "must have" category, how often did different samples of N=100 agree with that?

The answer is that the samples did not demonstrate reliable aggregate answers until they had N=200+ respondents. Note that this is only asking about the consistency of the Kano category assignment and not whether it is correct. As noted in Part 1, there are good reasons to think that the Kano categories are not useful in rapidly changing product areas. A more immediate implication is that small-scale Kano Model studies — for example, those with N=20, 30, or even 100 respondents — are highly likely to give different answers when repeated.

Issue 4: The Response Scale is Multidimensional

As noted in Part 1, one could reasonably agree with multiple options on the Kano survey item although it forces a single response. For example:

This confusion is one explanation for the low test-retest reliability we saw above. Respondents select a single answer among multiple feelings and then select something else at another time. It also suggests that the scale is multidimensional — not a uniform scale that is scorable on a single numeric dimension as many Kano Model analyses assume. So we tested that.

To test the dimensionality of the scale, we ran surveys that relaxed the "select one" requirement on the responses and allowed respondents to "choose all that apply". Then we examined the patterns using factor analysis.

The result is that the 5-point Kano response scale has at least 2 factors: a generic "positive" factor and a generic "negative" factor (see the whitepaper for details). Furthermore, those two factors are not directionally opposite — they are positively associated. Respondents who selected "I like it" were MORE likely to select "I dislike it" for the same feature. (There are questions about multi-select item response patterns that I'll set aside here; see the paper.) This suggests that the typical Kano numerical scoring process is unjustified.

If you're familiar with factor analysis results, here is the factor loading matrix (if you're not familiar, skip it or see the whitepaper). The takeaway is that Like and Dislike dominate the factors; they account for a large proportion of the variance (i.e., how people answered); and they are positively correlated contrary to Kano theory.

Those results are impossible to reconcile with Kano Model theory, and strongly suggest that something is going deeply wrong with the survey items. In the whitepaper, we argue that there is a mixture of a multidimensional scale (confusing respondents and leading to unreliable answers) along with acquiescence bias (avoiding negative direction responses). And limitations of the check-all response format that we used (as they say, more research is needed). In any case, this is another reason to suspect that results from a standard Kano Model survey will be incorrect.

But Wait, Aren't There 100s of Kano Successes?

A common anecdotal response to this research has been, "I used Kano for [some project] and got a good answer." Another has been, "I changed it [somehow] and it is better." I'm glad those researchers are happy with their answers ... but from a scientific perspective, I look for stronger evidence than self-reported, post-hoc happiness with the answers.

There are 100s of papers published about the Kano Model, but they are almost all single case studies similar to the anecdotes above. In those papers, the researchers themselves simply assert success for the results of their analysis. Yet we can't determine whether most of the answers were "correct" in terms of their true business outcome! (Think about it: what would count as a real-world confirmation of a Kano category?) Also there is a "file drawer" problem — we don't know how many studies gave poor results. For example, we would never know if another 1000 studies were unpublished because the Kano Model gave an answer that the authors didn't like.

However, there are a few review papers that examine more than 100 published papers about the Kano Model, and they are not reassuring about the science. Some conclusions from those studies are:

"at present, there is still no clear consensus among researchers about the most appropriate [Kano Model] assessment method, and convergent validity between the different methods has not been confirmed" (Mikulić, 2007)

"too much research has simply applied the Kano methodology without discussing its implications for the theory ... it is now necessary to revisit the theoretical foundation of the theory" (Witell, Löfgren, and Dahlgaard, 2013)

"many contributions are recently applying the Kano model in specific contexts without questioning the implications. Other examples are modifying the model, without showing the differences and implications" (Hartmann and Lebherz, 2016)

In short, the published case studies do not address the concerns here. The published success cases could easily be cherry-picked, and therefore they are unconvincing.

Happy Point: There are Alternatives

So, what could one do instead? In the whitepaper, we suggest two options to assess customers' preferences for feature priorities:

Use dimensional, Likert-type scales and plot those against one another.
Use a metric ranking option like MaxDiff.

Here's an example of how the MaxDiff alternative could work. First, use the MaxDiff method (see references) to obtain the "importance" or priority scores for the features of interest. Those might look like the following disguised (but real) scores:

Next, obtain a second dimension to plot those against. For example, we might ask how well our competition is performing on the same features. Then we plot the competitive assessment against the importance of each feature, as in the following real but disguised data from several years ago (Bahna & Chapman, 2018):

When we examine this plot carefully, we see some interesting patterns. For example, the tasks where we are ahead of competition are the tasks with the lowest importance (oh, no!) And the tasks with the highest importance are ones where we are behind our competition (oh, no!)

These kinds of survey data are just as easy to obtain as Kano Model responses; they use better survey methods; they are more flexible; and they yield similarly compelling and interesting results and charts to discuss with stakeholders.

Conclusion: Don't Use Kano

So, why would one use the Kano Model? I suggest not to, and instead to explore one of the more flexible approaches that is grounded in better survey science. OR, if you decide to use the Kano Model, please consider our suggestions in the whitepaper to pre-test the items, assess their reliability, and obtain an adequate sample size.

Thank you for reading the Quant UX Blog ... stay tuned for another post soon!

References

The ideas and illustrations in this post are from:

C Chapman and M Callegaro (May 2022). Kano analysis: A critical survey science review (whitepaper; presentation). In: Proceedings of the 2022 Sawtooth Software Conference, Orlando, FL.

For an introduction to MaxDiff as part of an alternative to Kano, see any of these:

Sawtooth Software (2023). MaxDiff. At https://sawtoothsoftware.com/maxdiff

C Chapman and K Rodden (2023). Quantitative User Experience Research, Chapter 10: "MaxDiff: Prioritizing Features and User Needs." New York: Apress.

E Bahna and C Chapman (2018). Constructed, Adaptive MaxDiff. In: Proceedings of the 2018 Sawtooth Software Conference.

The meta-analyses I cited, with literature reviews of the Kano Model, are:

J Hartmann and M Lebherz (2016). Literature Review of the Kano Model: Development Over Time (1984-2016). Whitepaper, Halmstad University.

J Mikulić (2007). The Kano Model: A Review of its Application in Marketing Research from 1984 to 2006.

L Witell, M Löfgren, and J Dahlgaard. (2013). Theory of attractive quality and the Kano methodology – the past, the present, and the future. Total Quality Management & Business Excellence.

The original Kano paper (in Japanese) is:

N Kano, N Seraku, F Takahashi, and S Tsuji. (1984) Attractive Quality and Must-Be Quality. Journal of the Japanese Society for Quality Control, 14, 147-156.

A popular guide to applied Kano analysis, and the survey items and scoring, is:

D Zacarias (2015). The Complete Guide to the Kano Model. Online, career.pm/briefings/kano-model