Yesterday’s post about the ORG report earned me two interesting comments from James Gilmour. They spurred me into more research, in particular the Cragg Ross Dawson report he referred to in the first comment. I decided another post on the subject was in order; normal nonsense will resume in due course.
I’ve found two reports by Cragg Ross Dawson; I believe James referred to the ballot paper design research report, but the later STV ballot paper report also makes interesting reading.
As he said, these might not have been “focus groups” in the sense of a bunch of people sitting round a table munching on biscuits, but neither were they proper usability tests.
Usability testing is about asking people to perform tasks in as close as possible to a realistic scenario (no prompting, no helping, no detailed instructions in advance) and observing what they do, and their success (or failure). It gives you objective results rather than the subjective feelings of the Cragg Ross Dawson reports.
Cragg Ross Dawson aren’t usability professionals; they’re a market research company. There’s a huge difference.
Some of my problems with their approach:
- The ‘Topic Guide’ in the first report suggests that test users, after trying out a ballot, were asked questions such as “is it clear to them who and what they were voting for?” and “how clearly does it explain how to use the ballot paper?”. A true usability test observes the test users to answer those questions – watch, don’t ask. People are very bad at explaining this kind of thing, often to the point of self-delusion. They’ll say things were easy when observation showed they had significant problems. When asked why they did something, they’ll invent entirely spurious explanations (not maliciously, but because they were asked and a plausible answer just pops into their head).
- It appears in this case that every test user tried every design of ballot, and then explained which one they preferred and why. This was a bad idea: from the second ballot, they were more familiar with the process and thus biased. To get a fair view of which ballot design was easiest to use, each user should have tried only one design; the success rates of each design could be compared after the test was complete. (And then the best design could be modified and the test performed again with new test users to verify that the new design was better and not worse.)
- Look at section C, ‘Outcome’. In a true usability test this section would summarise the success rates for each design of ballot. It doesn’t; it just reports ‘preferences’ for one design over another. It’s full of phrases like ‘regarded as’, ‘felt that’, ‘thought that’. Which design was most successful – helped most people vote for the candidate(s) they wanted to vote for? It doesn’t say!
I did dig out some actual usability data from the reports:
- First report, section 2.1, “Initial impressions”: “on first sight of the ballot papers most voters looked initially at the list of parties and candidates; on the basis of observation by the moderators, few seemed to start at the top and read the instructions”. And that’s exactly what I would expect to see. It’s been proven time and time again: people don’t read instructions (there are always exceptions, but they are exceptions).
- Second report, Chapter 3: “despite the view that the designs were straightforward, some respondents made mistakes; 13 out of 100 ballot papers were unintentionally spoiled”. Followed by “it is worth noting that of the 13 respondents who spoilt their initial ballot papers, 9 realised their mistakes and corrected subsequent papers – many admitted they had voted before reading instructions carefully”.
That second point is damning. People said that the designs were straightforward, but the reality was different. That’s why true usability tests are so important. The fact that people corrected subsequent papers just confirms my point above: from the second ballot design, they’re biased. Not to mention that in a real election they don’t get a second chance to vote.
The goal of a ballot paper design is to allow voters to vote for the candidate(s) of their choice, and for that vote to be counted, as efficiently as possible. This is easy to test objectively, and to retest with improved designs, until there is sufficient confidence in the results. This wasn’t done. Market research isn’t usability testing.
In the actual election, we know that voters made marks on the ballot paper that were mostly, but not always, valid. How many people successfully voted for the candidate(s) of their choice? We have no idea.