I watched with great interest tonight when US olympian Nastia Liukin tied China's He Kexin on the uneven bars; both gymnasts had identical scores of 16.725. However, I was as surprised as everyone else to see Nastia ranked in 2nd place despite the tie score (tied to the thousandths place!). Nastia would go on to win a perplexing second place (and silver medal) in this event. That's no small feat, but also not the gold many feel she earned.
So, what the hell happened with the score? It turns out that a tie-breaking mechanism is mandated by the International Olympic Committee so that only one country wins each bronze, silver and gold medal; in this case, this mechanism favored China's Kexin.
I asked myself if I could think of a more fair way to do scoring, that would properly align the incentives of the judges. (Among other issues people have with the judging, the rules state that no judge can be from any of the countries participating in the competition. This means that there are no judges from the very countries that are producing world-class athletes!)
My first thought was to try a second-price auction (a/k/a a Vickery auction). In this type of auction, the high bidder wins but pays the second-highest price. (These types of auctions are known to properly align incentives such that participants reveal their true valuation of the good at auction instead of bidding strategically.) In the gymnastics case, we'd want a two-sided Vickery auction mechanism; we'd essentially match the lowest and highest scores with second lowest and highest respectively.
First, let's go over the current scoring mechanism (from the Fédération Internationale de Gymnastique's Code of Points). Each gymnast's score is a combination of a fixed difficulty score (the "A score") combined with a variable execution score (the "B score"). The difficulty score is determined by a panel of judges before the gymnast performs the event; in this case both Liukin and Kexin had a 7.7 on difficulty. The execution score is arrived at by deducting fractional points from the number 10. So, each gymnast could have maxed out at 17.7, if they had been absolutely perfect.
However, the judges scored each gymnast's execution as so (sorted for each gymnast separately):
| Liukin | Kexin | ||
| Poland | 9.3 | Australia | 9.3 |
| Bulgaria | 9.1 | New Zealand | 9.1 |
| Australia | 9.0 | Poland | 9.1 |
| New Zealand | 9.0 | Brazil | 9.0 |
| Brazil | 9.0 | South Africa | 8.9 |
| South Africa | 8.8 | Bulgaria | 8.9 |
Dropping the lowest score and highest score for each gymnast and taking the average yields a 16.725 for both gymnasts, a tie. According to the rules (see the first article linked to above), the first tiebreaker calculation considers only the execution score, discards the highest and lowest values and takes the average of the remaining four. Because the difficulty scores were equal for both gymnasts, this resulted in another tie. The second tiebreaker involves further dropping the second lowest score and averaging the remaining 3 scores. This final step breaks the tie; Liukin gets a 16.733 and Kexin a 16.767. Under these agreed-upon rules, Kexin is ranked before Liukin and they each get the original, tied score (16.725). That explains how they can be ranked differently but have tied scores.
However, if we use a double-sided second-value mechanism to make the scores at both ends less extreme, Liukin wins. That is, if we replace both of their smallest scores with second-lowest scores, Liukin's lowest score, 8.9, becomes a 9.0 while Kexin's 8.9 stays an 8.9. We do the same with their highest scores which replaces both gymnast's 9.3 scores with a 9.1. When all the math is said and done, Liukin comes out on top with a 16.733 to Kexin's 16.717 for a miniscule difference of 0.016.
I guess the real question is: would this have the same incentives alignment characteristics as a Vickery auction? Could it even, as Dan suggests in his comment, allow judges from the same country score events in which their own gymnasts participate? It seems so. If a judge knows that their score will be substituted for another score if it's too extreme, they'll have no incentive to award an extreme score and the score they end up awarding will be free of strategic influence. Of course, this assumes a highly competent and precise pool of judges, which has already been called into question (see parenthetical note above about restrictions on what judges can score a given event).
UPDATE [2008-08-19T23:46:17]: Added bit from Dan and corrected calculation per commentor, "A Mom".
Your proposed mechanism and the current mechanism both imply the absence of a penalty for a referee who tries to game the system. Now, imagine a mechanism where you apply the Vickery auction principal to every ranking given by a referee. If you give one perfect 10.0 and everybody else gets a 9.5, then the 10.0 disappears. Something like that could well allow a U.S. judge to rank a U.S. gymnist, but it would be awfully strange when none of the gymnists learn their score until the last one has competed!
I did briefly consider a mechanism that worked across the field of gymnasts, but I felt that it would be weird if all of their scores could change after each successive performance. I mean, that competition looks nerve-racking enough! (especially for the Chinese who seem to spend their whole lives working for one moment).
Consider a single judge. The judge has two mental rankings of the gymnasts: the "fan ranking" F, which measures how much the judge *wants* each gymnast to win, and the "judge ranking" J, which measures how well the judge thinks each gymnast actually *performed*. (To simplify the argument, let's assume that J is unpolluted by F, that is, that the judge's observation of the gymnasts' performance is not biased by the judge's desires.)
You want to find a mechanism that causes the judge to give scores that advantage the gymnasts according to J, while ignoring F. But no such mechanism can exist.
To see why, imagine that we have a mechanism that causes a judge to provide scores S(J,F) depending on the judge's two rankings. For our mechanism to work, S(J,F) must produce scores that advantage gymnasts according to J but not according to F (that is, according to the judge's honest appraisal of performance, and not according to who the judge wants to see win).
The problem is that a judge can always mentally swap F and J before deciding what to do. The judge behaves as if her favorite gymnasts performed the best, while pretending to be rooting for the gymnasts who actually did perform the best. She provides the output S(F,J) instead of the desired S(J,F). The result is a score that advantages gymnasts according to the judge's (hidden) rooting preference, and not according to the judge's (hidden) observation of performance.
You can't stop a judge from mentally swapping J and F, because the swap occurs entirely within the judge's head. It follows that no mechanism with the desired property can exist.
The only way out of this mess is to rely on external indicators to predict a judge's fan ranking, for example by assuming that a judge will prefer gymnasts from her own country. This is essentially what the existing rules do.
So, for example, the harder the system tries to correct for a judge's fan ranking, the more the judge has an incentive to pretend to act as if she has a different fan ranking.
You can't just ask a judge to reveal J and F because they'll have an incentive to lie. They'll give high performance rankings to gymnasts they like, and they'll pretend not to like the gymnasts they actually do like.