Nastia Lukin and scoring
I watched with great interest tonight when US olympian Nastia Liukin tied China's He Kexin on the uneven bars; both gymnasts had identical scores of 16.725. However, I was as surprised as everyone else to see Nastia ranked in 2nd place despite the tie score (tied to the thousandths place!). Nastia would go on to win a perplexing second place (and silver medal) in this event. That's no small feat, but also not the gold many feel she earned.
So, what the hell happened with the score? It turns out that a tie-breaking mechanism is mandated by the International Olympic Committee so that only one country wins each bronze, silver and gold medal; in this case, this mechanism favored China's Kexin.
I asked myself if I could think of a more fair way to do scoring, that would properly align the incentives of the judges. (Among other issues people have with the judging, the rules state that no judge can be from any of the countries participating in the competition. This means that there are no judges from the very countries that are producing world-class athletes!)
My first thought was to try a second-price auction (a/k/a a Vickery auction). In this type of auction, the high bidder wins but pays the second-highest price. (These types of auctions are known to properly align incentives such that participants reveal their true valuation of the good at auction instead of bidding strategically.) In the gymnastics case, we'd want a two-sided Vickery auction mechanism; we'd essentially match the lowest and highest scores with second lowest and highest respectively.
First, let's go over the current scoring mechanism (from the Fédération Internationale de Gymnastique's Code of Points). Each gymnast's score is a combination of a fixed difficulty score (the "A score") combined with a variable execution score (the "B score"). The difficulty score is determined by a panel of judges before the gymnast performs the event; in this case both Liukin and Kexin had a 7.7 on difficulty. The execution score is arrived at by deducting fractional points from the number 10. So, each gymnast could have maxed out at 17.7, if they had been absolutely perfect.
However, the judges scored each gymnast's execution as so (sorted for each gymnast separately):
Liukin
Kexin
Poland
9.3
Australia
9.3
Bulgaria
9.1
New Zealand
9.1
Australia
9.0
Poland
9.1
New Zealand
9.0
Brazil
9.0
Brazil
9.0
South Africa
8.9
South Africa
8.8
Bulgaria
8.9
Dropping the lowest score and highest score for each gymnast and taking the average yields a 16.725 for both gymnasts, a tie. According to the rules (see the first article linked to above), the first tiebreaker calculation considers only the execution score, discards the highest and lowest values and takes the average of the remaining four. Because the difficulty scores were equal for both gymnasts, this resulted in another tie. The second tiebreaker involves further dropping the second lowest score and averaging the remaining 3 scores. This final step breaks the tie; Liukin gets a 16.733 and Kexin a 16.767. Under these agreed-upon rules, Kexin is ranked before Liukin and they each get the original, tied score (16.725). That explains how they can be ranked differently but have tied scores.
However, if we use a double-sided second-value mechanism to make the scores at both ends less extreme, Liukin wins. That is, if we replace both of their smallest scores with second-lowest scores, Liukin's lowest score, 8.9, becomes a 9.0 while Kexin's 8.9 stays an 8.9. We do the same with their highest scores which replaces both gymnast's 9.3 scores with a 9.1. When all the math is said and done, Liukin comes out on top with a 16.733 to Kexin's 16.717 for a miniscule difference of 0.016.
I guess the real question is: would this have the same incentives alignment characteristics as a Vickery auction? Could it even, as Dan suggests in his comment, allow judges from the same country score events in which their own gymnasts participate? It seems so. If a judge knows that their score will be substituted for another score if it's too extreme, they'll have no incentive to award an extreme score and the score they end up awarding will be free of strategic influence. Of course, this assumes a highly competent and precise pool of judges, which has already been called into question (see parenthetical note above about restrictions on what judges can score a given event).
UPDATE [2008-08-19T23:46:17]: Added bit from Dan and corrected calculation per commentor, "A Mom".