Criticisms of Michael Slepian’s Stanford study on poker tells and hand movements (published 2015)
Some places the study was featured.
The following is reposted from a 2015 piece I wrote for Bluff magazine. It was originally located at this URL but has become unavailable due to Bluff going out of business. I saw this study mentioned recently in Maria Konnikova’s book ‘The Biggest Bluff’ and was reminded about this piece and noticed it was offline, so I wanted to share it again. A few notes on this piece:
The original title below and was more negative-sounding than I liked; Bluff chose it. Also, if I could rewrite this piece now, I’d probably choose less negative-sounding phrasing in some places.
Regardless of the exact factors that might be at work in the found correlation, I realize it’s scientifically interesting that a significant correlation was found. But I also think it’s possible to draw simplistic and wrong conclusions from the study, and my piece hopefully gives more context about the factors that might be at work.
Image on left taken from Michael Slepian’s media page.
The Slepian Study on Betting Motions Doesn’t Pass Muster
A 2013 study¹ conducted at Stanford University by graduate student Michael Slepian and associates found a correlation between the “smoothness” of a betting motion and the strength of the bettor’s hand. In a nutshell, there was a positive correlation found between betting motions perceived as “smooth” and “confident” and strong hands. The quality of the betting motions was judged by having experiment participants watch short clips of players making bets (taken from the 2009 WSOP Main Event) and estimate the hand strength of those bets.
This experiment has gotten a lot of press over the last couple years. I first heard about it on NPR. Since, I’ve seen it referenced in poker blogs and articles and in a few mainstream news articles. I still occasionally hear people talk about it at the table when I play. I’ve had friends and family members reference it and send me links to it. It’s kind of weird how much attention it received, considering the tons of interesting studies that are constantly being done, but I guess it can be chalked up to the mystique and “sexiness” of poker tells.
The article had more than casual interest for me. I’m a former professional poker player and the author of two books on poker behavior: Reading Poker Tells and Verbal Poker Tells. I’ve been asked quite a few times about my opinion on this study, and I’ve been meaning to look at the study more closely and write up my thoughts for a while.
In this article, I’ll give some criticisms of the study and some suggestions for how this study (and similar studies) could be done better. This isn’t to denigrate the work of the experiment’s designers. I think this is an interesting study, and I hope it will encourage similar studies using poker as a means to study human behavior. But I do think it was flawed in a few ways, and it could be improved in many ways.
That’s not to say that I think their conclusion is wrong; in fact, in my own experience, I think their conclusion is correct. I do, however, think it’s a very weak general correlation and will only be practically useful if you have a player-specific behavioral baseline. My main point is that this study is not enough, on its own, to cause us to be confident about the conclusion.
I’ll give a few reasons for why I think the study is flawed, but the primary underlying reason is a common one for studies involving poker: the study’s organizers just don’t know enough about how poker works. I’ve read about several experiments involving poker where the organizers were very ignorant about some basic aspects of poker, and this affected the way the tests were set up and the conclusions that were reached (and this probably applies not just to poker-related studies but to many studies that involve an activity that requires a lot of experience to understand well).
Poker can seem deceptively simple to people first learning it, and even to people who have played it for decades. Many bad players lose money at poker while believing that they’re good, or even great players. In the same way, experiment designers may falsely believe they understand the factors involved in a poker hand, while being far off the mark.
Here are the flaws, as I see them, in this study:
1. The experimenters refer to all WSOP entrants as ‘professional poker players.’
This first mistake wouldn’t directly affect the experiment, but it does point to a basic misunderstanding of poker and the World Series of Poker, which might indirectly affect other aspects of the experiment and its conclusions.
Here are a couple examples of this from the study:
The World Series of Poker (WSOP), originating in 1970, brings together professional poker players every year (from the study’s supplemental materials)
These findings are notable because the players in the stimulus clips were highly expert professionals competing in the high-stakes WSOP tournament.
The WSOP Main Event is open to anyone and most entrants are far from being professional poker players. Categorizing someone’s poker skill can be difficult and subjective, but Kevin Mathers, a long-time poker industry worker, estimates that only 20% of WSOP Main Event entrants are professional (or professional-level) players.
This also weakens the conclusion that the results are impressive due to the players analyzed being professional-level. While the correlation found in this experiment is still interesting, it is somewhat expected that amateur players would have behavioral inconsistencies. I’d be confident in predicting that a similar study done on only video clips of bets made by professional poker players would not find such a clear correlation.
2. Hand strength is based on comparing players’ hands
This is a line from the study that explains their methodology for categorizing a player’s hand as ‘weak’ or ‘strong’:
Each player’s objective likelihood of winning during the bet was known (WSOP displays these statistics on-screen; however, we kept this information from participants by obscuring part of the screen).
They relied on the on-screen percentage graphics, which are displayed beside a player’s hand graphics in the broadcast. These graphics show the likelihood of a player’s hand winning; it does this by comparing it to the other players’ known hands. This makes it an illogical way to categorize whether a player believes he is betting a weak or strong hand.
If this isn’t clear, here’s a quick example to make my point:
A player has QQ and makes an all-in bet on a turn board of Q-10-10-8. Most people would say that this player has a strong hand and has every reason to believe he has a strong hand. But, if his opponent had 10-10, the player with Q-Q would have a 2.27% chance of winning with one card to come. According to this methodology, the player with the Q-Q would be judged as having a weak hand; if the test participants categorized that bet as representing a strong hand, they would be wrong.
It’s not stated in the study or the supplemental materials if the experimenters accounted for such obvious cases of how using the percentage graphics might skew the results. It’s also not stated how the experimenters would handle river (last-round) bets, when one hand has a 100 percent winning percentage and the losing hand has 0 percent (the only exception would be a tie).
It’s admittedly difficult to come up with hard-and-fast rules for categorizing hand strength for the purposes of such an experiment. As someone who has thought more than most about this problem, for the purpose of analyzing and categorizing poker tells, I know it’s a difficult task. But using the known percentages of one hand beating another known hand is clearly a flawed approach.
The optimal approach would probably be to come up with a system that pits a poker hand against a logical hand range, considering the situation, or even a random hand range, and uses that percentage-of-winning to rank the player’s hand strength. If this resulted in too much hand-strength ambiguity, the experiment designers could throw out all hands where the hand strength fell within a certain medium-strength range. Such an approach would make it more likely that only strong hand bets and weak hand bets were being used and, equally important for an experiment like this, that the player believed he or she was betting either a strong or weak hand.
3. Situational factors were not used to categorize betting motions
When considering poker-related behavior, situations are very important. A small continuation-bet on the flop is different in many ways from an all-in bet on the river. One way they are different: a small bet is unlikely to cause stress in the bettor, even if the bettor has a weak hand.
Also, a player making a bet on an early round has a chance for improving his hand; whereas a player betting on the river has no chance to improve his hand. When a player bets on the river, he will almost always know whether he is bluffing or value-betting; this is often not the case on earlier rounds, when hand strength is more ambiguous and undefined.
This experiment had no system for selecting the bets they chose for inclusion in the study. The usability of the clips was apparently based only on whether the clip meant certain visual needs of the experiment: i.e., did the footage show the entirety of the betting action and did it show the required amount of the bettor’s body?
From the study:
Research assistants, blind to experimental hypotheses, extracted each usable video in each installment, and in total extracted 22 videos (a standard number of stimuli for such studies; Ambady & Rosenthal, 1993) for Study 2 in the main text.
Study 1 videos required a single player be in the frame from the chest-up, allowing for whole-body, face-only, and arms-only videos to be created by cropping the videos. These videos were therefore more rare, and the research assistants only acquired 20 such videos.
The fact that clips were chosen only based on what they showed is not necessarily a problem. If a hand can be accurately categorized as strong or weak, then it doesn’t necessarily matter when during a hand it occurred. If there is a correlation between perceived betting motion quality and hand strength, then it will probably make itself known no matter the context of the bet.
Choosing bets only from specific situations would have made the experiment stronger and probably would have led to more definite conclusions. It could also help address the problem of categorizing hand strength. For example, if the experiment designers had only considered bets above a certain size that had occurred on the river (when all cards are out and there are no draws or semi-bluffs to be made), then that would result in polarized hand strengths (i.e., these bets would be very likely to be made with either strong or weak hands).
Also, the experiment’s method for picking clips sounds like it could theoretically result in all strong-hand bets being picked, or all weak-hand bets being picked. There is nothing in the experiment description that requires a certain amount of weak hands or strong hands. This is not in itself bad, but could affect the experiment in unforeseen ways.
For example, if most of the betting motion clips chosen were taken from players betting strong hands (which would not be surprising, as most significant bets, especially post-flop, are for value), then this could introduce some unforeseen bias into the experiment. One way this might happen: when a video clip shows only the betting motion (and not, for example, the bettor’s entire torso or just the face, as were shown to some study groups), this focus might emphasize the bet in the viewer’s mind and make the bet seem stronger. And if most of the hands-only betting clips were of strong-hand bets (and I…