Tennis Rating and Prediction Model and Method
The Rating Method
The first step in creating a model that can predict tennis matches as accurately as possible is to produce a rating system. Whilst many punters would use the official ATP tennis ranking’s, there is a lot of evidence that suggests that the ATP tennis ratings are not a good indication of current form.
Clarke (1994) produced what is called a SPARKS method (short for set-point-marks) which calculates the margin of victory of a tennis match. It gives each player one point per game won and a certain amount per set won. The difference between the two players is the marks, or the margin of victory.
For example, suppose a set is given a weighting of 2, and Lleyton Hewitt defeated Pete Sampras 6-2 6-7 6-4. Hewitt had won 18 games, Sampras 13. Hewitt has also won two sets to Sampras’s one, thus giving Hewitt a 18 – 13 + (2 * 2) – (1 * 2) = 7 mark victory.
A margin of victory is important when looking at a rating system. Bedford and Clarke (2000) found that the SPARKS method produced significantly better predictions than the official ATP tennis ratings. Their method though was relatively simple despite being amongst the first to look at this area. They didn’t look at surfaces and many other factors, however it was a definite step in the right direction.
In the Australian Open 2002, there was a lot of media coverage about why most of the top seeds were eliminated from the event early on. In a newspaper article in the Australian Financial Review, written by head of Champion Data and myself, we outlined that a lot of the top seeds were not in the best of form especially on the hard court surface.
So why isn’t the ATP ratings a good predictor for tennis matches?
There are a number of reasons why.
- The ATP ratings do not take into consideration the quality of the opposition.
If Lleyton Hewitt defeats Pete Sampras in the first round of the Australian Open, he would gain just as much as if he had defeated Jakub Herm- Zahlava.
- As mentioned previously, The ATP ratings do not take into consideration the margin of victory that the player won or lost by.
Lleyton Hewitt would gain just as many points in defeating Pete Sampras 6-0 6-0 6-0 as he would if he had defeated him 6-4 2-6 7-6 0-6 10-8.
- The ATP ratings give points for players who win due to the opposition retiring.
If Lleyton Hewitt was behind 2-6 2-4 and then Pete Sampras retires, Hewitt would receive points for progressing to the next round.
- The ATP ratings system gives players points for walkovers.
If Pete Sampras obtained an injury between matches and could not front up for the next game against Hewitt, Hewitt would receive points for progressing to the next round despite not playing a game.
- The ATP ratings system does not take the playing surface into account.
Many players play better or worse on certain surfaces and this has to be taken into consideration when looking at a players performance. Defeating Pete Sampras on grass is a lot better win than defeating him on clay.
- Probably most importantly, the ATP ratings system does not look at current form.
According to the ATP ratings system, the last 12 months of tennis is taken into consideration for theirratings. This means that a players performance 12 months ago has just as much weighting as what that player did last week. As a good example of this, Guestavo Kuerten was ranking number 2 at the end of 2001, however he had lost his last seven games.
- The ATP ratings system does not take into consideration other important tournaments like the Davis Cup, challenger and qualifying results.
Our Ratings System
So our ratings system does take all this into consideration and awards players with a rating based on the SPARKS method. If for example Hewitt is rated 13, and Sampras 9, then we would expect Hewitt to defeat Sampras by 4 marks. If Sampras wins or loses by less than 4 marks, then his rating would go up and Hewitt’s down. If Hewitt wins by more than 4 marks, then his rating would go up and Sampras’s down. Each player also has a surface rating which is added to their overall rating to get an expected marks victory on each particular surface. These ratings are added together using linear regression. Based on each weeks performance, a players ratings might up or down which is shownhere for men and here for women. Likewise their suface rankings might also change as shown here for men and here for women. This information on player preferred surfaces is only given by rank. The amount which a players rating changes based on the outcome of a match is found by a statistical method called ‘exponential smoothing’ which changes the ratings by a percentage after each match.
Interestingly a couple of factors were also considered. One is the head to head approach. Many punters look at past head to heads to predict what is going to happen in a current match. However it was found that previous head to head matches have no influence on the current game. The only reason previous head to head matches might have been one sided is because of their current form at the time, the surface they were playing on etc.
Tennis is one of the few sports where it was found that home ground advantage has little effect. One of the reasons for this is because a players favoured surface takes most of this into account. Most countries expertise in one particular surface. For example England - grass, USA and Australia - hard, most of europe and south america - clay. Therefore one players home ground advantage is really a surface advantage which is taken into consideration in the model.
The theory that players will not come back well after playing a five set match however does have an effect. It has been shown statistically that players do tend to play below their usual performance if they last played a five set match in grand slam tournaments. This is soon to be added to the model, and will be added once the grand slam tournaments are up and running.
Head to Head Probabilities
The expected outcome can easily be converted to a probability. For example if Hewitt was expected to win by 7 marks, this would have a greater probability if he was expected to win by 5 marks. The method to convert these expected marks to probability is done by another statistical procedure called ‘logistic binary regression’.
Calculating the Tournament Probabilities
Given that we have the probabilities for head to head matches, we can simulate the entire tournament. The total amount of possibilities of a tournament are incredibly high. For a small 32 player tournament, there are a total of 31 matches, and therefore there are 232 possible outcomes which is over four billion. Given this, the best way to calculate the probabilities for a tournament are via computer simulation. Each match is simulated and their rankings are adjusted after each round. This is an important step because if a little known player with a ranking of 2 made the final, his ranking at that time would be a lot higher than 2 due to his good form throughout the tournament so far.
So the tournament is simulated approximately 10,000 times depending on the size and the number of matches remaining.
Using the model to gamble
Original this was not the purpose of the model, it was just for matter of public interest, but seeing if the model is profitable is an important part of any statistical model when predicting sport outcomes.
Why is this? Well it’s quite simple. An important distinction to make is that bookmakers do not make odds based on the probabilities of winning, but rather what the general public things the probabilities of winning are. Their main concern is to balance the books based on what the average joe-bloe believes.
Therefore the model could be proven a statistically better predictor than the general public if it is profitable based on bookmakers odds. Given below is a step by step method of how one can gain an advantage over bookmakers and have the potential to make money by gambling on tennis matches.
The Gambling Technique
Converting Bookmakers Odds to Probabilities
By converting a bookmaker's odds to probabilities we can directly compare these to our own probabilities to see if there is a possibility of an advantage in a gamble. The inverse of the bookmaker's price is the expected probability.
For example in late March 20001 the bookmaker's gave Fernando Gonzalez (CHI) odds as high as $4.50 to defeat Pete Sampras (USA). This means that the bookmaker's (or the punter's) believe that Fernando Gonzalez (CHI) have approximately a chance of winning. We had predicted a 71.7% chance for Pete Sampras (USA) to win the matche, consequently this means that Fernando Gonzalez (CHI) have a 28.3% chance. This probability is higher than what the bookmaker's have Fernando Gonzalez (CHI) at and therefore this is where we have an advantage over the bookies and would gamble on Fernando Gonzalez (CHI) for this game.
Put simply, we have a 28.3% chance of returning $4.50 from a $1 bet, so on average our $1 bet will return 0.283 * $4.50 = $1.27. Hence an expected profit of 27%.
How much of an advantage do we have?
The advantage over the bookmaker's, or overlay, is calculated by taking the bookmaker's price into account by the following formula:
Overlay = [Our probability * Bookies Price] – 1
Therefore in this game, we had an overlay of (0.283 * 4.50) - 1 = 27.3%
This overlay is very large, and represents a very good betting opportunity, even though we still believe for Pete Sampras (USA) will win the match.
It is important though not to bet on any match that has a small overlay. To take into account some error, one should only bet on matches where a large overlay is recorded.
As shown on the graph below, betting on all games in which you have an overlay does not maximise your bank balance. Instead it shows that approximately gambling on all matches which show a 7% or greater overlay will maximise your bank balance. So therefore you should ignore gambling on any matches where the overlay is below approximately 7%
Not all matches will we have an advantage over the bookmaker's however. If the bookmaker's price is similar to our probabilities then there is no room for an advantage. This is mainly due to the fact that the bookmaker takes a 5% to 8% overlay per game.
Something that is often is asked is why when you fix your minimum overlay at a high level, say 20%, does you bank balance not get even higher? Surely you have a greater advantage, so therefore your bank balance should increase further! The answer is quite simple, there are not as much betting opportunities. The higher the minimum overlay, the less betting you do. To show this, below is a graph outlining the %profit vs. the %overlay. A percentage profit of 10% means that for every $100 you bet, you will on average gain $10. Notice in the graph below that when the minimum overlay level is higher, the percentage profit is just as high or greater. However do note that when the percentage overlay increases to over 30% then the %profit hovers around zero and can in fact be negative. This is no doubt because if there is such a high overlay on a player, the odds may represent another factor that the model is not taking into consideration. For example the player might have recently been injured during a doubles match. Therefore some players may not wish to gamble on games where the percentage overlay is very large or over 30%.
How much should be bet?
Even if the odds are on your side, you still need to guard against losing all your bank. We can work outmathematically the percentage of your bank you should bet to maximise your rate of growth.
The amount to bet is given using a system called the 'Kelly' method which was found by Kelly in 1956. It uses the bookmaker's price, your probability and the amount of overlay that you have in determining how much to gamble. It is given by the following formula:
For example, suppose we have $100 in the bank set aside, we should gamble = $7.80 on Fernando Gonzalez (CHI) to defeat Pete Sampras (USA).
Some people however see this as a very aggressive style of gambling with a large risk. To counterbalance this, the Half-Kelly system, where half the amount of the full Kelly is bet can be used. The Half-Kelly will decrease the risk involved, but could also decrease the profits received. Likewise a quarter or third Kelly is also seen as appropriate.
When shouldn't we bet?
Although some will say this is up to the individual, I believe there are a few times when one shouldn't bet on an event. One of these is when a player has not played many games on a particular surface. When the matches are shown from our website, there is also a column that shows 'Games (p1)' and 'Games (p2)' which refer to how many games player one and two have played on the current surface that the tournament is being played on. I would recommend that you do not bet on a match in which one of the players has played ten or less games on that surface. The reason for this is is that when a player starts his first game, he is given a surface rating of zero, this is because we have little information about this player. However he might be a clay court specialist, but our ratings will not reflect this. Therefore it is only recommended to gamble on a player once he has played several games on that surface and can develop a satisfactory surface rating.
Another time in which one probably should not gamble is when a player is returning from injury or from a long lapse from the game. Our model unfortunately does not take this into consideration as some players can perform well after a break, whereas others need to work back into the game. Likewise I would refrain from gambling is a player retired recently from singles or doubles matches. The information about how many games a player has played in the last two months will shortly become available on the web page.
How have we gone so far?
So far results are extremly pleasing with us making a constant profit. As shown below since we started gambling we have theoretically made a substantial increase in our original $500 bank balance. The method of full kelly quite easily has the greatest bank balance, but notice that it tends to increase and decrease reasonably dramatically in certain situations. Notice that at times the bank balance under the fully kelly method has halved and sometimes decreased to ¼ or what it was. This shows the risk involved with the full kelly method. However over time it has shown to generally increase to a substantial profit, despite the fact that losing all your bank balance is a risk.
As shown further below the half and quarter kelly methods have less exaggerated ups and downs and hence the amount at risk is lower. However the final bank balance tends not to be as high but there is a constant grandual increase in bank balance which is pleasing to most. The constant kelly method, is used only for determining what percentage overlay is best to bet on, this is because it is independent of the current bank balance.
I hope that you enjoy my website and get the most out of it for yourself. Whether you’re a punter, or just interested in tennis, or maybe interested in sports statistics and mathematics, I’m sure that you will get something interesting out of this website.
If you have any questions, please feel free to email me firstname.lastname@example.org. Otherwise happy punting!