Road to Rugby World Cup 2019: Rugby scores decomposition

With the Rugby World Cup 2019 Japan starting on 20th September, I thought I’d take a look at the tournament from a few different statistical angles. For this post I’ll be looking at the problem: given a rugby score, how can we decompose it into possible combinations of tries, conversions, penalties and dropped goals?

Context

I have a dataframe of results for almost all professional and international rugby union scores since the 2012/13 season, more than 10,000 matches. This is nice in terms of ‘breadth’ of the sample – however in terms of depth it’s a bit lacking! For each result I only have home/away team and home/away score, for example:

27/07/2019  New Zealand  South Africa  16  16

I was curious: is it possible to decompose the results into valid combinations of scoring methods? Then, perhaps as a second stage, estimate the probability of occurrence of each combination for a given score? The first question I will be looking at in this post, and the second will be next up in the series!

I’ve never seen a match of rugby before! What are the scoring methods you’re referring to?

TRY (5 points): awarded when an attacking player grounds the ball in the area at the end of the pitch (“in-goal area”).

CONVERSION (2 points): the team who has scored a try immediately gets to kick at goal for another 2 points before kick-off restart.

PENALTY GOAL (3 points): when an infringement is made, a penalty may be awarded to the other team who may then choose to take a penalty kick at goal.

DROP GOAL (3 points): a player may, at any time in play, drop-kick the ball over and between the posts.

PENALTY TRY (7 points): if a foul has stopped the attacking team from scoring then a penalty try is awarded, worth a full 7 points. Note: these happen fairly rarely so I include this just for completeness but don’t refer to penalty tries hereon.

I took the videos above from World Rugby Laws of the Game which is a great resource if you want to learn more about the laws of the game.

Grouping the Elementary Scoring Methods

  • 3 points: penalty goal or drop goal.
  • 5 points: unconverted try (i.e. a try has been scored, but the conversion did not score the extra 2 points)
  • 7 points: converted try (i.e. a try has been scored, and the conversion succeeded in scoring the extra 2 points).

Starting Off the Analysis: Scores 0-7

ScorePenalties or
drop goals (3pt ea)
Unconverted
tries (5pt ea)
Converted
tries (7pt ea)
0000
3100
5010
6200
7001
  • 0 is obviously a valid score.
  • 3, 5, 7 are obtained from elementary scores only.
  • 6 is obtained only from two penalties.
  • 1, 2 and 4 are not valid scores as they cannot be sums of

Onwards! Scores 8-10

scpdutct
8110
9300
10101
10020

The only way forward is to score 3, 5, or 7 points! So new valid scores/combinations are {previous scores} + {3,5,7}. In the table above:

  • 8 is the row for 5 points but plus 1 penalty/drop goal.
  • 9 is the row for 6 points but plus 1 penalty/drop goal.
  • 10 is either a converted try plus 1 penalty/drop goal OR an unconverted try plus another unconverted try (hence two rows).

Scripting

With the general rule established, it is fairly easy to script it:

for each i in 8:150
    if i - 3 in validscores, copy row(s) of i - 3, increment pen/dg field by 1 and set score to i
    if i - 5 in validscores...  "
    if i - 7 in validscores...  "

I wrote a script in R, which can be found on my github repo along with the results for scores up to 150 points.

What about the New Zealand vs South Africa match mentioned at the start?

scpdutct
16301
16220

Both teams scored 16 points. Both teams got there through one converted try and three penalties, corresponding to the first row of two possible ways to reach 16 points.

Are all scoring combinations equally likely?

No, for a given score, not all scoring combinations are equally likely because even if all scoring methods were of equal probability (1/3 probability each), they contribute different amounts of points and so this would make the likelihood uneven!

The table below shows all of the possible scoring combinations relating to the score of 48 points.

scpdutct
481600
481211
481130
48903
48903
48741
48660
48514
48433
48352
48271
48206
48190
48125
48044

It’s pretty unlikely that a team would ‘rack up’ so many points through 16 penalties/drop goals without scoring any tries! Equally, it’s unlikely a team would get there through just scoring tries alone. Intuitively, it would seem that it would be through a mixture that a team would be most likely to get there.

If we know the relative likelihood of occurrence of the three scoring methods to each other then we can calculate the probability of scoring combinations for a given score. That’s what we’ll be looking at in the next post!

Simulating the Six Nations 2019 Rugby Tournament in R: Final Round Update

In an earlier post I blogged how I had made a Monte Carlo simulation model of the Six Nations Rugby Tournament.  With the final round of the tournament approaching this Saturday, I decided to do a quick update.

Who can win at this stage?
Wales, England, or Ireland can still win.  Scotland, France and Italy do not have enough points at this stage to win.  Quite a good article from the London Evening Standard explains the detail.  The current league table is below.

Actual standings after round 4 out of 5

Who is playing who in the final round?

What is the simulation model based upon?
A random sample from a probability mass function for tries, conversions and penalties, which is combined with a pwin for each team, calculated based on the RugbyPass Index for both home and away teams.  If you want to know more, feel free to look at my previous post (linked above) or the R script (linked at the bottom).

What does the simulated league table look after the final round?
Running a simulation for the final three games, and adding these results on to the actual points each team has achieved after round 4, we get the distribution of league points shown below.

Apologies: a box plot can be a bit odd for discrete data such as this.  Please forgive me!  If I had the time I would reform this into something like a stacked histogram which would be more accurate 🙂

It should be noted that, whilst the ‘standard’ scoring scheme applies for these final matches, i.e.

  • 4pt for a win, 2pt for a draw, 0pt for a loss.
  • plus 1 bonus pt for scoring 4 tries or more, regardless of win/lose/draw.
  • plus 1 bonus pt if a team has lost but by 7 game points or less.

…there are also 3 additional points awarded if a team wins the ‘Grand Slam’ (wins all of their matches).  The candidate for this is Wales only.  They have so far won every match, and if they win their final match they get these extra points to ensure they win the tournament.

This rule avoids the situations where a team could lose one match but obtain maximum bonus points in the other, finishing up with more points overall than a team that has won every match but never obtained any bonus points.

So then, what are the final standings likely to look like?
After having run a simulation of the final round, the results are below.

 

Wales are “firm favourites” to win the tournament.  England have a “reasonable chance”.  Ireland retain an “outside chance”.

How does all of this compare to expectations before the start of the tournament?


Ireland, England, and Wales were predicted to be in close contention.  Wales have outperformed the prediction (mainly due to beating England).  England have outperformed the prediction (mainly due to beating Ireland, and due to amassing a lot of bonus points).  Ireland have under performed against the prediction (mainly due to losing to England, and then narrowly missing out on bonus points: scored only 3 tries against Scotland; lost to England by only 12 game pt).

Scotland beat Italy with a bonus point victory, but they have only managed to pick up one bonus point in their other games.  Picking up points against England in their final match will be tough.  So they will be likely to under perform.  France will likely beat Italy and perform roughly as expected.  Italy are looking firm against the prediction of finishing bottom again this year (however imho they could be a team to watch in their final match, as they’ll be playing a presently disorientated France, at home in Rome).

It has been an interesting journey for me simulating sports tournaments over the past few months.  Monte Carlo approaches can help you see the wood from the trees in complex situations, which has applications not just in sport but in industry as well.

Maybe this has inspired you to have a go yourself?  If so, the code for this blog post is available via Git here.  Although if you wish to have a play or to adopt the code, the original version is much cleaner, available here.  Good luck!