Even though I had been using Tableau since early 2011, all of my creations had pretty much been for the company that I worked for – using corporate data. I knew about tableau public, and had checked out plenty of viz’s of the day, but didn’t really have anything really interesting to publish that was not confidential.
That all changed around 6 months ago, when I was inspired to put together a viz for ‘The Ashes Cricket’ series.
Now if you are from England or Australia, (and likely any other cricket playing nation) you pretty much know what ‘The Ashes’ are all about. – They are an institution that is centre stage in the media every 2-3 years, and have been around since 1882 (expect for a couple of pauses caused by very inconvenient world wars). For the un-initiated, The Ashes is a Best of 5 test series alternating between home and away countries with the victors winning (or retaining) The Ashes. The Ashes are housed in a tiny urn containing the ashes of a wooden bail. The full story is available at this Wikipedia page. Anyway….Australia were getting hammered by The Poms (aka The English), and I was interested in checking out the key stats of the players in each team. And what better way to do that than put together a Tableau Viz.
Well, a lot of info analysts will tell you that the hardest part of any viz is getting a hold of the data, and getting it into the right shape. This was certainly true for my first pass. All I wanted was the keys stats for 22 players in both teams – innings, runs, wickets, etc. Alas, an on-line table of data – one row per player was unsurprisingly elusive, but during my discovery mission I quickly found the best source of raw data was from http://www.espncricinfo.com/ . Locating and manually extracting the numbers for 22 separate player profile web pages into a spreadsheet took me an hour or two. I spent a further morning or so knocking up some basic ’ranking’ views; adding some calculations to create some interesting metrics; plus creating a ‘select stat’ parameter to provide a good level of interactivity. I finished off with some custom shapes for the team logos.
The result was good enough to nominate for a viz of the day – and hey presto, a couple of days later my first public viz got over a thousand visits. All up – about a day’s worth of effort – but it wasn’t really effort – it was FUN!!! Creating and publishing information about my Passion (Cricket) was all it took for me to move from a purely paid professional viz creator, to an unpaid amateur! I was pretty happy 🙂
But I knew I could do better. Iterations were needed!
Of course, the problem with reporting current sports stats is that they quickly become irrelevant. After the following Ashes Test Match, my data was out of date – so I took it upon myself to update. Now, I am definitely not a fan of manual labour intensive repetitive tasks, so I probably spent far more time trying to automate my information extraction that it would have to update it manually several times – but that would have been just plain boring. I had already been experimenting with web crawling using Import.io for another passion of mine (punting on the horse races), [refer earlier blog] so I went to work setting up an automated extractor that would refresh not only my original 22 players, but the extra players that had made their way into both teams since the last game. I then had two sets of stats – before the Fourth Test, and after the Fourth Test – so I was able to experiment with a ‘variance’ viz showing shifts in KPI’s due to the single game.
The Aussies lost badly, and after waiting a few months to go by to allow for the wounds to heal, and I was ready for another iteration. High level aggregate career stats – one row per player was not really good enough for ‘this series’, ‘trending’ or ‘timeline’ style analysis. What if I could get a hold of the raw data? – one row per innings per player!! – then I would be able to do all sorts of creative stuff. So back into espncricinfo.com I went – and sure enough, after a bit of digging I found the raw innings by innings stats that I was after. Much to the horror of my family, I then proceeded to invest a whole lot more of my ‘at-home’ time setting up two more Import.io web crawlers (one for batting innings stats, and one for bowling innings stats).
The resulting data was quite rich including dates, places etc – but the layout of the tabular data meant that I was having to pull in some raw columns of data that were in need of some complicated parsing. Now, I have a rule that I personally try to stick to as much as possible – Never Do Any Data Manipulation in Excel that can be done in Tableau – after all, why create a repetitive manual process step when you can automate it with a built-in calculation or three. Here are two examples of the raw data – and what I needed to do to convert it.
So….once I had individual innings, dates, places etc I was able to allow the interactor a whole lot of options in drilling down to subsets of innings i.e. down from a high level aggregate career or series stats. There was even enough detail for a ‘one player per page dashboard’ – complete with a map! – (always good sizzle value)
So, over my Christmas holidays, with a beer in one hand, and a mouse in the other (and of course the TV tuned to the Cricket), I put my skills to use and come up with something pretty cool.
It all goes to show that when you are working with data relating to a PASSION that you have, then visualizing using Tableau can be a labour of love………
Here are some samples of the finished product.