Hey there everyone!
The Twenty20 Cricket World Cup is under way, and the final 10 teams have now been named and will be competing over the next 2-3 weeks for the world title.
What better excuse to dust off my web crawler and do another iteration of Cricket Team and Player stats. So yesterday I logged back into import.io and fired up my previously created automatic data extraction web crawlers that source information from http://www.espncricinfo.com/.
Now, my previous cricket crawling was focussed in on Cricket Test matches, so I knew I needed to make some adjustments to extract data specific to Twenty20 style matches. [For the un-initiated, Twenty20 cricket is a compressed 20 overs per side format, only taking ½ day to play, versus the traditional 5 day game]. To my pleasant surprise I found that ImportIO now has a new (still in beta – but working ok for me) crawling method that specialises in pulling multiple tables of data. WOW, now I didn’t have to limit myself to one type of game stats – I could crawl them ALL!
Each of the 10 teams have 15/16 players in their squads, so it turns out that the 155 players in the tournament have over 18,700 games of cricket recorded between them. That’s around 120 games each – and a really rich source of data.
So with the introduction of a ‘Game Type’ dimension, and some tweaking of some calculations, I now have transformed my Ashes Players Tableau viz into a Twenty20 International world cup viz.
Check it out on Tableau public.