Those who know me as a writer and speaker on data issues know that I have a bit of a thing about spreadsheets. They are regularly mentioned in my presentations and writing and my friends on Twitter know just how easily I am triggered into an Excel-fuelled rant.
In my previous post, I identified the over-reliance of spreadsheets as one of the three key indicators of the relationship that an organisation has with data. I frequently refer to the use of spreadsheets when working with clients; it’s a quick and valuable indicator and it can often give a route into identifying the deeper issues around data. But the issue with spreadsheets is complex and I think much of the broader discussion around spreadsheets tends to drift down unhelpfully blind alleys.
It conquered the world
There’s no doubt that the world of data has been fundamentally changed by the development of the spreadsheet. It has been the great democratiser of data putting powerful data functions on the desktops of millions across the globe. Spreadsheets are ubiquitous – there are around 750 million users of Microsoft Excel alone – and there is an ever-growing range of applications for the humble spreadsheet. The ability to use spreadsheets is becoming a basic life skill, taught in primary schools and considered essential for an ever-increasing range of jobs.
Faster pussycat! Kill! Kill!
As computers become ever-more powerful it seems that there is an arms-race of spreadsheet functionality. My first encounter with spreadsheets was using Lotus 1-2-3 in the mid-1980s. This spreadsheet – running on a pre-Windows DOS computer - allowed data to be entered on a single grid and sums and basic functions to be performed on that data. If you changed the data you could recalculate all the algorithms by pressing F7. Spreadsheets could be saved, printed and….er…..that’s about it. The latest incarnation of Excel has sophisticated data manipulation and analysis capabilities including things like Pivot Tables and Pivot Charts as well as a variety of relational data functions and the ability to interact with external databases and services. It is a world away from those early spreadsheets in every way imaginable.
The creeping terror
Our endless love-affair with spreadsheets is not without its critics and many focus on the errors that are created in spreadsheets and implications for organisations sometimes put their misplaced faith in them. The research that produced often-quoted global spreadsheet error rate of 88% is now over a decade old and while the error rate has a natural limit I wonder if the ever-increasing complexity of our spreadsheets, combined with the emergence of things like multi-user real-time editing, is pushing that figure higher now. The causes of spreadsheet errors have been thoroughly analysed and explained by many but there seems to be a larger truth that I think many are missing.
Modern spreadsheet systems deliver enormously powerful functionality through a user-interface which is relatively simple and intuitive. There is therefore a tendency to equate the ability to ‘drive’ a spreadsheet with the ability to undertake high-quality data processing and analysis. However, the capabilities of the average spreadsheet user have not increased at the same rate as the spreadsheet systems and it is this discrepancy that is driving the disconnect between our expectations of the value we can drive from our data and the reality.
Undertaking complex processing and analysis in a spreadsheet requires a very broad range of skills. Data skills such as data modelling and data quality assurance need to be combined with data manipulation and processing, especially in cases where spreadsheets mimic the relational data processing functionality that is found in a proper Database Management System. Statistical analysis needs to be combined with a rich understanding of the domain that is being modelled in the data in order for interpretation and insight to be meaningful and correct. It is the absence of skills that often leads to spreadsheets being poorly created and used. It might be an uncomfortable truth but the biggest problem with spreadsheets is the people who use them.
Andy Youell has spent over 30 years working with data and information, and with the systems and people that process it. Formerly the Director of Data Policy and Governance at the Higher Education Statistics Agency (HESA), and a member of the DfE Information Standards Board for a number of years, he now works with universities and colleges as a strategic data advisor. Follow him on Twitter @AndyYouell