Before going into battle, the Romans would knock an animal or two on the head and rummage around their entrails looking for signs of who would win. They called it haruspicy, and they mainly focused on the liver, for example, whether it was smooth and shiny, or rough and shrunken.
Beware the fortune tellers: An idiot's guide to using data in business
The new priests of diagnosis and divination
The Economist Intelligence Unit (EIU) recently described data's potential "as a competitive differentiator, an engine for innovation and a driver of business value". That's supported by its survey of 914 executives from eight industries across 13 countries, 87 per cent of whom agree that data is now the most important competitive differentiator.
A senior executive of a large Crown entity told me recently that the demand for data analytics in NZ is insatiable. Leading US tech trainer O'Reilly Media cites a LinkedIn survey which showed demand for data scientists is "off the charts", and trots out the "data is the sexiest job of the 21st century" trope.
There is certainly high demand for senior data scientists and the like in NZ, and the numbers of graduates with degrees in mathematical sciences, behavioural sciences, economics and econometrics, and information technology have grown by 60 per cent over the past decade. 8500 graduated in 2019. Most of the growth has been in information technology, with the other disciplines remaining steady.
But what are all these people actually doing?
In the beginning…
Blogger Sukanta Saha makes the point that data science has only emerged as a distinct field over the past decade. Like haruspicy in ancient times, the new priests of data analytics have since spread far and wide. We can be confident their message is actually scientific this time, and there is huge opportunity to improve our lives and the environment by better quality investment.
But are we clear what problems we are seeking to solve?
Sometimes it seems like the software comes first. Another story I heard is how a company was sold a beautiful performance dashboard that proved far too expensive to service, and didn't meet the needs of the board anyway. You hear that sort of story a lot.
How many angels can dance on the head of a pin?
Theologians in the Middle Ages argued about how many angels could dance on the head of a pin, and this became a metaphor for pointless debates. I was reminded of this when I discovered the debate about the definitions of business intelligence (BI) and data analytics and so on.
Leaders just want help making good decisions.
However, whether you care or not, it's probably important to understand the nomenclature. The website StitchData acknowledges that data-driven organisations can use BI and data analytics interchangeably, but say they shouldn't because BI is backwards looking, and data science predicts "what will or should happen in the future".
They aren't happy with that either though, and suggest three categories:
• Descriptive analytics: reports on what happened, like sales reports.
• Predictive analytics: forecasts, like sales forecasts but also Amazon's "buy" suggestions.
• Prescriptive analytics: advice about the likely outcomes of different actions, requiring advanced modelling.
They cite data strategist Mark van Rijmenam's view that descriptive analytics are the foundation of BI, predictive analytics are the base of big data, and prescriptive analytics are the future of big data.
They don't give up on the BI vs data analytics schism though. To them, BI is something a manager with skills can do themselves with the right tools, but data analytics is so complex that it remains the sole domain of data scientists (I think this means people with postgraduate degrees in a data-related discipline?) – the new priesthood. It's also not just about what happened, but also about why it happened, a whole other level of entrail reading.
What are angels for anyway?
Christian Ofori-Boateng is a member of the Forbes Technology Council and agrees with the idea of BI looking back and data analytics looking forward, despite there being lots of confusion.
He's got a more fundamental problem with BI though:
With the rise of software tools touting business intelligence and automation, the new norm is thinking that having these tools will make or break your business.
"Where most companies go wrong is attempting to adopt new technologies too fast across their entire organisation without having a plan in place for how they'll actually use the tools to solve a clearly defined problem."
This fits with I've heard from dozens of organisations. Another major problem is having "data for Africa" and not knowing what's important.
To Ofori-Boateng, people and culture come first, and software comes last. His formula is straightforward: (a) decide what the problem is (b) understand the stakeholders (c) identify the data you need and how to get it (d) choose KPIs that best measure success, and (e) establish systems and processes to turn data into action.
Tim Deskin, from US advisory firm Planet Moran, has a stark warning about expecting too much:
"The holy grail of a 'fully integrated one-system system' isn't realistic right now – new technologies are being developed constantly and the need for change is embraced slowly."
He advocates for a data-driven strategy, not a software strategy.
He also lists common problems: different databases and tools within an organisation will make data analytics a lot harder; data quality is a big issue - things may be measured differently, and data governance may be poor; complicated methods to pull data, often with a related key person risk; and serial disappointment when shiny new tools don't answer the questions you have.
The EIU targets two big challenges: better flow of internal data within an organisation so employees can extract more value; and seamless and secure sharing of data with "whole ecosystems of third-party business partners and customers".
They also lay out six commandments: mandate a data strategy from the top; share data with third parties; align and govern data with priorities like customer experience, business development and revenue growth; invest in data infrastructure; upskill internally; and use machine learning/AI for routine data processing.
Snakes and letters
Data Driven Science (DDS) train people in AI. They cite evidence that shows Python and R are the most popular programming tools for data science. To a layperson, they sound like twins: both are "amazingly flexible data analytics languages", free, open-source, and developed in the early-1990s.
DDS is unequivocal: "for anyone interested in machine learning, working with large datasets, or creating complex data visualisations, they are absolutely essential".
Both have their disciples, and to a layperson, it becomes difficult to judge which might suit you best when both congregations start singing the praises of their favourite.
DDS attempts to explain each in contrast to the other. Python is underpinned by a philosophy of "code readability and efficiency". It is an object-oriented language which groups data and code into objects that can interact with and modify one another. They say it allows data scientists to complete tasks with better stability, modularity and code readability.
R is a procedural language which breaks down a programming task into a series of steps, procedures and subroutines. They claim it makes it relatively easy to understand how complex operations are carried out. This is a boon when building a dataset but often at the expense of performance and readability.
R of course was co-founded by Ross Īhaka, a New Zealander, which is a great story in itself.
DDS analyses the pros and cons of each at the four stages of a data pipeline: collection, exploring for insights, modelling, and visualisation. The company concludes that Python is very versatile and easy to pick up, but R is designed specifically for data analysis and "you'll need to understand R if you want to make it far in your data science career". They recommend data scientists learn both.
An IBM blogpost says R is mainly used for statistical analysis, Python provides a more general approach to data wrangling.
I don't think that's very helpful to our friendly layperson, but the examples they provide are: "You might use Python to build face recognition into your mobile API or for developing a machine learning application…Data scientists use R for deep statistical analysis, supported by just a few lines of code and beautiful data visualisations."
Sukanta Saha has the simplest explanation: "Python is a fully-fledged programming language…R is purely for statistics and data analysis."
Visualising another schism
Another schism – but let's not forget competition is good for consumers – is between Microsoft's Power BI and Tableau. They are both used for dashboards and other data visualisations which are a long way from old-school dotted graphs spun out of Excel.
Blogger Tamara Scott says that although there is a flood of new data visualisation tools, both Power BI and Tableau have the right mix of power, ease of use, brand recognition, and price.
To her, the difference is that Tableau is built for data analysts, while Power BI is better suited to a general audience that needs business intelligence to enhance their analytics.
There are loads of tools on the market though. Scott lists SI Sense, Domo, Dundas and a bunch of others.
Trends in the datasphere
The rise of data analytics is not just about a new priesthood. Privacy is just one of many issues that come with living in a datasphere. Ben Lorica, from the O'Reilly Media website, lists seven trends.
The first is building a data culture, organisation and training ethos (recruiting and retaining data analysts is a challenge, but so is increasing data competence in executives, and establishing centres of excellence to keep up with best practice).
The second trend is migration to the cloud (and I'd add the emergence of "fog" and "mist" computing, but that's another story). The third is the need to continually invest in emergent data technologies. The fourth is to do the same for data security and privacy.
The fifth is the rapid spread of machine learning. Lorica says that most organisations do seem to be starting with the most important areas. For example, banks for risk, telcos for service. It's evolving fast though, with tools emerging that are targeted at data science platforms, metadata management, and data governance.
We don't hear quite so much about IoTs as we did several years ago, but cloud platforms, cheap sensors and machine learning mean they are very much the sixth trend.
Lorica's seventh trend is a meta trend – automation in data science itself, including the wonderfully named "hyperparameter tuning". I guess this is where data scientists, not just data analysts, really get to perform their rituals.
Predicting the future
We know making better use of data will bring many benefits. The last decade has seen lots of technologies come together to enable great leaps in gathering and manipulating data (and I haven't even touched on AI), and we are certainly getting lots of new insights. It's also accelerated the growth of data analytics as a field in itself, generating ever more jobs.
There is lots of over-promising though, matched by executives overestimating the benefits and the time and costs to achieve them. Not everyone is going to learn to code, but it's important we all lift our game in understanding what data analytics is so we can properly engage with data scientists, the new priests of examining entrails.