Big Data, Small Data, Big Company, Small Company
I’ve written several articles on Big Data recently that discussed various risks and approaches. In my research I noticed a trend that is certainly gaining traction, namely data analytics for small businesses.
The literature at times however confuse the so-called “size” of the data with the size of the business. It is entirely possible for small companies to have access to, and need insight from, Big Data analytics. On the other hand, many large corporations would do well investing in ways to assess and analyse Small Data in their quest for better decisions.
Before I proceed with my impressions of IBM Watson, I want to touch on this issue – Big Data is generally well defined as a concept. We all know that it refers to large, complex data sets that require a lot of processing power and smart analytics to generate insights.
Small Data on the other hand is a concept that is still being defined. Yes, it refers to smaller data sets that is typically within the grasp of the human mind to assess without much reduction. What is sometimes unclear is whether this means that it is an actual database that is small in terms of fields and entries, or whether Small Data is in fact the result of Big Data analytics (i.e. charts, graphs and tables, or Big Data made small).
It is my view that Small Data must refer to the actual data sources and not the results of analytics, so that terms like “Small Data Analytics” make sense (after all, analyzing the analysis should ideally be redundant).
Watson, the Data Sidekick
I recently received access to IBM’s Watson Analytics platform. Its purpose seems to be to make analytics simple for the average user who may not be a data scientist. They do this through intelligent analysis of data sets and deciding which type of statistical analyses would be most appropriate. With a few clicks any person can upload data in CSV format and have a single, double or multiple variable analyses at their disposal.
The Watson Dashboard currently allows the user access to two tools namely Predict and Explain, and Explore Your Data. The former allows the user to create a workbook from a data set and automatically generates a variety of statistical analyses with appropriate graphical representations. The latter allows the user to compare different data columns to each other in the form of a filtering system. I was unable to properly use the exploration tool however – some features didn’t seem to function properly.
The Predict and Explain analyses generate colorful charts and graphical representations that allow the user to assess the data and create some predictive outcomes. For each chart or graph there is also a key insight that helps the user understand what the data means.
The intent is clear – Watson is designed to allow average users to connect data sources and create high impact stories that explain the statistical results of the data sets.
Watson is still in beta which means that several functions have not yet been released. The three most critical functions excluded are the ability to connect to an external database, the ability to export analyses and access to the tool that allows the authoring of dashboards and storyboards.
Watson has definite limitations in terms of the size of the data source (around 5mb) and it currently relies on flat, static data files. There is a yet to be released option to connect to external sources which may bring it closer to a simplified Big Data solution, however it is unclear whether they will allow real time analysis of data feeds.
In spite of a good array of colorful graphs and charts, Watson is also somewhat limited in graphical representations. The risk is that companies will start using this tool and end up using the same graphics to explain different facets of the data. Not only could one presentation look like another (which is boring), if not carefully differentiated, one analyses could look like another, creating some confusion.
I enjoyed the fact that I could get a different perspective on some of my research with just a few clicks, but I soon realized that there is a risk in how Watson selects its analyses. Granted, it seems quite adept at selecting appropriate statistical methods, but I have yet to find a way to select a different method on a different combination of columns that would help me assess what the data is really saying. As such Watson is currently still limited, and this limitation will be reflected in the outputs.
There is also a risk associated with “average users” or individuals that may not have background in quantitative research. It is very simple to use Watson and one can imagine how users without a data science background can find justification for decisions without really knowing what the analytics mean. Even with the pretty pictures, the implications of a statistical analyses can not be automatically generated. In other words, as much as data can be automatically subjected to statistical analyses, interpretation is still a human skill that requires contextual insight and industry related experience.
Something About Qualitative Analysis
With all of this fuss about Big Data we often forget that quantitative methods have limitations when it comes to the “human factor.” Predictive analytics that relate to human behavior has value, that is sure, but if it is not coupled with qualitative analyses your decisions will at some point suffer surprises. A simple example is predictive analyses of the stock market – sentiment is a qualitative concept, yet technical analysts still rely primarily on quantitative methods to predict fluctuations (a direct consequence of sentiment). Often it works, but when it doesn’t the losses can be severe.
A Final Word on Watson
Watson as a platform has a long way to go, but I am positive about its potential and that it will find its niche. Data scientists need to take note and sign up for the beta if they can – presuming a wide adoption and an increased degree of flexibility (and perhaps advanced features), knowledge of Watson could become a major skill for future jobs.