If you teach someone how to fish…
The world of analytics has exploded with a vast array of new technologies, tools, systems, training, opportunities and business models. Most people understand that analytics is powerful and have heard stories about how companies like Amazon and Google use it drive innovation and grow their organisations. However, when it comes to your own life, its can be difficult to understand exactly how you can use it. For some, analytics feels like its something akin to magic wielded by ‘data scientists’ with PhDs and decades of experience.
The reality is that analytics is being democratised by the very same technology that’s made it valuable. This has given raise to self-service analytics. After years of investment in centralising data, maturing data governance and user-friendly software there are now a range of options for anyone to answer their own questions using sophisticated analytical techniques.
There are a lot of tools available to anyone to do you own analytics. Some are ‘one off’ tools like Google’s Ngram viewer that will allow you to investigate how frequently specific words have been used in books or Twitter Analytics which will let you look over the stats for your own account. Then there are more broader tools that will allow you investigate a range of different data sources. While there are many examples I want to focus on three across the broad spectrum of options. They are Watson Analytics, Tableau and Popily.
- Watson Analytics is cloud-based, lets you explore your own data, you can explore your data by typing natural language questions and it’s available with tiered payment options starting from free.
- Tableau has desktop, cloud and server-based options, its optimised for Enterprise data sources, and has free and paid options.
- Popily is a brand new offering and will continue to mature through new releases, it’s cloud-based, and currently only uses publicly available data but is free.
You may recognise the name ‘Watson’ as the artificial intelligence developed by IBM that won the quiz show Jeopardy in 2011. Watson was able to listen and respond to natural language questions beating two previous champions. Today, Watson is able to analyse large corpora of unstructured data allowing it to manage decisions in lung cancer treatment, find new food combinations for recipes and make music recommendations.
The Watson AI that is able to do all this is not the necessary the same ‘Watson’ you have access to as part of IBM’s cloud-based Watson Analytics offering. Watson Analytics allows you to ‘ask’ questions about your data sets in natural language by typing it questions. Watson Analytics responds with options and graphs that it’s determined will best answer you question.
While there appears to be no move to provide a desktop version of Watson Analytics, IBM’s enterprise-grade business intelligence offering, Cognos, is inheriting some of Watson Analytics natural language processing and visualisation aesthetics. For a great overview of the product, check out this video.
Tableau is best known as a visualisation tool. Its adoption within the business community continues to grow year on year. Tableau is a mature offering and recently released version 9. It can be deployed on your local machine, your server or from the cloud. It allows you to create beautiful, interactive graphs to quickly and intuitively tell a story or to provide insight into previously unintelligible data. To get a sense of the look and feel of Tableau’s visualisation check out their gallery.
Popily is a brand new offering released by the same team responsible for the analytical-themed podcast Partially Derivative and who developed CrisisNET. Popily provides non-technical people the ability to explore data without needing to know code or statistics. As a brand new offering, the cloud-based Popily can only be used to explore publicly available data sets added to their platform. I believe the release of Popily is the start of a wave of new start ups with a focus on self-service analytics leveraging the raise of technologies like software-as-a-service, machine learning and scalable analytics.
Let’s test them
I’ve reviewed these offerings by the following areas:
- Signing up
- Loading data
- Finding insights
The data we’re looking at has been limited to what’s currently available through Popily’s public library of data sources. We’ll use Airbnb’s data set because they share their listing information through a Creative Commons license. In fact, you can explore the data through their own visualisations here (created using Leaflet and Mapbox).
All three offerings have a free option (so feel free to jump in yourself and have a play – Watson Analytics, Tableau Public and Popily). Creating accounts for all options is straight forward, although you’ll need to download software for Tableau.
For Watson Analytics, if you pay you’ll be able to analyse more data (more rows and columns) and there’s an enterprise version where you can allocate access across a tenancy. Actual prices and packages are constantly changing (at least the time of writing) so check out the site for the latest prices.
Tableau has paid options designed for enterprises and are structured around the number of licensed users. For companies this means you’ll be paying for both desktop versions and a server license so that you can privately share your visualizations. Specifying users can be a bit limiting if your an organisation that prefers to have flexibility or plan on managing security access through Tableau server.
Watson Analytics allows you to upload your own data and, if you upgrade, you can also connect automatically to the Twitter API (they’ll grab a 10% sample of tweets for the last 6 months based off keywords). Adding data is as simple as clicking the add button from the login dashboard. The free account is limited to 50,000 rows and 40 fields. Adding an abridged version of the Airbnb data set took about 6 minutes over a medium speed NBN connection. Once uploaded, the first thing you’ll notice is that Watson Analytics has assessed the quality of your data. When you first click on your data set you’ll get a dialog box with a series of prompt questions.
Tableau is optimised to analyse large data sets. For Tableau Public, it can connect to Microsoft Excel, Microsoft Access, and text files. While you are limited to 1 million rows of data, this is only a limit per connection. There is a file size limit of 1 gigabyte to save to the cloud. Adding data connections is easy as you can select by source type (e.g Excel file, database, etc), you can view the data once connected, and select how you want to import the fields.
There is currently no ability to load your own data sets into Popily. This is why we’re using the Airbnb public data set already added to Popily. They are extending invitations to companies to add their data now.
The focus on this section will be looking for relationships between the price of accomodation and the number of rooms.
As we saw when we first loaded our data set, Watson Analytics is already suggesting areas that we might want to investigate. If you select the Explore option you’ll be able to ask you natural language questions. In this instance I’ve asked ‘what is the relationship between bedrooms and weekly_price?’.
Exploring these options I found that the visualisations are not all that useful initally. Watson Analytics likes to aggregate by average and it hides a lot of the information you want to see. However, clicking on the column function on the right allows you to select exactly what fields you want and how to graph them. Using this I created the following graph.
This is graph is more meaningful. I can see the relationship I’d expect to see between price and the number of rooms. But now I can also see which properties attract a higher premium per room (in this instance it’s trains and boats). Now you can also quickly click on the property_type field and select other relevant fields to investigate like Country and Neighborhood. Another powerful option available through Watson Analytics is its prediction engine. To see more about this feature check out some guides here and here.
Tableau is much more hands on then Watson Analytics or Popily. This means that when you first add your data set, you’re not going to get any automatic recommendations. However, Tableau has done a lot behind the scenes. It’s categorised each of the Airbnb fields and determined if they are attributes or dimensions. This works in your favour when deciding how to visualise your information.
From this starting screen you can start to explore your data. To explore the relationship between beds and price you grab the fields from the lists on the left and drag them across to the row and column shelves. Tableau will automatically select the scatter plot chart, which, for this investigation is exactly what we want. We can now decide which detail we want to split the plot by. Dragging across the property type field, and aggregating by average values, we can replicate a similar graph to what we create in Watson Analytics.
From here there’s a lot of flexibility with what you can do with this information. You can add dimensions to change size, shape and colour. You can also quickly add filters, trendlines and, forecast if you have time series data or graph data to a map.
When you first log in to Popily you’ll see a list of recent public data sources on the right. Click on Airbnb listings and you’ll immediately be presented with a set of charts. If you scroll to bottom you’ll see that the data source has been prepopulated with 2,421 pages of charts. You can go through and explore these pages, but it makes more sense to start limiting your search to those fields that you are interested in.
Let’s start our search with the relationship between cost and the number of rooms. You can search by fields within the yellow bordered search dialog at the top of the screen. Select monthly price and number of beds. You’ll see the number of pages has been limited to 5 and you can start exploring charts more relevant to your investigation. You’ll be presented with a chart called Average monthly price by number of beds over date cost started on AirBnB. Once again, not a particularly insightful. If you scroll down you’ll see Average monthly price of number of beds.
This graph is a little more useful as we can start to see the relationship – namely, more beds more expensive. However, from the example picture above you’ll notice an immediate limitation of Popily’s visualisation. There’s no axis headings, no legend and no labels. In fact, other then the heading the only indication you’ll know what you are looking at is if you mouse over the graph elements. Even more annoying is that if you have multiple elements on a line graph it won’t label the values (you need to guess) and you need to be very precise with how you position your mouse to get the values.
I like Tableau because it provides the most control over how you load, model and visual insights. However the value of self service analytics is giving anyone the power to do meaningful analytics. From the perspective of non-technical user I’d recommend Watson Analytics. It’s a more mature offering than Popily and doesn’t present you with learning curve required for Tableau. I’m looking forward to seeing how these offerings continue to grow and evolve. If you agree or disagree let me know below.