## Value Driver Modelling – Part 3: Calculating value using VDM

My previous posts have discussed the basics about value driver modelling (VDM) and how to build a well designed VDM-based model. The purpose of this post is to explore a practical implementation of VDM through a value calculator.

I’ve seen the introduction of value calculators transform the way organisation’s think, plan, track and report on the benefits of their projects. One of the clearest examples I’ve been involved in was developing a value calculator to model the benefits of combining two coal mines located next to each other.

The challenge facing the owner of these two coal mines was how do they decide if they should combine two multi billion dollar mines based on the potential value of 74 different but interrelated benefits. To provide certainty for the owner all the different benefits were explicitly mapped through a benefits dependency network diagram. These relationships were then combined with the 74 benefits, modeled through a sequence of value driver trees resulting in a report that showed which benefits best worked in combination with each other, what the different permutations of options were and did it all without double counting interrelated benefits. In this particular project, we identified \$1.4 billion dollars in profit over 20 years.

While I have seen this tool be useful in the mining’s operations, I’ve also seen it used successfully in other sectors like retail and other functions like procurement. I’m currently developing a VDM tool based on elements of organisational psychology to value the benefits of human capital investments – an area that has historically performed poorly when building robust business cases for change.

# What is a value calculator?

A value calculator is a tool that uses VDTs to dynamically calculate the benefits arising from improvements. A value calculator is the tool that allows VDM to be used across all stages of the benefits realisation process. It is particularly useful during when quantifying and prioritising benefits.

# What does a value calculator do?

A value calculator’s primary purpose is to value improvements across the operations of an organisation. It can provide a valuation that incorporates the constraints and dependencies unique to that organisation’s specific operations. The tool is able to value improvements as either individual changes, or as part of larger projects or programmes of change.

The development and delivery of a value calculator, as part of a proejct, could happen in a number of different ways. Below is an example, generic project plan that delivers a value calculator for an organisation. The process itself if agnostic of any industry or company and could be used to deliver either small or much larger value calculators. The key elements to be mindful with this plan are that it allows sufficient time for good design. It also ensures that the tool is built in modules so that it can be progressively validated and tested. While this plan is only focussed on delivering a value calculator, it is equally possible for a value calculator to form part of a much larger project focussed on benefits realisation and cost reductions.

# The seven components of a value calculator

So now that we know what a value calculator is and what it does, let’s look at the components that go towards making it work.

1. Parameters effectively the value drivers for the tool. That is, each parameter represents a box on your VDT.
2. Baseline data populates a realistic, “current state” of the operations you are modelling. This data is effectively the value of the inputs that go inside the boxes on your VDT.
3. The list of improvements change the value of the baseline data with the parameters. Changing these values reflects an improvement occurring within the operations.
4. The value stream is a series of connected VDTs, each flowing into the next. These VDTs are effectively the engine that drives the calculation of the benefit.
5. The benefits are the output of the VDTs. It could be expressed as an increase in profit, decrease in costs or increase in production.
6. Dependency groups are groups of improvements whose outputs are somehow connected to each other. A dependency group applies a maximum, minimum, average or cumulative rule to a set of parameters. This means that the calculator can determine how a complex programme of improvements should interact with each other.
7. The last component is the application of constraints to the benefits. As I have discussed previously, these constraints are built into the VDTs to limit an organisation’s ability to create value based on the reality of their operations.

# How to navigate a simple value calculator

Now we’ll go through an example value calculator. You can download an example create in Excel here (note that I don’t recommend building value calculators in Excel as its difficult to update data, conduct advanced statistical analysis or to quickly and intuitively report findings).

Open the workbook and on the second spreadsheet you’ll see a value chain for an open-pit coal mine. This will be the context for our example (keep in mind that VDM can apply to different sectors and different functions, so think through how this modelling could be used for your context).

You might recognise part of the diagram below as the components of a simple value calculator. I’ll go through and show you how these components translate into the example Excel document.

• The grey boxes show the structure of the simple value calculator. Each box is a spreadsheet in the workbook and each spreadsheet produces an output that feeds into the next spreadsheet.
• The select improvements spreadsheet allows a user to turn improvements on and off in the calculator.
• The define improvements spreadsheet lists all of the parameters and how they change with each improvement.
• The determine dependencies spreadsheet defines how different improvements interact with each other.
• The allocate improvements spreadsheet lists out the individual changes to every value driver in your VDT.
• The calculate benefit spreadsheet contains the VDTs that calculate the benefit of the changes.
• The report benefits spreadsheet summaries the benefits for a chart. Now that you have an overview of the calculator let’s have a look at it.

# Overview of the haul coal process

So that you can better understand the context of this tool, I’ll briefly go through the VDTs for the haul coal process. The VDTs are located on the Calculate Value spreadsheet where I have used a series of VDT tables. I have created a table for each piece of equipment operating at this mine. This is quite a small mine with only two trucks and one loader.

There are three main parts to these VDTs.

1. Time is calculated by starting with the total number of hours in a year, then subtracting all the lost hours due to maintenance, repairs and other delays. The result is an amount of hours called Operating Time, which is the amount of time a piece of equipment operates productively.
2. The productivity section describes how often a piece of equipment completes a cycle of tasks. In this context, the cycle means how often a truck completes a dump run or an loader fills a truck.
3. Payload represents the average amount of material a piece of equipment moves.

We can multiply all these value drivers together to calculate the total amount of material moved a year. This calculator applies improvements to all the value drivers in these tables in order to calculate an improvement.

# How does the value calculator know what to improve and by how much?

Benefits extraction describes the process of identifying, defining and allocating improvements to your VDTs. You can see this process work by using the tool.

1. First, select some of the improvements from the options we have available on the select improvements spreadsheet. For example, select the two projects under the reduced casual idle time due to better people management hypothesis.
2. Next, on the define improvements spreadsheet, you can see which parameters for the VDT match up with each of the improvement projects as well as the extent of the improvement. For this tool all improvements are described in terms of a percentage improvements. You’ll also notice that the projects that you’ve activated have their value flow through to the calculating value column. This is also where you define the dependencies between improvements.
3. The next stage of the calculator is the allocate improvements spreadsheet. Here you’ll see that the improvement percentages from the parameters defined in the last spreadsheet are allocated to each piece of relevant equipment. These percentage improvements are applied against a baseline in order to calculate what the new improved performance will be. The baseline productivity could be different between every type of machinery, so the improvements are applied at the machine level (you could also make an assumption that improvements are applied at a fleet level and roll-up all the parameters that you see defined here to a fleet level). So, these improved percentages are what will flow through our VDTs.

# How is the value of the improvements calculated?

The baseline information in this tool provides the basis from which to apply our improvements. The baseline represents a point in time for our model and should best reflect the context within which our improvements are applied. In this tool, we have baseline information for every value driver.

It’s best to establish your baseline through analysing historical data. It’s also possible that you’ll need to fill in the gaps using technical specifications, expert assumptions and your own observations. An important limitation to note from this particular value calculator is that it is based entirely on averages. The statistical variance that is inherent in operations of any description, are not considered here.  It is possible to design a more sophisticated model that uses VDTs driven by statistical variance instead of just the mean.

# How does the value calculator avoid double counting benefits?

On the define improvements spreadsheet is the Dependency Group column. This column allows the value calculator to group improvements together when their outcomes are dependent upon each other. In this instance, we see that the Stand in Operators and Optimise Hot Seating improvements both affect the same outcome, Truck Operating Standby. The dependency group is called Idle Time Management.

We can modify the behaviour of this dependency group by going to the determine dependencies spreadsheet.  On that spreadsheet you’ll see the various behaviours that this group can exhibit. In this instance the min attribute has been select and is being used to calculate the benefit of the improvements. The min attribute means we expected the minimum benefit from the combination of the two improvements to flow through the operation. Depending on the change, any of these four behaviours could be possible.

# How does the value calculator visualise the improvements?

The report benefit spreadsheet visualises the culmination of all the value calculator’s analysis. For this calculator, there’s one graph showing the mine’s hauling and loading capacity. It also shows the total change in tonnes for the improvements selected.

If this were a real operation you’ll see (if you were to turn on all the improvements) that productivity is currently constrained by the amount of material it can haul. Accordingly, any improvements for loading would be wasted because there wouldn’t be any hauling capacity to match it. This is a clear example that demonstrates how a value calculator can clearly show which improvements, and in what combination, will actually benefit an organisation.

Have a play with different combinations of benefits and follow them ‘through’ the model to see how each section takes an input, transforms it and then passes it on the the next stages in the model.

The next post will be the final for this series and will explore, in more detail, other ways that value driver modelling can be used.

## Value Driver Modelling – Part 2: The 15 fundamental principles of good VDM design

Early in my career as a management consultant, a colleague and I were working on a project for a large gold mining company. The mine had more improvement opportunities than they had funding for and so wanted to ensure that they spent their limited capital on the best possible combination of opportunities. They needed four months worth of modelling completed in two weeks to meet budget deadlines. So, we locked ourselves away in tiny, windowless room with a steady supply of coffee and ingenuity and two weeks later we had created two value calculators identifying 35 million tonnes in increased productivity.

I’ve taken what I learnt from that project, and many like it, to identify 15 principles that allow you to successfully use value driver modelling and avoid many of the pitfalls that can derail a project.

These principles group into three broad categories:

1. Building a model
2. Using correct logic
3. Working with data

## Building a model

### 1. Prioritise the model’s features

Often on a project you will have more requirements expected of you than time to deliver them. Prioritisation will help ensure that the most important features are completed within time and budget. It also allows you to understand the broader purpose of the model and not get distracted by minor problems or features.

There are many ways to prioritise a model’s requirements. I’ve used the matrix below which is based on the factors of “importance” and “ease of implementation”. An alternative approach is the MoSCoW method. This is particularly useful if you have a client who has a clear understanding of what they want.

### 2. Model in a single direction

To avoid confusion, VDT elements should only exchange inputs and outputs from with other elements of the same level or one level above. This ensures simplicity and reduces any unforeseen interactions between entities. To illustrate why this is important, picture a factory making widgets. The manufacturing process moves forward from delivery of raw resources, processing, manufacturing, packaging and delivery. The creation of  value (in this case, the widgets) goes from one stage to another; widgets never go ‘back’ through the process.

### 3. Build and test a prototype model as soon as possible

It’s rare to have a model perfectly and completely documented at the start of a project. Additionally, you may have a client who doesn’t really know what they want until they see and use it. To avoid spending time and money developing a completely polished product that the client does not want, instead produce a working prototype as soon as possible. This will quickly highlight the actual requirements for the model and allow you to focus your efforts for the rest of the project. Even if the entire prototype is discarded, so long as it contributes towards progressing the project, it was still worth producing.

### 4. Develop the model as a series of interrelated modules

Overly complicated models are difficult to troubleshoot which creates the risk of inaccurate or unpredictable outputs. It also becomes difficult to expand if new features need to be added. It’s best to design the model into modules that can be developed and tested independently. It also allows you to reuse similar modules for other models that require similar features.

### 5. Structure the model flexibly so that it is responsive to change

A potential risk with models is that they become quickly out-dated because of changes over time. If the model’s context is likely to change, identify those sections most at risk, and spend additional time building in flexibility. Typically changes that you will need to anticipate include expanding the scope and number of inputs (e.g a new call centre is added), create in new inputs and outputs (the model is expanded to include ‘delivery activities’ for a factory) and new data is updated (e.g a new set of cost figures).

### 6. Use assumptions to ensure the model is both deliverable and useful

Not all of a client’s operations can be understood to the level of detail required to model it. However, often complex issues can be resolved through making reasonable assumptions. For example, does the model need to output daily figures, or can results be aggregated by month.

When making assumptions, there are some key issues you should consider:

• Has the client and/or the subject matter expert signed off on the assumptions?
• Are the assumptions clearly recorded along with their value and rationale?
• Do you understand how the assumptions impact the model’s output?

### 7. Elicit clear requirements for specific end users

‘Use cases’ are an intuitive way of working out exactly what a model needs to do. Use cases like the example below explains the relationship between a user, their requirement and the resulting benefits from using your model.

## Using correct logic

### 8. Use well conditioned formulas

A model could be poorly designed such that its outputs were very sensitive to small changes to its inputs. For example, the diagram below for widgets calculates the Widget Conversion Rate by subtracting the Statistical Rate from the Historical Rate. However, a 1% change in the historical rate drives a dramatic increase in Hourly Widget Production. This is called ill-conditioning and there is no automatic way of detecting the problem nor is there an obvious solution. However, thorough testing should highlight the problem. Then using an assumption, for this example, changing widget conversion rate to a static value may solve the issue.

### 9. Ensure rounding is consistent

The prevailing level of accuracy is limited by your least accurate result. This means that despite having value drivers with various decimals, the highest level of rounding is the most accurate. For example, the widget conversion rate inherits the level of rounding from the historical rate. In turn, the hourly widget production value driver, inherits the level of rounding from sheet productivity.

### 10. Select the appropriate method for modelling the distribution of your inputs

There are different ways of modelling an organisation’s variability. Simple value driver models use a conventional approach by using averages. More sophisticated models can use statistical methods to simulate the changes in productivity that real organisation’s face. Additionally, these advanced models can simulate the dependencies between value drivers highlighting the inter-relationship between key parts of the organisation.

### 11. Avoid feedback loops

Avoid inputs that become their own outputs. While this example below is overly obvious, in more complicated models it important to know how your inputs are being calculated and if those assumptions impact the accuracy of your results.

## Working with data

This final section deals with a series of universal truths concerning the data you use in your models.

### 12. You cannot get all the data you need

You can never get all of the data you need. A complete set of the data you require for project will not exist. And the data you do received will most likely have been collected for purpose extraneous to your own. However, you can use the principles we’ve already discussed like use cases, assumptions and prioritisation to overcome this issue.

### 14. You cannot use all the data you have

Despite not being able to get all the data you need, you may be overloaded by the data you do have. Picking the information you use is very important, as it will form your model’s point of view. Information from different sources at best will be slightly different, at worse will be contradictory. Ensure that you understand the limitations and assumptions behind your data to ensure that it matches the reasons for you to use it. Lastly, when dealing with massive data sets, you can improve the performance of your model by only loading the data that you need. However, ensure that the model is flexible enough to broaden the scope of the data in case requirements change.

### 14. You will need to produce your own data

You always have to develop some of your own data. Not all the data required to build the model will exist in a system already. You will need to work with the client, key stakeholders, subject matter experts. You may even need to go out into the field to ensure that information that is critical to the model is collected accurately and completely.

### 15. You will need to synthesise your own data

You always have to synthesise data to meet the needs of the model. To bridge the gap between the data you cannot collect or does not already exists, you will need to make assumptions or synthesise some data. This is to allow the model to operate despite the fact that some of its components are currently unknowable. Ensure that these assumptions are well documented and understood by you and the key stakeholders.

The next post in the value driver modelling training series will show you how to create your own value calculator and use it to decide how to best to improve your organisation.

## Value Driver Modelling – Part 1: What are value driver trees?

At the peak of the mining boom in Australia it was vogue to use value driver trees to analyse your mining operations and to answer questions like “should I buy more trucks or am I better off with more excavators?”.

These days there’s less consulting dollars floating around for value driver modelling but that hasn’t stopped it from being a fantastic way to visualise and analyse the flow of value from one part of your organisation to another, regardless of your industry.

This is the first post in a series introducing the concept of value drive modelling and providing some practical examples in how to use it.

## What ways can VDM be used?

Throughout this series I’ll show you examples of how VDM can;

• identify where the biggest constraints are in an organisation’s ability to create value.
• be used in conjunction with sensitivity analysis to show which areas are at greatest risk for failing to deliver value.
• value a range of different investment options to find the optimal combination for creating value.
• provide transparency at the individual employee level to see how they contribute to the creation of value.
• allow you to benchmark operations that were previously too different to compare through traditional benchmarking methodologies.

## What is value?

Before we get to the diagrams, let’s start with what we mean by ‘value’. Value can be understood as “something of importance, worth, or the usefulness of something”. Value can also be described as  profit or stakeholder wealth. So, if organisations create value, when and where does this value come from? To visualise this value creation process we can use Porter’s value chain.

A value chain diagram shows a chain of activities for an organisation operating in a specific industry. The chain of activities gives products or services more value than the sum of the independent activities’ values. The important distinction here is between primary and support activities. The primary activities are where all the value is directly created. The support activities, while critical for sustaining the business, do not add value directly to the product that the customer ultimately buys.

While there are some legitimate criticisms of Porter’s value chain (least of all the failure of his consultancy) it still provides a straight forward framework for understanding how organisation’s might create value.

## What are value driver trees?

Value Drive Trees (VDTs) are the main type of diagram used as part of VDM. VDTs are basically a picture of the ‘gears’ or ‘value drivers’ that power a business. Here we have an example of the basic building block of a VDT.

This building block forms part of a much larger VDT, which in this case is for the productivity of a dozer. The individual elements of this building block are simple. It has a heading, units, and a value. The definition of value in this context, is the amount of hectares cleared per annum.

Now if I expand the tree, we see more elements to the VDT. We see that the boxes are connected in a relationship, and that that relationship is described mathematically, in this case, with multiplication signs. It’s apparent that these lower levels elements multiply together to equate to our starting element. This is the fundamental function of a VDT, to transparently show the relationship between different elements of value.

It’s possible for this tree to keep breaking down into ever more detailed steps. What’s important to remember is that we are interested in those elements that are directly contributing value to the final outcome of the VDT.

## What are the different types of VDTs?

The rest of this post will go through the different ways VDTs can be used and visualised. To interact with the VDTs, please download the Excel workbook here.

### Benefit Realisation VDT

Benefits Realisation VDTs can be built from Benefits Dependency Networks. A Benefits Dependency Network diagram (as above) shows the inter-relationships between enablers, business changes, benefits and investment objectives.

At it most fundamental level, the purpose of a benefit dependency network diagram is to ensure that you don’t double count benefits from interrelated improvements or investments you make to an organisation.

A Benefits Realisation VDT can quantify the benefit as well as show the relationship between the business benefits, operational assumptions and the investment objectives. The matching colours between the above diagrams show where these common elements are.

### Revenue VDT

Revenue VDTs show the flow of value through the primary activities of an organisation. This model constrains the flow of value based on key inputs. These constraints can show us where improvements could contribute additional value to the organisation. This is an overly simplistic VDT but shows you the basics of what a revenue VDT can do.

### Cost VDT

Similar to a Revenue VDT, a cost VDT can use the same inputs to determine what the cost will be to an organisation. This VDT splits costs between fixed and variable. The variable costs are driven by the same input assumptions as the revenue model.

### Profit VDT

A profit VDT is a simple combination of the Revenue and Cost VDTs. This allows us to change a single input and see how this reflects on the overall value to the organisation. This shows the play off between improving productivity and the impact this also has on cost.

As an experiment, download the workbook and increase the number of employees working in the organisation from 14 to 15 (at cell O37). You might have assumed that having an additional employee might allow the organisation to create more value (in this case, gross profit). However, as you can see, the additional employee also increases the total cost of labour and without a corresponding increase in productivity for the manufacturing plant itself (see Total Production Input (L) at cell L24), the potential value from the new employee is wasted and you’ve reduced the organisation’s profitability by more than \$100k.

### Financial VDT

A Financial VDT can be used to assess the financial performance of an organisation. For this example, we are measuring the Economic Value Added or EVA of an organisation. EVA is measure of whether a company is earning better than its cost of capital. These financial inputs can be changed to see how the impact on the EVA of the organisation. Here I have also plugged in some standard accounting ratios to show how you can track the financial performance at the same time.

### Reporting VDT

The elements of a VDT can contain whatever information you wish them to. For a reporting VDT we can allocate people to be responsible for specific value drivers as well as show how their achievement is dependent on one another. Here we have cascaded KPIs through the organisational hierarchy to specific individuals. We can report on the status on how they are impacting the creation of value.

For example, in the diagram below, you can see that Susan Grace is doing well by keeping the average cost per employee down, but Luke King is affecting value overall because the average shift is greater than 8 hours. Without this level of detail, you might have only known that Margaret Gold’s KPI was on track and not seen the underlying issue of an overworked workforce.

### Table VDT

A table VDT contains the same mathematical logic as a VDT however, since it is in a table form, we are able to easily show information over time. In this instance we are using it to measure changes in planned performance. By incorporating time you can analyse seasonable trends or forecast future production.

If this table was to be represented as a diagram, it would look like the VDT below.

### Longitudinal VDT

This final example of a VDT shows the flexibility that VDTs have in presenting and calculating value. In this instance, we are showing the change in value drivers over time. For example, we see that variable costs increased around the same time as production costs. We could hypothesis that, when production increases, economies of scale should see a decrease in costs. So we could focus on this area for investigation

The table VDT above, can then be visualised as a diagram as below.

In the next post, I’ll go through some of the fundamental principles you should follow for great VDM design.

# To a person with a hammer, every problem is a nail…

Right now I’m going through a phase where my hammer is Tableau and everything can be fixed through a decent dashboard. To that end, I developed a dashboard that visualizes my team’s backlog of work.

My team isn’t strictly in ‘software development’ but we’ve come to use Agile as the foundation to managing our projects. It’s flexibility allows us to manage all different types of work (i.e. scheduled reports, analytical projects, user support, etc) using Atlassian’s Jira (with their Agile, Greenhooper, Zephyr plugins).

The challenge I’ve faced is that, as we operate in a shared service model, we have lots of competing requests from our customers. The challenge is balancing these requests at our monthly prioritization meetings. As pseudo-product owners these customers determine what we do but with so many competing agendas, how do we get consensus for where we’ll focus our effort?

Enter the prioritization dashboard.

The purpose of the dashboard is to answer the main questions our customers ask:

1. What are my requests?
2. How important are my requests compared to everyone else’s?

Generally, if these questions can be answered quickly and transparently (that is, each customer can see everyone’ else requests) it becomes very apparent which requests should be done before others. For example, should we prioritize the complicated enhancement for a report going to 5 people or a simple bug fix going to 5,000?

# What does the dashboard include?

As you can see from the dashboard (nick named the petri dish), you can quickly get a sense where the more important requests are (top right hand corner). Using the filters on the right you can click on your team name (‘Learning’ for example) and have those requests highlighted on the matrix and the request details are listed below in the table. You can also get a feel for the size of the task and the nature of the request.

The following are the definitions for the custom fields used.

Components: the nature and type of request

• Group initiatives are requests from senior executive stakeholders
• Customer initiatives are requests from our stakeholders
• Continuous improvement are requests from the team
• Scheduled reports are reports that recur regularly
• Fast track are requests prioritized for immediate delivery

T Shirt Size: An approximate estimate of effort based on initial assessment of a request before elaboration.

• XS =< 1 day
• S = 2 day
• M = 5 day
• L =10 day
• XL => 10 day

Benefits Score: This is a measure of benefit value between 1-5. Benefits that may be identified include reduced costs, reduced risks, or improved employee value proposition.

Strategy Score: This is a measure of  strategic alignment between 1-5. A strategically aligned item should align to the strategic technology stack and be in alignment with the team’s development road map

## Why build a dashboard when you could use Jira’s Agile plan view?

Despite Jira’s useful reporting functionality, it’s hard to intuitively represent to our less Jira-savvy customers what exactly is going on with our backlog. While a list is great when you going through iteration planning there isn’t enough information available without constantly drilling down into each issue. It’s also not clear which requests came from which customers. Plus, you know, the whole hammer thing…

## How does it update?

The dashboard is delivered through Tableau. Currently, to get the latest request information, I extract it from Jira manually as a CSV and update the dashboard. This is fine as the dashboard only needs to be updated monthly and only takes a few minutes. That said, if you wanted ‘live’ data, Jira does have a restful API that you can plug into.

## What’s next?

In true Agile fashion, the dashboard has some enhancements waiting in the backlog. The most important is maturing from using the benefits score and t-shirt size to using actual cost/benefit measured in dollars. Even though the team doesn’t currently have a charge back model, speaking in ‘dollars’ is something all customers understand. It also can tell a compelling story (for example, your enhancement will cost \$10,000 of the team’s available effort – is that a wise investment compared to the benefits you expect?)

I’ve added a version of the dashboard with mocked up data to Tableau here.

## UPDATED: Sentiment Analysis with “sentiment”

I was looking for a quick way to do sentiment analysis for comments from an employee survey. I came across this post here by Gaston Sanchez.

The guide is a little dated now (the “sentiment” package needs to be manually downloaded, ggplot2 has been updated, setting up a Twitter API has changed, etc). Since I found Gaston’s guide useful, I’ve included some updated steps to effectively get the same output that they provided previously.

This example looks for the sentiment of tweets about the #UCLfinal.

NOTE: R version 3.1.2 through R Studio

Step 1 – Install packages

You will only be required to install these packages the first time.

`# Required packages for the plots `
` install.packages(c("plyr","ggplot2","wordcloud","RColorBrewer","httr","slam","mime","R6"," Rcpp"))`

`#Required packages to connect to your Twitter API`
` install.packages(c("twitteR", "bit","bit64","rjson","DBI")`

`# Required packages for sentiment`
` install.packages(c("NLP","tm","Rstem"))`

Step 2 – Install ‘sentiment’

The sentiment package is not available from all the CRAN server, so you can install it manually. Download “sentiment_0.2.tar.gz” from http://cran.r-project.org/src/contrib/Archive/sentiment/

`# Update [directory] with the location where you have saved "sentiment_0.2.tar.gz"`
` install.packages("[directory]", repos = NULL, type = "source")`

You will need to load these packages for each new session.

`library(plyr)`
` library(ggplot2)`
` library(wordcloud)`
` library (RColorBrewer)`
` library(httr)`
` library(slam)`
` library(mime)`
` library(R6)`
` library(twitteR)`
` library(bit)`
` library(bit64)`
` library(rjson)`
` library(DBI)`
` library(tm)`
` library(Rstem)`
` library(NLP)`
` library(sentiment)`
` library(Rcpp)`

Click on ‘Create New App’

Complete the compulsory fields, accept the Developer Agreement (note you can enter a placeholder Website if you don’t have one) and click ‘Create your Twitter Application’.

Step 5 – Connect to Twitter

Enter the authentication details below

`# Authenticate with Twitter`

```api_key <- "[your key]" ``````api_secret <- "[your secret]" ``````token <- "[your token]" ``````token_secret <- "[your token secret]" ````setup_twitter_oauth(api_key,api_secret,token,token_secret)`

If you get the following prompt:

```[1] "Using direct authentication" ``````Use a local file to cache OAuth access credentials between R sessions? ``````1: Yes ````2: No`

Press 1 and execute to save a local copy of the OAuth access credentials.

Step 6 – Harvest tweets

Now it’s time to harvest the tweets for analysis. Note, if you’re setting behind a firewall this may not work. If so, tweak your firewall settings. Additionally, it might take a minute to harvest the tweets.

`# harvest some tweets`
` some_tweets = searchTwitter("uclfinal", n=1500, lang="en")`

`# get the text`
`some_txt = sapply(some_tweets, function(x) x\$getText())`

Step 7 – Prepare text for sentiment analysis

`# remove retweet entities`
`some_txt = gsub("(RT|via)((?:\\b\\W*@\\w+)+)", "", some_txt)`

`# remove at people`
`some_txt = gsub("@\\w+", "", some_txt)`

`# remove punctuation`
`some_txt = gsub("[[:punct:]]", "", some_txt)`

`# remove numbers`
`some_txt = gsub("[[:digit:]]", "", some_txt)`

`# remove html links`
`some_txt = gsub("http\\w+", "", some_txt)`

`# remove unnecessary spaces`
`some_txt = gsub("[ \t]{2,}", "", some_txt)`
`some_txt = gsub("^\\s+|\\s+\$", "", some_txt)`

`# define "tolower error handling" function `
`try.error = function(x)`
`{`
`   # create missing value`
`   y = NA`
`   # tryCatch error`
`   try_error = tryCatch(tolower(x), error=function(e) e)`
`   # if not an error`
`   if (!inherits(try_error, "error"))`
`   y = tolower(x)`
`   # result`
`   return(y)`
`}`

`# lower case using try.error with sapply `
`some_txt = sapply(some_txt, try.error)`

`# remove NAs in some_txt`
`some_txt = some_txt[!is.na(some_txt)]`
`names(some_txt) = NULL`

Step 8 – Perform sentiment analysis

Please note that the classifying the polarity and emotion of the tweets may take a few minutes

`# classify emotion`
`class_emo = classify_emotion(some_txt, algorithm="bayes", prior=1.0)`

`# get emotion best fit`
`emotion = class_emo[,7]`

`# substitute NA's by "unknown"`
`emotion[is.na(emotion)] = "unknown"`

`# classify polarity`
`class_pol = classify_polarity(some_txt, algorithm="bayes")`

`# get polarity best fit`
`polarity = class_pol[,4]`

Step 9 – Create a data frame in order plot the results

`# data frame with results`
`sent_df = data.frame(text=some_txt, emotion=emotion,`
`polarity=polarity, stringsAsFactors=FALSE)`

`# sort data frame`
`sent_df = within(sent_df, emotion `

This is what the first 5 rows of data may look like for df_sent

Step 10 – plot the emotions and polarity of the tweets

# plot distribution of emotions
ggplot(sent_df, aes(x=emotion)) +
geom_bar(aes(y=..count.., fill=emotion)) +
scale_fill_brewer(palette=”Dark2″) +
labs(x=”emotion categories”, y=”number of comments”) +
labs(title = “Sentiment Analysis of Tweets about UCL Final\n(classification by emotion)”, plot.title = element_text(size=12))

# plot distribution of polarity

ggplot(sent_df, aes(x=polarity)) +
geom_bar(aes(y=..count.., fill=polarity)) +
scale_fill_brewer(palette=”RdGy”) +
labs(x=”polarity categories”, y=”number of tweets”) +
labs(title = “Sentiment Analysis of Tweets about UCL Final \n(classification by polarity)”,plot.title = element_text(size=12))

# separating text by emotion

emos = levels(factor(sent_df\$emotion))
nemo = length(emos)
emo.docs = rep(“”, nemo)

for (i in 1:nemo)
{
tmp = some_txt[emotion == emos[i]]
emo.docs[i] = paste(tmp, collapse=” “)
}

# remove stopwords
emo.docs = removeWords(emo.docs, stopwords(“english”))

# create corpus
corpus = Corpus(VectorSource(emo.docs))
tdm = TermDocumentMatrix(corpus)
tdm = as.matrix(tdm)
colnames(tdm) = emos

# comparison word cloud
comparison.cloud(tdm, colors = brewer.pal(nemo, “Dark2”), scale = c(3,.5), random.order = FALSE, title.size = 1.5)