MAKE ANALYTICS GREAT AGAIN

After the surprisingly accurate result of the Brexit referendum tweet tracker, we decided to roll it out to the US Presidential Election.

This posed a bit more of a challenge. Unlike Brexit, where we had some fairly obvious hashtags used by either party, there are quite a large number of neutral tweets we were collecting under the hashtag ‘Election2016’, and a lot of tweets attacking candidates rather than necessarily showing support for one.

To address this, we had to build a custom corpus, manually classifying a few hundred tweets before using lessons from that to automate it. Once this corpus was built up, we are capable of classifying tweets as they are collected.

The base tool was configured within three days to analyse the data and present the findings live. It scaled to over 150 tweets/second.

Count of Positive Tweets

DONALD TRUMP

HILLARY CLINTON

WORD CLOUD

Election day for Donald Trump

Election day for Hillary Clinton

SENTIMENT MAP

How did we do it?

Similar to the run up to the EU Referendum, we wanted to analyse mass public opinion on the two Presidential Candidates. We took the concept from idea to live in 10 working days, with some modifications to the system. The tool is a reusable solution that organisations can customize to their needs: it is low cost, easy to use and quick to deliver.

The results are simply a reflection of the tweets we have collected. This election has been particularly susceptible to Twitter bots and this is consistent with the results we have seen. The Twitter data stream is potentially very biased and the work here is not meant to be a forecast of the election result.

Day 1

  • Planned and agreed desired outcome Agreed on feature prioritisation and set deadlines for our minimum viable product (MVP).
  • Commenced UX Research to create initial wireframes.

Day 2-4

  • Set up the core social analytics engine.
  • Analysed Twitter, searching for specific keywords (around 15 keywords referenced to the target topic).
  • Input sentiment analysis of the Twitter results, based on a sentiment lexicon. Then, calculated on a per-tweet bases immediately after being collected.
  • Saved data into a database (AWS Aurora dB) including: date of tweet, language, author, content and of course, sentiment.

Day 5-6

  • To link the front end web page and the database, we made use of two Amazon Web Services tools: API Gateway and Lambda.
  • From API Gateway we created a URL - which upon requesting triggers a Lambda function written in Python. The Lambda function queries the database, does some light calculations, and returns the results as JSON.
  • These results make up the body of the URL’s response, which is then displayed on the webpage for the front-end to work with.
  • Created interactive front-end UI from UX wireframes.
  • Integrated front-end code into SPARCK Live site.

Day 9

  • Linked front-end code to the live system, and tested against a live stream of >300 Tweets per second to test integrity under load.

Day 10

  • Launched SPARCK LAB to the public through http://sparck.io/lab