Using Data Science to predict Brexit

We tracked how the EU Referendum polled on Twitter, both for Leave and Remain.

The Twitter data in the week up to the referendum, including polling day, clearly showed a bias towards leave, contrary to most traditional polls.

The base tool was configured within three days to analyse the data and present the findings live. It scaled to over 150 tweets/second. Interact with the graphs below to see how sentiment changed over time, and how turbulent the Referendum night itself was.

Average sentiment on polling day:

Remain
Leave

Sentiment over time:

Count of tweets:

How did we do it?

In the run up to the EU Referendum, we wanted to use the public’s opinion across the globe to see if it was possible to create a low cost, real time, analytics engine. We took the concept from idea to live in less than 10 working days.

The tool is a reusable solution that organisations can customize to their needs: it is low cost, easy to use and quick to deliver. Organisations can leverage the power of Big Data and social media by themselves without the need to engage expensive digital agencies. We set out to see if we could quickly and cheaply disrupt this market.

In under 2 weeks, and for less than £10K, we built a re-useable solution that can be set up within a few hours to monitor and display the results of a real time event.

The base tool was configured within three days to analyse the data and present the findings live. It scaled to over 150 tweets/second. Interact with the graphs below to see how sentiment changed over time, and how turbulent the Referendum night itself was.

Day 1

  • Planned and agreed desired outcome Agreed on feature prioritisation and set deadlines for our minimum viable product (MVP).
  • Commenced UX Research to create initial wireframes.

Day 2-4

  • Set up the core social analytics engine.
  • Analysed Twitter, searching for specific keywords (around 15 keywords referenced to the target topic).
  • Input sentiment analysis of the Twitter results, based on a sentiment lexicon. Then, calculated on a per-tweet bases immediately after being collected.
  • Saved data into a database (AWS Aurora dB) including: date of tweet, language, author, content and of course, sentiment.

Day 5-6

  • To link the front end web page and the database, we made use of two Amazon Web Services tools: API Gateway and Lambda.
  • From API Gateway we created a URL - which upon requesting triggers a Lambda function written in Python. The Lambda function queries the database, does some light calculations, and returns the results as JSON.
  • These results make up the body of the URL’s response, which is then displayed on the webpage for the front-end to work with.
  • Created interactive front-end UI from UX wireframes.
  • Integrated front-end code into SPARCK Live site.

Day 9

  • Linked front-end code to the live system, and tested against a live stream of >300 Tweets per second to test integrity under load.

Day 10

  • Launched SPARCK LAB to the public through http://sparck.io/lab