Using Data Science to predict BREXIT
Scroll to Read

In 2016, we teamed up with our Data Science colleagues to track how the EU Referendum polled on Twitter, both for Leave and Remain, with interesting results!

The Twitter data in the week up to the referendum, including on polling day, clearly showed a bias towards Leave, contrary to most traditional polls.
The base tool that we used was configured in just 3 days to analyse the data and present the findings live. It scaled to over 150 tweets/second! See for yourself and interact with the graphs below to see how sentiment changed over time, and how turbulent the Referendum night itself was.

Average Sentiment on Polling Day

REMAIN

LEAVE

Sentiment Over Time
Count of tweets

How did we do it?

During run up to the EU Referendum, we wanted to use the public’s opinion across the globe to see if it was possible to create a low cost, realtime, analytics engine. We took the concept from idea to live in less than 10 working days.

The tool is a reusable solution that organisations can customise to their needs: it’s low cost, easy to use and quick to deliver. Organisations can leverage the power of Big Data and Social Media by themselves, without the need to engage expensive digital agencies. We set out to see if we could quickly and cheaply disrupt this market.

In under 2 weeks, and for less than £10K, we built a re-useable solution that can be set up within a few hours to monitor and display the results of a realtime event.

Day 1

  • Planned and agreed desired outcome.
  • Agreed on feature prioritisation and set deadlines for our minimum viable product (MVP).
  • Commenced UX research to create initial wireframes.

Day 2-4

  • Set up the core social analytics engine.
  • Analysed Twitter, searching for specific keywords (around 15 keywords referenced to the target topic).
  • Input sentiment analysis of the Twitter results, based on a sentiment lexicon. Then, calculated on a per-tweet bases immediately after being collected.
  • Saved data into a database (AWS Aurora dB) including: date of tweet, language, author, content and of course, sentiment.

Day 5-6

  • To link the frontend web page and the database, we made use of two Amazon Web Services tools: API Gateway and Lambda.
  • From API Gateway, we created a URL – which upon requesting triggers a Lambda function written in Python. The Lambda function queries the database, does some light calculations, and returns the results as JSON.
  • These results make up the body of the URL’s response, which is then displayed on the webpage for the front-end to work with.
  • Created interactive front-end UI from UX wireframes.
  • Integrated front-end code into SPARCK Live site.

Day 7-9

  • Linked front-end code to the live system, and tested against a live stream of >300 Tweets per second to test integrity under load.

Day 10

  • Launched onto our SPARCK LAB to the public.