Cambridge Analytica and the folly of Big Data

    Thursday, 05 December 2019
    The Belgian law is incompatible with the European regulations. Credit: Pixabay

    When the documentary ‘The Great Hack’ premiered at the Sundance film festival in January of 2019, the news headlines and reviews echoed the film’s sensationalism, pointing towards a high profile mass-manipulation of the highest order.

    The belaboured premise being that Cambridge Analytica, a political consulting firm based in England and owned by Trump donor Robert Mercer, had built psychological profiles based on all of our deepest darkest secrets and applied shadowy algorithms in such a way that they were able to create a disinformation campaign so powerful that it lead to both Brexit and the election of Donald Trump. The timing of the release coincided loosely with the publishing of the Mueller report, the culmination of a two-year-long enquiry into the shady dealings of President Trump.

    The resulting media narrative has been an incoherent mess aimed at vilifying any number of actors that may have been responsible for either of these two political catastrophes.

    The catchline of the trailer that “Data” surpassed oil in value at some point in the last years doesn’t stand up to scrutiny, in fact, the list of largest companies by revenue shows that seven of the world’s largest companies are energy companies, two are car companies and the biggest is Walmart. The twisting turning investigative style mirrors the reporting of Carole Cadwalladr, the journalist who broke the initial story in 2017.

    Cadwalladr fanned the flames of a Russian conspiracy writing “someone close to the intelligence select committee tells me that ‘work is being done’ on potential Russian interference in the referendum.”

    The Scandal

    The scandal is an ethical one surrounding Cambridge Analytica. It is claimed that they used the facebook ‘like’ data of 20 million users without consent.

    Researcher Alex Kogan at Cambridge university was contacted to perform basic survey design and he was eventually paid for mining the personal data of around 300,000 facebook users who filled in a personality survey which provided data that facebook had for their friends. This was all done in accordance with facebook’s privacy policy and with the consent of the survey takers. He then legally sold this data to Cambridge Analytica.

    Cambridge Analytica claims to have then used this data to create profiles with hundreds of thousands of dimensions on every person in the US. This claim was made by CEO Alexander Nix at the Concordia Conference in 2016. This permitted them to assign users to one of the Big 5 personality traits and suggest “microtargeting” strategies to the aforementioned campaigns.

    A word on Personality Traits 

    Before we go any further we need to examine this notion of the big 5 personality traits, a branch of psychology that finds its origins in the work of Francis Galton, AKA the Father of Eugenics. The idea that some typology of personality exists and that this has a causal impact on behaviour is the result of the flickering embers of the behaviourist movement and psychological pseudoscience linked to the phrenology movement of the 19th century.

    The most convincing work done on the subject was by Walter Mischel who thoroughly disproved the ancient notion that “Personality” has any predictable impact on behaviour in the late 1960s. These personality traits are total hokum.

    Microtargeting and its Discontents

    Microtargeting is a method used by political campaigns to identify potentially politically active people (voters, donors, volunteers) by using combinations of dimensions or traits that are thought to be linked, indicative of particular types of voting behaviour, and highly concentrated.

    Content-specific to these profiles is then communicated to these microtargeted groups. An example would be combining facebook likes, gender, shopping behaviour, average income and voter registration in a neighbourhood to try to find out how many wealthy independents bought yoga mats, and then tailoring and delivering content to people thought to have common political sensibilities on the basis of these shared traits.

    Professor Eitan Hersh, in his 2015 book Hacking The Electorate analyses the efficacy of ‘Microtargeting’.  He discusses the quality of the available data, which notably differs from state to state based on what type of information is publicly available and how this already leads to major challenges when trying to create microtargeted groups. He claims that, not only are these analyses limited because of the differences in the type of publicly available data, but that they are limited in their predictive ability. Political campaigns have been relying on publicly available information for decades and this information is still the most indicative of political affiliation but it is not shown to increase the efficacy of microtargeting.

    Regardless of how effective microtargeting, and the data models that orient decision-makers towards which microtargeted groups they should be aiming their advertisement, the most critical and overlooked issue is that, ultimately, a human being must make a decision about what data to feed to the model and accept the information they are presented as a result.

    Algorithms, Fire and Dangerous Things

    As for the data analytics and underlying data models, in his book Outnumbered, mathematician David Sumpter discusses his attempts to apply advanced statistical models to a similar dataset, and then reaching out to Kogan after confirming his suspicions that it seemed impossible to make such assertions.

    In Outnumbered, Kogan confesses his own scepticism that Cambridge Analytica was capable of making such accurate predictions based on the data that was passed onto them. When users had enough likes (ideally at least over 50), and voter registration was known then regression models were able to link some ‘liking’ behaviours to supposed political affiliation, but when voter registration information was effectively unknown then the models operated only slightly better than chance at identifying political affiliation.

    This is rendered useless when the fluidity of voter behaviour is taken into account, as well as the evolution of regional politics and perhaps most importantly, the fact that voter turn out for both Brexit and the 2016 US election were both around 60%.

    When confronted Alexander Nix confessed to the same, and when David Carrol of the Great Hack made a data protection request to Cambridge Analytica they sent him his age, gender, the district he voted in, that he voted in the Democratic primary and was thus a registered democrat, and their own assessment that he was likely to vote and continue to vote democrat.

    This information is the standard political data that political analysts have been using for years. All the empirical evidence points to Cambridge Analytica simply not being capable of doing what they claimed, instead making gross exaggerations as to their capabilities and piggybacking on the success of their clients while ignoring all their failures.

    This, of course, is standard behaviour for a consulting firm especially one selling the snake oil of “Data Science.”

    The Algorithms Did not Save us from Hillary

    The microtargeting that CA was accused of conducting was nothing new to politics, in fact, Obama acquired facebook like data for 190 million users based on 1 million app downloads. In this case, app downloaders were aware that they were going to be pandered to, but their Facebook friends were not.

    Obama’s campaign conducted microtargeting based on facebook likes and the aforementioned publicly available data. So why didn’t Hillary do the same, you may be asking yourself. She did.

    Hillary Clinton, who recently suggested that Mark Zuckerberg should “pay the price” for the damage he’s done to democracy, resurfaced several months after losing the 2016 election to start pointing fingers at everyone but herself. At the Recode Technology conference the unemployed and embittered Clinton said “I get the nomination. So I’m now the nominee of the democratic party. I inherit nothing from the democratic party.”

    This, of course, is patently false, as former DNC chair Donna Brazile claims in her exposé of Clinton’s glaring failures “Hacks”, which displays how the Clinton campaign had been covering the DNC’s debt as early as August 2015, months before a presidential run was even announced. The leaked Podesta emails that lead to Brazile’s predecessor, Debbie Wasserman-Schultz, showed how the Clinton campaign had taken total control of the DNC and weaponized it against the viral popularity of Bernie Sanders.

    At their lowest point, Hillary’s top aides suggested going after Sander’s Jewish heritage, a tactic similar to the racist dog-whistle she had taken against Obama in the 2008 democratic primary.

    The second part of her statement is in reference to the database called “Vertica”, which is a reference to the application that was used to organise and store data for the Obama campaign in 2012.

    This prompted Hillary to endeavour into creating the ‘most advanced’, data-driven campaign the world had ever seen. Campaign manager Robby Mook built his campaign around a computer model name Ada, designed by Elan Kriegel, that was to lay the foundation for every strategic decision the campaign took.

    In the run-up to the 2016 election campaign staffers bragged about their “invisible guiding hand”, and planned to release Ada to the public the day after the election. Ada was fed data from all over the country and executed four hundred thousand simulations every day, guiding the campaign in every decision it made. It was said that no decision was made without consulting Ada, and that it could only be accessed by a cadre of top aides.

    The plan was to micro-target based on a number of different dimensions collected for every voter in the database. The campaign famously said something to the effect of ‘for every coal miner we lose in western Pennsylvania we’ll pick up two suburban mom’s in Philadelphia’. It was an act of hubris made all the more astonishing due to her having lost the Michigan primary against Sanders despite America’s erstwhile foremost analyst Nate Silver of the 538 blog giving him a less than 1% chance of winning the state 6 months earlier.

    This failure illustrates the biggest issue with data science today, namely, that if these systems are going to predict human behaviour in an accurate and consistent way, and actionable insights are to be driven from these analyses, then a much deeper understanding of human cognition and nature is required, and a way to capture that knowledge in data is a major prerequisite to any model that is capable of analysis.

    The Falcon cannot hear the Falconer

    Hillary’s campaign spent over $1.1 billion, 500 million more than Trump. It is said that Trump barely even organised in Florida, and despite the Clinton campaign’s microtargeting of Spanish speakers in Florida, Trump still carried the majority of Cuban-American voters.

    During the Republican primaries Trump was able to put away one data-driven campaign after the other. Jeb Bush, who spent $160 million was written off by a catty remark at a debate. Ted Cruz, who was working with Cambridge Analytica at the time, and represented the second most extreme candidate in the primaries, was humiliated time and time again despite his penchant for well-crafted debate and microtargeting.

    As for the democratic ticket, when grassroots organisers pleaded with Clinton HQ to send support, Robby Mook and Ada ignored them. Was Ada consulted when crowds streamed out of a Clinton rally Jay-Z had performed at in Ohio as soon as she began speaking? After months of pleading, Ada finally suggested that Michigan was at play, but in this case busses of volunteers heading for Michigan were turned back by the Clinton campaign. DNC chair Donna Brazile pleaded incessantly for Hillary to heed these warnings, but the algorithmic arrogance of her campaign refused not only to listen but also liberate funds the DNC had itself raised in support of her campaign.

    This is the folly of data analytics. IT is something known as the drunkard’s dilemma, whereby a drunkard limits the search for their lost keys to under a lamppost because that is where the light is shining. In business slang the expression is “garbage in, garbage out”, and Hillary’s entire billion-dollar campaign, with its 60 mathematicians and statisticians, was a big stinking pile of garbage.

    Perhaps more troubling than the misplaced faith in misused machines was the fact that Ada predicted that Pennsylvania and Florida were close calls and the Clinton campaign subsequently spent tremendous resources securing these states, but still lost both states in dramatic fashion. This puts the issue in stark relief, namely that it doesn’t matter if you identify every possible voter, it doesn’t matter how much money you spend to reach them, and it doesn’t matter how much of a bumbling idiot your opponent is. You have to listen to voters, and not talk at them. A messaging strategy that echoes their concerns and sentiments is the nature of our democratic processes.

    Could it have been the Russians?

    Russia has interfered in elections across Europe, most notably with Russian Banks providing €9.4 million in loans to Marine Le Pen’s far-right when other European banks refused. Investigations showed that an organisation likely linked to the Russian government had spent about $150,000 on Facebook ads, and ran groups with as many as 300,000 members.

    The Clinton and Trump campaigns spent about $81 million on social media ads by contrast.

    Russian attempts to meddle were largely supportive of Donald Trump, but upon closer inspection were as befuddling as they were “expansive”. Russian ad buys, generally averaging in the $25-50 range, were sending messages as diverse as ‘like if you love Jesus’, advertisements for sex toys, facebook groups with as few as several hundred members primarily oriented at providing skin and hair care advice under the name of “Woke Blacks”, and even promoting anti-trump rallies. Celebrities like Sarah Silverman, who spoke at the Democratic National Convention in support of Hillary Clinton, and Daily Show host Trevor Noah retweeted Russian Trolls themselves!

    The Russians were likely also responsible for the email leaks, not because of some complex hacking scenario but because of some simple phishing emails. Campaign manager John Podesta is thought to have let hackers in despite the campaign being aware for weeks that there were major phishing attempts. In the end, the tragedy wasn’t the hacked emails, it was the content of the emails.

    From premeditated antisemitism and mocking Catholicism, to wariness of “refugees” due to a lack of vetting (ironically the exact same language Trump used publicly), to emails describing how they were manufacturing a pro-green anti keystone pipeline agenda but careful to hide that Clinton’s campaign and the Clinton foundation receiving millions of dollars from keystone backers and the fossil fuel industry.

    If the emails had been leaked by a Clinton staffer it would have been called whistleblowing, and this might go a ways in identifying why “microtargeting” and big data were disastrous for Hillary.

    Hillary Clinton glances at the Zahir

    Jorge Luis Borges talks about the concept of a Zahir, an old Arabic myth that says that there exists on the planet at any given time, an object that takes on the powers of the Zahir. Whoever glances at the Zahir will have their thoughts slowly consumed by it until they are rendered insane. The Ada computer model was thus something of a Zahir for Hillary Clinton.

    The Clinton campaign simply had no idea what voters were saying because they weren’t listening. Instead, they built a machine that repeated back to them daily how she would be president, and in looking into this machine Hillary’s thoughts became consumed with the idea of her own presidency. So much so that 3 years down the line she still seems fixated on the all-consuming vastness of her unpopularity that has become the Zahir. First it was the Russians, then it was Cambridge Analytica, and now it’s Facebook itself. All of these entities are nefarious and power-hungry, all deserving of tremendous scorn and wariness, but no, it wasn’t the algorithms that doomed us, it was the humans who built them, fed them and interpreted their results.

    Alexandre d’Hoore