Privacy in the Age of Big Data

05/02/2013

Social media, Web 2.0, the Internet of Things, mobile computing, and the expansion of sensors is allowing more information to be gathered than ever before which, coupled with Big Data analytics, offers unprecedented insight into the needs, patterns, and habits of users, citizens, and consumers. Though it leads to better service, convenience, and efficiency, the recent deluge of personal information also causes complex privacy concerns with no simple technical, legal, or behavioral solution as they go beyond simply securing personal information and stem from the sometimes counter-intuitive nature of Big Data itself.

Despite the name, Big Data means more than just a large amount of information. The TechAmerica Foundation explains Big Data as “a phenomenon defined by the rapid acceleration in the expanding volume of high velocity, complex, and diverse types of data. Big Data is often defined along three dimensions — volume, velocity, and variety” and added, critically, that it requires “advanced techniques and technologies to enable the capture, storage, distribution, management, and analysis of the information.” Because of the type as well as the size of the data now being collected and integrated – unstructured or semi-structured text, clickstreams, video, audio, images, etc. – it can be difficult or even impossible for users and analysts to know in advance what insights they can draw when that data is combined and analyzed through advanced technical methods.

This is a problem for privacy policy as most of the data collected is what the security world would call “sensitive but not classified.” This is information that may be entirely innocuous alone but can be augmented to prove highly revealing. Much of the data gathered is anonymized so that, on its own, it cannot be linked to a specific individual. When combined with other data sets and further analyzed, however, numerous connections can be easily made. In a recent study, it was found that mobile phone location data – which cellular tower was closest and the time of the call – can identify an individual with 95% accuracy using only four data points. With just a little additional information such as a person’s daily commute and an event they attended, for example, you can tie an identity to an anonymous user. The same study references another experiment where a medical database was combined with a voter list to extract the health records of the governor of Massachusetts. Another example came from the Central Intelligence Agency’s Chief Technology Officer Gus Hunt, who, in a presentation, described how accelerometer data gathered by the health and fitness tracking application FitBit can be used to identify individuals with perfect accuracy based on the way they walk.

Most of us are not comfortable with this level of exposure, but at the moment, there is little we can do about it. Just by browsing the web we can pick up hundreds of tracking cookies gathering seemingly insignificant bits of information about where we click, which sites we visit next, what type of stories we like to read, and what items are in our shopping carts, which can later be sold, collected, and aggregated for a more meaningful profile. Few users on social media check the constantly changing privacy policies of services such as Facebook to see how their personal information is used even by their consent. Our increasingly prevalent mobile devices gather even more information. Simply to run my new tablet, I had to sign off on countless essential applications that would store and record information such as my search history and location. A recent look at the 100 most popular free applications, the top ten in five common categories for iOS and Android found that all of the Apple and 92% of the Android apps sent and received user data unencrypted, 60% of iOS apps and 42% of Android apps collect location data, and 54% of iOS apps collect contacts or e-mail addresses while 20 percent of Android apps did so as well.

Numerous solutions have been proposed to address the new privacy concerns raised by Big Data. The simplest are legal, with new norms and laws governing how information is collected, stored, and used. A February World Economic Forum report, for example, suggests focusing regulation on how data is used rather than how it is collected and holding organizations accountable. Any legal or regulatory solution would require technical measures, however, so others have proposed tools to help us control the Big Data that we generate. M.I.T.’s Human Dynamics Lab is, for example, developing a framework called openBDS, “a privacy-preserving implementation of an information repository which allows the user to collect, store, and give access to his data.” Lastly, pragmatists remind us that we only have as much privacy as we create for ourselves and recommend thoughtful use of social media and new technology.

All of these approaches have merit but struggle with the challenge at the heart of Big Data privacy – because innovative technology and methodology are required to generate insight from Big Data, we don’t immediately and automatically know what we should and should not be sharing. Privacy, like all forms of security, is a trade-off, and because Big Data is by definition not intuitive, users are ill-equipped to make informed decisions about what they’re comfortable sharing. Since Big Data analytics derive non-obvious connections between seemingly trivial data points, it may be tempting to err on the side of caution and not share anything. By clamping down on Big Data, however, we would be missing out on some of the greatest advances in modern information technology. Opting out entirely would not only be inconvenient, costing us our GPS, helpful applications, better reading and viewing lists, relevant ads, and hundreds of dollars in targeted coupons, it would also stifle innovation. Advertising aside, government and industry use Big Data to craft better solutions for users, customers, and citizens ranging from search engines to cancer diagnosis.

Big Data privacy is a unique but pressing challenge with no obvious solution, as there is no single technological fix, no common sense practice, and no strict regulation that can guarantee everybody a reasonable expectation of privacy without strangling innovation. New laws and policies will need to be implemented to ensure data is not used in an invasive manner while solutions will need to be developed to comply with, monitor, and enforce such policy, all while users will need to educate themselves and make smart choices about the data they share. Still, individuals must decide for themselves how open or restrictive they want to be with all their data, even when it seems to be of little value, an already complicated problem compounded by Big Data collection, storage, and analysis techniques.

Subscribe Sign In