Data science and data security

10 min

Ruben Verborgh on data & privacy

How we will regain control of our personal data

Data is the ‘new oil’ that drives our economy, but recent data breaches indicate that we should perhaps be wary about sharing our data. According to Ruben Verborgh, expert in semantic web technology, the way to regain control over our personal data is to return to a decentralized web, a win-win scenario for both consumers and companies.

Trading data for services

In March 2018, it became clear that Cambridge Analytica had used personal data of millions of Facebook subscribers for political purposes. This scandal made it painfully clear how little we had worried about our personal data before. In fact, our lack of control over our own data is one of the concerns that Tim Berners-Lee – inventor of the World Wide Web – voiced in 2017, claiming that the internet today is not as open as it used to be.

Although originally intended as a ‘free space open to everyone’, the web has evolved into a centralized system governed by a handful of internet giants such as Google and Facebook.

What they have in common is a shared business model that seems very attractive at first sight: they do not charge us for their services. However, if we were to read the small print in their data policy, we would soon realize that we are paying – not with money, but with our personal data.

The catchphrase of the decade is ‘data is the new oil’, so obviously the more data companies have, the more powerful they become. For Facebook, for instance, the data of an average user is said to equal an annual revenue of roughly 20 dollars. This seems a considerable amount – especially taking into account that the site has 2.27 billion active users – but looking at it from our own perspective as consumers it is actually not that much at all. Let’s turn it around: suppose you’ve lost all your private messages and all your photos, how much would you be willing to pay to get them back? Surely, 20 dollars is a small price to regain control over your data.

image

We are continuously creating data about our personal life, professional life, shopping habits, etc. For a long time, nobody seemed to be particularly worried about the amount of information we 'give' to companies such as Facebook, Google, and others. But in the near future, people will want to regain control of their personal data.

The business model of today’s centralized web does not only impact our privacy, but also stifles innovation. There can only be one winner, i.e. the one who has most data. And in many sectors, those winners have already been established. This leaves them with little incentive to innovate. After all, why would you change a winning game? 

Social media companies are not the only players in the big data game. Almost every company collects information on its clients and is – in a way – a big data company. Even supermarkets track and analyze what you are buying and send you personalized ads based on your profile. For many of these companies, data is merely a means to an end: they need data to optimize their services or sales, but they actually have no direct interest in owning or managing it. In addition, owning data is increasingly becoming a liability for companies, especially since the GDPR (General Data Protection Regulation) came into effect: if they store it, they also have to be able to protect it. 

Imagining a decentralized web

By 2035, we will have regained control of our personal data. What this means is that we will get the choice to store our data wherever we want, independently of the applications we use. Our personal data will be stored in so-called ‘data pods’ that we control ourselves. In an ideal world, we will have multiple data pods per user, e.g. one for personal data, one for work-related information, one for official documents, etc. Everything we post will be saved in one of our own data pods, which are stored on a server of our choice. 

To enable something like a social network, we would of course need to share some of our data. Only, in the decentralized web of 2035, we will get to choose – for each application separately – which data pod we want to make available. 

We will also remain the owner of the data – no copies will be made – instead we will just link the application to the parts of our data pod we are willing to open up. To make this possible, one important condition has to be met: applications must be able to reuse data created by other applications.

To explain this with a present-day example: suppose all your friends are on Facebook, but you prefer a less popular alternative like Ello. In 2019, choosing Ello would imply that you cannot interact with any of your friends (unless you are a trendsetter and can convince them to join you on Ello – but then we are merely replacing one point of centralization by another). In 2035, it will not matter which social networking site your friends use. As applications will no longer own any data, they will merely become different interfaces to view information from your friends’ data pods. So, on Ello, you would be able to see – and interact with – your friends’ posts on Facebook. Switching between applications would thus become easy and competition between them would be based on service quality rather than data ownership.  

Currently, half of the internet is hidden in data silos belonging to a few companies. It is almost the same as if your browser would determine which websites you can access. Imagine Safari users being unable to see any website created by a Google Chrome user. This makes as little sense as a social network site determining which friends or contacts you can connect with.

image

In a decentralized vision, data storage and applications will become two independent markets. Social network sites will try to outperform each other with the user-friendliness of their services rather than with the sheer number of users they have. There will also be a new market for servers that offer data storage.

The end of the big data age: then what?

Decentralizing the web disrupts the big data business model. It changes the way applications compete: instead of attracting new members by already having a lot of data about current members, they will have to convince clients with innovation, e.g. a user-friendly interface, the best customer service, flawless data security, etc. And, more importantly, they will also have to make money another way, e.g. by charging an annual subscription fee. Or perhaps some will give you the option to pay a fee or to (consciously) sell your data. The point is that this way of thinking creates choice and diversity, and thereby opportunity.

The effects of reclaiming control over our data will of course go far beyond the reach of social media. For example, most consumers today do not really know what information their supermarket has on them, let alone that they exert any control over it. In a decentralized system, customers will own all their data. At first sight, this might seem a big disadvantage for supermarkets, but in fact, it is also an opportunity to get access to more data than they could possibly collect on their own. It becomes a trading game: customers might agree to share all their personal shopping data – from every supermarket – in return for personalized bargains. If supermarket Y notices that you always buy yoghurt in their competitor’s store, they could offer you a personalized reduction on their dairy products. It allows them to analyze their clients’ complete shopping behavior, giving them valuable insights to optimize their offer. 

image

Most consumers today do not really know what information their supermarket has on them. In a decentralized system, customers will own all their data and will have the option to (consciously) ‘sell’ it. For supermarkets, for instance, this could be a big advantage because they could request access to customers' complete shopping data, finding out what they buy from competitors as well. This would allow them to analyze their clients’ complete shopping behavior, providing valuable insights to optimize their offer.

Why is now a good time to reclaim our privacy?

Moving – or actually returning – towards a decentralized web is first and foremost a matter of convincing people that it is important to do so. The growing concern regarding data privacy is definitely a step in the right direction. In 2018, the European Union also gave an important signal with the launch of its General Data Protection Regulation (GDPR), which aims to give individuals more control over their personal data. 

Regulations like the GDPR also make it more expensive for companies to own data, which could stimulate them to move towards a different business model; especially for companies that only use big data as a means to an end, not as a central component to their business model. 

Companies will need to start thinking differently about data. Like oil, data should not be kept and stored, but rather become the substance that makes the engines run. 

Interestingly, giving people back control over their own data does not mean there will be less. In fact – in a decentralized view – there will be even more data because companies will also be able to request access to data harvested by competitors.

 

How is imec contributing to this future?

Of course, there are also a number of technological challenges associated with a decentralized web. Sharing and owning big data is easy if everything is stored in the same place (as it is now). But how do we enable a big data application – like a social network – if users’ information and posts are scattered across millions of individual servers? 

The main challenge is that a decentralized web requires significantly more processing power and network bandwidth, because its algorithms are more complex. At the same time, individual data stores are obviously a lot less powerful than large-scale data centers. The solution lies in collaboration: every single personal data store as well as every device (e.g. computers, smartphones, tablets, etc.) has its own CPUs. Today, these are actually idle most of the time. At imec, we are exploring how these CPUs can work together – when they are not being used by their owner – to enable a decentralized web. If they collaborate, their collective computational power and bandwidth would be enough to meet the increased complexity of decentralized algorithms. To make sure this collaboration happens in a trustworthy way, we are exploring a system of distributed ledgers based on blockchain technology.


Want to know more? 

  • For more information on Ruben Verborgh’s view on the decentralized web, take a look at his personal blog.
     

This article is part of a special edition of imec magazine. To celebrate imec's 35th anniversary, we try to envisage how technology will have transformed our society in 2035.

About Ruben Verborgh

Ruben Verborgh is a researcher in semantic hypermedia at IDLab, an imec research group at UGent, where he obtained his PhD in Computer Science in 2014. He explores the connection between Semantic Web technologies and the Web's architectural properties, with the ultimate goal of building more intelligent clients. Along the way, he became fascinated by Linked Data, REST/hypermedia, Web APIs, and related technologies. He is a co-author of two books on Linked Data, and has written several publications on Web-related topics in international journals. In 2018, he joined Inrupt, the organization created by Sir Tim Berners-Lee, which aims to create an ecosystem – called Solid – that enables the decentralization of the web.
 

This website uses cookies for analytics purposes only without any commercial intent. Find out more here. Our privacy statement can be found here. Some content (videos, iframes, forms,...) on this website will only appear when you have accepted the cookies.

Accept cookies