In this blog post I want to give an example to how declarative programming can not only help developers in making their lives easier, but also help users keep their private data private. The gory technical details behind this post can be found in our Onward! 15 paper.
Who Owns our Data?
The cloudalion website runs on WordPress, hosted by WordPress.com. At total, I spent a quite some time putting content into this website, and I expect it to not be a waste of effort, so that when I decide I want to move to another provider, and maybe even another platform, I do not want to find myself in a spot where I just need to write everything all over again.
WordPress.com allows me to download the contents of my website to a big .xml file. At one point I tried taking this file and uploading it to another WordPress provider, and failed due to some bug. I believe this problem can be solved, but it made the problem very clear: Even where the cloud service provider has all the best intentions, ownership of the data is a real problem.
In the cloudalion website, all content is intended to be shared with the world. Even drafts (which are private) are written so that one day they will be made public. So if a bug in WordPress causes a draft I wrote to be published without my consent, it would not be the end of the world for me. So you’ll see a truncated post with a few more typos than usual, but probably nothing I would really regret writing. In contrast, social media is full of these exact things. People use social networks to talk to friends, gossip, congratulate, criticize, etc. We write things to people in confidence, things not intended for the world to see. When we do, we want a guarantee that what we write will not go beyond the circle we intended to share it with.
We do not always know, understand or appreciate the implications of the privacy policies of the websites we use. These are typically long legal documents that most websites do not take the time to explain it in layman’s terms.
Conclusion: be careful with what you share online…
OK, sorry, got carried away a bit. This is not that kind of post. I’m not trying to scare you or anything. I wrote the above just to make the point that ownership of data is a real problem, one that affects us all. What can we do about it? This is what this post is all about.
The Three-Tear Problem
Traditionally, when we think about data in Web applications, we think mostly about numbers and strings, stored in a database that helps us retrieve them, but not to interpret them. Because the database does not interpret the data it does not know by itself what piece of data a user is allowed to see, and what modifications a user is allowed to make.
These roles are traditionally invested in the business logic of the application. In the three-tier architecture, the business logic is the part that runs custom code on the server side. This is in contrast to the data tier, which (typically) runs on an off-the-shelf database, and the presentation tier that runs on the client. One of the most prominent roles of the business logic tier is to decide what to allow a user and what not to.
Because the database holds plain data, the business logic has unrestricted access to that data, and can read or write whatever it sees fit. It therefore acts as the gatekeeper, deciding what the client can or cannot do, and what it can or cannot see.
However, with great power comes great responsibility. By having unrestricted access to the data, the business logic tier has unlimited opportunities for bugs that leak sensitive information or unlawfully modifies the data.
Just Like Politics
One can compare the presumption of getting access control right in the business logic to the presumption of having a small group of people know what is good for an entire whole nation. This model was practiced for centuries in monarchies, where kings and queens, apart for caring for themselves and their families usually acted out of what they truly believed to be the best interest of their kingdom.
In communism, the communist party was a body that collectively made all policy decisions for their country, based on the assumption that the people are driven by self interest, and they (the party) are driven by the common good.
In both cases, the group in power has full control over the country. They have access to all information, and the power to use the armed forces and law enforcement at their will.
These models has two major flaws. First, we can only hope the group in power actually works in the people’s best interest. There is no actual mechanism to make sure that even a majority of people are satisfied with even a some of what their government is doing. The second flaw is that even if the king or head of party has the best intentions at heart, there is no mechanism to oversee the execution of their orders, besides the mechanisms they put up themselves. Unfortunately, many kings in history preferred to hide the fact that their family members conspired against them, just to save themselves the shame.
The second problem is the problem of enforcement. How do we know the software actually does what the policy says? Can we trust the company to allocate enough resources enforce its policy well enough?
In politics, these two problems are typically addressed in democratic regimes. The first problem is solved by the use of elections, in which the people, either directly or indirectly elect their leaders for time-limited terms. The need to be elected motivates leaders to at least seem as if they represent the interest of their voters.
The solution to the second problem is a two-fold. One part is separation of powers, in which government itself is split into typically three branches, elected in different ways, with different authorities, and the responsibility to scrutinize one another. The other is transparency, in the form of free speech, freedom of the press and freedom of information, which gives citizens the ability to scrutinize their elected leaders and uphold them to what they claimed they will do once in office.
There are a lot of flaws in democracy and in how it is implemented in many countries. Nevertheless, it is hard to argue that it is not a big improvement over the alternative. We therefore ask the question, can we apply democracy to the cloud?
The Democratic Cloud
So how can we allow Web applications to share information between people, while keeping some information private, without giving them the power to corrupt?
Just like in politics, our answer is separation of powers. Remember the pre-cloud era? When people bought (or downloaded, or copied) software to be installed on their PCs? Back then there was a clear separation of powers between those who create the software (e.g., Microsoft) and those who ran it (ourselves). Today these two roles are played by the same party. Google, Facebook and Twitter both develop the software and provide it as a service. This duality makes them the owners of our data, and hinder transparency with regards to what they do with it.
So our first step in making a democratic cloud is separating these roles into two different legal entities (e.g., two companies). One will provide the software (SaaS), and the other will serve it, on top of a platform it provides (PaaS).
In a way, the cloud already operates this way. Many SaaS providers already base their services on PaaS offered by others. However, in the existing model, the PaaS is totally hidden from the end user, and the SaaS provider has full control over both the data and the software.
In the model we propose, both the SaaS provider and the end user are customers of the PaaS provider. Under this model, the PaaS provider is payed to uphold the profitability of the SaaS, but at the same time to enforce the rights of the end users. This balance between these two powers, we believe, is the key to a “democratic cloud”.
At this point, you must be asking yourself two major questions. One is a technical one: how can we implement business logic in a way where no one actually knows everything, and no one is able to do everything. The other question is of a business nature: what can motivate today’s SaaS providers to move into such a model, where they lose their control over the data. In the remainder of this post I’ll give a brief answer to each question.
Facts and Rules as Messages
The answer to the technical question is that we treat our data and our business logic as facts and rules, which is an idea I already advocated in this blog time and time before. But we do not stop there. We put each fact and each rule in an “envelope”, and on this envelope we write the name of the recipient, as well as the name (or names) of the sender(s). Then we put it in this big tank with all the other envelopes to mix.
We only allow users to post “messages” (facts or rules in envelopes) as long as they write themselves as senders. When a user queries the database, we only show him or her results for which they are recipients. All we need to add for this to be a complete access control system is the logic of what happens when we apply a rule to a fact. Let’s just say that if we address this in a way that makes sense, we get a simple mechanism that allows a third party (the PaaS provider) to enforce access control, without knowing anything about the application, and actually, without making a special case for the application. The SaaS provider is just another user who publishes rules…
The Business Incentive
It is pretty obvious why users would prefer this model, but why would SaaS providers?
So if your favourite social network uses unanonymized user data for analytics, it has to make it available to itself somehow. This can be done by putting it in an envelope addressed to themselves (or to someone who preforms analytics on their behalf). However, if they do this, users can see it by looking at the logs provided by the PaaS provider, and report this abuse of private data.
This can lead SaaS providers to improve their privacy practices. But why would they want the trouble?
I think the biggest selling point is that it takes a big hustle off their backs. Today SaaS providers are legally responsible for preserving user privacy. There are different levels of legal requirements in different countries, but at the very least, SaaS providers can be sued for not complying with their own privacy policies.
By transferring control over user data to the PaaS providers they take away much of the burden. They are not responsible for user data, because they don’t have access to it (except for in the client side, where they do not retain it). The burden moves to the PaaS provider, who solves the problem once, for all applications.
There is a way to make the Internet more democratic. It is not something one can do in a day, or even in a year. This transition requires a transformation of mindset more than anything else, and these tend to be tough to make. However, making this transition has a lot of benefits for both developers and users. Creating the software to support this is one of the top priorities of the cloudalion project, but getting the software there is just part of the deal. People need to be convinced in the benefit before they get into such an adventure. To gain this confidence we need to get the software out there and start using it for gradually bigger projects. This too is our goal, which we hope to achieve with the help of the community.