In this blog I will discuss two computer languages, Cloudlog and Cedalion – the former is a query language and the latter is a programming language, but both languages are pure declarative. This means that none of these languages has commands. Instead, they have declarations, definitions and rules of different kinds. We will discuss these languages in depth in future posts, but in this post I would like to talk about what declarative languages are, and why do we need them?
What Are Declarative Languages?
The most common answer you’ll find will tell you that unlike an imperative language (such as C, C++, Java, Python and probably most other programming languages you know), where you define how your program will work, in a declarative language you define what your program needs to do. I find this answer somewhat misleading, as it implies that declarative languages are somewhat magical, in the way that they can solve problems typically solved by a programmer. You just feed in the “what” (the spec for your software), and they will take care of the “how” for you.
This interpretation of the term is obviously not true. As a programmer writing code in declarative programming languages you should be aware of the “how” just as much as when you are writing code in imperative languages. To explain what declarative programming languages really are all about, I’ll use a cake recipe example.
A Declarative Cake Recipe
Cake recipes are typically imperative. They provide step-by-step instructions of how to bake a cake. They start with a (somewhat declarative) list of input ingredients, and then tell you to do this and then to that, and then if it is so than add that otherwise don’t, and at the end, put it in the oven. Recipes work like this because we do. We usually find it easier to follow step-by-step instructions, rather than plan our actions based on a description of what a cake is.
An alternative to a step-by-step recipe is a describing the cake by what is sometimes called a product tree. For example:
Cake = Doe, baked at 180 deg. Celsius for 45 Minutes. Doe = Dry mass + wet mass, mixed for 2 minutes in a mixer. Dry mass = 1.5 cups flour + 0.5 cups sugar + 1 tbl spoon baking powder Wet mass = ...
This declarative recipe conveys all the information we need to make a cake, just like the step-by-step instructions. However, it does not tell us what to do before what. For example, it does not tell us whether to first mix the dry mass or the wet mass. It just gives us the information of what needs to be done, and we should figure out the sequence of when to do what. Readers familiar with graph algorithms can see that this description is a tree (or a DAG, in a more general case, where one ingredient can be used more than once in a recipe), and the algorithm we need to apply to this tree (or DAG) in order to get the step-by-step instructions is topological sort, which a simple and efficient algorithm. As people, we find it hard to follow such algorithms, and therefore having the step-by-step instructions spelled for us is usually a good thing. However, in programming, we are not the ones baking the cake. The computer is. We are the ones writing the recipe. So this begs the question, are we really better than a computer in calculating topological sort on a graph?
It’s an Imperative World. But Should it Be?
Up until von Neumann, the computing world was declarative. Digital design is done using languages such as Verilog and VHDL, which are declarative. They define the output of a digital system as a function of its inputs, using logic gates. State (an essential part of imperative programming) appeared when memory was invented. Still, the abstractions invented to reason about state (of digital systems), such as state machines, were declarative. Imperative programming was not discovered. It was invented, with the invention of the machine instruction by John von Neumann in the 1940’s. His greatest contribution to computing was his “stored program” model, the idea that the program does not have to be hard-wired to the machine running it. It can be stored in the computer’s memory as data. Von Neumann needed a simple way to express the program, and found a sequence of instructions to be just that. The idea was simple: Instructions are sets of bits stored in memory. A register (program counter – PC) points to the next instruction to be evaluated, and the digital hardware evaluates each instruction, modifying memory, registers and potentially the PC itself, which by default is incremented one instruction forward.
Von Neumann’s model was a huge leap in computers, making one machine capable of doing multiple different things, and reducing the amount of hardware needed to perform complicated tasks. As computing evolved over the course of the 70 years to follow, new programming languages placed more and more abstractions over machine instructions. However, the step-by-step nature of programming is still at the heart and soul of practically all popular programming languages.
Von Neumann’s model was the right thing for the 1940’s where the hardware price was the limiting factor in computing. Today we have pretty good software running in data centers with hundreds or thousands of computers, each performing billions of instructions per second. Our bottlenecks shifted. Today the two main limiting factors in computing are the size of the data and the size of the software.
With cloud suppliers such as AWS and Google, it is easy (and relatively cheap) to add more computers to scale your application. If your application is a Web application or a mobile application that stores its data on a server, you can solve most of your scalability problems by just adding more servers to serve more request. Most your problems, not all of them. The one piece of your application that will not scale by default is the database. This is where traffic from all the sessions and all the users will meet. This is the single-point-of-failure of your application and one thing that really cares about how big your data is. Solutions that run super fast for 100 users, can be slow for 10,000 users, and will probably just not work for 1,000,000 users. The need to cope with data size was the main incentive behind the NoSQL movement. NoSQL databases come to replace the mostly-declarative SQL, and to provide performance, require de-normalization that is typically done in an imperative fassion. To me, this is a stride in the wrong direction. I’ll discuss this more in future posts when talking about NoDatalog and Cloudlog.
Program size is another limiting factor in computing. As software grows in size, it becomes more complex. This complexity is caused by interactions between different software components, and by assumptions that we make early in the process, assumptions we find to be wrong or inaccurate later on, but we then have to work around them to avoid rewriting big chunks of code. Throughout history, the software industry has come up with ways to break large pieces of software into smaller modules. Object-oriented programming (OOP) had that as one of its primary goals, and today we see this in microservices, which are becoming the new way of making large software.
When breaking software into smaller modules we reduce the complexity that comes from interactions within a software unit at the cost of having to design the right interfaces for our modules. These interfaces, often called APIs, have a crucial role in determining the success of our design. If we get these APIs wrong and expose too much (or the wrong things), it makes it harder for our design to evolve.
Abstraction is the key to simplifying complex systems. Placing modules behind interfaces is just one way of abstraction. The programming language we use is by itself an abstraction over the machine on which the program runs, and it provides us the ability to define further abstractions, such as functions, classes and even variables (anything that can be named).
Domain-specific languages (DSLs) are computer languages that provide abstractions for a specific problem domain. HTML, for example, provides abstractions for parts of a web-page, while CSS provides abstractions for styling features. Traditionally, building languages (DSLs included) is considered hard. However, in the functional-programming community there is a long tradition of so-called embedded or internal DSLs. These are DSLs written from within a general-purpose host language (such as Lisp or Haskell), so that the compiler or interpreter of the host language can be leveraged, and so can editors, debuggers etc. However, such languages provide (under the limitations of the host language) syntax and semantics that resemble the problem domain. Interestingly, these DSLs are mostly declarative, and so are (often) the languages hosting them. In a future post I will discuss the Cedalion programming language, and how it takes the best of DSLs to allow complex software to be written in a (relatively) simple way.
OK, I wrote a lot of words, but did not way much… except give some forward pointers to posts I haven’t yet written…
The bottom line is, we live in an imperative world. It is not so because of the laws of nature or because some God has made it so. It is so because we made it so. Because imperative programming was an easier way to get hardware size down in the 40’s and 50’s. Today, when applications are massively parallel and distributed, when data sizes are well beyond what a single computer can handle and program size is often enormous, our challenges are different. Taking a cake product-tree and turning into a recipe is an easy task for a computer. Why should we do it ourselves?