In my previous post I wrote about the two insights that lead to the development of Cedalion, and the first of them was realizing the true power of logic programming. In this post I would like to explain this.
I am assuming most reader have heard about logic programming, and maybe some of the readers even studied Prolog at one point in their life, but I cannot assume readers remember much of it. So I’ll try to take it slow…
It’s all very Logical, Really…
There are a few ways one can program a computer. You can tell the computer what to do, in a step-by-step manner, like this:
x = 10 s = 0 while x > 0: x = x - 1 s = s + x
This is called imperative programming. Alternatively, you can tell the computer what is the value of an expression, like this:
square(x) = x * x quad(x) = square(x) * square(x)
This is called functional programming, which is inspired by Algebra. Logic programming is yet another way to program a computer. This time, the inspiration comes from Mathematical Logic.
I don’t want to go into Mathematical Logic since it is a really big and deep field of study, and most of it is irrelevant to computing, because it is simply too abstract. However, one sub-field, named First Order Logic was simple and practical enough for computer scientists in the 1970’s to come up with a programming paradigm and a programming language that were based on it.
In this post I don’t want to dive too deep into the mathematics. I just want to give the intuition of how it works. In logic, a theory is a set of axioms — sentences taken to be true. For example, Euclidean Geometry can be considered a theory with five axioms, one of which states that one can draw a straight line from any point to any point. Logic then defines how a set of axioms can derive theorems. For example, the theorem that two triangles with respectively-equal edges also have respectively-equal angles is a theorem that can be derived from the axioms of Euclidean Geometry. How is it derived? Well, through logic… Most commonly, through one important derivation rule named Modus Ponens (or MP for short). It states that:
If P implies Q and P is true, then Q is true.
For example, if you know for fact (axiom or theorem) that every banana is yellow (for every X, X is a banana implies X is yellow), and you know that Y is a banana, then you know that Y is yellow.
The axioms of Euclidean Geometry can be stated as “P implies Q” statements for some P and Q. For example:
X is a point and Y is a point implies there exist line Z such that Z goes through X and Y.
Prolog: A Gentle Introduction
Prolog was the first logic programming language, and is by far the most popular one. A Prolog program corresponds, more or less, to a logic theory. This means it consists of axioms. Some of these axioms are facts, and some are rules. Running a program is done by presenting a goal — something Prolog will have to try to determine if it is a theorem (i.e., can be proven) or not (cannot be proven).
For example, consider a “theory” consisting of parent-child relationships between people, say Star Wars characters:
father(anakin, luke). father(anakin, lea). mother(lea, ben). % Revealed in Episode VII
These are Prolog facts. They are written in a function-like manner, because this is Prolog’s syntax, but they should be read as:
anakin is luke's father. anakin is lea's father. lea is ben's mother.
The function-like syntax is simply easier for Prolog to understand. What we actually did was defined two predicates: father and mother. A predicate is a relationship between things, in this case, two Star Wars characters.
In Episode V, after Darth Vader (Anakin) told Luke he was his father, confused Luke could have loaded this program to a Prolog interpreter (they already had them in 1980 when the movie came out) on one of the Millennium Falcon‘s computers, and type:
?- father(darth, luke).
The answer would have been:
This is because we do not have the axioms to tell Luke that Darth Vader is indeed Anaking Skywalker.
alias(darth, anakin). father(X, Y) :- alias(X, X1), father(X1, Y).
OK, so we added a fact saying Darth is actually Anakin (in a new predicate named alias), but what about the second line? This is a rule.
A Prolog rule has the form:
Head :- Body
The head is a goal, and the body consists of one or more goals. A goal is kind of like a fact, but instead of being something we say to be true, a goal is a question we ask. The rule:
father(X, Y) :- alias(X, X1), father(X1, Y).
can be read as:
X is Y's father if X is an alias of X1, and X1 is Y's father
Basically, “:-” can be read as “if” and “,” can be read as “and”. X, X1 and Y are what we call logic variables. Unlike the names of our characters, which we start with a small letter (“luke” and not “Luke”), variables start with capital letters. Variables are placeholders that can be replaced by any value. In our example, X will be replaced by darth, Y will be replaced by luke, and X1 will be replaced by anakin.
Rules, just like facts, are axioms. They are part of our theory (program) and Prolog takes them into account when answering questions (goals). So now, when Luke asks:
?- father(darth, luke).
Prolog will answer:
How did it do it? Simple! It tried to match father(darth, luke) with all the axiom it had. None of the facts matched, so it tried the rule. It tried to match the head of the rule: father(X, Y) to father(darth, luke), and guess what: it succeeded. Variables can match any value, so X matched darth, and Y matched luke.
But we are not done. For this to be true, the body of the rule must be true as well. The body of this rule consists of two goals, connected with an “and” relationship, meaning they both must be true for the body to be true. The first goal is
but X is darth, so we are actually looking at
which looks for all names for which darth is an alias. This matches the fact:
So we record that X1 is anakin and move forward to the next goal:
which is actually:
because X1 is anakin, and Y is luke. Now we are in luck, because there is a fact just like that.
Doing Something Useful
So far, Prolog does not seem all that useful. It helps us decide if things are “true” or not (with our subjective definition of truth), but it does not allow us to learn new things. We cannot calculate things, we cannot discover new relationships, etc.
Or can’t we? For example, instead of asking whether Darth is Luke’s father, Luke could have asked “who is my father”? Like so:
?- father(X, luke). X = anakin ; X = darth ; false.
By supplying a variable (X) in place of a concrete value (darth), Luke could get all values that match. So Prolog gives us both answers: Anakin and Darth. The “false” at the end tells us there are no more answers.
We can, for example, define what an ancestor is:
ancestor(X, X). ancestor(X, Z) :- fatherA(X, Y), ancestor(Y, Z). ancestor(X, Z) :- mother(X, Y), ancestor(Y, Z).
And then we can ask who are Ben’s ancestors:
?- ancestor(X, ben). X = ben ; X = anakin ; X = darth ; X = lea ; false.
And get four answers, representing three different people.
How is That Useful for Languages?
OK, so in the previous post I said Prolog was good for defining Domain-Specific Languages (DSLs), because of two unique properties. The second property was non-determinism: the property that a questions can have any number of answers, including no answer at all. Indeed, the answers we got from Prolog varied from
meaning no answers, to
X = ben ; X = anakin ; X = darth ; X = lea ; false.
meaning four different answers.
This is called non determinism, because it comes in contrast to the determinism of all other programming paradigms. In imperative programming, for example, each command that is executed completes its execution exactly once. Each functional expression evaluates to exactly one value. In logic programming, a goal can have zero or more results.
The first property was that code and data are the same. Look at our example: what is the data here? One can say that facts are data, and rules are code. This can make sense, but is it really true?
Look at the following fact from our program:
It is a fact, because it has no Head :- Body structure, but can we consider it as data? It is most certainly part of our program: the part that states that any person is his or her own ancestor.
So in Prolog queries are non-deterministic and there is no separation of “data” and “code”. But how is it useful to define languages (DSLs)?
If you were to implement a DSL, how would you have done it?
This is not an easy question to answer, especially for people who have not done this before. But I think the most common answer would have been — implement an interpreter. An interpreter is a program that takes as input another program and executes it. A Python interpreter (CPython, for example) is a program that takes as input a Python program, and executes it.
If we were to implement an interpreter in Prolog, how would we have done it? This depends heavily on how much freedom we have over the syntax of the language this interpreter needs to implement. If we have absolute freedom, and we decide we are OK with Prolog’s syntax, we implement this DSL as an internal DSL.
Since in Prolog data and code are the same, the input program (which is data) can be Prolog code. Want an example? No problems. Let’s consider a simple DSL that actually is a tiny functional programming language. In that language, an expression can either be a constant of the form c(X), where X is a number (e.g., c(2)), or a sum of two expressions (A+B). For example, the expression c(1)+c(2) evaluates to 3.
The following implements this tiny DSL:
eval(c(X), X). eval(A+B, C) :- eval(A, A1), eval(B, B1), C is A1 + B1.
This definition actually defines a predicate named eval. eval defines the relation between an expression in our DSL and its corresponding value. The first axiom defines the relation between a constant and its value, and the second axiom defines the relation between a sum and its value. In the latter case, this is a rule, where we recursively evaluate the operands (the two expressions to the left and right of the ‘+’), and then calculate the sum of the two values using Prolog’s ‘is’ predicate. With this definition we can evaluate expressions:
?- eval(c(1)+c(2)+c(3), X). X = 6.
But is this a DSL? Well, yes and no. Yes, we have a “language” of expressions we can evaluate; but no, it is not yet a programming language, as we cannot “program” anything.
In order to program, we need to be able to define abstractions. In the C programming language, programmers can define functions (which are actually procedures). Ours is a functional programming language, so we wish to allow developers to define functions (of a true mathematical nature).
So we write this:
:- op(700, xfx, ':='). eval(X, Z) :- (X := Y), eval(Y, Z).
The first line is a directive telling Prolog’s parser to accept := as an infix operator. This is how Prolog does its best to provide us with syntax suitable for DSLs. The second line is a normal rule, which gives yet-another solution to the eval predicate. It states that X evaluates to Z, if a definition of the form X:=Y exists in the program, and Y evaluates to Z.
With this in place, we can start defining functions:
f(X) := X+c(2).
This defines the function f(X), which resolves to X+2. We can then use f(X) in an expression:
?- eval(f(c(3)), X). X = 5 ; false.
We evaluated f(X) with (constant) 3 as argument, and got 5! Wow, that was easy!
What Just Happened Here?
OK, so we just created our first DSL in Prolog! I know, It’s not a very useful one. We cannot use recursion at this point because we don’t have conditionals (and without conditional, you cannot terminate a recursion), so we cannot do loops and stuff. The need to wrap values with c() is annoying, and oh, we only have the + operator. But all these things are solvable, and with a little more Prolog fu we can get a nice functional (pun intended) and maybe even useful DSL…
So how did we create a DSL (a small programming language) so easily? What was the trick?
The trick was inversion of control. Most people look at a Prolog program as a set of predicate definitions, just like our Star Wars example above. In that example we defined mother, father, alias and ancestor. Then we executed the program by asking a question, e.g., ancestor(X, ben).
When we turned to implementing a DSL, however, we inverted things. The DSL definition does not have all the answers anymore. Instead, it turned to the DSL program for some of the answers.
In the DSL definition, we used the rule:
eval(X, Z) :- (X := Y), eval(Y, Z).
This rule is part of the definition of the eval predicate, and it uses the := predicate as part of its definition. But the := predicate is not defined anywhere in the DSL definition. Instead, it is defined in the DSL program:
f(X) := X+c(2).
The DSL definition did not define the := predicate in the Prolog sense, but it did provide it with meaning. It defined how it is used (as part of eval’s definition), and by that it gave it power. Users can now define := in their programs, and different definitions will result in different programs. Kind of like the C programming language “defines” the main() function by calling it at when a program starts, but it is up to us to implement it, so that every C program has a different main() function.
We could not have done this without the two unique Prolog features: non-determinism and data-as-code. Although our DSL is supposed to be deterministic (it is, after all, a functional programming language), we used non-determinism in the process. The eval has both concrete solutions (for c(X) and A+B) and a generic solution (looking for definitions, using the := predicate). For a language to allow this, it has to account for the possibility of having more than one solution to a question.
We used the data-as-code principle by implementing our DSL program as a set of Prolog axioms (in our case, one axiom: f(X):=X+c(2)). The DSL definition and DSL program are actually parts of the same Prolog program, where the DSL definition can be considered a software library, and the DSL program is, well, a program using the library. This is what internal DSLs are all about.
The hardest thing in writing (a post, an academic paper, etc) is to imagine how it will be perceived by the reader. As I wrote at the beginning of this post, I do not expect the average reader to have much or any prior understanding of logic programming and Prolog. If nothing else, I hope this post can serve as a short tutorial to the basics of logic programming. Readers interested in learning more about how to create DSLs in Prolog can refer to Tim Menzies’ course on the subject, which is a nice read on DSLs in general, before getting specifically to Prolog DSLs.
To me, understanding the ideas presented in this post was a breakthrough that helped me realize how programming can be done differently. Logic DSLs are easy to implement, and in turn allow ordinary programmers to make their own languages to solve the problems they are facing. But it does not stop at programming languages. I further realized that the same ideas can be applied the state of an application — the data it stores in its database (remember, data is code!). I hinted about this when posting about NoDatalog. I think this realization can change the world of Web applications forever. But this is a topic for another post.