Why are there So Many Programming Languages?

In my previous posts I talked about why declarative programming and in particular logic programming are relevant to software developers everywhere.  In this post I will take a step back and try to look at a bigger picture: programming languages in general, and ask why.  Why after we had Cobol we needed Fortran, and after we had Python there was still room for Ruby.  Why are there so many programming languages? Is it possible to have one language to rule them all?

To answer these questions we will go away from programming, and see how other kinds of languages evolve and adopt to their surroundings, just like living things…

Days of the Radio

Let’s talk about radio communication for a bit.  At this point I’m not even talking about binary communication used by cellphones or Bluetooth.  I’m talking about good-old walkie-talkie-style radio.

Radio is used because of its high availability.  The air is never down…  so communication always works, as long as the end-device is working.  This is why it is used for things that must work – when human life is on the line.  This is why it is used by the police, and by pilots to talk to air-traffic controllers (ATCs).

Radio is great, but it has two major limitations:

  1. Radio is noisy.  Bad weather, “static” and other factors cause noise on the line.  Add to that the end equipment that is not tuned for quality audio (to say the least) and the environment on both sides of the line – a noisy cockpit or a busy street…
  2. Radio is a narrow channel. “Narrow” in terms of “bandwidth” – a term that originally came from radio…  It means that the amount of communication that can pass on a single channel is limited.  Limited mainly by the speed in which a human can speak or can understand speech.

These two limitations are common to both police and aviation radio, as well as radio used in other areas, and they are taken as a fact of life.  What is interesting is that both police and aviation adopted their own “radio languages” as means of overcoming them.

Radio Languages

Noise is usually overcome by adding redundancy.  In radio, redundancy is achieved by sticking to a strict set of words.  For example, when the ATC wants to tell a pilot he or she is allowed to take off, they have a specific way of saying this.  Look at the following conversation, taken from here:

(1) Pilot: “Langley Tower, ABC is ready for takeoff, request back-track.”
(2) Controller:  “ABC, back-track approved.  Cleared takeoff Runway 19.”
(3) Pilot:  “Cleared takeoff Runway 19, ABC.”

In this conversation, the controller tells the pilot “Cleared takeoff” (line 2) and the runway number.  The controller is not allowed to say “you may take off” or “take off whenever you are ready”.  The fact that there is only one way to say something reduces the chances of confusion.  The pilot expects either “Cleared takeoff” or a short list of other phrases that can come when they are waiting lined-up on the runway.  After the pilot receives this message he or she repeats the message, in a different order (line 3).  This is another form of redundancy.  This gives the ATC a chance to say something at the rare chance the pilot misinterpreted the message.

In the spirit of narrowing the language, multi-digit numbers are broken into their digits, and names are broken into letters, where each letter is represented by a word.  To further avoid confusion, some of the digit names are slightly modified to reduce the chance they sound like something else.  For this reason the digit 3 is pronounced (in some conventions) as “Tree”, and 9 is pronounced as “Niner”.  So if the ATC tells you (the pilot) to change altitude to flight level “Tree Niner”, you know you understand him correctly.  But if you hear “Three Nine”, they either forgot the code they were supposed to use, or they said something else.  Either way, you should ask…

In digital communication over radio (or noisy channels in general) redundancy is also used.  For example, adding a parity bit to a message (XOR of all bits in the message) is effective in catching flipped bits in a message.  Adding a checksum is a more advanced solution used in many modern communication protocols.

As we saw, the answer for noise was redundancy – saying more to express less, but be more certain that the other guy said what you think he said.  But what about the other limitation: limited bandwidth?  How do police and the aviation world handle the limited bandwidth of radio?  They take the phrases they use often, and make sure they are short.  Things that are seldom said can take longer to say.

This principle guided police forces in the US to adopt a language of codes for common responses or other phrases often said over the radio.  For example, 10-4 (ten-four), means “got it”, or 187 (one eighty seven) means “homicide”.

The exact same principle of shortening phrases used frequently is used in compression algorithms such as Huffman Coding, which assigns less bits for characters that are used frequently, and more bits for characters that are rarely used.

In a way, our solutions for noise and bandwidth may seem contradicting.  In the former, we made our transmissions longer, and in the latter we made them shorter.  But the fact of the matter is that they do not.  One way of looking at it is that by adding redundancy we convert the noise problem into a bandwidth problem, and then we solve the latter by using a specialized code.

Back to Programming…

In programming, as in radio, we also have a narrow and noisy channel, between the computer and ourselves.  While computers are able to process megabytes or gigabytes of input in a second, we – the programmers – cannot.  We are bound by the limits of human cognition, and by the “state of the art” in human-machine interfaces (typically, a screen, a mouse and a keyboard).

So why is this channel narrow? It is narrow because there are only that many lines of code a human can write per minute, and only that many line one can read (programming is a two way street: we usually read more lines of code than we write).  Why is it noisy?  Because we are human, and because being human means making mistakes.  These mistakes can go in both directions – either not writing what we mean, or not understanding the written code correctly.

Programming languages are made to address exactly these two problems.  Before “high-level” languages were invented, programmers used binary machine code to write programs.  They used tables listing the different instructions and how each instruction is encoded (where do its arguments go, etc).  To someone not using these tables, a program would look like a list of random numbers, like these:

457f 464c 0102 0001 0000 0000 0000 0000
0001 003e 0001 0000 0000 0000 0000 0000
0000 0000 0000 0000 0290 0000 0000 0000
0000 0000 0040 0000 0000 0040 000d 000a
4855 e589 00bf 0000 e800 0000 0000 c35d
6568 6c6c 2c6f 7720 726f 646c 0000 4347
3a43 2820 6255 6e75 7574 3420 392e 322e
312d 7530 7562 746e 3175 2933 3420 392e
322e 0000 0000 0000 0014 0000 0000 0000
7a01 0052 7801 0110 0c1b 0807 0190 0000
001c 0000 001c 0000 0000 0000 0010 0000
4100 100e 0286 0d43 4b06 070c 0008 0000
2e00 7973 746d 6261 2e00 7473 7472 6261
2e00 6873 7473 7472 6261 2e00 6572 616c
742e 7865 0074 642e 7461 0061 622e 7373
2e00 6f72 6164 6174 2e00 6f63 6d6d 6e65
0074 6e2e 746f 2e65 4e47 2d55 7473 6361
006b 722e 6c65 2e61 6865 665f 6172 656d

(the above is a fragment from a “hello world” object file)

Understanding these numbers is both time consuming and error prone (narrow and noisy channel).  The first step in the evolution of programming language was the conception of the assembly language: a textual language that is still one-to-one translation of machine language, but takes the heavy lifting of understanding the way instructions are encoded away from the programmer.  Looking at assembly code is much easier, for example, the famous “hello world” C-program can be written in  assembly like this (output of gcc -c -s hello.c on my machine):

    .file    "hello.c"
    .section    .rodata
.LC0:
    .string    "hello, world"
    .text
    .globl    main
    .type    main, @function
main:
.LFB0:
    .cfi_startproc
    pushq    %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movl    $.LC0, %edi
    call    puts
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size    main, .-main
    .ident    "GCC: (Ubuntu 4.9.2-10ubuntu13) 4.9.2"
    .section    .note.GNU-stack,"",@progbits

Much easier on the eyes…  however… still not what we can consider easy to read.  There are a lot of details that are just not important.  We do not care about registers, for example.   So why should we specify them explicitly? We don’t care about the stack, so why should we perform push and pop operations ourselves?

Assembly code is easier to read and write compared to machine code, but it still gives us complete control over the machine.  However, we usually do not need that much control, and instead, we would like our narrow channel to be used more efficiently.  For example, calling a C function is a complex operation on the machine level.  It involves manipulation of the stack, saving registers, and more.  However, calling functions is something we do often.  This is why the C programming language provides us with special syntax for that.  A function definition is another example.  By providing special syntax for common tasks we make the code that uses them shorter and easier to read over the narrow channel.  This way, the C equivalent of the above assembly code is much shorter and easier to read:

#include <stdio.h>
int main() {
  printf("hello, world\n");
}

One important thing a language like C adds is the ability to give meaningful names to things like variables, functions and types.  These meaningful names take more bandwidth than the numbers they are often replacing, but they are there to avoid errors.  Chances are we will not treat a variable named X as the vertical axis of the screen (that is often named Y).  So meaningful names are a form of redundancy that allows languages to overcome noise.

Types are another form of redundancy.  When we write:

int i = 0;

we record the fact that the type of i is “int” twice: once explicitly, and once implicitly by initializing it with 0, which is of type “int”.  This redundant information is there to help us avoid mistakes.  If for example we write:

int i = "hello";

the compiler will tell us we did something wrong.

So Why so many Languages?

As in “radio languages”, that differ between police and aviation, programming is not one things.  Sometimes we implement Web applications, sometimes we implement complex machine learning algorithms for big data, and sometimes we program a tiny IoT sensor.  For each such programming task there are those things that are important to us, and those that are not.  When implementing an IoT device, we probably care about every byte of memory and every clock cycle.  In such cases we would probably choose between C and assembly.  However, for a Web application we do not need such fine-grained control, and do not want to pay the bandwidth cost.  Instead, we prefer languages that give us syntax for more advanced concepts.  One example I like is list comprehensions in Python.  The following Python code creates a list named squares, that holds the squares of all even numbers in numbers:

squares = [num**2 for num in numbers if num % 2 == 0]

How many lines would be needed to write something like that in C?  C++? Java? (think before lambdas were introduced in Java 8 and C++11)…

Another reason for choosing one language over the other is the cost of an error.  In a Web application the cost of an error is often small.  Some user will not see the page he or she wants…  For mission critical software such as automotive or aircraft software, errors are much less acceptable.  Languages differ in the amount of redundancy they offer so that developers can choose how much redundancy they need and how much bandwidth they are willing to sacrifice.  Types are a good example for such redundancy, as we saw earlier.  Using a dynamically-typed language will save us the bandwidth that types may incur, but leave it to us to test our code thoroughly to avoid type errors at runtime.

 Is There an Ideal Language Out There?

Can we say one language is objectively better than another? And if so, can one language be better than any other language?  If the answer to both questions is “yes”, then there can be a language out there that to “rule them all”… and there will no longer be need for other languages…

My answer to both questions is both No and ProbablyNo, because as we discussed above, there is no single correct answer to how much redundancy we need to add to our code.  It’s a matter of how much noise we can tolerate.  Probably, because for the bandwidth problem I believe we can get close to “ideal”, and that ideal is far beyond where our traditional programming languages are at.

Most of today’s languages are tuned towards a certain usage patter.  Just like police and aviation radio made their own short words for things they say often.  A languages like C and Python are fixed.  We can define functions, types and variables and name them, but this is where our power ends.  Too often we find ourselves writing boilerplate code, and using design patterns instead of just writing what we mean…

In digital communication, bandwidth limitations are often addressed using data compression algorithms.  These algorithms differ in the insights they leverage to compress the data, but they all use the same principle of giving shorter representation for pieces of data that repeat multiple times.  The vast majority of these algorithms, unlike radio languages, do not use a fixed code (like the police codes).  Instead, they look at the data they need to compress and derive the code from that data.  English text has different letter frequencies than French text, and therefore it will be compressed differently.  So why do we try to “compress” programs like they are all the same?

In future posts I will discuss Cedalion, a programming language that designed to address this.  It allows its users (programmers) to add new language concepts, give them the syntax they want and define their meaning, all in a non-threatening way (you  do not need to implement compiler extensions or anything like that).

Until then, have a great 2016!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s