Yesterday I decided to go forward with an ambitious plan to implement a new workbench for the Cedalion language. I even posted this tweet:
This move is a major one. Up until today, the Cedalion language was inseparable from its Eclipse-based editor. With this new project, there will be two different editors capable of editing Cedalion code. At some point, the Eclipse-based one will be deprecated and only the Web-based one will be supported.
In this post I’d like to explain this decision, including some of the design considerations.
The “Write it From Scratch” Syndrome
Often, when software engineers are facing a difficult engineering problem that involves a “legacy” system, they tend to think about re-implementing the system from scratch rather than fixing the existing one. As a software engineer am often fall to the same trap of thinking re-implementing something is easier / better than fixing it. Is this the case here?
Just because our gut tells us something, it does not mean this thing is not true. The more experience we have, the better our gut becomes at guiding us. Nevertheless, in both academia and industry, we should rely on science, not a gut feel, when making such decisions.
While software engineering, as a science, is far from giving us disciplined, quantitative methods to estimate the trade-offs between fixing legacy software and re-implementation, it does give us some guidance, some terminology that allows us to reason upon the risks and benefits in either approach.
Internal vs. External Bugs
Many years ago I read a great article (that unfortunately I could not find today) about why Netscape fell. This had nothing to do with Microsoft bullying them out of the game. That, of-course, had a part in this, but it wasn’t the only reason. The reason described in that article was pure engineering. The ambitious Netscape 6 project that was never completed.
According to that article, Netscape 5 was the result of evolution that started at the very first days of the Web with the first Netscape Navigator browser. It was designed with certain assumptions in mind, assumptions that were valid in the mid 90’s, but with the fast evolution of the Web they were not valid anymore. For the first four major revisions, the Netscape team struggled to update the same code-base to allow modern Web pages to be displayed properly. But at one point they decided that enough is enough, and it’s time for a rewrite.
To make a long story short, this rewrite turned out to be a huge mistake. The part the team did not anticipate is what the article called external bugs.
Internal bugs are “normal” bugs, caused by the software not conforming to its specifications, if such exists. For Netscape, the specifications were mainly HTTP, HTML and CSS. However, there are also external bugs. These bugs are not in your code-base, but rather in the outside world, such as HTML pages that do not fully conform to the spec, but need to be displayed nevertheless. When building a browser, you need to make sure you don’t break the Web. And the Web is not perfect.
Cedalion is not the Web
The parallel of the Web in our case, are the existing Cedalion programs. Existing programs must not be broken. I don’t know if this is a good thing or a bad thing, but to date, only that much code has been written in Cedalion. This means that we can expect less external bugs, at least from this direction.
- Another direction where external bugs can come from is the underlying technology. Bugs in the underlying technology, or places where it is not a 100% fit for this problem can cause a lot of pain. Making good choices at early stages is crucial for the success of such a project. When I started the implementation of the existing Cedalion Workbench I chose Eclipse as the underlying technology. This was a good (exciting, even) choice back then (2006). Today Eclipse is a bit old-school. The notion of downloading hundreds of megabytes to your computer and making sure you have a compatible Java installed are not something people do everyday. Today people try to avoid installing anything on their computer. Today, everything is in the cloud…
If you’ve seen the DSL tutorial, you know I’m a big believer in TDD. Ever since I’ve been using this methodology the quality of my software increased significantly. When Cedalion started ten years ago, TDD was a concept known to few, and practiced by only a hand-full of people. I was not one of them (I only learned about it in 2008, and back then I thought it was weird and impractical), so my idea of testing for the Eclipse-based Cedalion Workbench was user testing. After I implemented a feature, I simply saw that it worked. And if it didn’t, I fixed the implementation until it did. I had very few unit tests that I ran seldom, and no practical way to test for regression.
Today I know that this lack of discipline is one of the reasons for the poor user experience users get from the Cedalion Workbench. An opportunity to rewrite everything, in a test-first approach, is necessary to get quality. The approach of fixing the existing code-base by implementing unit tests for known bugs and making sure they all pass has two major problems:
- Code not written TDD-style was not designed for testability. In our case, the Cedalion code base is coupled with the Eclipse platform.
- This approach will handle existing bugs and make sure they will not return, but during this process we may introduce new bugs, against which we do not have unit tests.
As I wrote above, choosing the right technology has significant impact on the project’s success. While there is no guarantee I won’t look back at one choice or another ten years from today and ask myself why did I have to choose that of all things, but there are some guidelines that can reduce the risk of that happening:
- Follow industry trends
- Use open-source technology
- Only use projects with massive engagement
But the most important guideline is:
- Keep these technologies segregated.
This applies to both segregation among each other and against our core logic.
For the rest of this post I’d like to go over the specific technology I’m considering for this project, and explain why.
While not external technology, I think it’s worth mentioning that my intention is to implement most of the new workbench in Cedalion. Even today’s workbench is partially implemented in Cedalion, but this time it will be different.
In the existing implementation, the Eclipse plug-in consults the Cedalion program being edited every time there is a need for such consultation. This goes to how projection works (Eclipse asks Cedalion how to project one concept or another), the context menu (Eclipse asks Cedalion what entries to present), etc. The drawback here is that if there is a bug in the Cedalion program (after all, we are editing it…) this bug can cause the whole workbench to crash.
In the new design there are two Cedalion programs. One is the workbench itself. It is supposed to be stable. The other is the program being edited, which lives inside a “Cedalion container” — an abstraction that allows Cedalion programs to co-exist in the same environment, where the external one can access the internal one, but not vice versa. Using such a container, and consulting the executed program only for those things we need form it has the following advantages:
- It makes it easier to protect against non-termination (by applying a timeout) and against unexpected behavior.
- It keeps the image minimal. The program being edited does not have to carry parts of the implementation of the workbench.
- It maintains a better separation between the workbench and the language.
- It allows multiple different programs to be edited by the same running instance.
This is the only external dependency that existed in the original workbench and is carried around to the new one. SWI is probably the most popular Prolog implementation in the world. It is open-source, feature rich and high performance. It is also very stable. We use it to run Cedalion programs, which are translated to Prolog.
I don’t think I need to explain why I want to use Git. It is popular. If GitHub can get away with only supporting Git, so can we. The plan is to persist the user code in an external Git repository (e.g., on GitHub), while a local copy is temporarily maintained on the server’s local file system.
Docker simplifies deployment. A single “docker run” command can install an launch a complete server with multiple dependencies on your machine. I’m using it to package the Cedalion Workbench today. For an HTTP server, however, installation and deployment will be even easier.
Node.JS + Cedalion
The cloudlog1 project, implementing the world’s first NoDatalog database is based on a framework we call “Nodalion”, a combination of Node.js and Cedalion. It provides a programming model that allows Cedalion programs to create impure predicates — predicates that unlike normal Cedalion predicates, can depend on state and can modify state. This allows such Cedalion programs to do real things and react to real things. It combines nicely with Node’s asyncroneous behavior. One of the nice things about this is that we can test impure predicates in pure logic settings, by simulating the behavior of the external code.
Facebook’s React framework is one of two major client-side template frameworks that are rapidly growing in popularity (the other being Google’s Angular.js). Both frameworks provide (relatively) declarative ways to specify how data translates to HTML, both are doing this on the client side, and both are doing this amazingly fast. It was not easy to choose between the two of them. I had some good experience with Angular (1.X), and I very much like the thinking behind it. No reason why I wouldn’t like 2.X. But business is business, and React simply looks more promising with regards to the performance it can provide for the projectional editing use-case.
Projecting code is somewhat different than projecting data. Data is typically structured in some relatively-flat manner (no more than three or four nesting levels). Code, however, is represented as a tree. Even a list in Cedalion is a tree and not an array (think linked list, where each node holds a value and the next node in the list).
The main problem is not the depth by itself, but rather the kinds of editing operations done on these trees. For example, when inserting an element to a list, a portion of the tree now grows in depth. Neither Angular nor React are equipped to handle this kind of mutations efficiently. But with some changes to our projection logic we can (hopefully) make it more React-friendly, but it is less likely we will be able to make it Angular-friendly.
Both React and Angular have extensive mechanisms hidden for the most part from the user, that do a simple thing: modify the display when the underlying data changes. To do so, they take two very different approaches. Angular traverses the data tree (the model), and tries to find parts of it that changed. React, on the other hand, generates a “virtual DOM” based on the model data, and compares that to the existing version. Then it applies to the actual DOM only the delta between the two.
In our new design, the “model” for the client side is actually a representation of the view — the projection of the AST as generated by the projection logic, which runs on the server side. This tree moves to the client side and needs to be rendered. When something changes, the server sends the entire sub-tree that was changed to the client, and the client is supposed to replace it as a whole.
This replacement, from Angular’s point of view, is a replacement of the entire sub-tree. Angular will not even attempts to compare the new tree to the old tree. React, on the other hand, works well with replacement. It will create a virtual DOM based on the new content, and will try to find the minimal set of changes to the real DOM that will convert the existing DOM to the desired one.
Unfortunately, React is making some simplifying assumptions that make it faster for “data” but not for projection of code. We therefore need to find ways to make our projection model more “React-friendly”.
So if you read all the way down here you probably (1) understand why I want a complete rewrite, and (2) have some clue regarding what I’m up against. Implementing a workbench from scratch is not easy, and requires a massive time investment. However, Cedalion is the key to everything the cloudalion project is pushing for. You can’t have a brave new way to create Web applications if the development environment is not usable. As a PhD student who is also employed in the industry part-time it is hard to allocate the time for such an ambitious project, especially when this is not novel enough to write academic papers about. Academia will tell you how to achieve software quality, but it will not reward you when you build quality software. From an academic point of view, almost anything one can learn from the new workbench they could learn from the old one (maybe there are opportunities for UX-related research, but this is not my field). Nevertheless, I’ll try to allocate time for this project. Scarcity of time, as I’ve seen before, will have to make me more focused at what I want to achieve, and will make sure I think before I do. It is not going to be easy, but it is going to be…
Of-course, if anyone wants to get involved, I’m sure we can find ways to do this, even without knowing Cedalion. I’ll post my progress on Twitter (@cloudalion), so stay tuned.