RASCL Design Notes

Author: Matt Gushee
Date: June 10, 2006
Copyright: This document is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 2.5 license

1   Warning

This document will not tell you what RASCL is. For that, please read the specification and/or examples. This document exists chiefly to explain why I thought it was a good idea to create RASCL. As such it is highly opinionated, even biased. If you disagree with my opinions, then you disagree. Feel free to ignore my views, or RASCL itself.

2   Why RASCL?

My primary goal in creating RASCL was to have a configuration language for desktop applications written in OCaml--one that, in simple cases, would be readable and even editable by users with minimal technical skills and no knowledge of OCaml. Further, I wanted a language that would support both the simplest configuration--a simple sequence of key-value pairs--and arbitrarily nested structures with a single syntax definition. I would have been happy to reuse an existing syntax, but for reasons described below, I could not find anything that I felt was quite suitable.

2.1   What is a configuration language?

While at first glance this question may not seem to require any discussion, I have found that people hold widely differing views of what a 'configuration language' should do; hence, there are widely differing implementations of things called 'configuration [files/languages]'. There is one school of thought that conflates configuration with scripting. So Lua, for example, is sometimes described as a configuration language. I take strong exception to this view. Not that there's anything wrong with scripting; in fact, it's a wonderful thing for an application to have scripting capability, if the users are prepared to take advantage of it. But anyway:

  • Configuration is not programming

Again:

  • Configuration is not programming!

Oh, and did I mention?

  • Configuration is NOT programming!

Thus, a configuration language is not a programming language, and vice versa.

So what is it, then?

My definition of a configuration language would be something like:

A simple declarative language for describing the appearance and behavior of a program.

Describing, not implementing.

2.2   Wouldn't it be useful to support arbitrary data types?

This is undoubtedly useful for some applications. There are many more where it is superfluous, and supporting a variety of data types would vastly increase the complexity of RASCL and RASCL parsers. It is worth bearing in mind in any case that data types are abstractions. They don't really 'exist' in a text file or any other storage format; rather, strings are read and may be interpreted as instances of one or another data type. So if you really need to be able to represent types that RASCL doesn't support, there is no reason why you can't layer readers and writers for your types on top of RASCL. Or use a different language. RASCL is, and will remain, A Simple Configuration Language.

2.3   Power through simplicity

There are clearly benefits to the scripting-as-configuration paradigm. Those with the necessary knowledge of Emacs Lisp can work magic with startup scripts, such that a single program called 'emacs' can appear in an infinite variety of forms, which may or may not resemble the well-known text editor. This is great for geeks who need or want to spend a lot of time with the program in question.

But there is a downside, of course, which is that you need to learn the language. For those with prior experience with Lisp or Scheme, or perhaps with other functional languages, Emacs Lisp may be easy to learn. But for normal people, it takes considerable effort to learn the language, and even those with the inclination to learn may not have the time. For non-technical users, of course, it is simply a non-starter.

So what should we provide for the rest of the world? One popular and semi-obvious answer is to create a 'user-friendly' configuration GUI, tell users to use that, and not worry about the complexity of the configuration files--or perhaps, dispense with configuration files entirely in favor of some obscure binary database.

Now, I have no objection to the existence of such configuration GUIs. They are indeed often helpful. But I also strongly believe they cannot be considered a complete or universal solution, for several reasons:

  • GUI config tools sometimes fail

    They might fail to save the desired configuration, or fail to save anything at all. They might even mangle the user's configuration files, making applications unusable. Now, this is generally due to bugs in the tools themselves, which in principle are fixable. But in practice, it can be weeks or months between the time a bug first appears and the time a fix becomes readily available to end users.

    So there needs to be a reasonable alternative to the GUI tools.

  • GUIs aren't always friendly

    Okay, so you've developed a super-cool configuration GUI full of tree widgets and tabs. Are you sure your users can navigate it? Sometimes it's much easier to find things in a text file than in a complex GUI.

  • Some options should be hidden

    What if your application is supposed to be accessible to the masses, but also has experimental or 'advanced' features? One option might be to create 'beginner' and 'advanced' modes, but that approach creates issues of its own (e.g. if someone packages your app for a Linux distribution, what should the default mode be?).

    If you have config files including all the options, but make only the safe ones available through the GUI, then you have a more-or-less self-policing system for keeping dangerous tweaks out of reach of naive users.

    This implies that your config files should be reasonably accessible for those who are prepared to hand-edit them and accept the consequences.

  • Some people just prefer to edit text files

    Unless you're prepared to use a Windows Registry-type framework (or the Windows Registry itself), you can't really prevent people from hand-editing configuration files, and there will always be users who prefer to do it that way--though they may not have much real technical knowledge. Why fight it? Instead of using a fragile auto-generated file which comes with warnings like:

    # DO NOT EDIT THIS FILE!
    

    Put in a little bit of extra effort to make the config files work both ways. Auto-generated by default, but also easy to read and hard to mess up.

    Of course, no matter how simple your config format is, there will always be a few users who insist on tweaking parameters that they don't understand. But locking down your configuration with an "idiot-proof" GUI-only approach takes a great deal of effort, and since there is no alternative, it has to always work. By using a simple and accessible config file format, you can get very good though imperfect results with much less effort.

3   Why not just use an existing language?

This is a reasonable question. Why not, indeed? My feeling about this is that using an existing language may be appropriate in many situations, but didn't work too well for my case--a simple language for OCaml-based applications. Let's take Lua, for example:

  • Is it implemented in OCaml?

    Sort of. Actually, there is a partial implementation of Lua 2.0 in OCaml, whereas the current version of Lua is 5.03. Personally, I'm not comfortable with using an implementation that old, nor am I sure I want to create a new implementation.

  • What are you committing to by saying 'this application uses Lua for configuration'? What will people assume you are committing to?

    In the narrowest technical sense, 'Lua' (or whatever language name) refers merely to a syntax. But if you name the language without qualification, people tend to assume (quite reasonably, in my opinion) that you are talking not just about the syntax but also the standard library.

    So, do we want to implement the Lua standard library or not? Obviously, it would be a lot of work--which might be worth doing for its own sake. But remember, the goal here is just to come up with a simple configuration file format. Implementing the Lua standard library sounds like a good way to avoid ever finishing the application that you set out to create in the first place.

    Or we could make sure to pedantically state 'Lua syntax'. We're still not out of the woods, though. As noted above, the existing implementation of Lua in OCaml is way out of date. I haven't studied the differences between versions 2.0 and 5.0 of Lua, but there surely are some. And if you don't take care to specify 'Lua 2.0', people who already know Lua will reasonably expect to write code in a much newer version of the language and have it work, which it might or might not.

  • Does using Lua significantly benefit users?

    If you accept the premise that configuration is different from programming or scripting, probably not. Sure, there are people who already know Lua. But there are many more who do not. And though it's a fairly easy language to learn, it still takes some effort; furthermore, since Lua is a general-purpose scripting language, its syntax is stricter and more complex than is really required for simple configuration files. Why should users have to learn this stuff?

So, why bother? Using Lua (or Python, or Ruby, or whatever) probably doesn't help users a whole lot, probably (in the case of OCaml) entails creating a new implementation, as well as documentation to inform users what features of the language are and are not supported, and creates expectations that you are probably not willing to meet.

3.1   What about XML?

Ugh. Don't get me started about XML configuration files. Oops, too late. First let me say that I spent a couple of years as an XML consultant/trainer/developer. Since I have deliberately not kept up to date, I probably no longer qualify as an XML expert, but I still have some idea what I'm talking about. I should also mention that I got into XML in its infancy, when it was considered to be mainly for documents-- and that's my bias. I was never a big fan of XML as a universal data exchange format.

So anyway, I've always maintained that XML for configuration files was, in most cases, just plain silly. Yes, there are cases where XML would make a lot of sense--Apache comes to mind--but for most applications it is over-over-overkill. First of all, most XML parsers entail a performance hit. Usually not a show-stopper, but something to be considered. Furthermore, depending on the programming language the use of XML can create a dependency on an external library[*]. If you're a corporate developer, you may not think that matters, but if, like me, you're interested in developing desktop applications for non-technical users--small businesses and home users, dependencies are a big issue. Ideally, end users should not have to install anything other than the application they want to use.

Finally, and most crucially, XML is not perceived as a friendly language. Many people, including competent developers, find it downright intimidating. You don't think they should? I don't either, but they do.

[*] Of course, RASCL requires an external library, too. But it's a tiny
language, and a RASCL parser should be tiny, too ... reasonable to simply embed in your application, which is not always the case with XML parsers.

3.2   What about YAML?

YAML is interesting, and I came close to adopting it as a config file format. There were two reasons I didn't. First, like XML, YAML has many features (data types and structures) that are probably not needed for most configuration files. Second, given that people should be able to edit config files without learning much of anything, the Pythonesque indentation in YAML is a nightmare waiting to happen. Not that it's hard--but it's far from obvious, and there's a strong possibility of naive users royally screwing up the indentation.

3.3   What about JSON (JavaScript Object Notation)?

That's more like it. I came very close to using JSON, and RASCL syntax is in fact largely based on JSON. The main reason I didn't use JSON itself was that all strings have to be quoted. This is probably a good idea for a data format designed to be directly evaluated in JavaScript or some other language. But that's not the case here--at least, I don't think so: it's hard to support a read-file-and-evaluate model while guarding against malicious code.

Anyway, all the double-quoting is tedious, creates opportunities for error, and seems mostly unnecessary in a simple configuration language.