Harrison Totty

A Language Design Analysis of HolyC

Mar 15 2019


I was recently introduced to the story of Terry A. Davis, a schizophrenic programmer who independently designed the free operating system TempleOS. This article will not delve into the story of Terry or TempleOS, but instead the programming language Terry wrote specifically for developing the operating system - a language he dubbed “HolyC”.

HolyC, as the name would imply, is a C-like programming language with a number of key differences and improvements. Like C, it’s whitespace independent and compiles to assembly. However, as Terry describes in the OS’s own documentation for the built-in assembly language:

TempleOS uses nonstandard opcodes. Asm is kind-of a bonus and I made changes to make the assembler simpler. For opcodes which can have different numbers of args, I separated them out – Like IMUL and IMUL2. The assembler will not report certain invalid forms. Get an Intel datasheet and learn which forms are valid.

This article won’t cover the assembly layer of TempleOS, but the above is interesting nonetheless.

Numeric Types in HolyC

HolyC allows the following numeric types: U0, I8, U8, I16, U16, I32, U32, I64, U64, and F64. As you probably would guess, a U vs I prefix denotes a signed/unsigned integer/float and the numerical value represents the number of bits associated with the type. The two most interesting types are U0 and F64.

U0 is essentially void but with zero size. In regular C, void is actually considered a Unit Type and thus when computing sizeof(void) in GCC, you will find that it resolves to 1. U0 is actually closer to a Bottom Type, like ! in Rust.

F64 is interesting because it is the only float available in HolyC. I can’t find a specified reason in the source code, but I presume it is a combination of “if you’re going to do floating-point operations, wouldn’t you pretty much always want maximum precision by definition?” and maybe the fact that

All values are extended to 64-bit when accessed. Intermediate calculations are done with 64-bit values.

The example given is:

U0 Main()
{
    I16 i1;
    I32 j1;
    j1=i1=0x12345678;  // i1 is 0x5678 but j1 is 0x12345678
    
    I64 i2=0x8000000000000000;
    Print("%X\n", i2>>1);  // Prints 0xC0000000000000000
    
    U64 u3=0x8000000000000000;
    Print("%X\n", u3>>1);  // Prints 0x40000000000000000
    
    I32 i4=0x80000000;     // This is loaded into a 64-bit register variable.
    Print("%X\n", i4>>1);  // Prints 0x40000000
    
    I32 i5=-0x80000000;
    Print("%X\n", i5>>1);  // Prints 0xFFFFFFFFC0000000
}

Functions

Functions are where you start to see some of the more drastic differences. For starters, functions that are invoked without arguments (or without overriding any default arguments) may be syntactically shortened to just the function name followed by a semicolon.

// The following are equivalent.

x = Foo();  // C
y = Foo;    // HolyC

Speaking of default arguments, in HolyC it’s a-okay to have default args at any point in the function definition like so:

// ----- Function Definition -----
I32 Foo(I32 i=8, I32 j)
{
    return (i + j);
}

// ----- Invocation -----
I32 x;
x = Foo(,6);

Note the prepended comma in the function invocation arguments. At first this seems pretty useless (why wouldn’t you just re-order your args?) but it does allow you to write functions with some logical order:

// Copies the files in "source" path to the "destination" path
U0 CopyTo(char *source="T:/Doc/Files", char *dest)
{
    // ...
}

CopyTo(,"T:/Doc/Files2");

Similarly to Python and other modern languages, functions can have variable argument counts, here specified with (...) in the function definition. The function body may then access its arguments by utilizing the built-in argc and argv variables:

I64 Sum(...)
{
    I64 i,tot = 0;
    for (i = 0; i < argc; i++)
        tot += argv[i];
    return tot;
}

I64 x = Sum(3, 4, 5); // x = 12

Note that for loops don’t require curly braces if they only perform one operation. This is true in both C and HolyC.

Finally, HolyC does not have a required Main() function. Expressions outside of functions are simply evaluated from top to bottom in source. This also allows the programming language to act like a shell, and in-fact is the shell of TempleOS.

Switch Statements

Terry explained multiple times how switch statements are the most powerful constructs in HolyC. In the language, switch statements always utilize jump tables in assembly (and thus the documentation mentions to not use them in cases with large/sparse value ranges). The ones in HolyC offer quite a range of convenience improvements over their C counterparts. For starters, HolyC offers experienced programmers an unchecked variant of the switch expression, denoted via switch [foo] instead of switch (foo). In addition, the language also has implicit case values and even case ranges!

I64 i;
switch (i) {
    case: "zero\n"; break;         // Implicit case statements start at 0
    case: "one\n"; break;          // ... and increment by 1 each time.
    case: "two\n"; break;
    case 3: "three\n"; break;      // Explicit cases work as you would expect.
    case 4...8: "others\n"; break; // Cases 4 through 8 will print "others\n".
}

Note that in the above example I technically skipped explaining another quirk with HolyC, which is that constant (literal) string expressions all by themselves will automatically be sent to Print. This lets you do neat things like:

U0 PrintMessage(char *first, char *last)
{
    "Hello person!\n";
    "Your name is %s %s.\n", first, last;
}

Back to switch statements, they may actually be nested into what are known as “sub_switch” statements via the start and end keywords. Below is the example code included in TempleOS for this functionality:

U0 SubSwitch ()
{
    I64 i;
    for (i=0;i<10;i++)
        switch(i) {
            case 0: "Zero ";     break;
            case 2: "Two ";      break;
            case 4: "Four ";     break;
            start:
                "[";
                case 1: "One";   break;
                case 3: "Three"; break;
                case 5: "Five";  break;
            end:
                "] ";
                break;
        }
    '\n';
}

SubSwitch;

The above code will print Zero [One] Two [Three] Four [Five] to the command line.

The #exe {} Expression

This is my personal favorite feature. This expression allows you to write code or execute programs whose output is embedded into the rest of your source code at compile time. This let you do things like:

#include #exe { /* code to find location of library */ }

This is essentially HolyC’s solution to macros.

Misc Features & Quirks