Select Page

Survey of Built-in Concurrency in Programming Languages

by | Jun 23, 2022 | Uncategorized | 0 comments

Over the years, I’ve written quite a bit of C/C++ and Python code to support CLI (command-line interpreter) based factory (and developer) testing of many types of device (usually via a serial port). I’ve always found CLI commands to be a little too limited. So, about a year ago, I started designing a small scripting language for MCUs with very little memory/resources, that could be used in place of CLI commands. It has a REPL (read-eval-print loop), and can also be “compiled” into bytecode and uploaded into memory to be executed. The language was far from complete and still somewhat limited, and in addition, still took a little too much RAM and flash and required the use of malloc a little too much. I put the project aside for a bit, mostly because I wasn’t sure it really made sense to continue with it, and I wanted to think it out a little more.

In addition to having a CLI, for most of the projects I’ve worked on from scratch over the past few years, I’ve been developing a sort of skeleton / framework of reusable code for MCU development that I can just pop in place for new projects and build upon. The skeleton includes functionality for some things I tend to use over and over again, such as a simple “thread” class with notifications (using FreeRTOS), a 3rd party console library for CLI, an auto-mutex class that can be used to provide shared access to say an I2C bus with multiple devices on it between threads, some interrupt handling and callback classes, error handling code / a hardfault handler, logging, and a whole bunch of other stuff that you’d normally need for pretty much every kind of device you would develop firmware for.

A few weeks ago, I took another look at my little factory testing scripting language. I wondered if maybe it would make sense to make this language way more fully featured, such that it could be used for actual development or prototyping and not just for factory / developer testing a device. I thought it might be really nice to actually build in some of the common things my skeleton provides right into the language itself. Now, I know there are a ton of existing languages out there. However, none of them but maybe C and C++ are really that usable on limited memory/flash MCUs (yeah, maybe Rust too, but I find Rust a very cumbersome language). Some of the things that you need to do in embedded device code, like handling IRQs, don’t seem to really be accounted for in any language I know of (at least not in a simple or elegant way). However, there are a LOT of languages that have built-in concurrency (i.e. multi-threading).

I decided that if I really wanted to make this a real language, with simple and elegant concurrency, I should at least look around and take a survey of existing languages that provide concurrency built-in to the language (but not provided externally via a runtime or library / package) and see how they do it. This article is all about what I found. Let’s get started!

The languages I examined were:

  • Occam
  • E
  • Ada
  • Go
  • Elixir
  • Esterel
  • Lua
  • Python, Java, Haskell, OCaml and Concurrent ML

Some of these are “research languages” and some quite old. However, many of them seem to use ideas from others.

Occam

We’ll start with Occam, since from what I can tell, many of the other languages seem to have based their model of concurrency on it.

Occam was developed for “transputers”, an early kind of highly parallel computer architecture. In Occam, every statement is considered a “process”; whether they run in an actual separate process seems to be left to the implementation, and from what I can tell most of the examples don’t treat them this way. Occam provides a “SEQ” command, that takes a block of code that must run sequentially; it ends when the last command finishes. It also provides a “PAR” command, that takes a block of code whose individual statements can all run in parallel; the block ends when all statements have finished executing. E.g.

SEQ
    var2 := var1 + 1
    var3 := var2 + 2

PAR
    var2 := var1 + 1
    var3 := var1 + 2

(note how in the SEQ we’re allowed to use var2 in the calculation for var3 but not in the PAR)

Each of these can be “replicated” using a FOR loop, e.g.:

SEQ i = 0 FOR 5
    some bunch of statements here

PAR i = 0 FOR 5
    some bunch of statements here

SEQ will run them all sequentially one after the other I think. PAR will run them all in parallel. The variable “i” can be used as a sort of “pid” for say printing out which process we’re on or use in an array or such for storing data for that process.

Synchronization between “processes” is done using bidirectional “channels”:

CHAN OF INT in1, out2

To wait for a “signal” on a channel, you can do something like:

in1 ? var1

This will block until in1 is signalled and the integer value from in1 will be stored in var1.
To signal a channel, you can do something like:

out2 ! var1

This will block until out2 is available for signalling and the integer value in var1 will be sent as the signal via out2.
Here’s an example of how this could be used:

CHAN common :
PAR
    INT var1 := 1
    SEQ
        do some calculation with var1 here
        common ! var1
    INT var2 :
    SEQ
        common ? var2
        do some calculation with var2 here or output it or such

This uses the channel “common” to synchronize the two parallel “SEQ” blocks. I.e. both SEQ blocks will immediately start running concurrently. The 2nd one will block waiting for a signal on “common”. The first will block waiting for “common” to be available for signalling. Since “common” is already available for signalling, the 1st block will immediately send var1’s value “1” through it. The 2nd block will then receive that value and stick it in var2.
Deadlock is actually very possible here and you have to design with that in mind. There is no order of priority inside a PAR, i.e. they all start immediately. You can force order using PRI PAR instead, which causes each subprocess to be started in the order it’s written in the code.

Now, you may be saying, what if I need to block on multiple signals? Occam provides for this too, via the ALT statement.
ALT is like a switch statement for signals, e.g.

CHAN chan1, chan2, chan3
INT var :
ALT
    chan1 ? var
        do something here
    chan2 ? var
        do something else here
    chan3 ? var
        do something else here
    Time ? AFTER T
        do some timeout thing here

What the ALT statement does is wait for any one of the channels to be signaled. Without the “Time” statement, it would block until one of them was signaled. It then handles that signal and exits the block. If multiple channels are signaled simultaneously, it’s arbitrary which one will be handled. Typically an ALT block is run in a loop over and over (sort of like a main loop for a thread). The weird Time ? AFTER T syntax provides a way to time out the ALT block when nothing happens after some amount of time. I actually think this timeout syntax is really ugly and could’ve been done better; not only that but as we’ll see later, a lot of other languages based their multi-signal synchronization method on this and repeated the ugliness of it (why?!?).

E

Next, let’s look at the E language. E uses “promises” via an “eventually” operator “<-” to provide concurrency. Without concurrency, i.e. in normal operation, you would call a function on an object like:

pid.calculate()
println("Calculated")

and it would execute the function and return.

Instead you can have it execute asynchronously:

pid <- calculate()
println("Calculating")

and the <- operator will queue a call to the calculate method on the pid object and return immediately moving on to the next statement (the println).
In this case, the pid object could even be on a different computer on the network.
Multiple promises are “sent” (or queued) in order. However, this does not mean they will be “received” in that order on the other end.

You can also set up a callback for when the “promise” has “resolved” (finished executing), using a “when” block:

def pidValue := pid <- calculate()
when (pidValue) -> {
    println('Pid calculated value $pidValue')
} catch prob {
    println('Could not calculate pid value, $prob')
} finally {
    println('Always gets here after resolution')
}
println('Continuing')

pidValue is a “promise” here. You cannot use it outside a “when” block or you get an error.
The “when” block sets up a callback that executes asynchronously and immediately moves on to the next statement (in this case println(‘Continuing’)).

When the call to calculate is “resolved” successfully, it executes the first block (which in this case prints the value calculated). If it timed out or some other problem happened, it executes the “catch” block. In either case, whether it successfully resolved or failed to resolve, the “finally” block is executed (but only after resolution).

E allows you to have multiple promises in the when clause, e.g. when (foo, bar, baz).

There is somewhat of a naming convention to tell the difference between a function that returns a promise as opposed to one that executes immediately, e.g. calculateVow(). It didn’t appear to be used consistently in the documentation I read on this subject.

Unfortunately, I did not see anything in the documentation about how you would implement the other end, i.e. the “receiver”, although there are examples of how you would be able to be a middleman for a receiver and a sender (i.e. you need to wait for some service to give you a value before you can give a value to a caller of your service). I won’t go into that here except to say that E provides “Ref.promise()”, e.g. (fooPromise, fooResolver) := Ref.promise(), which returns a promise and resolver, and you can either call resolve on the resolver to mark it successful or call smash on it to mark it a failure.

E also provides some methods to prevent the program from exiting while waiting for promises to resolve, including blockAtTop, continueAtTop, and waitAtTop(promise). I didn’t bother reading enough to understand these as they weren’t important for my survey.

Ada

Ada has built-in concurrency via “task” objects. As soon as you define a task’s body, it starts executing in the background. The main program won’t end until all tasks finish. Synchronization is done using a “rendezvous”, which is basically a synchronized function, e.g.:

procedure Main is
    task After is
        entry Go (Text : String);
    end After;
    task body After is
    begin
        accept Go (Text : String) do
            Put_Line ("After: " & Text);
        end Go;
    end After;
begin
    Put_Line ("Before");
    After.Go ("Main");
end Main;

In this example, the “After” task body executes once, blocking (via “accept”) until some other task or main calls the rendezvous “Go” on the task object (which main does right after outputting “Before” here). Calling a rendezvous will block if there is no task currently accepting that rendezvous.

Ada provides a way to handle multiple rendezvous at once, very similarly to how Occam does with channels, using a “select” statement, e.g.

task body Counter is
    Value : Integer := 0;
    begin
        loop
            select
                accept Increment do
                    Value := Value + 1;
                end Increment;
            or
                accept Decrement do
                    Value := Value - 1;
                end Decrement;
            or
                accept Get (Result : out Integer) do
                    Result := Value;
                end Get;
            or
                delay 5.0;
                Put_Line ("Exiting Counter task…");
                exit;
            end select;
        end loop;
    end Counter;

In this example, the Counter task loops forever (well, sort of) waiting for another task to call one of its rendezvous methods, Increment, Decrement, or Get. It will execute whichever one gets called and exit the select. As you can see Ada provides a way to time out a select (and it is just as ugly and confusing as Occam’s). The “delay 5.0” in this select statement will time out the select after 5 seconds if no other rendezvous method is called. I.e. the select will start a timer immediately on entry and wait for a call to a rendezvous. After the timer expires it executes the statements in the timeout block and exits (normally exits the select, but in this case it exits the task via “exit”). The thing to remember is that if any of the rendezvous are called the timer is interrupted and stopped, and on next loop it will start over again as soon as select is executed. The order of incoming “accept” processing is set via a compiler pragma and defaults to first in first out.

Ada also has “protected” classes, where access to any method on an object of a protected class is synchronized (via a mutex) on the object (just like in Java which we’ll see later). It’s done using a “barrier” or “entry”. An entry is like a function but includes a “barrier” condition that must be true before the function is allowed to execute. E.g.

package Binary_Semaphores is
    protected type Binary_Semaphore is
        entry Wait;
        procedure Signal;
    private
        Signaled : Boolean := False;
    end Binary_Semaphore;
end Binary_Semaphores;
package body Binary_Semaphores is
    protected body Binary_Semaphore is
        entry Wait when Signaled is
        begin
            Signaled := False;
        end Wait;
        procedure Signal is
        begin
            Signaled := True;
        end Signal;
    end Binary_Semaphore;
end Binary_Semaphores;

In this example, a Binary_Semaphore object is synchronized via the “Wait” entry. Calling “Wait” will block until the field “Signaled” becomes True, which happens when another task calls the “Signal” method. Note that the Signal method doesn’t seem to be synchronized and multiple calls to it while something is not “Wait”ing will not allow multiple later “Wait”s to execute, only the first (until Signal is called again). The fact that Signal could be called while Wait is changing the value of Signalled to False seems like a race condition to me here.

Go

Go has some runtime library packages that provide access to atomic variables of different types (using the underlying platform atomic mechanisms). It also has some runtime library packages built on top of these for synchronization, like sync.WaitGroup, etc. I won’t cover those here because I don’t consider them built-in to the language syntax.

Built-in to the Go language are “goroutines”, which are NOT coroutines but actual threads. You can call any function asynchronously by using the “go” keyword, e.g.:

func f(n int) {
    fmt.Println(n)
}
func main() {
    go f(0)
    // You can create thousands of them coz they're supposedly lightweight
    for i := 1; i < 10; i++ {
        go f(i)
    }
    var input string
    fmt.Scanln(&input)
}

Note, the fmt.Scanln in the main function is to cause main to block (until you type something) and let the other threads continue running, otherwise it would exit and kill any running threads.

Go provides synchronization via “channels”, which are just like Occam channels. You can define a channel with a subtype, e.g.:

var c chan string = make(chan string)

Then you can use that channel as both an input, via “<-” and an output, via “->”. Channels block unless they’re ready. I.e. output to a channel will block until something uses it as input and vice versa. Here’s an example:

func pinger(c chan string) {
    for i := 0; ; i++ {
        c <- "ping"
    }
}
func ponger(c chan string) {
    for i := 0; ; i++ {
        c <- "pong"
    }
}
func printer(c chan string) {
    for {
        msg := <- c
        fmt.Println(msg)
        time.Sleep(time.Second * 1)
    }
}
func main() {
    var c chan string = make(chan string)
    go pinger(c)
    go ponger(c)
    go printer(c)
    var input string
    fmt.Scanln(&input)
}

In this example, the “printer” thread loops forever waiting for a signal on string channel “c” and then just prints the value. The pinger/ponger threads continually send “ping” and “pong” strings through the channel to the printer thread. Since the printer thread sleeps for a second every loop, pinger and ponger block until it’s ready. It’s arbitrary which will get through first.

Channels can actually be used as streams / queues, e.g. make(chan string, 10), which creates a 10 item string queue. Output to these channels will not block until the queue is full and input from them only blocks while it’s empty.

You can also wait for multiple channels at once. Go provides the “select” statement, which is basically the same as Ada’s “select” and Occam’s “ALT”. E.g.

select {
    case msg1 := <- c1:
        fmt.Println("Message 1", msg1)
    case msg2 := <- c2:
        fmt.Println("Message 2", msg2)
    case <- time.After(time.Second):
        fmt.Println("timeout")
    default:
        fmt.Println("nothing ready")
}

And note that it also has an even uglier confusing way of doing a timeout on the select than Ada or Occam. The “default” clause runs if no other case did.

Elixir

Elixir is a functional language built on top of Erlang’s BEAM VM, so it provides similar concurrency to Erlang (which I won’t cover here).

Elixir has the “spawn” statement, which creates a new process, immediately starts it, and continues execution of the next statement after the spawn.

Synchronization is done via message passing. You can “send” a process (via its pid) any variable/expression asynchronously. The receiving process does a “receive do” (with or without standard functional pattern matching) and blocks until it receives a message. It checks the message against any patterns and if none match, it sticks the message back in the queue and moves to the next one. Future “receive do” statements will start over again at the beginning of the queue (i.e. with any messages that didn’t match the last receive do patterns that were stuck back into the queue). E.g.:

pid = spawn(fn ->
    statement1
    receive do
        pattern1 -> do something
        pattern2 -> do something else
    end
    ...
    statementn
    end)

send(pid, "pattern1")

Esterel

Esterel is actually a very interesting language that is actually built to be a “reactive” programming language (which is similar to event driven but not quite the same). The core of the language is built to use and handle “signals”. However, trying to concisely describe how it does this would take a whole book (way too much for a single article like this), so I will not be talking about it further here. For more info you can read the primer here: https://cseweb.ucsd.edu/classes/wi17/cse237A-a/handouts/Esterelv5Primer.pdf

Lua

Lua has built-in coroutines. They do NOT run in parallel. Instead, as soon as you “resume” a created coroutine, the function that resumed it is suspended and the coroutine function runs until it “yields”. Synchronization is done using yield/resume, also allowing you to pass data back and forth. E.g.

c = coroutine.create(function (v)
    local v1 = 100
    print("coroutine starting", v, v1)
    v1 = coroutine.yield(v+1, v1+1)
    print("coroutine post yield", v1)
    return v1+3
end)
print("main", coroutine.resume(c, 3))
print("main", coroutine.resume(c, 5))

This creates a coroutine. The first resume passes 3 as v into the coroutine’s function. The function prints “coroutine starting 3 100” and then yields, returning 4 and 101 to the main routine, which then prints “main 4, 101”. The main routine then resumes again passing 5 back. The coroutine function receives that 5 value into v1 as the result of the first yield and prints “coroutine post yield 5”. It then returns back to main the value 8, and main prints “main 8”. Control between coroutines/main is passed back and forth via resume/yield pairs.

I personally don’t find coroutines at all useful, especially for event driven applications.

The rest

The remaining languages I looked at either have concurrency built in and are widely discussed elsewhere, or were just not interesting enough for me to go into detail here. I will mention them only briefly here.

Python has “threads”, but they do not run in parallel. There are many many other sites that explain python threads.

Java has real threads built-in to the language. Shared data access is synchronized via the “synchronize” keyword. Synchronization can apply to a block, a function, etc and is usually done on a per object basis, e.g. synchronized(someobject) { code synchronized on that object; }. There are many other articles/books written about Java threading.

Haskell seems to have some kind of concurrency that was bolted on after the fact and seems like an ugly hack. It’s basically just threads (monadic) with atomic mutable variables for synchronization/communication. There are also some GHC packages that provide synchronization, like STM.

OCaml has some different methods of doing concurrency, but they are not built-in to the language and instead provided by other 3rd party packages.

As for Concurrent ML, I was only able to find papers on it and no real documentation. And those papers I could find on it show that it seems like it’s based somewhat on Occam, but with really messy syntax. I did not bother pursue this further.

Final Notes

Out of all the languages I researched, I particularly like how Ada’s tasks and synchronization works (except for the ugly timeout syntax again). I plan to design something similar to it into the language I’m working on. I’ll post more about this language later. Thanks for reading!

Leave a Reply

Recent Comments

    Archives