During my work with the course-ware of a new Erlang course, I experimented with one of the programming assignments to compare the threading performance of Erlang versus Java. The assignment is one of the classical programs from teaching concurrent programming: How to generate prime numbers using a pipeline of sieve threads. The algorithm is based on the Sieve of Eratosthenes, but modified to use threads instead.
The primes in [2..N]
An initial version sends the numbers [3..N] to the first sieve thread representing the prime number 2. Each seive thread receives a number and checks if it’s divisable with the prime number of the sieve. If not, the number is forward to the next sieve. If there is no next seive, it is created using the current number as its prime number. That means, we have a pipeline of sieve threads that grows with each prime number found. The initial version of the solution finds all prime numbers in the given interval [2..N]. When the generator has sent all numbers, it sends a ‘done’ marker, which is forwarded throughout the pipe and each sieve terminates.
Here are the relevant portions of the code (the full source code can be found at the end of this blog post)
generator(N, interval, Stats) ->
Next = sieve_create(1, 0, 2, stats_create(Stats)),
iterate(3, N, send_and_forget(Next));
send_and_forget(Next) ->
fun
(done) -> Next ! {done, []}, done;
(Number) -> Next ! Number
end.
iterate(N, N, Send) ->
Send(done);
iterate(Number, MaxNumber, Send) when Number < MaxNumber ->
case Send(Number) of
done -> iterate(MaxNumber , MaxNumber, Send);
Number -> iterate(Number + 1, MaxNumber, Send)
end.
sieve_create(Ordinal, MaxOrdinal, Prime, Stats) ->
spawn(?MODULE, sieve_init, [Ordinal, MaxOrdinal, Prime, stats_create(Stats)]).
sieve_init(Ordinal, MaxOrdinal, Prime, Stats) ->
StatsNew = stats_print(Ordinal, Prime, Stats),
sieve_loop(Ordinal, MaxOrdinal, Prime, none, StatsNew).
sieve_loop(Ordinal, MaxOrdinal, Prime, none, Stats) ->
receive
{done, PrimeNumbers} ->
generator ! {done, [Prime | PrimeNumbers]};
Number when (Number rem Prime) == 0 ->
sieve_loop(Ordinal, MaxOrdinal, Prime, none, Stats);
Number when (Number rem Prime) /= 0 ->
Next = sieve_create(Ordinal+1, MaxOrdinal, Number, Stats),
sieve_loop(Ordinal, MaxOrdinal, Prime, Next, Stats)
end;
sieve_loop(Ordinal, MaxOrdinal, Prime, Next, Stats) ->
receive
{done, PrimeNumbers} ->
Next ! {done, [Prime | PrimeNumbers]};
Number when (Number rem Prime) == 0 ->
sieve_loop(Ordinal, MaxOrdinal, Prime, Next, Stats);
Number when (Number rem Prime) /= 0 ->
Next ! Number,
sieve_loop(Ordinal, MaxOrdinal, Prime, Next, Stats)
end.
The drawback of this solution from a concurrency point of view is that it does not create particular many concurrent threads. At the time the generator sends the done marker, the pipeline is still under construction. Here is snippet showing it in action
$ erl -noshell -pa target/beam -run prime exe 10000 interval stats 1) 2 [processes 1/1, messages 18/18] 2) 3 [processes 2/2, messages 31/31] 3) 5 [processes 3/3, messages 29/31] 4) 7 [processes 4/4, messages 31/31] 5) 11 [processes 5/5, messages 31/31] . . . 38) 163 [processes 38/38, messages 23/31] . . . 72) 359 [processes 72/72, messages 31/31] 73) 367 [processes 58/72, messages 31/31] . . . 728) 5507 [processes 31/72, messages 31/120] . . 1226) 9941 [processes 1/72, messages 4/349] 1227) 9949 [processes 1/72, messages 3/349] 1228) 9967 [processes 1/72, messages 2/349] 1229) 9973 [processes 1/72, messages 1/349] Prime Numbers [2,3,5,7,11,13,17,...,9941,9949,9967,9973] $
In this case, I’m computing the prime numbers in the interval [2..10000]. Each line shows the ordinal number, the prime number and some stats. The stats shows the actual and maximum number of threads created and the length of the input queue of each sieve. At most 72 concurrent threads are running at the same time and most of the time essentially fewer threads.
The first N primes
More interesting is generating the N first prime numbers, because it will create N threads as well. That means I can study the threading and messaging performance. Here are the missing portions of the program
run(N, Mode, Stats) ->
register(generator, self()),
generator(N, Mode, Stats),
Result = receive
{done, PrimeNumbers} -> lists:reverse(PrimeNumbers)
end,
unregister(generator),
Result.
generator(N, count, Stats) ->
Next = sieve_create(1, N, 2, stats_create(Stats)),
iterate(3, infinity, send_and_receive(Next)).
send_and_receive(Next) ->
fun
(done) -> done;
(Number) ->
receive
done -> Next ! {done, []}, done
after 0 -> Next ! Number
end
end.
sieve_loop(MaxOrdinal, MaxOrdinal, Prime, none, Stats) ->
receive
{done, PrimeNumbers} ->
generator ! {done, [Prime | PrimeNumbers]};
_Number ->
sieve_loop(MaxOrdinal, MaxOrdinal, Prime, none, Stats)
end;
However, I decided to re-write the program to avoid the list reverse in run/3, but also to make the stats more accurate. I show you the relevant portions of the new version below. The complete source can be found at the end of this post.
generator_loop(Number, Next) ->
receive
done ->
Next ! {done, self()},
unregister(generator),
receive
{result, Primes, StatsLst} -> {Primes, StatsLst}
end
after 0 ->
Next ! Number,
generator_loop(Number + 1, Next)
end.
sieve_create(Prime, Ordinal, MaxOrdinal, Verbose) ->
print_progress(Prime, Ordinal, Verbose),
check_if_last(Ordinal, MaxOrdinal),
spawn(?MODULE, sieve_loop, [Prime, Ordinal, MaxOrdinal, none, Verbose, stats_create()]).
check_if_last(N, N) -> generator ! done;
check_if_last(N, M) when N < M -> ok.
sieve_loop(Prime, Ordinal, MaxOrdinal, Next, Verbose, Stats)
when Ordinal < MaxOrdinal ->
receive
Number when (Number rem Prime) == 0 ->
sieve_loop(Prime, Ordinal, MaxOrdinal, Next, Verbose, stats_update(Stats));
Number when (Number rem Prime) /= 0 ->
NextNew = case Next of
none -> sieve_create(Number, Ordinal+1, MaxOrdinal, Verbose);
Pid -> Pid ! Number, Pid
end,
sieve_loop(Prime, Ordinal, MaxOrdinal, NextNew, Verbose, stats_update(Stats));
{done, Prev} ->
Next ! {done, self()},
receive
{result, Primes, StatsLst} ->
Prev ! {result, [Prime | Primes], [stats_id(Stats, Ordinal, Prime) | StatsLst]}
end
end;
sieve_loop(Prime, MaxOrdinal, MaxOrdinal, _Next, _Verbose, Stats) ->
receive
Number when is_integer(Number) ->
sieve_loop(Prime, MaxOrdinal, MaxOrdinal, none, none, stats_update(Stats));
{done, Prev} ->
Prev ! {result, [Prime], [stats_id(Stats, MaxOrdinal, Prime)]}
end.
When the last sieve is created (Ordinal == MaxOrdinal) it sends the done marker to the generator. The generator the sends its own PID to its next seive and each sieve forwards the done marker together with its PID. When the done marker hits the last seive, it creates the last item of the resulting prime number list and sends it to its predecessor. Each sieve prepends its own prime and finally the complete list of prime numbers arrives to the generator. Finally, we are done with the preludium of this blog post. Let’s study its execution.
Running the Erlang version
Here is a first run with the dedicated ‘N first primes’ version.
jens@spooky:~/workspace/Prime$ /usr/lib/erlang/bin/erl -pa target/beams/ Erlang (BEAM) emulator version 5.6.5 [64-bit] [smp:2] [async-threads:0] [kernel-poll:false] Eshell V5.6.5 (abort with ^G) 1> prime:run(10). 1) 2 2) 3 3) 5 4) 7 5) 11 6) 13 7) 17 8 ) 19 9) 23 10) 29 Elapsed time 0.002 secs [2,3,5,7,11,13,17,19,23,29] 2> prime:run(1000). 1) 2 2) 3 3) 5 . . . 999) 7907 1000) 7919 Elapsed time 5.886 secs [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73, 79,83,89,97,101,103,107,109|...] 3>
Spawning 1000 threads in Erlang is a no-brainer and we can see it performs pretty good. The program also collects some statistics of each sieve input queue and writes it out to a CSV file. Here comes the interesting part
| Thread | Min | Average | Max | Count |
| 1-2 | 0 | 37 | 147 | 86410 |
| 2-3 | 0 | 11 | 36 | 43204 |
| 3-5 | 0 | 8 | 33 | 28802 |
| 4-7 | 0 | 6 | 35 | 23041 |
| 5-11 | 0 | 5 | 40 | 19749 |
| 6-13 | 0 | 5 | 40 | 17953 |
| 7-17 | 0 | 5 | 41 | 16572 |
| 8-19 | 0 | 4 | 42 | 15597 |
| 9-23 | 0 | 4 | 42 | 14776 |
| 10-29 | 0 | 4 | 52 | 14137 |
| 11-31 | 0 | 4 | 52 | 13655 |
| . . . | ||||
| 993-7867 | 0 | 13 | 65 | 7410 |
| 994-7873 | 0 | 12 | 42 | 7409 |
| 995-7877 | 0 | 12 | 51 | 7408 |
| 996-7879 | 0 | 12 | 52 | 7407 |
| 997-7883 | 0 | 12 | 52 | 7406 |
| 998-7901 | 0 | 11 | 42 | 7405 |
| 999-7907 | 0 | 11 | 67 | 7404 |
| 1000-7919 | 0 | 10 | 28 | 7403 |
We can see that at most 147 messages was in the message queue of sieve(1,2), but in average the queue length is way more moderate, fluctuating around 10. Let’s do another round, this time with 10.000 threads, before starting the discussion.
$ erl -pa target/beams/ Erlang (BEAM) emulator version 5.6.5 [64-bit] [smp:2] [async-threads:0] [kernel-poll:false] Eshell V5.6.5 (abort with ^G) 1> prime:run(10000, no). Elapsed time 272.172 secs [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73, 79,83,89,97,101,103,107,109|...] 2>
Here are the message queue stats:
| Thread | Min | Average | Max | Count |
| 1-2 | 0 | 42 | 803 | 483183 |
| 2-3 | 0 | 12 | 30 | 241591 |
| 3-5 | 0 | 8 | 33 | 161060 |
| 4-7 | 0 | 7 | 28 | 128847 |
| 5-11 | 0 | 6 | 28 | 110440 |
| 6-13 | 0 | 5 | 28 | 100400 |
| 7-17 | 0 | 5 | 28 | 92677 |
| 8-19 | 0 | 5 | 28 | 87225 |
| . . . | ||||
| 9994-104693 | 0 | 3 | 20 | 30258 |
| 9995-104701 | 0 | 3 | 20 | 30257 |
| 9996-104707 | 0 | 3 | 20 | 30256 |
| 9997-104711 | 0 | 3 | 20 | 30255 |
| 9998-104717 | 0 | 3 | 20 | 30254 |
| 9999-104723 | 0 | 3 | 20 | 30253 |
| 10000-104729 | 0 | 3 | 20 | 30252 |
Even this time we can see that the message queue length is very moderate. In both cases the first sieve receives the major hit of messages and after that for the rest of the pipeline, the queue length is around 10 or less.
If you knew the concurrency model of Erlang, but not its implementation, this is suspicious. Erlang messages are sent by asynchronous message passing, which means that a thread loops and sends an unbounded number of messages until the thread is put on hold for a while (i.e., it runs out of reduction ticks).
What we should have seen; is a large amount of messages in the beginning of the pipeline. However, that doesn’t happen because the Erlang runtime system takes care of the fine-tuning of its thread scheduling. A thread is allowed to perform a certain number of reductions (e.g. function calls and pattern matching) before it is put on hold and inserted last in the ready-queue. This queue contains all threads that do not wait for some condition, like a receive statement. So instead of running for a certain amount of time it runs for a certain number of operations. The consequence is paramount; a thread cannot send too many messages before the recipient has a chance to ‘receive’ them. In addition, the Erlang runtime system tries to level the send rate so the recipient’s message queue doesn’t get flooded. To fully understand this and fully appreciate it, let’s study the same program in Java.
The Java version
I will just show you fragments of the Java code. You can find all sources at the end of this post.
Class Sieve
public class Sieve extends Thread {
public static int maxOrdinal = 0;
private int ordinal, prime;
private Sieve next;
private LinkedList<Integer> numbersInput = new LinkedList<Integer>();
private volatile boolean running = true;
public Sieve(int ordinal, int prime) {
this.ordinal = ordinal;
this.prime = prime;
start();
}
public void run() {
if (ordinal == maxOrdinal) {Generator.done();}
int n;
while ((n = getNumber()) > 0) {
if (ordinal == maxOrdinal) continue;
if (n % prime != 0) {
if (next == null) {
next = new Sieve(ordinal+1, n);
} else {
next.putNumber(n);
}
}
}
if (next != null) next.putNumber(0);
}
public synchronized void putNumber(int n) {
if (n > 0) {
numbersInput.addLast(n);
} else {
running = false;
}
notify();
}
private int getNumber() {
int n;
synchronized (this) {
while (running && numbersInput.isEmpty()) {
try{ wait(); }catch (InterruptedException e) {}
}
n = running ? numbersInput.removeFirst() : 0;
}
return n;
}
}
Inside the run() method we recognizes the sieve algorithm. Java do have support for message passing, so we have to implement it. I’m using a linked-list and synchronized methods. The public putNumber() method is called by the predecessor and inserts a message last in the queue/list. The private getNumber() is used by the sieve loop to fetch the next number. If the input queue is empty, it waits until notify() is called in putNumber(). The last sieve calls the done() method of the generator, which sends ’0′ to all sieves.
Class Generator
public class Generator extends Thread {
private static volatile boolean running = true;
private Sieve next = new Sieve(1, 2);
public void run() {
for (int n = 3; running; ++n) {
next.putNumber(n);
}
next.putNumber(0);
}
public static void done() {running = false;}
//. . .
}
I have removed some code from the generator, but the important part is retained. In the run() method, the generator thread sends an unbounded sequence of numbers to its successor. When the last seive calls done() it breaks the loop and sends ’0′. You might wonder why done() is not synchronized. That is because it’s exactly one thread that will call done() and it’s harmless to assign ‘running’ a new value.
Running the Java version
Let’s start with an easy one; the 100 first prime numbers.
$ java -cp target/classes/ Prime -y 0 -b 0 -n 100 1) 2 2) 3 3) 5 4) 7 . . . 75) 379 76) 383 Exception in thread "main" java.lang.OutOfMemoryError: Java heap space Exception in thread "Thread-1" java.lang.OutOfMemoryError: Java heap space Exception in thread "Thread-16" java.lang.OutOfMemoryError: Java heap space at java.lang.Integer.valueOf(Integer.java:601) at Sieve.putNumber(Sieve.java:56) at Sieve.run(Sieve.java:39) $
Whoa!!! It crashed after just 76 threads! Can this be true? What is happening? I managed to run it to completion, when starting a second time.
$ java -cp target/classes/ Prime -y 0 -b 0 -n 100 1) 2 2) 3 . . . 99) 523 100) 541 [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541] Elapsed time: 35.872 secs $
Hmmm, the Erlang version running up to 1000 sieves took 5 secs and the Java version running a just up to 100 threads took 35 secs. Let’s see what is going on in the Java program and the true side of asynchronous message passing – as it had been seen in Erlang if not its runtime system had been so helpful.
| Thread | Min | Average | Max | Count |
| 1-2 | 5326 | 2463146 | 5983878 | 2584638 |
| 2-3 | 0 | 19077 | 128598 | 1292319 |
| 3-5 | 0 | 9645 | 55008 | 851453 |
| 4-7 | 0 | 8728 | 41121 | 681161 |
| 5-11 | 0 | 7216 | 38952 | 583852 |
| 6-13 | 0 | 7294 | 41757 | 530775 |
| 7-17 | 0 | 6292 | 30771 | 489945 |
| . . . | ||||
| 95-499 | 0 | 26395 | 59904 | 133446 |
| 96-503 | 0 | 23891 | 51488 | 113883 |
| 97-509 | 0 | 6376 | 32185 | 81409 |
| 98-521 | 0 | 14685 | 38199 | 81215 |
| 99-523 | 0 | 20810 | 57653 | 81029 |
| 100-541 | 0 | 25095 | 58695 | 78282 |
Five million (5,983,878) messages in the first sieve and in average 50,000 messages in each sieve’s message queue! We can clearly see why it’s so important that the Erlang runtime system manages and levels how many messages are permitted to be sent, to prevent it ending up in the situation our Java version is in.
Java is using native threads, which on my Ubuntu Linux machine means POSIX Threads (NPTL). The threads are scheduled using interrupt-driven preemptive scheduling, which means at a regular frequency a clock interrupt switches the currently executing thread. This can happen as a ‘bolt from the blue’.
To compare with cooperative scheduling, let’s introduce a helper class that invokes yield() and call it inside each loop of the generator and the sieve.
public class ThreadSwitcher {
private long count = 0;
private int factor = 1;
public ThreadSwitcher(int factor) {this.factor = factor;}
public void yield() {
if ((factor > 0) && (++count % factor) == 0) {
Thread.yield();
}
}
}
If I run it with factor=10, it completes in 18 secs.
$ java -cp target/classes/ Prime -y 10 -b 0 -n 100 1) 2 2) 3 . . . 99) 523 100) 541 [2, 3, 5, 7, 11, 13, ..., 523, 541] Elapsed time: 18.495 secs
If I decrease the factor=1, meaning a switch after each round, we get even better figures.
$ java -cp target/classes/ Prime -y 1 -b 0 -n 100 1) 2 2) 3 . . . 98) 521 99) 523 100) 541 [2, 3, 5, 7, ...,521, 523, 541] Elapsed time: 7.467 secs
Now, look at the messages stats below. Much better numbers, although not as good as Erlang.
| Thread | Min | Average | Max | Count |
| 1-2 | 821 | 299267 | 448928 | 347981 |
| 2-3 | 0 | 7028 | 28651 | 173982 |
| 3-5 | 0 | 2650 | 9117 | 115988 |
| 4-7 | 0 | 706 | 5871 | 92790 |
| 5-11 | 0 | 223 | 5229 | 78868 |
| 6-13 | 0 | 220 | 4942 | 71698 |
| 7-17 | 0 | 210 | 4919 | 66181 |
| 8-19 | 0 | 200 | 4747 | 62289 |
| 9-23 | 0 | 195 | 4680 | 59010 |
| . . . | ||||
| 95-499 | 0 | 4319 | 6752 | 28060 |
| 96-503 | 0 | 44 | 604 | 28034 |
| 97-509 | 0 | 7412 | 20373 | 28007 |
| 98-521 | 11657 | 19766 | 27877 | 16222 |
| 99-523 | 0 | 15582 | 16218 | 26 |
| 100-541 | 0 | 0 | 0 | 25 |
Is there anything else we can do? Yes, we can change the message sending semantics into a blocked (synchronous) implementation. I took the liberty to realize synchronous message passing in the Java program. You can find the sources at the end of this blog post. Now, running it without yield, but with a max length of each message queue of 100 gives this output.
jens@spooky:~/workspace/PrimeJava$ java -cp target/classes/ Prime -y 0 -b 100 -n 100 1) 2 2) 3 3) 5 . . . 98) 521 99) 523 100) 541 [2, 3, 5, 7, 11, ..., 523, 541] Elapsed time: 0.455 secs
Yiiiha, Java rocks!. Looking at the message stats below shows something of a déjà-vu. Where have I seen numbers like this before? Hmm, let’s see. Aha, now I remember; when running the Erlang version.
| Thread | Min | Average | Max | Count |
| 1-2 | 0 | 89 | 100 | 15244 |
| 2-3 | 0 | 87 | 100 | 7523 |
| 3-5 | 0 | 89 | 100 | 4917 |
| 4-7 | 0 | 92 | 100 | 3835 |
| 5-11 | 0 | 94 | 99 | 3188 |
| 6-13 | 36 | 99 | 100 | 2800 |
| 7-17 | 0 | 99 | 100 | 2485 |
| 8-19 | 0 | 2 | 99 | 2339 |
| 9-23 | 0 | 2 | 95 | 2216 |
| . . . | ||||
| 96-503 | 0 | 99 | 100 | 414 |
| 97-509 | 0 | 99 | 100 | 314 |
| 98-521 | 0 | 99 | 100 | 214 |
| 99-523 | 0 | 99 | 100 | 114 |
| 100-541 | 0 | 0 | 0 | 113 |
OK, time to get serious again. I became curious about how the performance varies with the settings of the yield-factor (y) and the bound-buffer factor (b). Here is a table showing the elapsed time of 1000 sieve threads, as a function of some values of y and b.
| y | b | Seconds |
| 0 | 1000 | 43.007 |
| 0 | 500 | 28.778 |
| 0 | 100 | 13.315 |
| 0 | 50 | 9.167 |
| 0 | 10 | 8.941 |
| 0 | 5 | 8.614 |
| 0 | 1 | 6.028 |
| 1 | 100 | 93.844 |
| 1 | 10 | 14.071 |
| 1 | 1 | 4.779 |
| 10 | 100 | 50.055 |
| 10 | 10 | 8.056 |
| 10 | 1 | 8.351 |
| 100 | 100 | 51.479 |
| 100 | 10 | 8.655 |
| 100 | 1 | 8.439 |
It’s quite stunning to see that the elapsed time ranges from 93 seconds down to 4, just by changing how often the thread switches and capping the message queue length. In this case, we can see that the buffer-length contributes most and we can draw the conclusion that synchronous sending with buffer length 1 performes best. This is often namned rendezvous.
After running the series above using the Java version, I switched back to the Erlang version for comparison. Here are two runs with 1000 and 10,000 threads. Additional comments are superfluous.
$ erl -pa target/beams/ -noshell -run prime exe 1000 no Elapsed time 2.213 secs $ erl -pa target/beams/ -noshell -run prime exe 10000 no Elapsed time 211.147 secs
The Java version still has a long way to go, when it comes to the number of threads. Above 1000 threads it starts to struggle. Running with 5000 threads takes 350 seconds and with 7500 threads it takes 852 seconds.
Some other day I will re-implement the program in C++ using pthreads and compare with both the Java and Erlang versions. Perhaps that will happen when I am teaching realtime systems programming in C++ on Linux in September for Ericsson in Gothenburg. You bet, I will be using the prime number pipeline as an exercise there too.
Source Code
PrimeNumbers Source Code (6.5 KiB, 47 hits)
Final words
This Erlang version is part of the exercises for a course in Erlang Basics I have been developing during the last couple of weeks. My Erlang course is marketed by Informator in Sweden and for a start intended for Ericsson. If you are interested in attending the course, please don’t hesitate to contact Informator.


Anybody still looking in here?
I’m looking for a very tiny version of ‘first N prime numbers’ in Erlang.
Can someone help me out? :/
Would love to see the sources but am getting a 403 Forbidden
Sorry for that. Have been moving around the site a couple of times and in the process got rid of my subversion repo.
I will try to find the sources during the w/e and publish it here.
Thanks for pointing this out for me.
I have now fixed the missing sources problem. Now there is a ZIP file with all the source code, bith Erlang and Java.
Thanks Ulf, for your comments.
I actually started with a receiver method, but removed it because it prevents the out-of-band message ‘done’ to propagate ahead of all numbers in the queue when it’s time to stop the pipeline. A receive method imposes a strict FIFO order on the messages. On the other hand, we have seen that the message queue never get that long compared to the extremes of the Java version, so in practice it may not be a problem.
You’re having fun, I see.
One tiny observation: Since you don’t expect to buffer unknown messages, it will be slightly more efficient and semantically clearer if you write
case next_message() of
{done, PrimeNumbers} ->
…
end
next_message() ->
receive
Msg -> Msg
end.
This way, the process will crash if you get an unexpected message, rather than buffering it and slowing down subsequent receive operations.
There are of course many ways to overwhelm a receiver even in Erlang, but of course you know that. A subtle point is that the reduction-counting scheduler will actually punish the sender with extra reductions if the receiver’s message queue is very large. This slows the sender down even more. While this was a ‘clever’ optimization on single-core, it is not obviously a good idea on multi-core, since it implies synchronous delivery and locking.