Pnut: A C to POSIX shell compiler you can trust

by feeleyon 7/24/24, 12:22 AMwith 118 comments
by 1vuio0pswjnm7on 7/24/24, 4:47 AM

"Because Pnut can be distributed as a human-readable shell script (`pnut.sh`), it can serve as the basis for a reproducible build system. With a POSIX compliant shell, `pnut.sh` is sufficiently powerful to compile itself and, with some effort, [TCC](https://bellard.org/tcc/). Because TCC can be used to bootstrap GCC, this makes it possible to bootstrap a fully featured build toolchain from only human-readable source files and a POSIX shell.

Because Pnut doesn't support certain C features used in TCC, Pnut features a native code backend that supports a larger subset of C99. We call this compiler `pnut-exe`, and it can be compiled using `pnut.sh`. This makes it possible to compile `pnut-exe.c` using `pnut.sh`, and then compile TCC, all from a POSIX shell."

Anywhere we can see a step-by-step demo of this process.

Curious if the authors tried NetBSD or OpenBSD, or using another small C compiler, e.g., pcc.

Historically, tcc was problematic for NetBSD and its forks. Not sure about today, but tcc is still in NetBSD pkgsrc WIP which suggests problems remain.

by theamkon 7/24/24, 4:11 AM

If you are wondering how it handles C-only functions.. it does not.

open(..., O_RDWR | O_EXCL) -> runtime error, "echo "Unknow file mode" ; exit 1"

lseek(fd, 1, SEEK_HOLE); -> invalid code (uses undefined _lseek)

socket(AF_UNIX, SOCK_STREAM, 0); -> same (uses undefined _socket)

looking closer at "cp" and "cat" examples, write() call does not handle errors at all. Forget about partial writes, it does not even return -1 on failures.

"Compiler you can Trust", indeed... maybe you can trust it to get all the details wrong?

by cozzydon 7/24/24, 1:48 AM

Can finally port systemd to shell to quell the rebellion.

by okaleniukon 7/24/24, 9:30 AM

I love things like these because they shake our perception of normal loose. And who said our perception of normal doesn't deserve a good shake?

A C to shell compiler might seem impractical, but you know what is even more impractical? Having a separate language for a build system. And yet, here we are. Using Shell, Make or CMake to build a C program is only acceptable because is has always been so. It's a "perceived normality" in the C world.

There is no good reason, however, CMake isn't a C library. With build system being a library, we could write, read, and, most importantly, debug build scripts just like any other part of the buildable. We already have includeOS, why not includeMake?

by wahernon 7/24/24, 3:23 AM

This is very cool, regardless of how serious it was intended to be taken. Before base-64 encoders/decoders became more common as preinstalled commands in the environments I found myself on, I wrote a base64 utility in mostly pure POSIX shell:

  https://25thandClement.com/~william/2023/base64.sh
If this project had existed I might have opted to compile my C-based base-64 encoder and decoder routines, suitably tweaked for pnut's limitations.

I say base64.sh is mostly pure not because it relies on shell extensions, but because the only non-builtins it depends on are od(1) or, alternatively, dd(1) to assist with binary I/O. And preferably od(1), as reading certain control characters, like NUL, into a shell variable is especially dubious. The encoder is designed to operate on a stream of decimal encoded bytes. (See decimals_fast for using od to encode stdin to decimals, and decimals_slow for using dd for the same.)

It looks like pnut uses `read -r` for reading input. In addition to NULs and related raw byte issues, I was worried about chunking issues (e.g. truncation or errors) on binary data, e.g. no newlines within LINE_BUF bytes. Have you tested binary I/O much? Relatedly, how many different shell implementations have you tested your core scheme with? In addition to bash, dash, and various incarnations of /bin/sh on the BSDs, I also tested base64.sh with Solaris' system shells (ksh88 and ksh93 derivatives), as well as AIX's (ksh88 derivative). AIX had some odd quirks with pipelines even with plain text I/O. (Unfortunately Polar Home is gone, now, so I have no easy way to play with AIX; maybe that's for the better.)

by voidUpdateon 7/24/24, 7:03 AM

When I'm told that "I can trust" something that I feel like I had no reason to distrust, it makes me feel even more suspicious of it

by akoboldfryingon 7/24/24, 1:59 AM

I was puzzled by the example C function containing pointers. Do I understand correctly that you implement pointers in shell by having a shell variable _0 for the first "byte" of "memory", a shell variable _1 for the second, etc.?

by rubickson 7/24/24, 2:55 AM

I can't wait to see the shell equivalents for ptrace, setjmp, and dlopen.

by metadaton 7/24/24, 4:16 PM

Also see this related submission from May, 2024:

Amber: Programming language compiled to Bash https://news.ycombinator.com/item?id=40431835 (318 comments)

---

Pnut doesn't seem to differentiate between `int' and `int*' function parameters. That's weird, and doesn't come across as trustworthy at all! Shouldn't the use of pointers be disallowed instead?

  int test1(int a, int len) {
    return a;
  }
  
  int test2(int* a, int len) {
    return a;
  }
Both compile to the exact same thing:

  : $((len = a = 0))
  _test1() { let a $2; let len $3
    : $(($1 = a))
    endlet $1 len a
  }
  
  : $((len = a = 0))
  _test2() { let a $2; let len $3
    : $(($1 = a))
    endlet $1 len a
  }
The "runtime library" portion at the bottom of every script is nigh unreadable.

Even still, it's a cool concept.

by teo_zeroon 7/24/24, 6:41 AM

Just to be clear, the input must be written in a subset of C, because many constructs are not recognized, like unsigned types, static variables, [] arrays, etc.

Is there a plan to remove such limitations?

by itvisionon 7/24/24, 12:53 PM

Instantly make your C code 200 times slower without any effort!

by andrewfon 7/24/24, 2:30 AM

Looking forward to the point where this can build autoconf. It's great that the generated ./configure script is portable but if I want to make substantial changes to the project I need to find a binary for my machine (and version differences can be quite substantial)

by kazinatoron 7/25/24, 6:24 PM

This is not useful if it doesn't call external libraries.

Even POSIX standard ones. Chokes on:

  #include <glob.h>

  int main()  // must be (); (void) results in syntax error.
  {
    glob_t gb; // syntax error here
    glob("abc", 0, NULL, &gb);
    return 0;
  }
Nobody needs entirely self-contained C programs with no libraries to be turned into shell scripts; Unix people switch to C when there is a library function they need to call for which there no command in /bin or /usr/bin.

If I reduce it to:

  #include <glob.h>

  int main()
  {
    glob("abc", 0, NULL, 0);
    return 0;
  }
it "compiles" into something with a main function like:

  _main() {
    defstr __str_0 "abc"
    _glob __ $__str_0 0 $_NULL 0
    : $(($1 = 0))
  }
but what good is that without a definition of _glob.

by forrestthewoodson 7/24/24, 2:27 AM

Hrmmm. But why?

Quite frankly I think Bash scripting is awful and frequently wish shell scripts were written in a real and debuggable language. For anything non-trivial that is.

I feel like I’d rather write C and compile it with Cosmopolitan C to give me a cross-platform binary than this.

Neat project. Definitely clever. But it’s headed in the opposite direction from what I’d prefer...

by vermonon 7/24/24, 7:56 AM

If the end goal is portability for C, would Cosmopolitan Libc be a better choice because it supports a lot more features and probably runs faster?

by iodon 7/24/24, 3:26 PM

I am sorry if this comes off to be negative, but with every example provided on the site, when compiled and then fed into ShellCheck¹, generates warnings about non-portable and ambiguous problems with the script. What exactly are we supposed to trust?

¹ https://www.shellcheck.net

by osmsuckson 7/24/24, 7:32 AM

I'm writing something similar, but it's based on its own scripting language. The idea of transpiling C sounds appealing but impractical: how do they plan to compile, say, things using mmap, setjmp, pthreads, ...? It would be better to clearly promise only a restricted subset of C.

by kxndnenfnon 7/24/24, 4:20 AM

This is quite interesting! Without having dug deeper into it, seeing the human readable output I assume quite different semantics from C?

The C to shell transpiler I'm aware of will output unreadable code (elvm using 8cc with sh backend)

by dsp_personon 7/24/24, 6:08 AM

I use linux-vt-setcolors in my startup, which would be a bit more convenient if it was a shell script instead of C, but it uses ioctl.

Trying to compile with this tool fails with "comp_glo_decl: unexpected declaration"

by Retr0idon 7/24/24, 1:19 PM

Can it do wrapping arithmetic?

The `sum` example doesn't seem to do wrapping, but signed int overflow is technically UB so I guess they're fine not to.

Switching it to `unsigned int` gives me:

code.c:1:1 syntax error: unsupported type

by yencabulatoron 7/25/24, 6:29 PM

It seems to have practically no error checking. Try compiling

    int why(int unused) {
      wat_why_does_this_compile;
      no_error_checking();
    }

by atilaneveson 7/24/24, 1:14 PM

I'm still figuring out why anyone would want to write a shell script in C. That sounds like torture to me.

by JoshTripletton 7/24/24, 5:34 PM

Several times I've found myself wishing for the reverse: a shell-to-binary compiler or JIT.

by layer8on 7/24/24, 12:06 PM

Can you trust that it faithfully reproduces undefined behavior? ;)

by gojomybelovedon 7/24/24, 6:17 AM

Love this!

by o11con 7/24/24, 1:36 AM

It's a bad sign when I immediately look at the screenshot and see quoting bugs.