Writing a minimal Lua implementation with a virtual machine from scratch in Rust

by finite_jeston 1/16/22, 1:54 AMwith 29 comments
by jstimpfleon 1/16/22, 10:11 AM

Being that tokens are the leaves of the AST, there are a lot of them and they can take a lot of space. To save memory it is a good idea to store only a file location instead of a full token. Whenever token information is needed, just lex again to get the full token, starting at the file location. This works only for languages with a context-free lexical syntax, of course (and not entirely sure "context-free" is the right term here but you get what I mean).

Storing row/column in file location data is wasteful - just a file offset should be enough. Whenever the row/column coordinates are needed (normally only in user messages) they can be quickly recomputed.

In effect, parsed tokens can be stored as just an offset - a 4 or 8 byte integer.

by da39a3eeon 1/16/22, 5:11 AM

The article looks great and I’m looking forward to reading it; this comment is not a criticism of the article.

This API is the only bad thing about Rust!

  .expect("Could not read file")
It’s so unfortunate to have an API that reads

  .expect("thing we don’t expect")
I think we should all just forget it’s there and use

  .unwrap_or_else(|| panic!(“thing we don’t expect”))

by dupedon 1/16/22, 9:51 AM

Working on tokenization and parsing there have been two "lights clicking on" moments that I think every dev working on a PL implementation should have :

- Tokens are the leaves of your syntax trees

- File locations are relative, not absolute

It's easier to build a parser that doesn't buy into these things, but it's way harder to build tooling and good error messaging if you don't.

by eatonphilon 1/16/22, 6:37 AM

Hey folks just saw this, author here. Happy to answer questions!

by xvilkaon 1/16/22, 4:29 AM

There's also Luster[1].

[1] https://github.com/kyren/luster

by cgoto89798on 1/16/22, 5:04 AM

Does Rust have computed goto, which really helps interpreter speed?

It basically means you can do something like "goto opcode_table[*(++ip)];"

GCC offers it as a non-standard extension to C.

  https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html
FORTRAN has had it since 1957. But Pascal and C purged "evil computed GOTO" and only offered non-computed goto. Then Java etc. purged non-computed goto.

by debduton 1/16/22, 5:17 AM

Thanks for sharing! A great learning