Ohm V18: Exciting Changes To CST Representation
Hey everyone, let's dive into some exciting updates coming to Ohm, specifically in version 18! We're talking about major changes to how the Concrete Syntax Tree (CST) is represented, and I'm here to break it down for you. If you're using Ohm to build your grammars, you'll definitely want to pay attention. We're talking about a more streamlined, efficient, and powerful way to work with your syntax trees. So, buckle up, because we're about to explore the future of Ohm!
The Current State of CST in Ohm
Currently, if you're working with Ohm, you might be familiar with wrapper nodes. When you write an operation or attribute in your grammar, the arguments passed to your actions are essentially these wrapper nodes. These wrappers give you access to the properties of the underlying CST nodes. However, there are some areas where this approach can get a bit clunky. One of the main goals for Ohm v18 is to make your lives easier, with a more intuitive and efficient way to interact with the CST.
Now, let's address why the current CST representation could use a little TLC. In the current iteration, there are a few pain points that can make working with the CST a bit cumbersome. Don't worry, we're here to fix it!
Why the Change? Paving the Way for WebAssembly
So, why the big changes? The primary motivation for these updates is to pave the way for Ohm v18's support for compiling grammars to WebAssembly (Wasm). This is a huge step forward! Wasm allows you to run your code at near-native speeds in the browser and other environments. In order to make this happen, the language-specific APIs for processing a match result need to be as lean as possible. The current wrapper-based approach adds a layer of abstraction that isn't ideal for Wasm. By directly exposing the CST nodes, we can reduce overhead and make Ohm even more performant.
Addressing the Pain Points: Key Issues and Solutions
Now, let's get into the specifics. Here's a breakdown of the challenges we're tackling and how Ohm v18 will offer solutions:
1. Iteration Nodes: Bringing Order to the Chaos
One of the first issues is the way iteration nodes are handled. When you have repetition or optional elements in your grammar, and you do a straightforward traversal of a node's children, you might find yourself visiting nodes in an order that doesn't match the order they appear in the input. For example, consider this grammar: start = (letter digit)+
. If you input a1b2
, the action for start
would receive two IterNode
arguments: one containing the letters and another containing the digits. This can be a bit confusing and make it harder to process the input in the intended order.
In Ohm v18, repetition and optionals will generate their own distinct nodes. So instead of dealing with IterNode
wrappers, you'll get individual nodes for each repetition or optional element. This change will make it much easier to traverse and work with the CST in a way that matches the structure of your input.
2. Lookahead: No More Sneaky Bindings
Another tricky area is lookahead (&
). In the current version, positive lookahead creates a binding, which means the same input text can be captured by multiple nodes. Imagine this grammar: start = &letter any
. If you input a
, the action for start
receives two arguments: one for the letter
and another for the any
. The problem is that, at runtime, it's not always obvious that a node comes from a lookahead. This can create confusion and make it harder to understand the structure of your CST.
In Ohm v18, lookahead will behave like negative lookahead, meaning it won't create a binding. This simplifies things and makes it easier to reason about the structure of your CST.
3. Accessing Skipped Spaces: The Missing Piece
Finally, the current version lacks an easy way to access the CST nodes associated with skipped spaces. This can be a problem if you need to preserve whitespace or spacing information. While the current version offers an approach for accessing spaces, it is not direct.
In Ohm v18, terminal and nonterminal nodes will have a leadingSpaces?: NonterminalNode
property. This makes it easier to access and work with spaces. To access the trailing spaces, you could potentially write a _root
action, which would give you access to the top-level sequence.
Additional Changes: Pseudo-Rules and Macros
Beyond these core CST changes, there are a few other updates worth mentioning. We're introducing the concept of pseudo-rules or macros. Some of the built-in rules, like any
, end
, and caseInsensitive
, currently produce a TerminalNode
. This can be a little confusing because they behave like applications but produce terminal nodes.
To address this, we're planning to give these pseudo-rules a distinct look: @any
, @end
, @caseInsensitive
. This change means they can't be overridden by user-defined rules, which simplifies things and reduces the risk of conflicts. This change will streamline the grammar definition process, making it more predictable and less prone to errors.
Benefits of These Changes
So, what are the benefits of all these changes?
- Improved Performance: By streamlining the CST representation and removing unnecessary wrappers, Ohm v18 will be more efficient, especially when compiling to WebAssembly.
- Enhanced Clarity: The new CST structure will be more intuitive and easier to reason about, making it simpler to understand and work with your grammars.
- Greater Flexibility: The changes will give you more control over your CST and make it easier to extract the information you need.
- Simplified APIs: The overall goal is to make the APIs for working with the CST cleaner and more straightforward.
Conclusion: The Future is Bright
These changes represent a significant step forward for Ohm. By addressing the current pain points and paving the way for WebAssembly support, we're making Ohm even more powerful, efficient, and user-friendly. I'm excited to see how you all use these new features to build amazing grammars and applications. Stay tuned for more updates, and happy parsing!
This is just a preview of the exciting changes coming in Ohm v18. I hope this helps you understand the transformation of the CST representation and how it will improve your experience with Ohm. I'm always here to answer your questions and provide more information. Thanks for tuning in!