syntree/
lib.rs

1//! [<img alt="github" src="https://img.shields.io/badge/github-udoprog/syntree-8da0cb?style=for-the-badge&logo=github" height="20">](https://github.com/udoprog/syntree)
2//! [<img alt="crates.io" src="https://img.shields.io/crates/v/syntree.svg?style=for-the-badge&color=fc8d62&logo=rust" height="20">](https://crates.io/crates/syntree)
3//! [<img alt="docs.rs" src="https://img.shields.io/badge/docs.rs-syntree-66c2a5?style=for-the-badge&logoColor=white&logo=" height="20">](https://docs.rs/syntree)
4//!
5//! A memory efficient syntax tree.
6//!
7//! This crate provides a tree structure which always is contiguously stored and
8//! manipulated in memory. It provides similar APIs as [`rowan`] and is intended
9//! to be an efficient replacement for it (read more below).
10//!
11//! Anything can be stored in the tree as long as it implements `Copy`.
12//!
13//! <br>
14//!
15//! ## Usage
16//!
17//! Add `syntree` to your crate:
18//!
19//! ```toml
20//! syntree = "0.18.0"
21//! ```
22//!
23//! If you want a complete sample for how `syntree` can be used for parsing, see
24//! the [calculator example][calculator].
25//!
26//! <br>
27//!
28//! ## Syntax trees
29//!
30//! This crate provides a way to efficiently model [abstract syntax trees]. The
31//! nodes of the tree are typically represented by variants in an enum, but
32//! [could be whatever you want][any-syntax].
33//!
34//! Each tree consists of *nodes* and *tokens*. Siblings are intermediary
35//! elements in the tree which encapsulate zero or more other nodes or tokens,
36//! while tokens are leaf elements representing exact source locations.
37//!
38//! An example tree for the simple expression `256 / 2 + 64 * 2` could be
39//! represented like this:
40//!
41//! ```text
42//! Operation@0..16
43//!   Number@0..3
44//!     Number@0..3 "256"
45//!   Whitespace@3..4 " "
46//!   Operator@4..5
47//!     Div@4..5 "/"
48//!   Whitespace@5..6 " "
49//!   Number@6..7
50//!     Number@6..7 "2"
51//!   Whitespace@7..8 " "
52//!   Operator@8..9
53//!     Plus@8..9 "+"
54//!   Whitespace@9..10 " "
55//!   Operation@10..16
56//!     Number@10..12
57//!       Number@10..12 "64"
58//!     Whitespace@12..13 " "
59//!     Operator@13..14
60//!       Mul@13..14 "*"
61//!     Whitespace@14..15 " "
62//!     Number@15..16
63//!       Number@15..16 "2"
64//! ```
65//!
66//! > Try it for yourself with:
67//! >
68//! > ```sh
69//! > cargo run --example calculator -- "256 / 2 + 64 * 2"
70//! > ```
71//!
72//! The primary difference between `syntree` and [`rowan`] is that *we don't
73//! store the original source* in the syntax tree. Instead, the user of the
74//! library is responsible for providing it as necessary. Like when calling
75//! [`print_with_source`].
76//!
77//! The API for constructing a syntax tree is provided through [`Builder`] which
78//! implements streaming builder methods. Internally the builder is represented
79//! as a contiguous slab of memory. Once a tree is built the structure of the
80//! tree can be queried through the [`Tree`] type.
81//!
82//! Note that [`syntree::tree!`] is only a helper which simplifies building
83//! trees for examples. It corresponds exactly to performing [`open`],
84//! [`close`], and [`token`] calls on [`Builder`] as specified.
85//!
86//! ```
87//! use syntree::{Builder, Span};
88//!
89//! #[derive(Debug, Clone, Copy, PartialEq, Eq)]
90//! enum Syntax {
91//!     Number,
92//!     Lit,
93//!     Nested,
94//! }
95//!
96//! use Syntax::*;
97//!
98//! let mut tree = Builder::new();
99//!
100//! tree.open(Number)?;
101//! tree.token(Lit, 1)?;
102//! tree.token(Lit, 3)?;
103//!
104//! tree.open(Nested)?;
105//! tree.token(Lit, 1)?;
106//! tree.close()?;
107//!
108//! tree.close()?;
109//!
110//! let tree = tree.build()?;
111//!
112//! let expected = syntree::tree! {
113//!     Number => {
114//!         (Lit, 1),
115//!         (Lit, 3),
116//!         Nested => {
117//!             (Lit, 1)
118//!         }
119//!     }
120//! };
121//!
122//! assert_eq!(tree, expected);
123//!
124//! let number = tree.first().ok_or("missing number")?;
125//! assert_eq!(number.span(), Span::new(0, 5));
126//! # Ok::<_, Box<dyn core::error::Error>>(())
127//! ```
128//!
129//! Note how the resulting [`Span`] for `Number` corresponds to the full span of
130//! its `Lit` children. Including the ones within `Nested`.
131//!
132//! Trees are usually constructed by parsing an input. This library encourages
133//! the use of a [handwritten pratt parser][pratt]. See the [calculator
134//! example][calculator] for a complete use case.
135//!
136//! <br>
137//!
138//! ## Compact or empty spans
139//!
140//! Spans by default uses `u32`-based indexes and `usize`-based pointers, this
141//! can be changed from its default using the [`Builder::new_with`] constructor:
142//!
143//! ```
144//! use syntree::{Builder, Span, Tree};
145//!
146//! syntree::flavor! {
147//!     struct FlavorU16 {
148//!         type Index = usize;
149//!         type Width = u16;
150//!     }
151//! };
152//!
153//! syntree::flavor! {
154//!     struct FlavorU32 {
155//!         type Index = usize;
156//!         type Width = u32;
157//!     }
158//! };
159//!
160//! let mut tree = Builder::<_, FlavorU16>::new_with();
161//!
162//! tree.open("root")?;
163//! tree.open("child")?;
164//! tree.token("token", 100)?;
165//! tree.close()?;
166//! tree.open("child2")?;
167//! tree.close()?;
168//! tree.close()?;
169//!
170//! let tree = tree.build()?;
171//!
172//! let expected: Tree<_, FlavorU32> = syntree::tree_with! {
173//!     "root" => {
174//!         "child" => { ("token", 100) },
175//!         "child2" => {}
176//!     }
177//! };
178//!
179//! assert_eq!(tree, expected);
180//! assert_eq!(tree.span(), Span::new(0, 100));
181//! # Ok::<_, Box<dyn core::error::Error>>(())
182//! ```
183//!
184//! Combined with [`Empty`], this allows for building trees without spans, if
185//! that is something you want to do:
186//!
187//! ```
188//! use syntree::{Builder, Empty, EmptyVec, TreeIndex, Tree};
189//!
190//! syntree::flavor! {
191//!     struct FlavorEmpty {
192//!         type Index = Empty;
193//!         type Indexes = EmptyVec<TreeIndex<Self>>;
194//!     }
195//! };
196//!
197//! let mut tree = Builder::<_, FlavorEmpty>::new_with();
198//!
199//! tree.open("root")?;
200//! tree.open("child")?;
201//! tree.token("token", Empty)?;
202//! tree.close()?;
203//! tree.open("child2")?;
204//! tree.close()?;
205//! tree.close()?;
206//!
207//! let tree = tree.build()?;
208//!
209//! let expected: Tree<_, FlavorEmpty> = syntree::tree_with! {
210//!     "root" => {
211//!         "child" => { "token" },
212//!         "child2" => {}
213//!     }
214//! };
215//!
216//! assert_eq!(tree, expected);
217//! assert!(tree.span().is_empty());
218//! # Ok::<_, Box<dyn core::error::Error>>(())
219//! ```
220//!
221//! <br>
222//!
223//! ## Why not `rowan`?
224//!
225//! I love [`rowan`]. It's the reason why I started this project. But this crate
226//! still exists for a few philosophical differences that would be hard to
227//! reconcile directly in `rowan`.
228//!
229//! `rowan` only supports adding types which in some way can be coerced into an
230//! `repr(u16)` as part of the syntax tree. I think this decision is reasonable,
231//! but it precludes you from designing trees which contain anything else other
232//! than source references without having to perform some form of indirect
233//! lookup. This is something needed in order to move [Rune] to lossless syntax
234//! trees (see [the representation of `Kind::Str` variant][kind-str]).
235//!
236//! To exemplify this scenario consider the following syntax:
237//!
238//! ```
239//! #[derive(Debug, Clone, Copy)]
240//! enum Syntax {
241//!     /// A string referenced somewhere else using the provided ID.
242//!     Synthetic(usize),
243//!     /// A literal string from the source.
244//!     Lit,
245//!     /// Whitespace.
246//!     Whitespace,
247//!     /// A lexer error.
248//!     Error,
249//! }
250//! ```
251//!
252//! You can see the [full `synthetic_strings` example][synthetic_strings] for
253//! how this might be used. But not only can the `Synthetic` token correspond to
254//! some source location (as it should because it was expanded from one!). It
255//! also directly represents that it's *not* a literal string referencing a
256//! source location.
257//!
258//! In [Rune] this became needed once we started [expanding
259//! macros][rune-macros]. Because macros expand to things which do not reference
260//! source locations so we need some other mechanism to include what the tokens
261//! represent in the syntax trees.
262//!
263//! You can try a *very* simple lex-time variable expander in the
264//! [`synthetic_strings` example][synthetic_strings]:
265//!
266//! ```sh
267//! cargo run --example synthetic_strings -- "Hello $world"
268//! ```
269//!
270//! Which would output:
271//!
272//! ```text
273//! Tree:
274//! Lit@0..5 "Hello"
275//! Whitespace@5..6 " "
276//! Synthetic(0)@6..12 "$world"
277//! Eval:
278//! 0 = "Hello"
279//! 1 = "Earth"
280//! ```
281//!
282//! So in essence `syntree` doesn't believe you need to store strings in the
283//! tree itself. Even if you want to deduplicate string storage. All of that can
284//! be done on the side and encoded into the syntax tree as you wish.
285//!
286//! <br>
287//!
288//! ### Errors instead of panics
289//!
290//! Another point where this crate differs is that we rely on propagating a
291//! [`Error`] during tree construction if the API is used incorrectly
292//! *instead* of panicking.
293//!
294//! While on the surface this might seem like a minor difference in opinion on
295//! whether programming mistakes should be errors or not. In my experience
296//! parsers tend to be part of a crate in a larger project. And errors triggered
297//! by edge cases in user-provided input that once encountered can usually be
298//! avoided.
299//!
300//! So let's say [Rune] is embedded in [OxidizeBot] and there's a piece of code
301//! in a user-provided script which triggers a bug in the rune compiler. Which
302//! in turn causes an illegal tree to be constructed. If tree construction
303//! simply panics, the whole bot will go down. But if we instead propagate an
304//! error this would have to be handled in [OxidizeBot] which could panic if it
305//! wanted to. In this instance it's simply more appropriate to log the error
306//! and unload the script (and hopefully receive a bug report!) than to crash
307//! the bot.
308//!
309//! Rust has great diagnostics for indicating that results should be handled,
310//! and while it is [more awkward to deal with results][syntree-math] than [to
311//! simply panic][rowan-math] I believe that the end result is more robust
312//! software.
313//!
314//! <br>
315//!
316//! ## Performance and memory use
317//!
318//! The only goal in terms of performance is to be as performant as `rowan`. And
319//! cursory testing shows `syntree` to be a bit faster on synthetic workloads
320//! which can probably be solely attributed to storing and manipulating the
321//! entire tree in a single contiguous memory region. This might or might not
322//! change in the future, depending on if the needs for `syntree` changes. While
323//! performance is important, it *is not* a primary focus.
324//!
325//! I also expect (but haven't verified) that `syntree` could end up having a
326//! similarly low memory profile as `rowan` which performs node deduplication.
327//! The one caveat is that it depends on how the original source is stored and
328//! queried. Something which `rowan` solves for you, but `syntree` leaves as an
329//! exercise to the reader.
330//!
331//! [`Builder::new_with`]: https://docs.rs/syntree/latest/syntree/struct.Builder.html#method.new_with
332//! [`Builder`]: https://docs.rs/syntree/latest/syntree/struct.Builder.html
333//! [`close`]: https://docs.rs/syntree/latest/syntree/struct.Builder.html#method.close
334//! [`Empty`]: https://docs.rs/syntree/latest/syntree/struct.Empty.html
335//! [`Error`]: https://docs.rs/syntree/latest/syntree/enum.Error.html
336//! [`open`]: https://docs.rs/syntree/latest/syntree/struct.Builder.html#method.open
337//! [`print_with_source`]: https://docs.rs/syntree/latest/syntree/print/fn.print_with_source.html
338//! [`rowan`]: https://docs.rs/rowan/latest/rowan/
339//! [`Span`]: https://docs.rs/syntree/latest/syntree/struct.Span.html
340//! [`syntree::tree!`]: https://docs.rs/syntree/latest/syntree/macro.tree.html
341//! [`token`]: https://docs.rs/syntree/latest/syntree/struct.Builder.html#method.token
342//! [`Tree`]: https://docs.rs/syntree/latest/syntree/struct.Tree.html
343//! [abstract syntax trees]: https://en.wikipedia.org/wiki/Abstract_syntax_tree
344//! [any-syntax]: https://github.com/udoprog/syntree/blob/main/examples/iterator.rs
345//! [calculator]: https://github.com/udoprog/syntree/blob/main/examples/calculator
346//! [kind-str]: https://github.com/rune-rs/rune/blob/e97a32e/crates/rune/src/ast/generated.rs#L4359
347//! [OxidizeBot]: https://github.com/udoprog/OxidizeBot
348//! [pratt]: https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html
349//! [rowan-math]: https://github.com/rust-analyzer/rowan/blob/master/examples/math.rs
350//! [rune-macros]: https://github.com/rune-rs/rune/blob/main/crates/rune-modules/src/core.rs#L36
351//! [Rune]: https://github.com/rune-rs/rune
352//! [synthetic_strings]: https://github.com/udoprog/syntree/blob/main/examples/synthetic_strings.rs
353//! [syntree-math]: https://github.com/udoprog/syntree/blob/main/examples/math.rs
354
355#![deny(missing_docs)]
356#![cfg_attr(docsrs, feature(doc_cfg))]
357#![no_std]
358
359extern crate alloc;
360
361#[cfg(feature = "std")]
362extern crate std;
363
364#[macro_use]
365mod macros;
366mod builder;
367
368#[cfg(feature = "std")]
369pub mod edit;
370
371mod empty;
372mod error;
373#[macro_use]
374mod flavor;
375mod index;
376mod links;
377pub mod node;
378pub mod pointer;
379pub mod print;
380mod span;
381mod tree;
382
383#[doc(inline)]
384pub use self::builder::{Builder, Checkpoint};
385#[doc(inline)]
386pub use self::empty::{Empty, EmptyVec};
387#[doc(inline)]
388pub use self::error::Error;
389#[doc(inline)]
390pub use self::flavor::{Flavor, FlavorDefault, Storage};
391#[doc(inline)]
392pub use self::index::{Index, Length, TreeIndex};
393#[doc(inline)]
394pub use self::node::node_impl::Node;
395#[doc(inline)]
396pub use self::pointer::{Pointer, Width};
397#[doc(inline)]
398pub use self::span::Span;
399#[doc(inline)]
400pub use self::tree::Tree;
401
402#[doc(hidden)]
403pub mod macro_support {
404    use crate::index::TreeIndex;
405
406    #[cfg(feature = "alloc")]
407    pub type Vec<T> = alloc::vec::Vec<T>;
408
409    #[cfg(not(feature = "alloc"))]
410    pub type Vec<T> = crate::empty::EmptyVec<T>;
411
412    pub type DefaultIndexes<F> = crate::macro_support::Vec<TreeIndex<F>>;
413}