similar/lib.rs
1//! This crate implements diffing utilities. It attempts to provide an abstraction
2//! interface over different types of diffing algorithms. The design of the
3//! library is inspired by pijul's diff library by Pierre-Étienne Meunier and
4//! also inherits the patience diff algorithm from there.
5//!
6//! The API of the crate is split into high and low level functionality. Most
7//! of what you probably want to use is available top level. Additionally the
8//! following sub modules exist:
9//!
10//! * [`algorithms`]: This implements the different types of diffing algorithms.
11//! It provides both low level access to the algorithms with the minimal
12//! trait bounds necessary, as well as a generic interface.
13//! * [`udiff`]: Unified diff functionality.
14//! * [`utils`]: utilities for common diff related operations. This module
15//! provides additional diffing functions for working with text diffs.
16//!
17//! # Sequence Diffing
18//!
19//! If you want to diff sequences generally indexable things you can use the
20//! [`capture_diff`] and [`capture_diff_slices`] functions. They will directly
21//! diff an indexable object or slice and return a vector of [`DiffOp`] objects.
22//!
23//! ```rust
24//! use similar::{Algorithm, capture_diff_slices};
25//!
26//! let a = vec![1, 2, 3, 4, 5];
27//! let b = vec![1, 2, 3, 4, 7];
28//! let ops = capture_diff_slices(Algorithm::Myers, &a, &b);
29//! ```
30//!
31//! # Text Diffing
32//!
33//! Similar provides helpful utilities for text (and more specifically line) diff
34//! operations. The main type you want to work with is [`TextDiff`] which
35//! uses the underlying diff algorithms to expose a convenient API to work with
36//! texts:
37//!
38//! ```rust
39//! # #[cfg(feature = "text")] {
40//! use similar::{ChangeTag, TextDiff};
41//!
42//! let diff = TextDiff::from_lines(
43//! "Hello World\nThis is the second line.\nThis is the third.",
44//! "Hallo Welt\nThis is the second line.\nThis is life.\nMoar and more",
45//! );
46//!
47//! for change in diff.iter_all_changes() {
48//! let sign = match change.tag() {
49//! ChangeTag::Delete => "-",
50//! ChangeTag::Insert => "+",
51//! ChangeTag::Equal => " ",
52//! };
53//! print!("{}{}", sign, change);
54//! }
55//! # }
56//! ```
57//!
58//! ## Trailing Newlines
59//!
60//! When working with line diffs (and unified diffs in general) there are two
61//! "philosophies" to look at lines. One is to diff lines without their newline
62//! character, the other is to diff with the newline character. Typically the
63//! latter is done because text files do not _have_ to end in a newline character.
64//! As a result there is a difference between `foo\n` and `foo` as far as diffs
65//! are concerned.
66//!
67//! In similar this is handled on the [`Change`] or [`InlineChange`] level. If
68//! a diff was created via [`TextDiff::from_lines`] the text diffing system is
69//! instructed to check if there are missing newlines encountered
70//! ([`TextDiff::newline_terminated`] returns true).
71//!
72//! In any case the [`Change`] object has a convenience method called
73//! [`Change::missing_newline`] which returns `true` if the change is missing
74//! a trailing newline. Armed with that information the caller knows to handle
75//! this by either rendering a virtual newline at that position or to indicate
76//! it in different ways. For instance the unified diff code will render the
77//! special `\ No newline at end of file` marker.
78//!
79//! ## Bytes vs Unicode
80//!
81//! Similar module concerns itself with a looser definition of "text" than you would
82//! normally see in Rust. While by default it can only operate on [`str`] types,
83//! by enabling the `bytes` feature it gains support for byte slices with some
84//! caveats.
85//!
86//! A lot of text diff functionality assumes that what is being diffed constitutes
87//! text, but in the real world it can often be challenging to ensure that this is
88//! all valid utf-8. Because of this the crate is built so that most functionality
89//! also still works with bytes for as long as they are roughly ASCII compatible.
90//!
91//! This means you will be successful in creating a unified diff from latin1
92//! encoded bytes but if you try to do the same with EBCDIC encoded bytes you
93//! will only get garbage.
94//!
95//! # Ops vs Changes
96//!
97//! Because very commonly two compared sequences will largely match this module
98//! splits its functionality into two layers:
99//!
100//! Changes are encoded as [diff operations](crate::DiffOp). These are
101//! ranges of the differences by index in the source sequence. Because this
102//! can be cumbersome to work with, a separate method [`DiffOp::iter_changes`]
103//! (and [`TextDiff::iter_changes`] when working with text diffs) is provided
104//! which expands all the changes on an item by item level encoded in an operation.
105//!
106//! As the [`TextDiff::grouped_ops`] method can isolate clusters of changes
107//! this even works for very long files if paired with this method.
108//!
109//! # Deadlines and Performance
110//!
111//! For large and very distinct inputs the algorithms as implemented can take
112//! a very, very long time to execute. Too long to make sense in practice.
113//! To work around this issue all diffing algorithms also provide a version
114//! that accepts a deadline which is the point in time as defined by an
115//! [`Instant`] after which the algorithm should give up. What giving up means
116//! depends on the algorithm. For instance due to the recursive, divide and
117//! conquer nature of Myer's diff you will still get a pretty decent diff in
118//! many cases when a deadline is reached. Whereas on the other hand the LCS
119//! diff is unlikely to give any decent results in such a situation.
120//!
121//! The [`TextDiff`] type also lets you configure a deadline and/or timeout
122//! when performing a text diff.
123//!
124//! Note that on wasm targets calling [`Instant::now`] will result in a panic
125//! unless you enable the `wasm32_web_time` feataure. By default similar will
126//! silently disable the deadline checks internally unless that feature is
127//! enabled.
128//!
129//! # Feature Flags
130//!
131//! The crate by default does not have any dependencies however for some use
132//! cases it's useful to pull in extra functionality. Likewise you can turn
133//! off some functionality.
134//!
135//! * `text`: this feature is enabled by default and enables the text based
136//! diffing types such as [`TextDiff`].
137//! If the crate is used without default features it's removed.
138//! * `unicode`: when this feature is enabled the text diffing functionality
139//! gains the ability to diff on a grapheme instead of character level. This
140//! is particularly useful when working with text containing emojis. This
141//! pulls in some relatively complex dependencies for working with the unicode
142//! database.
143//! * `bytes`: this feature adds support for working with byte slices in text
144//! APIs in addition to unicode strings. This pulls in the
145//! [`bstr`] dependency.
146//! * `inline`: this feature gives access to additional functionality of the
147//! text diffing to provide inline information about which values changed
148//! in a line diff. This currently also enables the `unicode` feature.
149//! * `serde`: this feature enables serialization to some types in this
150//! crate. For enums without payload deserialization is then also supported.
151//! * `wasm32_web_time`: this feature swaps out the use of [`std::time`] for
152//! the `web_time` crate. Because this is a change to the public interface,
153//! this feature must be used with care. The instant type for this crate is
154//! then re-exported top-level module.
155#![warn(missing_docs)]
156pub mod algorithms;
157pub mod iter;
158#[cfg(feature = "text")]
159pub mod udiff;
160#[cfg(feature = "text")]
161pub mod utils;
162
163mod common;
164mod deadline_support;
165#[cfg(feature = "text")]
166mod text;
167mod types;
168
169pub use self::common::*;
170#[cfg(feature = "text")]
171pub use self::text::*;
172pub use self::types::*;
173
174// re-export the type for web-time feature
175#[cfg(feature = "wasm32_web_time")]
176pub use deadline_support::Instant;