docs: more architecture guide cleanups

erikgrinaker · erikgrinaker · commit fd9b93d9e964 · 2025-05-11T20:45:17.000+02:00
diff --git a/docs/architecture/sql-data.md b/docs/architecture/sql-data.md
@@ -1,38 +1,45 @@
 # SQL Data Model
 
-The SQL data model is toyDB's representation of user data. It is made up of data types and schemas.
+The SQL data model represents user data in tables and rows. It is made up of data types and schemas,
+in the [`sql::types`](https://github.com/erikgrinaker/toydb/tree/686d3971a253bfc9facc2ba1b0e716cff5c109fb/src/sql/types)
+module.
 
 ## Data Types
 
-toyDB supports four basic scalar data types as `sql::types::DataType`: booleans, floats, integers,
+toyDB supports four basic scalar data types as `sql::types::DataType`: booleans, integers, floats,
 and strings.
 
 https://github.com/erikgrinaker/toydb/blob/b2fe7b76ee634ca6ad31616becabfddb1c03d34b/src/sql/types/value.rs#L15-L27
 
-Concrete values are represented as `sql::types::Value`, using corresponding Rust types. toyDB also
-supports SQL `NULL` values, i.e. unknown values, following the rules of
+Specific values are represented as `sql::types::Value`, using the corresponding Rust types. toyDB
+also supports SQL `NULL` values, i.e. unknown values, following the rules of
 [three-valued logic](https://en.wikipedia.org/wiki/Three-valued_logic).
 
 https://github.com/erikgrinaker/toydb/blob/b2fe7b76ee634ca6ad31616becabfddb1c03d34b/src/sql/types/value.rs#L40-L64
 
-The `Value` type provides basic formatting, conversion, and mathematical operations. It also
-specifies comparison and ordering semantics, but these are subtly different from the SQL semantics.
-For example, in Rust code `Value::Null == Value::Null` yields `true`, while in SQL `NULL = NULL`
-yields `NULL`.  This mismatch is necessary for the Rust code to properly detect and process `Null`
-values, and the desired SQL semantics are implemented higher up in the SQL execution engine (we'll
-get back to this later).
+The `Value` type provides basic formatting, conversion, and mathematical operations.
+
+https://github.com/erikgrinaker/toydb/blob/686d3971a253bfc9facc2ba1b0e716cff5c109fb/src/sql/types/value.rs#L68-L79
+
+https://github.com/erikgrinaker/toydb/blob/686d3971a253bfc9facc2ba1b0e716cff5c109fb/src/sql/types/value.rs#L164-L370
+
+It also specifies comparison and ordering semantics, but these are subtly different from the SQL
+semantics. For example, in Rust code `Value::Null == Value::Null` yields `true`, while in SQL
+`NULL = NULL` yields `NULL`.  This mismatch is necessary for the Rust code to properly detect and
+process `Null` values, and the desired SQL semantics are implemented during expression evaluation
+which we'll cover below.
 
 https://github.com/erikgrinaker/toydb/blob/b2fe7b76ee634ca6ad31616becabfddb1c03d34b/src/sql/types/value.rs#L91-L162
 
-During execution, a row of values will be represented as `sql::types::Row`, with multiple rows
-emitted as `sql::types::Rows` row iterators:
+During execution, a row of values is represented as `sql::types::Row`, with multiple rows emitted
+via `sql::types::Rows` row iterators:
 
 https://github.com/erikgrinaker/toydb/blob/b2fe7b76ee634ca6ad31616becabfddb1c03d34b/src/sql/types/value.rs#L378-L388
 
 ## Schemas
 
-toyDB schemas support a single object: a table. There's only a single, unnamed database, and no
-named indexes, constraints, or other schema objects.
+toyDB schemas only support tables. There are no named indexes or constraints, and there's only a
+single unnamed database.
 
 Tables are represented by `sql::types::Table`:
 
@@ -47,42 +54,41 @@ The table name serves as a unique identifier, and can't be changed later. In fac
 are entirely static: they can only be created or dropped (there are no schema changes).
 
 Table schemas are stored in the catalog, represented by the `sql::engine::Catalog` trait. We'll
-revisit the implementation of this trait in the storage section below.
+revisit the implementation of this trait in the SQL storage section.
 
 https://github.com/erikgrinaker/toydb/blob/0839215770e31f1e693d5cccf20a68210deaaa3f/src/sql/engine/engine.rs#L60-L79
 
-Table schemas are validated (e.g. during creation) via the `Table::validate()` method, which
-enforces invariants and internal consistency. It uses the catalog to look up information about other
-tables, e.g. that foreign key references point to a valid target column.
+Table schemas are validated when created via `Table::validate()`, which enforces invariants and
+internal consistency. It uses the catalog to look up information about other tables, e.g. that
+foreign key references point to a valid target column in a different table.
 
 https://github.com/erikgrinaker/toydb/blob/c2b0f7f1d6cbf6e2cdc09fc0aec7b050e840ec21/src/sql/types/schema.rs#L98-L170
 
-It also has a `Table::validate_row()` method which is used to validate that a given
-`sql::types::Row` conforms to the schema (e.g. that the value data types match the column data
-types). It uses a `sql::engine::Transaction` to look up other rows in the database, e.g. to check
-for primary key conflicts (we'll get back to this below).
+Table rows are validated via `Table::validate_row()`, which ensures that a `sql::types::Row`
+conforms to the schema (e.g. that value types match the column data types). It uses a
+`sql::engine::Transaction` to look up other rows in the database, e.g. to check for primary key
+conflicts (we'll get back to this later).
 
 https://github.com/erikgrinaker/toydb/blob/c2b0f7f1d6cbf6e2cdc09fc0aec7b050e840ec21/src/sql/types/schema.rs#L172-L236
 
 ## Expressions
 
 During SQL execution, we also have to model _expressions_, such as `1 + 2 * 3`. These are
-represented as values and operations on them. They can be nested arbitrarily as a tree to represent
-compound operations.
+represented as values and operations on them, and can be nested as a tree to represent compound
+operations.
 
 https://github.com/erikgrinaker/toydb/blob/9419bcf6aededf0e20b4e7485e2a5fa3e975d79f/src/sql/types/expression.rs#L11-L64
 
 
-For example:
+For example, the expression `1 + 2 * 3` (taking [precedence](https://en.wikipedia.org/wiki/Order_of_operations)
+into account) is represented as:
 
 ```rust
-// 1 + 2 * 3 is represented as:
-//
-//             +
-//            / \
-//           1   *
-//              /  \
-///            2    3
+//    +
+//   / \
+//  1   *
+//     /  \
+///   2    3
 Expression::Add(
     Expression::Constant(Value::Integer(1)),
     Expression::Multiply(
@@ -97,8 +103,8 @@ An `Expression` can contain two kinds of values: constant values as
 references. The latter will fetch a `sql::types::Value` from a `sql::types::Row` at the specified
 index during evaluation.
 
-We'll see later how the SQL parser and planner transforms text expressions like `1 + 2 * 3` into
-this `Expression` form, and how it resolves column names to row indexes -- e.g. `price * 0.25` to
+We'll see later how the SQL parser and planner transforms text expression like `1 + 2 * 3` into an
+`Expression`, and how it resolves column names to row indexes like `price * 0.25` to
 `row[3] * 0.25`.
 
 Expressions are evaluated recursively via `Expression::evalute()`, given a `sql::types::Row` with
diff --git a/docs/architecture/sql-parser.md b/docs/architecture/sql-parser.md
@@ -1,7 +1,7 @@
 # SQL Parsing
 
-And so we finally arrive at SQL. The SQL parser is the first stage in processing SQL
-queries and statements, located in the [`src/sql/parser`](https://github.com/erikgrinaker/toydb/tree/39c6b60afc4c235f19113dc98087176748fa091d/src/sql/parser)
+We finally arrive at SQL. The SQL parser is the first stage in processing SQL queries and
+statements, located in the [`sql::parser`](https://github.com/erikgrinaker/toydb/tree/39c6b60afc4c235f19113dc98087176748fa091d/src/sql/parser)
 module.
 
 The SQL parser's job is to take a raw SQL string and turn it into a structured form that's more
@@ -99,7 +99,7 @@ string are well-formed. For example, the following input string:
 Will result in these tokens:
 
 ```
-String("foo"), CloseParen, Number("3.14"), Keyword(Select), Plus, Ident("x")
+String("foo") CloseParen Number("3.14") Keyword(Select) Plus Ident("x")
 ```
 
 Tokens and keywords are represented by the `sql::parser::Token` and `sql::parser::Keyword` enums
@@ -137,12 +137,14 @@ kinds of SQL statements that we support, along with their contents:
 
 https://github.com/erikgrinaker/toydb/blob/39c6b60afc4c235f19113dc98087176748fa091d/src/sql/parser/ast.rs#L6-L145
 
-The nested tree structure is particularly apparent with _expressions_ -- these represent values and
-operations which will eventually _evaluate_ to a single value. For example, the expression
-`2 * 3 - 4 / 2`, which evaluates to the value `4`.
+The nested tree structure is particularly apparent with expressions, which represent values and
+operations on them. For example, the expression `2 * 3 - 4 / 2`, which evaluates to the value `4`.
 
-These expressions are represented as `sql::parser::ast::Expression`, and can be nested indefinitely
-into a tree structure.
+We've seen in the data model section how such expressions are represented as
+`sql::types::Expression`, but before we get there we have to parse them. The parser has its own
+representation `sql::parser::ast::Expression` -- this is necessary e.g. because in the AST, we
+represent columns as names rather than numeric indexes (we don't know yet which columns exist or
+what their names are, we'll get to that during planning).
 
 https://github.com/erikgrinaker/toydb/blob/39c6b60afc4c235f19113dc98087176748fa091d/src/sql/parser/ast.rs#L147-L170
 
@@ -215,7 +217,7 @@ than that of the operators preceding them (hence "precedence climbing"). For exa
 2 * 3 - 4 / 2
 ```
 
-The algorithm is documented in more detail on `Parser::parse_expression`:
+The algorithm is documented in more detail on `Parser::parse_expression()`:
 
 https://github.com/erikgrinaker/toydb/blob/39c6b60afc4c235f19113dc98087176748fa091d/src/sql/parser/parser.rs#L501-L696
 
diff --git a/docs/architecture/sql-planner.md b/docs/architecture/sql-planner.md
@@ -1,8 +1,8 @@
 # SQL Planning
 
-The SQL planner in [`sql/planner`](https://github.com/erikgrinaker/toydb/tree/c64012e29c5712d6fe028d3d5375a98b8faea266/src/sql/planner)
-takes a SQL statement AST from the parser and generates an execution plan for it. We won't actually
-execute it just yet though, only figure out how to execute it.
+The SQL planner in the [`sql::planner`](https://github.com/erikgrinaker/toydb/tree/c64012e29c5712d6fe028d3d5375a98b8faea266/src/sql/planner)
+module takes a SQL statement AST from the parser and generates an execution plan for it. We won't
+actually execute it just yet though, only figure out how to execute it.
 
 ## Execution Plan
 
@@ -16,7 +16,7 @@ emits a stream of SQL rows as output, and may take streams of input rows from ch
 
 https://github.com/erikgrinaker/toydb/blob/213e5c02b09f1a3cac6a8bbd0a81773462f367f5/src/sql/planner/plan.rs#L106-L175
 
-Here is an example (taken from the `Plan` code comment above):
+Here is an example, taken from the `Plan` code comment above:
 
 ```sql
 SELECT title, released, genres.name AS genre
@@ -63,12 +63,13 @@ the `Order` node still needs access to the column data to sort by it).
 
 The planner uses a `sql::planner::Scope` to keep track of which column names are currently visible,
 and which column indexes they refer to. For each node the planner builds, starting from the leaves,
-it creates a new `Scope` that tracks how columns are modified and rearranged by the node.
+it creates a new `Scope` that contains the currently visible columns, tracking how they are modified
+and rearranged by each node.
 
 https://github.com/erikgrinaker/toydb/blob/6f6cec4db10bc015a37ee47ff6c7dae383147dd5/src/sql/planner/planner.rs#L577-L610
 
-When an expression refers to a column name, the planner can use `Scope::lookup_column` to find out
-which column number the expression should take its input value from.
+When an AST expression refers to a column name, the planner can use `Scope::lookup_column()` to find
+out which column number the expression should take its input value from.
 
 https://github.com/erikgrinaker/toydb/blob/6f6cec4db10bc015a37ee47ff6c7dae383147dd5/src/sql/planner/planner.rs#L660-L686
 
@@ -154,24 +155,24 @@ https://github.com/erikgrinaker/toydb/blob/6f6cec4db10bc015a37ee47ff6c7dae383147
 
 https://github.com/erikgrinaker/toydb/blob/6f6cec4db10bc015a37ee47ff6c7dae383147dd5/src/sql/planner/planner.rs#L283-L289
 
-`Planner::build_from` first encounters the `ast::From::Join` item, which joins `movies` and
+`Planner::build_from()` first encounters the `ast::From::Join` item, which joins `movies` and
 `genres`. This will build a `Node::NestedLoopJoin` plan node for the join, which is the simplest and
 most straightforward join algorithm -- it simply iterates over all rows in the `genres` table for
 every row in the `movies` table and emits the joined rows (we'll see how to optimize it with a
 better join algorithm later).
 
 https://github.com/erikgrinaker/toydb/blob/6f6cec4db10bc015a37ee47ff6c7dae383147dd5/src/sql/planner/planner.rs#L319-L344
 
-It first recurses into `Planner::build_from` to build each of the `ast::From::Table` nodes for each
-table.  This will look up the table schemas in the catalog, add them to the current scope, and build
-a `Node::Scan` node which will emit all rows from each table. The `Node::Scan` nodes are placed into
-the `Node::NestedLoopJoin` above.
+It first recurses into `Planner::build_from()` to build each of the `ast::From::Table` nodes for
+each table.  This will look up the table schemas in the catalog, add them to the current scope, and
+build a `Node::Scan` node which will emit all rows from each table. The `Node::Scan` nodes are
+placed into the `Node::NestedLoopJoin` above.
 
 https://github.com/erikgrinaker/toydb/blob/6f6cec4db10bc015a37ee47ff6c7dae383147dd5/src/sql/planner/planner.rs#L312-L317
 
 While building the `Node::NestedLoopJoin`, it also needs to convert the join expression
 `movies.genre_id = genres.id` into a proper `sql::types::Expression`. This is done by
-`Planner::build_expression`:
+`Planner::build_expression()`:
 
 https://github.com/erikgrinaker/toydb/blob/6f6cec4db10bc015a37ee47ff6c7dae383147dd5/src/sql/planner/planner.rs#L493-L568
 
@@ -228,9 +229,6 @@ sorts them by the order expression.
 https://github.com/erikgrinaker/toydb/blob/6f6cec4db10bc015a37ee47ff6c7dae383147dd5/src/sql/planner/planner.rs#L245-L252
 
 And that's it. The `Node::Order` is placed into the root `Plan::Select`, and we have our final plan.
-We'll see how to execute it soon, but first we should optimize it to see if we can make it run
-faster -- in particular, to see if we can avoid reading all movies from storage, and if we can do
-better than the very slow nested loop join.
 
 ```
 Select
@@ -242,6 +240,10 @@ Select
             └─ Scan: genres
 ```
 
+We'll see how to execute it soon, but first we should optimize it to see if we can make it run
+faster -- in particular, to see if we can avoid reading all movies from storage, and if we can do
+better than the very slow nested loop join.
+
 ---
 
 <p align="center">
diff --git a/docs/architecture/sql-raft.md b/docs/architecture/sql-raft.md
@@ -1,8 +1,8 @@
 # SQL Raft Replication
 
-toyDB uses Raft to replicate SQL storage across a cluster of nodes (see the Raft section above for
-details). All nodes will store a copy of the SQL database, and the Raft leader will replicate writes
-across nodes and execute reads.
+toyDB uses Raft to replicate SQL storage across a cluster of nodes (see the Raft section for
+details). All nodes will store a full copy of the SQL database, and the Raft leader will replicate
+writes across nodes and execute reads.
 
 Recall the Raft state machine interface `raft::State`:
 
@@ -44,20 +44,39 @@ send requests to the local Raft node (we'll see how this plumbing works in the s
 
 https://github.com/erikgrinaker/toydb/blob/c2b0f7f1d6cbf6e2cdc09fc0aec7b050e840ec21/src/sql/engine/raft.rs#L80-L95
 
-The channel takes a `raft::Request` containing binary Raft client requests, and also a return
-channel where the Raft node can send back a `raft::Response`. The Raft engine has a few convenience
-methods to send requests and receive responses, for both read and write requests:
+The channel takes a `raft::Request` containing binary Raft client requests and a return channel
+where the Raft node can send back a `raft::Response`. The Raft engine has a few convenience methods
+to send requests and receive responses, for both read and write requests:
 
 https://github.com/erikgrinaker/toydb/blob/c2b0f7f1d6cbf6e2cdc09fc0aec7b050e840ec21/src/sql/engine/raft.rs#L114-L135
 
-And the implementation of the `Engine` and `Transaction` traits simply send requests via Raft:
+And the implementation of the `sql::engine::Engine` and `sql::engine::Transaction` traits simply
+send these requests via Raft:
 
 https://github.com/erikgrinaker/toydb/blob/c2b0f7f1d6cbf6e2cdc09fc0aec7b050e840ec21/src/sql/engine/raft.rs#L194-L276
 
 One thing to note here is that we don't support streaming data via Raft, so e.g. the
 `Transaction::scan` method will buffer the entire result in a `Vec`. With a full table scan, this
 will load the entire table into memory -- that's unfortunate, but we keep it simple.
 
+To summarize, this is what happens when `Transaction::insert()` is called to insert a row via Raft:
+
+1. `sql::engine::raft::Transaction::insert()`: called to insert a row.
+2. `sql::engine::raft::Write::Insert`: enum representation of the insert command.
+3. `raft::Request::Write`: raft request containing the Bincode-encoded `Write::Insert` command.
+4. `sql::engine::raft::Engine::tx`: sends the `Request::Write` and response channel to Raft.
+5. `raft::Node::step()`: the `Request::Write` is given to Raft in a `Message::ClientRequest`.
+6. Raft does its replication thing, and commits the command's log entry.
+7. `raft::State::apply()`: the Bincode-encoded `Write::Insert` is passed to the state machine.
+8. `sql::engine::raft::State::apply()`: decodes the command to a `Write::Insert`.
+9. `sql::engine::raft::State::local`: contains the `Local` engine on each node.
+10. `sql::engine::local::Engine::resume()`: called to obtain the SQL/MVCC transaction.
+11. `sql::engine::local::Transaction::insert()`: the row is inserted to the local engine.
+12. `raft::RawNode::tx`: the `Ok(())` result is sent as a Bincode-encoded `Message::ClientResponse`.
+13. `sql::engine::raft::Transaction::insert()`: receives the result and returns it to the caller.
+
+The plumbing here will be covered in more details in the server section.
+
 ---
 
 <p align="center">
diff --git a/docs/architecture/sql-storage.md b/docs/architecture/sql-storage.md
diff --git a/docs/architecture/sql.md b/docs/architecture/sql.md