erikgrinaker
diff --git a/‎docs/architecture/sql-data.md‎
Lines changed: 40 additions & 34 deletions b/‎docs/architecture/sql-data.md‎
Lines changed: 40 additions & 34 deletions
diff --git a/‎docs/architecture/sql-execution.md‎
Lines changed: 14 additions & 13 deletions b/‎docs/architecture/sql-execution.md‎
Lines changed: 14 additions & 13 deletions
diff --git a/‎docs/architecture/sql-optimizer.md‎
Lines changed: 22 additions & 25 deletions b/‎docs/architecture/sql-optimizer.md‎
Lines changed: 22 additions & 25 deletions
diff --git a/‎docs/architecture/sql-parser.md‎
Lines changed: 11 additions & 9 deletions b/‎docs/architecture/sql-parser.md‎
Lines changed: 11 additions & 9 deletions
@@ -1,38 +1,45 @@
 # SQL Data Model
 
-The SQL data model is toyDB's representation of user data. It is made up of data types and schemas.
+The SQL data model represents user data in tables and rows. It is made up of data types and schemas,
+in the [`sql::types`](https://github.com/erikgrinaker/toydb/tree/686d3971a253bfc9facc2ba1b0e716cff5c109fb/src/sql/types)
+module.
 
 ## Data Types
 
-toyDB supports four basic scalar data types as `sql::types::DataType`: booleans, floats, integers,
+toyDB supports four basic scalar data types as `sql::types::DataType`: booleans, integers, floats,
 and strings.
 
 https://github.com/erikgrinaker/toydb/blob/b2fe7b76ee634ca6ad31616becabfddb1c03d34b/src/sql/types/value.rs#L15-L27
 
-Concrete values are represented as `sql::types::Value`, using corresponding Rust types. toyDB also
-supports SQL `NULL` values, i.e. unknown values, following the rules of
+Specific values are represented as `sql::types::Value`, using the corresponding Rust types. toyDB
+also supports SQL `NULL` values, i.e. unknown values, following the rules of
 [three-valued logic](https://en.wikipedia.org/wiki/Three-valued_logic).
 
 https://github.com/erikgrinaker/toydb/blob/b2fe7b76ee634ca6ad31616becabfddb1c03d34b/src/sql/types/value.rs#L40-L64
 
-The `Value` type provides basic formatting, conversion, and mathematical operations. It also
-specifies comparison and ordering semantics, but these are subtly different from the SQL semantics.
-For example, in Rust code `Value::Null == Value::Null` yields `true`, while in SQL `NULL = NULL`
-yields `NULL`.  This mismatch is necessary for the Rust code to properly detect and process `Null`
-values, and the desired SQL semantics are implemented higher up in the SQL execution engine (we'll
-get back to this later).
+The `Value` type provides basic formatting, conversion, and mathematical operations.
+
+https://github.com/erikgrinaker/toydb/blob/686d3971a253bfc9facc2ba1b0e716cff5c109fb/src/sql/types/value.rs#L68-L79
+
+https://github.com/erikgrinaker/toydb/blob/686d3971a253bfc9facc2ba1b0e716cff5c109fb/src/sql/types/value.rs#L164-L370
+
+It also specifies comparison and ordering semantics, but these are subtly different from the SQL
+semantics. For example, in Rust code `Value::Null == Value::Null` yields `true`, while in SQL
+`NULL = NULL` yields `NULL`.  This mismatch is necessary for the Rust code to properly detect and
+process `Null` values, and the desired SQL semantics are implemented during expression evaluation
+which we'll cover below.
 
 https://github.com/erikgrinaker/toydb/blob/b2fe7b76ee634ca6ad31616becabfddb1c03d34b/src/sql/types/value.rs#L91-L162
 
-During execution, a row of values will be represented as `sql::types::Row`, with multiple rows
-emitted as `sql::types::Rows` row iterators:
+During execution, a row of values is represented as `sql::types::Row`, with multiple rows emitted
+via `sql::types::Rows` row iterators:
 
 https://github.com/erikgrinaker/toydb/blob/b2fe7b76ee634ca6ad31616becabfddb1c03d34b/src/sql/types/value.rs#L378-L388
 
 ## Schemas
 
-toyDB schemas support a single object: a table. There's only a single, unnamed database, and no
-named indexes, constraints, or other schema objects.
+toyDB schemas only support tables. There are no named indexes or constraints, and there's only a
+single unnamed database.
 
 Tables are represented by `sql::types::Table`:
 
@@ -47,42 +54,41 @@ The table name serves as a unique identifier, and can't be changed later. In fac
 are entirely static: they can only be created or dropped (there are no schema changes).
 
 Table schemas are stored in the catalog, represented by the `sql::engine::Catalog` trait. We'll
-revisit the implementation of this trait in the storage section below.
+revisit the implementation of this trait in the SQL storage section.
 
 https://github.com/erikgrinaker/toydb/blob/0839215770e31f1e693d5cccf20a68210deaaa3f/src/sql/engine/engine.rs#L60-L79
 
-Table schemas are validated (e.g. during creation) via the `Table::validate()` method, which
-enforces invariants and internal consistency. It uses the catalog to look up information about other
-tables, e.g. that foreign key references point to a valid target column.
+Table schemas are validated when created via `Table::validate()`, which enforces invariants and
+internal consistency. It uses the catalog to look up information about other tables, e.g. that
+foreign key references point to a valid target column in a different table.
 
 https://github.com/erikgrinaker/toydb/blob/c2b0f7f1d6cbf6e2cdc09fc0aec7b050e840ec21/src/sql/types/schema.rs#L98-L170
 
-It also has a `Table::validate_row()` method which is used to validate that a given
-`sql::types::Row` conforms to the schema (e.g. that the value data types match the column data
-types). It uses a `sql::engine::Transaction` to look up other rows in the database, e.g. to check
-for primary key conflicts (we'll get back to this below).
+Table rows are validated via `Table::validate_row()`, which ensures that a `sql::types::Row`
+conforms to the schema (e.g. that value types match the column data types). It uses a
+`sql::engine::Transaction` to look up other rows in the database, e.g. to check for primary key
+conflicts (we'll get back to this later).
 
 https://github.com/erikgrinaker/toydb/blob/c2b0f7f1d6cbf6e2cdc09fc0aec7b050e840ec21/src/sql/types/schema.rs#L172-L236
 
 ## Expressions
 
 During SQL execution, we also have to model _expressions_, such as `1 + 2 * 3`. These are
-represented as values and operations on them. They can be nested arbitrarily as a tree to represent
-compound operations.
+represented as values and operations on them, and can be nested as a tree to represent compound
+operations.
 
 https://github.com/erikgrinaker/toydb/blob/9419bcf6aededf0e20b4e7485e2a5fa3e975d79f/src/sql/types/expression.rs#L11-L64
 
 
-For example:
+For example, the expression `1 + 2 * 3` (taking [precedence](https://en.wikipedia.org/wiki/Order_of_operations)
+into account) is represented as:
 
 ```rust
-// 1 + 2 * 3 is represented as:
-//
-//             +
-//            / \
-//           1   *
-//              /  \
-///            2    3
+//    +
+//   / \
+//  1   *
+//     /  \
+///   2    3
 Expression::Add(
     Expression::Constant(Value::Integer(1)),
     Expression::Multiply(
@@ -97,8 +103,8 @@ An `Expression` can contain two kinds of values: constant values as
 references. The latter will fetch a `sql::types::Value` from a `sql::types::Row` at the specified
 index during evaluation.
 
-We'll see later how the SQL parser and planner transforms text expressions like `1 + 2 * 3` into
-this `Expression` form, and how it resolves column names to row indexes -- e.g. `price * 0.25` to
+We'll see later how the SQL parser and planner transforms text expression like `1 + 2 * 3` into an
+`Expression`, and how it resolves column names to row indexes like `price * 0.25` to
 `row[3] * 0.25`.
 
 Expressions are evaluated recursively via `Expression::evalute()`, given a `sql::types::Row` with
 
@@ -1,44 +1,45 @@
 # SQL Execution
 
-Ok, now that the planner and optimizer has done all the hard work of figuring out how to execute a
+Now that the planner and optimizer have done all the hard work of figuring out how to execute a
 query, it's time to actually execute it.
 
 ## Plan Executor
 
 Plan execution is done by `sql::execution::Executor` in the
 [`sql::execution`](https://github.com/erikgrinaker/toydb/tree/9419bcf6aededf0e20b4e7485e2a5fa3e975d79f/src/sql/execution)
-module, using a `sql::engine::Transaction` to perform read/write operations on the SQL engine.
+module, using a `sql::engine::Transaction` to access the SQL storage engine.
 
 https://github.com/erikgrinaker/toydb/blob/213e5c02b09f1a3cac6a8bbd0a81773462f367f5/src/sql/execution/executor.rs#L14-L49
 
 The executor takes a `sql::planner::Plan` as input, and will return an `ExecutionResult` depending
 on the statement type.
 
-https://github.com/erikgrinaker/toydb/blob/213e5c02b09f1a3cac6a8bbd0a81773462f367f5/src/sql/execution/executor.rs#L330-L338
+https://github.com/erikgrinaker/toydb/blob/686d3971a253bfc9facc2ba1b0e716cff5c109fb/src/sql/execution/executor.rs#L331-L339
 
 When executing the plan, the executor will branch off depending on the statement type:
 
-https://github.com/erikgrinaker/toydb/blob/213e5c02b09f1a3cac6a8bbd0a81773462f367f5/src/sql/execution/executor.rs#L56-L100
+https://github.com/erikgrinaker/toydb/blob/686d3971a253bfc9facc2ba1b0e716cff5c109fb/src/sql/execution/executor.rs#L57-L101
 
-We'll focus on `SELECT` queries here, which is the most interesting.
+We'll focus on `SELECT` queries here, which are the most interesting.
 
-toyDB uses the iterator model (also known as the volcano model) for query execution. In the case
-of a `SELECT` query, the result is a result row iterator, and pulling from this iterator by calling
-`next()` will drive the entire execution pipeline. This maps very naturally onto Rust's iterators,
-and we leverage these to construct the execution pipeline as nested iterators.
+toyDB uses the iterator model (also known as the volcano model) for query execution. In the case of
+a `SELECT` query, the result is a row iterator, and pulling from this iterator by calling `next()`
+will drive the entire execution pipeline by recursively calling `next()` on the child node results.
+This maps very naturally onto Rust's iterators, and we leverage these to construct the execution
+pipeline as nested iterators.
 
 Execution itself is fairly straightforward, since we're just doing exactly what the planner tells us
 to do in the plan. We call `Executor::execute_node` recursively on each `sql::planner:Node`,
 starting with the root node. Each node returns a result row iterator that the parent node can pull
 its input rows from, process them, and output the resulting rows via its own row iterator (with the
 root node's iterator being returned to the caller):
 
-https://github.com/erikgrinaker/toydb/blob/213e5c02b09f1a3cac6a8bbd0a81773462f367f5/src/sql/execution/executor.rs#L102-L103
+https://github.com/erikgrinaker/toydb/blob/686d3971a253bfc9facc2ba1b0e716cff5c109fb/src/sql/execution/executor.rs#L103-L104
 
-`Executor::execute_node` will simply look at the type of `Node`, recursively call
-`Executor::execute_node` on any child nodes, and then process the rows accordingly.
+`Executor::execute_node()` will simply look at the type of `Node`, recursively call
+`Executor::execute_node()` on any child nodes, and then process the rows accordingly.
 
-https://github.com/erikgrinaker/toydb/blob/213e5c02b09f1a3cac6a8bbd0a81773462f367f5/src/sql/execution/executor.rs#L102-L211
+https://github.com/erikgrinaker/toydb/blob/686d3971a253bfc9facc2ba1b0e716cff5c109fb/src/sql/execution/executor.rs#L103-L212
 
 We won't discuss every plan node in details, but let's consider the movie plan we've looked at
 previously:
 
@@ -93,15 +93,14 @@ Additionally, `ConstantFolding` also short-circuits logical expressions. For exa
 
 https://github.com/erikgrinaker/toydb/blob/213e5c02b09f1a3cac6a8bbd0a81773462f367f5/src/sql/planner/optimizer.rs#L58-L84
 
-As the code comment mentions though, this doesn't fold as far as possible. It doesn't attempt to
-rearrange expressions, which would require knowledge of precedence rules. For example,
-`(1 + foo) - 2` could be folded into `foo - 1` by first rearranging it as `foo + (1 - 2)`, but we
-don't do this currently.
+As the code comment mentions though, this doesn't fold optimally: it doesn't attempt to rearrange
+expressions, which would require knowledge of precedence rules. For example, `(1 + foo) - 2` could
+be folded into `foo - 1` by first rearranging it as `foo + (1 - 2)`, but we don't do this currently.
 
 ## Filter Pushdown
 
 The `FilterPushdown` optimizer attempts to push filter predicates as far down into the plan as
-possible, to reduce the amount of work we do.
+possible, to reduce the number of rows each node has to process.
 
 https://github.com/erikgrinaker/toydb/blob/213e5c02b09f1a3cac6a8bbd0a81773462f367f5/src/sql/planner/optimizer.rs#L90-L95
 
@@ -117,10 +116,10 @@ Select
             └─ Scan: genres
 ```
 
-Even though we're filtering on `release >= 2000`, the `Scan` node still has to read all of them
-from disk and send them via Raft, and the `NestedLoopJoin` node still has to join all of them.
-It would be nice if we could push this filtering into into the `NestedLoopJoin` and `Scan` nodes
-and avoid this work, which is exactly what `FilterPushdown` does.
+Even though we're filtering on `release >= 2000`, the `Scan` node still has to read all of them from
+disk and send them via Raft, and the `NestedLoopJoin` node still has to join all of them. It would
+be nice if we could push this filtering into the `NestedLoopJoin` and `Scan` nodes and avoid this
+extra work, and this is exactly what `FilterPushdown` does.
 
 The only plan nodes that have predicates that can be pushed down are `Filter` nodes and
 `NestedLoopJoin` nodes, so we recurse through the plan tree and look for these nodes, attempting
@@ -159,10 +158,9 @@ discussed previously. This allows us to examine and push down each AND part in i
 has the same effect regardless of whether it is evaluated in the `NestedLoopJoin` node or one of
 the source nodes. Our expression is already in conjunctive normal form, though.
 
-We then look at each AND part, and check which side of the join they have column references for.
-If they only reference one of the sides, then the expression can be pushed down into it. We also
-make some effort here to move primary/foreign key constants across to both sides, but we'll gloss
-over that.
+We then look at each AND part, and check which side of the join it has column references for.  If it
+only references one of the sides, then the expression can be pushed down into it. We also make some
+effort here to move primary/foreign key constants across to both sides, but we'll gloss over that.
 
 https://github.com/erikgrinaker/toydb/blob/213e5c02b09f1a3cac6a8bbd0a81773462f367f5/src/sql/planner/optimizer.rs#L155-L247
 
@@ -213,7 +211,7 @@ The code is as outlined above:
 
 https://github.com/erikgrinaker/toydb/blob/213e5c02b09f1a3cac6a8bbd0a81773462f367f5/src/sql/planner/optimizer.rs#L254-L303
 
-Helped by `Expression::is_column_lookup` and `Expression::into_column_values`:
+Helped by `Expression::is_column_lookup()` and `Expression::into_column_values()`:
 
 https://github.com/erikgrinaker/toydb/blob/9419bcf6aededf0e20b4e7485e2a5fa3e975d79f/src/sql/types/expression.rs#L363-L421
 
@@ -227,14 +225,13 @@ A [nested loop join](https://en.wikipedia.org/wiki/Nested_loop_join) is a very i
 algorithm, which iterates over all rows in the right source for each row in the left source to see
 if they match. However, it is completely general, and can join on arbitraily complex predicates.
 
-In the common case where the join predicate is an equality check (i.e. an
-[equijoin](https://en.wikipedia.org/wiki/Relational_algebra#θ-join_and_equijoin)), such as
-`movies.genre_id = genres.id`, then we can instead use a
-[hash join](https://en.wikipedia.org/wiki/Hash_join). This scans the right table once, builds an
-in-memory hash table from it, and for each left row it looks up any right rows in the hash table.
-This is a much more efficient O(n) algorithm.
+In the common case where the join predicate is an equality comparison such as
+`movies.genre_id = genres.id` (i.e. an [equijoin](https://en.wikipedia.org/wiki/Relational_algebra#θ-join_and_equijoin)),
+then we can instead use a [hash join](https://en.wikipedia.org/wiki/Hash_join). This scans the right
+table once, builds an in-memory hash table from it, and for each left row it looks up any right rows
+in the hash table. This is a much more efficient O(n) algorithm.
 
-In our previous movie example, we are in fact doing an equijoin, and so our `NestedLoopJoin`:
+In our previous movie example, we are in fact doing an equijoin:
 
 ```
 Select
@@ -245,7 +242,7 @@ Select
          └─ Scan: genres
 ```
 
-Will be replaced by a `HashJoin`:
+And so our `NestedLoopJoin` can be replaced by a `HashJoin`:
 
 ```
 Select
@@ -263,15 +260,15 @@ hash table), but we keep it simple.
 https://github.com/erikgrinaker/toydb/blob/213e5c02b09f1a3cac6a8bbd0a81773462f367f5/src/sql/planner/optimizer.rs#L309-L348
 
 Of course there are many other join algorithms out there, and one of the harder problems in SQL
-optimization is how to efficiently perform deep multijoins. We don't attempt to tackle these
+optimization is how to efficiently perform large N-way multijoins. We don't attempt to tackle these
 problems here -- the `HashJoin` optimizer is just a very simple example of such join optimization.
 
 ## Short Circuiting
 
 The `ShortCircuit` optimizer tries to find nodes that can't possibly do any useful work, and either
 removes them from the plan, or replaces them with trivial nodes that don't do anything. It is kind
-of similar to the `ConstantFolding` optimizer in spirit, but works at the plan node level rather
-than the expression node level.
+of similar to the `ConstantFolding` optimizer in spirit, but works on plan nodes rather than
+expression nodes.
 
 https://github.com/erikgrinaker/toydb/blob/213e5c02b09f1a3cac6a8bbd0a81773462f367f5/src/sql/planner/optimizer.rs#L350-L354
 
 
@@ -1,7 +1,7 @@
 # SQL Parsing
 
-And so we finally arrive at SQL. The SQL parser is the first stage in processing SQL
-queries and statements, located in the [`src/sql/parser`](https://github.com/erikgrinaker/toydb/tree/39c6b60afc4c235f19113dc98087176748fa091d/src/sql/parser)
+We finally arrive at SQL. The SQL parser is the first stage in processing SQL queries and
+statements, located in the [`sql::parser`](https://github.com/erikgrinaker/toydb/tree/39c6b60afc4c235f19113dc98087176748fa091d/src/sql/parser)
 module.
 
 The SQL parser's job is to take a raw SQL string and turn it into a structured form that's more
@@ -99,7 +99,7 @@ string are well-formed. For example, the following input string:
 Will result in these tokens:
 
 ```
-String("foo"), CloseParen, Number("3.14"), Keyword(Select), Plus, Ident("x")
+String("foo") CloseParen Number("3.14") Keyword(Select) Plus Ident("x")
 ```
 
 Tokens and keywords are represented by the `sql::parser::Token` and `sql::parser::Keyword` enums
@@ -137,12 +137,14 @@ kinds of SQL statements that we support, along with their contents:
 
 https://github.com/erikgrinaker/toydb/blob/39c6b60afc4c235f19113dc98087176748fa091d/src/sql/parser/ast.rs#L6-L145
 
-The nested tree structure is particularly apparent with _expressions_ -- these represent values and
-operations which will eventually _evaluate_ to a single value. For example, the expression
-`2 * 3 - 4 / 2`, which evaluates to the value `4`.
+The nested tree structure is particularly apparent with expressions, which represent values and
+operations on them. For example, the expression `2 * 3 - 4 / 2`, which evaluates to the value `4`.
 
-These expressions are represented as `sql::parser::ast::Expression`, and can be nested indefinitely
-into a tree structure.
+We've seen in the data model section how such expressions are represented as
+`sql::types::Expression`, but before we get there we have to parse them. The parser has its own
+representation `sql::parser::ast::Expression` -- this is necessary e.g. because in the AST, we
+represent columns as names rather than numeric indexes (we don't know yet which columns exist or
+what their names are, we'll get to that during planning).
 
 https://github.com/erikgrinaker/toydb/blob/39c6b60afc4c235f19113dc98087176748fa091d/src/sql/parser/ast.rs#L147-L170
 
@@ -215,7 +217,7 @@ than that of the operators preceding them (hence "precedence climbing"). For exa
 2 * 3 - 4 / 2
 ```
 
-The algorithm is documented in more detail on `Parser::parse_expression`:
+The algorithm is documented in more detail on `Parser::parse_expression()`:
 
 https://github.com/erikgrinaker/toydb/blob/39c6b60afc4c235f19113dc98087176748fa091d/src/sql/parser/parser.rs#L501-L696