Skip to content

Commit 615f9df

Browse files
committed
docs: more architecture guide cleanups
1 parent 686d397 commit 615f9df

File tree

8 files changed

+184
-155
lines changed

8 files changed

+184
-155
lines changed

docs/architecture/sql-data.md

Lines changed: 40 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1,38 +1,45 @@
11
# SQL Data Model
22

3-
The SQL data model is toyDB's representation of user data. It is made up of data types and schemas.
3+
The SQL data model represents user data in tables and rows. It is made up of data types and schemas,
4+
in the [`sql::types`](https://github.com/erikgrinaker/toydb/tree/686d3971a253bfc9facc2ba1b0e716cff5c109fb/src/sql/types)
5+
module.
46

57
## Data Types
68

7-
toyDB supports four basic scalar data types as `sql::types::DataType`: booleans, floats, integers,
9+
toyDB supports four basic scalar data types as `sql::types::DataType`: booleans, integers, floats,
810
and strings.
911

1012
https://github.com/erikgrinaker/toydb/blob/b2fe7b76ee634ca6ad31616becabfddb1c03d34b/src/sql/types/value.rs#L15-L27
1113

12-
Concrete values are represented as `sql::types::Value`, using corresponding Rust types. toyDB also
13-
supports SQL `NULL` values, i.e. unknown values, following the rules of
14+
Specific values are represented as `sql::types::Value`, using the corresponding Rust types. toyDB
15+
also supports SQL `NULL` values, i.e. unknown values, following the rules of
1416
[three-valued logic](https://en.wikipedia.org/wiki/Three-valued_logic).
1517

1618
https://github.com/erikgrinaker/toydb/blob/b2fe7b76ee634ca6ad31616becabfddb1c03d34b/src/sql/types/value.rs#L40-L64
1719

18-
The `Value` type provides basic formatting, conversion, and mathematical operations. It also
19-
specifies comparison and ordering semantics, but these are subtly different from the SQL semantics.
20-
For example, in Rust code `Value::Null == Value::Null` yields `true`, while in SQL `NULL = NULL`
21-
yields `NULL`. This mismatch is necessary for the Rust code to properly detect and process `Null`
22-
values, and the desired SQL semantics are implemented higher up in the SQL execution engine (we'll
23-
get back to this later).
20+
The `Value` type provides basic formatting, conversion, and mathematical operations.
21+
22+
https://github.com/erikgrinaker/toydb/blob/686d3971a253bfc9facc2ba1b0e716cff5c109fb/src/sql/types/value.rs#L68-L79
23+
24+
https://github.com/erikgrinaker/toydb/blob/686d3971a253bfc9facc2ba1b0e716cff5c109fb/src/sql/types/value.rs#L164-L370
25+
26+
It also specifies comparison and ordering semantics, but these are subtly different from the SQL
27+
semantics. For example, in Rust code `Value::Null == Value::Null` yields `true`, while in SQL
28+
`NULL = NULL` yields `NULL`. This mismatch is necessary for the Rust code to properly detect and
29+
process `Null` values, and the desired SQL semantics are implemented during expression evaluation
30+
which we'll cover below.
2431

2532
https://github.com/erikgrinaker/toydb/blob/b2fe7b76ee634ca6ad31616becabfddb1c03d34b/src/sql/types/value.rs#L91-L162
2633

27-
During execution, a row of values will be represented as `sql::types::Row`, with multiple rows
28-
emitted as `sql::types::Rows` row iterators:
34+
During execution, a row of values is represented as `sql::types::Row`, with multiple rows emitted
35+
via `sql::types::Rows` row iterators:
2936

3037
https://github.com/erikgrinaker/toydb/blob/b2fe7b76ee634ca6ad31616becabfddb1c03d34b/src/sql/types/value.rs#L378-L388
3138

3239
## Schemas
3340

34-
toyDB schemas support a single object: a table. There's only a single, unnamed database, and no
35-
named indexes, constraints, or other schema objects.
41+
toyDB schemas only support tables. There are no named indexes or constraints, and there's only a
42+
single unnamed database.
3643

3744
Tables are represented by `sql::types::Table`:
3845

@@ -47,42 +54,41 @@ The table name serves as a unique identifier, and can't be changed later. In fac
4754
are entirely static: they can only be created or dropped (there are no schema changes).
4855

4956
Table schemas are stored in the catalog, represented by the `sql::engine::Catalog` trait. We'll
50-
revisit the implementation of this trait in the storage section below.
57+
revisit the implementation of this trait in the SQL storage section.
5158

5259
https://github.com/erikgrinaker/toydb/blob/0839215770e31f1e693d5cccf20a68210deaaa3f/src/sql/engine/engine.rs#L60-L79
5360

54-
Table schemas are validated (e.g. during creation) via the `Table::validate()` method, which
55-
enforces invariants and internal consistency. It uses the catalog to look up information about other
56-
tables, e.g. that foreign key references point to a valid target column.
61+
Table schemas are validated when created via `Table::validate()`, which enforces invariants and
62+
internal consistency. It uses the catalog to look up information about other tables, e.g. that
63+
foreign key references point to a valid target column in a different table.
5764

5865
https://github.com/erikgrinaker/toydb/blob/c2b0f7f1d6cbf6e2cdc09fc0aec7b050e840ec21/src/sql/types/schema.rs#L98-L170
5966

60-
It also has a `Table::validate_row()` method which is used to validate that a given
61-
`sql::types::Row` conforms to the schema (e.g. that the value data types match the column data
62-
types). It uses a `sql::engine::Transaction` to look up other rows in the database, e.g. to check
63-
for primary key conflicts (we'll get back to this below).
67+
Table rows are validated via `Table::validate_row()`, which ensures that a `sql::types::Row`
68+
conforms to the schema (e.g. that value types match the column data types). It uses a
69+
`sql::engine::Transaction` to look up other rows in the database, e.g. to check for primary key
70+
conflicts (we'll get back to this later).
6471

6572
https://github.com/erikgrinaker/toydb/blob/c2b0f7f1d6cbf6e2cdc09fc0aec7b050e840ec21/src/sql/types/schema.rs#L172-L236
6673

6774
## Expressions
6875

6976
During SQL execution, we also have to model _expressions_, such as `1 + 2 * 3`. These are
70-
represented as values and operations on them. They can be nested arbitrarily as a tree to represent
71-
compound operations.
77+
represented as values and operations on them, and can be nested as a tree to represent compound
78+
operations.
7279

7380
https://github.com/erikgrinaker/toydb/blob/9419bcf6aededf0e20b4e7485e2a5fa3e975d79f/src/sql/types/expression.rs#L11-L64
7481

7582

76-
For example:
83+
For example, the expression `1 + 2 * 3` (taking [precedence](https://en.wikipedia.org/wiki/Order_of_operations)
84+
into account) is represented as:
7785

7886
```rust
79-
// 1 + 2 * 3 is represented as:
80-
//
81-
// +
82-
// / \
83-
// 1 *
84-
// / \
85-
/// 2 3
87+
// +
88+
// / \
89+
// 1 *
90+
// / \
91+
/// 2 3
8692
Expression::Add(
8793
Expression::Constant(Value::Integer(1)),
8894
Expression::Multiply(
@@ -97,8 +103,8 @@ An `Expression` can contain two kinds of values: constant values as
97103
references. The latter will fetch a `sql::types::Value` from a `sql::types::Row` at the specified
98104
index during evaluation.
99105

100-
We'll see later how the SQL parser and planner transforms text expressions like `1 + 2 * 3` into
101-
this `Expression` form, and how it resolves column names to row indexes -- e.g. `price * 0.25` to
106+
We'll see later how the SQL parser and planner transforms text expression like `1 + 2 * 3` into an
107+
`Expression`, and how it resolves column names to row indexes like `price * 0.25` to
102108
`row[3] * 0.25`.
103109

104110
Expressions are evaluated recursively via `Expression::evalute()`, given a `sql::types::Row` with

docs/architecture/sql-execution.md

Lines changed: 33 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,46 +1,47 @@
11
# SQL Execution
22

3-
Ok, now that the planner and optimizer has done all the hard work of figuring out how to execute a
3+
Now that the planner and optimizer have done all the hard work of figuring out how to execute a
44
query, it's time to actually execute it.
55

66
## Plan Executor
77

88
Plan execution is done by `sql::execution::Executor` in the
99
[`sql::execution`](https://github.com/erikgrinaker/toydb/tree/9419bcf6aededf0e20b4e7485e2a5fa3e975d79f/src/sql/execution)
10-
module, using a `sql::engine::Transaction` to perform read/write operations on the SQL engine.
10+
module, using a `sql::engine::Transaction` to access the SQL storage engine.
1111

1212
https://github.com/erikgrinaker/toydb/blob/213e5c02b09f1a3cac6a8bbd0a81773462f367f5/src/sql/execution/executor.rs#L14-L49
1313

1414
The executor takes a `sql::planner::Plan` as input, and will return an `ExecutionResult` depending
1515
on the statement type.
1616

17-
https://github.com/erikgrinaker/toydb/blob/213e5c02b09f1a3cac6a8bbd0a81773462f367f5/src/sql/execution/executor.rs#L330-L338
17+
https://github.com/erikgrinaker/toydb/blob/686d3971a253bfc9facc2ba1b0e716cff5c109fb/src/sql/execution/executor.rs#L331-L339
1818

1919
When executing the plan, the executor will branch off depending on the statement type:
2020

21-
https://github.com/erikgrinaker/toydb/blob/213e5c02b09f1a3cac6a8bbd0a81773462f367f5/src/sql/execution/executor.rs#L56-L100
21+
https://github.com/erikgrinaker/toydb/blob/686d3971a253bfc9facc2ba1b0e716cff5c109fb/src/sql/execution/executor.rs#L57-L101
2222

23-
We'll focus on `SELECT` queries here, which is the most interesting.
23+
We'll focus on `SELECT` queries here, which are the most interesting.
2424

25-
toyDB uses the iterator model (also known as the volcano model) for query execution. In the case
26-
of a `SELECT` query, the result is a result row iterator, and pulling from this iterator by calling
27-
`next()` will drive the entire execution pipeline. This maps very naturally onto Rust's iterators,
28-
and we leverage these to construct the execution pipeline as nested iterators.
25+
toyDB uses the iterator model (also known as the volcano model) for query execution. In the case of
26+
a `SELECT` query, the result is a row iterator, and pulling from this iterator by calling `next()`
27+
will drive the entire execution pipeline by recursively calling `next()` on the child nodes' row
28+
iterators. This maps very naturally onto Rust's iterators, and we leverage these to construct the
29+
execution pipeline as nested iterators.
2930

3031
Execution itself is fairly straightforward, since we're just doing exactly what the planner tells us
3132
to do in the plan. We call `Executor::execute_node` recursively on each `sql::planner:Node`,
3233
starting with the root node. Each node returns a result row iterator that the parent node can pull
3334
its input rows from, process them, and output the resulting rows via its own row iterator (with the
3435
root node's iterator being returned to the caller):
3536

36-
https://github.com/erikgrinaker/toydb/blob/213e5c02b09f1a3cac6a8bbd0a81773462f367f5/src/sql/execution/executor.rs#L102-L103
37+
https://github.com/erikgrinaker/toydb/blob/686d3971a253bfc9facc2ba1b0e716cff5c109fb/src/sql/execution/executor.rs#L103-L104
3738

38-
`Executor::execute_node` will simply look at the type of `Node`, recursively call
39-
`Executor::execute_node` on any child nodes, and then process the rows accordingly.
39+
`Executor::execute_node()` will simply look at the type of `Node`, recursively call
40+
`Executor::execute_node()` on any child nodes, and then process the rows accordingly.
4041

41-
https://github.com/erikgrinaker/toydb/blob/213e5c02b09f1a3cac6a8bbd0a81773462f367f5/src/sql/execution/executor.rs#L102-L211
42+
https://github.com/erikgrinaker/toydb/blob/686d3971a253bfc9facc2ba1b0e716cff5c109fb/src/sql/execution/executor.rs#L103-L212
4243

43-
We won't discuss every plan node in details, but let's consider the movie plan we've looked at
44+
We won't discuss every plan node in detail, but let's consider the movie plan we've looked at
4445
previously:
4546

4647
```
@@ -52,62 +53,62 @@ Select
5253
└─ Scan: genres
5354
```
5455

55-
We'll recursively call `execute_node` until we end up in the two `Scan` nodes. These simply
56-
call through to the SQL engine (either using Raft or local disk) via `Transaction::scan`, passing
56+
We'll recursively call `execute_node()` until we end up in the two `Scan` nodes. These simply
57+
call through to the SQL engine (either using Raft or local disk) via `Transaction::scan()`, passing
5758
in the scan predicate if any, and return the resulting row iterator:
5859

59-
https://github.com/erikgrinaker/toydb/blob/213e5c02b09f1a3cac6a8bbd0a81773462f367f5/src/sql/execution/executor.rs#L202-L203
60+
https://github.com/erikgrinaker/toydb/blob/686d3971a253bfc9facc2ba1b0e716cff5c109fb/src/sql/execution/executor.rs#L203-L204
6061

6162
`HashJoin` will then join the output rows from the `movies` and `genres` iterators by using a
6263
hash join. This builds an in-memory table for `genres` and then iterates over `movies`, joining
6364
the rows:
6465

65-
https://github.com/erikgrinaker/toydb/blob/213e5c02b09f1a3cac6a8bbd0a81773462f367f5/src/sql/execution/executor.rs#L127-L140
66+
https://github.com/erikgrinaker/toydb/blob/686d3971a253bfc9facc2ba1b0e716cff5c109fb/src/sql/execution/executor.rs#L128-L141
6667

6768
https://github.com/erikgrinaker/toydb/blob/889aef9f24c0fa4d58e314877fa17559a9f3d5d2/src/sql/execution/join.rs#L103-L183
6869

6970
The `Projection` node will simply evaluate the (trivial) column expressions using each joined
7071
row as input:
7172

72-
https://github.com/erikgrinaker/toydb/blob/213e5c02b09f1a3cac6a8bbd0a81773462f367f5/src/sql/execution/executor.rs#L178-L185
73+
https://github.com/erikgrinaker/toydb/blob/686d3971a253bfc9facc2ba1b0e716cff5c109fb/src/sql/execution/executor.rs#L179-L186
7374

7475
And finally the `Order` node will sort the results (which requires buffering them all in memory):
7576

76-
https://github.com/erikgrinaker/toydb/blob/213e5c02b09f1a3cac6a8bbd0a81773462f367f5/src/sql/execution/executor.rs#L172-L176
77+
https://github.com/erikgrinaker/toydb/blob/686d3971a253bfc9facc2ba1b0e716cff5c109fb/src/sql/execution/executor.rs#L173-L177
7778

78-
https://github.com/erikgrinaker/toydb/blob/213e5c02b09f1a3cac6a8bbd0a81773462f367f5/src/sql/execution/executor.rs#L297-L327
79+
https://github.com/erikgrinaker/toydb/blob/686d3971a253bfc9facc2ba1b0e716cff5c109fb/src/sql/execution/executor.rs#L298-L328
7980

80-
The output row iterator of `Order` is returned to the caller via `ExecutionResult::Select`, and
81-
it can now go ahead and pull its query result.
81+
The output row iterator of `Order` is returned via `ExecutionResult::Select`, and the caller can now
82+
go ahead and pull the resulting rows from it.
8283

8384
## Session Management
8485

8586
The entry point to the SQL engine is the `sql::execution::Session`, which represents a single user
86-
session. It is obtained via `sql::engine::Engine::session`.
87+
session. It is obtained via `sql::engine::Engine::session()`.
8788

8889
https://github.com/erikgrinaker/toydb/blob/0839215770e31f1e693d5cccf20a68210deaaa3f/src/sql/execution/session.rs#L14-L21
8990

90-
The session takes a series of raw SQL statement strings as input, then parses, plans, and executes
91-
them against the engine.
91+
The session takes a series of raw SQL statement strings as input and parses them:
9292

93-
https://github.com/erikgrinaker/toydb/blob/0839215770e31f1e693d5cccf20a68210deaaa3f/src/sql/execution/session.rs#L29-L30
93+
https://github.com/erikgrinaker/toydb/blob/0839215770e31f1e693d5cccf20a68210deaaa3f/src/sql/execution/session.rs#L29-L33
9494

9595
For each statement, it returns a result depending on the kind of statement:
9696

9797
https://github.com/erikgrinaker/toydb/blob/0839215770e31f1e693d5cccf20a68210deaaa3f/src/sql/execution/session.rs#L132-L148
9898

99-
In particular, the session performs transaction control. It handles `BEGIN`, `COMMIT`, and
100-
`ROLLBACK` statements itself, and modifies the transaction accordingly.
99+
The session itself performs transaction control. It handles `BEGIN`, `COMMIT`, and `ROLLBACK`
100+
statements, and modifies the transaction accordingly.
101101

102102
https://github.com/erikgrinaker/toydb/blob/0839215770e31f1e693d5cccf20a68210deaaa3f/src/sql/execution/session.rs#L34-L70
103103

104104
Any other statements are processed by the SQL planner, optimizer, and executor as we've seen in
105-
previous sections. These statements are always executed using the session's current transaction. If
106-
there is no active transaction, the session will create a new, implicit transaction for each
107-
statement.
105+
previous sections.
108106

109107
https://github.com/erikgrinaker/toydb/blob/0839215770e31f1e693d5cccf20a68210deaaa3f/src/sql/execution/session.rs#L77-L83
110108

109+
These statements are always executed using the session's current transaction. If there is no active
110+
transaction, the session will create a new, implicit transaction for each statement.
111+
111112
https://github.com/erikgrinaker/toydb/blob/0839215770e31f1e693d5cccf20a68210deaaa3f/src/sql/execution/session.rs#L87-L112
112113

113114
And with that, we have a fully functional SQL engine!

0 commit comments

Comments
 (0)