Skip to content

Commit 8199aa1

Browse files
authored
Merge pull request #83 from stackhpc/mkdocs
docs: Add mkdocs documentation
2 parents b611e9f + 8b08609 commit 8199aa1

16 files changed

+799
-644
lines changed

.github/workflows/publish-docs.yml

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
---
2+
name: Publish documentation
3+
on:
4+
push:
5+
branches:
6+
- main
7+
jobs:
8+
docs-publish:
9+
name: Publish documentation
10+
runs-on: ubuntu-latest
11+
steps:
12+
- uses: actions/checkout@v3
13+
- uses: actions/setup-python@v4
14+
with:
15+
python-version: 3.x
16+
- run: pip install -r docs/requirements.txt
17+
- run: mkdocs gh-deploy --force

.github/workflows/pull-request.yml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,18 @@ jobs:
1515
- name: Build
1616
run: make build
1717
docs:
18+
runs-on: ubuntu-latest
19+
steps:
20+
- name: Checkout
21+
uses: actions/checkout@v3
22+
- name: Setup Python
23+
uses: actions/setup-python@v4
24+
with:
25+
python-version: 3.x
26+
- name: Install mkdocs
27+
run: pip install -r docs/requirements.txt
28+
- run: mkdocs build --strict
29+
rustdocs:
1830
runs-on: ubuntu-latest
1931
steps:
2032
- name: Checkout

CONTRIBUTING.md

Lines changed: 0 additions & 12 deletions
This file was deleted.

README.md

Lines changed: 20 additions & 271 deletions
Original file line numberDiff line numberDiff line change
@@ -9,289 +9,38 @@ The work is funded by the
99
and is done in collaboration with the
1010
[University of Reading](http://www.reading.ac.uk/).
1111

12-
Documentation is available on [docs.rs](https://docs.rs/reductionist/latest/reductionist/).
12+
Documentation for the Reductionist application is hosted on [GitHub](https://stackhpc.github.io/reductionist-rs).
13+
Documentation for the source code is available on [docs.rs](https://docs.rs/reductionist/latest/reductionist/).
1314

1415
This is a performant implementation of the active storage server.
1516
The original Python functional prototype is available [here](https://github.com/stackhpc/s3-active-storage-prototype).
1617

17-
## Concepts
18+
Note: The original S3 Active Storage project was renamed to Reductionist, to avoid confusion due to overuse of the term Active Storage.
1819

19-
The Reductionist server supports the application of reductions to S3 objects that contain numeric binary data. These reductions are specified by making a HTTP post request to the active storage proxy service.
20+
## Features
2021

21-
The Reductionist server does not attempt to infer the datatype - it must be told the datatype to use based on knowledge that the client already has about the S3 object.
22+
Reductionist provides the following features:
2223

23-
For example, if the original object has the following URL:
24+
* HTTP(S) API with JSON request data
25+
* Access to data stored in S3-compatible storage
26+
* Basic numerical operations on multi-dimensional arrays (count, min, max, select, sum)
27+
* Perform calculations on a selection/slice of an array
28+
* Perform calculations allowing for missing data
29+
* Compressed data (GZip, Zlib)
30+
* Filtered data (byte shuffle)
31+
* Data with non-native byte order (endianness)
32+
* Server resource (CPU, memory, files) management
33+
* [Prometheus](https://prometheus.io/) metrics
34+
* Tracing with an option to send data to [Jaeger](https://www.jaegertracing.io/)
35+
* Ansible-based containerised deployment
2436

25-
```
26-
http[s]://s3.example.org/my-bucket/path/to/object
27-
```
37+
## Related projects
2838

29-
Then Reductionist server could be used by making post requests to specfic reducer endpoints:
30-
31-
```
32-
http[s]://s3-proxy.example.org/v1/{reducer}/
33-
```
34-
35-
with a JSON payload of the form:
36-
37-
```
38-
{
39-
// The URL for the S3 source
40-
// - required
41-
"source": "https://s3.example.com/,
42-
43-
// The name of the S3 bucket
44-
// - required
45-
"bucket": "my-bucket",
46-
47-
// The path to the object within the bucket
48-
// - required
49-
"object": "path/to/object",
50-
51-
// The data type to use when interpreting binary data
52-
// - required
53-
"dtype": "int32|int64|uint32|uint64|float32|float64",
54-
55-
// The byte order (endianness) of the data
56-
// - optional, defaults to native byte order of Reductionist server
57-
"byte_order": "big|little",
58-
59-
// The offset in bytes to use when reading data
60-
// - optional, defaults to zero
61-
"offset": 0,
62-
63-
// The number of bytes to read
64-
// - optional, defaults to the size of the entire object
65-
"size": 128,
66-
67-
// The shape of the data (i.e. the size of each dimension)
68-
// - optional, defaults to a simple 1D array
69-
"shape": [20, 5],
70-
71-
// Indicates whether the data is in C order (row major)
72-
// or Fortran order (column major, indicated by 'F')
73-
// - optional, defaults to 'C'
74-
"order": "C|F",
75-
76-
// An array of [start, end, stride] tuples indicating the data to be operated on
77-
// (if given, you must supply one tuple per element of "shape")
78-
// - optional, defaults to the whole array
79-
"selection": [
80-
[0, 19, 2],
81-
[1, 3, 1]
82-
],
83-
84-
// Algorithm used to compress the data
85-
// - optional, defaults to no compression
86-
"compression": {"id": "gzip|zlib"},
87-
88-
// List of algorithms used to filter the data
89-
// - optional, defaults to no filters
90-
"filters": [{"id": "shuffle", "element_size": 4}],
91-
92-
// Missing data description
93-
// - optional, defaults to no missing data
94-
// - exactly one of the keys below should be specified
95-
// - the values should match the data type (dtype)
96-
"missing": {
97-
"missing_value": 42,
98-
"missing_values": [42, -42],
99-
"valid_min": 42,
100-
"valid_max": 42,
101-
"valid_range": [-42, 42],
102-
}
103-
}
104-
```
105-
106-
The currently supported reducers are `max`, `min`, `sum`, `select` and `count`. All reducers return the result using the same datatype as specified in the request except for `count` which always returns the result as `int64`.
107-
108-
The proxy returns the following headers to the HTTP response:
109-
110-
* `x-activestorage-dtype`: The data type of the data in the response payload. One of `int32`, `int64`, `uint32`, `uint64`, `float32` or `float64`.
111-
* `x-activestorage-byte-order`: The byte order of the data in the response payload. Either `big` or `little`.
112-
* `x-activestrorage-shape`: A JSON-encoded list of numbers describing the shape of the data in the response payload. May be an empty list for a scalar result.
113-
* `x-activestorage-count`: The number of non-missing array elements operated on while performing the requested reduction. This header is useful, for example, to calculate the mean over multiple requests where the number of items operated on may differ between chunks.
114-
115-
[//]: <> (TODO: No OpenAPI support yet).
116-
[//]: <> (For a running instance of the proxy server, the full OpenAPI specification is browsable as a web page at the `{proxy-address}/redoc/` endpoint or in raw JSON form at `{proxy-address}/openapi.json`.)
117-
118-
## Caveats
119-
120-
This is a very early-stage project, and as such supports limited functionality.
121-
122-
In particular, the following are known limitations which we intend to address:
123-
124-
* Error handling and reporting is minimal
125-
* No support for missing data
126-
* No support for encrypted objects
127-
128-
## Running
129-
130-
There are various ways to run the Reductionist server.
131-
132-
### Production deployment
133-
134-
Reductionist provides an Ansible playbook to easily deploy it and supporting
135-
services to one or more hosts. See the [deployment
136-
README](deployment/README.md) for details.
137-
138-
### Running in a container
139-
140-
The simplest method is to run it in a container using a pre-built image:
141-
142-
```sh
143-
docker run -it --detach --rm --net=host --name reductionist ghcr.io/stackhpc/reductionist-rs:latest
144-
```
145-
146-
Images are published to [GitHub Container Registry](https://github.com/stackhpc/reductionist-rs/pkgs/container/reductionist-rs) when the project is released.
147-
The `latest` tag corresponds to the most recent release, or you can use a specific release e.g. `0.1.0`.
148-
149-
This method does not require access to the source code.
150-
151-
### Building a container image
152-
153-
If you need to use unreleased changes, but still want to run in a container, it is possible to build an image.
154-
First, clone this repository:
155-
156-
```sh
157-
git clone https://github.com/stackhpc/reductionist-rs.git
158-
cd reductionist-rs
159-
```
160-
161-
```sh
162-
make build
163-
```
164-
165-
The image will be tagged as `reductionist`.
166-
The image may be pushed to a registry, or deployed locally.
167-
168-
```sh
169-
make run
170-
```
171-
172-
## Build
173-
174-
If you prefer not to run the Reductionist server in a container, it will be necessary to build a binary.
175-
Building locally may also be preferable during development to take advantage of incremental compilation.
176-
177-
### Prerequisites
178-
179-
This project is written in Rust, and as such requires a Rust toolchain to be installed in order to build it.
180-
The Minimum Supported Rust Version (MSRV) is 1.66.1, due to a dependency on the [AWS SDK](https://github.com/awslabs/aws-sdk-rust).
181-
It may be necessary to use [rustup](https://rustup.rs/) rather than the OS provided Rust toolchain to meet this requirement.
182-
See the [Rust book](https://doc.rust-lang.org/book/ch01-01-installation.html) for toolchain installation.
183-
184-
### Build and run Reductionist
185-
186-
First, clone this repository:
187-
188-
```sh
189-
git clone https://github.com/stackhpc/reductionist-rs.git
190-
cd reductionist-rs
191-
```
192-
193-
Next, use Cargo to build the package:
194-
195-
```sh
196-
cargo build --release
197-
```
198-
199-
The active storage server may be run using Cargo:
200-
201-
```sh
202-
cargo run --release
203-
```
204-
205-
Or installed to the system:
206-
207-
```sh
208-
cargo install --path . --locked
209-
```
210-
211-
Then run:
212-
213-
```sh
214-
reductionist
215-
```
216-
217-
## Testing
218-
219-
For simple testing purposes Minio is a convenient object storage server.
220-
221-
### Deploy Minio object storage
222-
223-
Start a local [Minio](https://min.io/) server which serves the test data:
224-
225-
```sh
226-
./scripts/minio-start
227-
```
228-
229-
The Minio server will run in a detached container and may be stopped:
230-
231-
```sh
232-
./scripts/minio-stop
233-
```
234-
235-
Note that object data is not preserved when the container is stopped.
236-
237-
### Upload some test data
238-
239-
A script is provided to upload some test data to minio.
240-
In a separate terminal, set up the Python virtualenv then upload some sample data:
241-
242-
```sh
243-
# Create a virtualenv
244-
python3 -m venv ./venv
245-
# Activate the virtualenv
246-
source ./venv/bin/activate
247-
# Install dependencies
248-
pip install scripts/requirements.txt
249-
# Upload some sample data to the running minio server
250-
python ./scripts/upload_sample_data.py
251-
```
252-
253-
### Compliance test suite
254-
255-
Proxy functionality can be tested using the [S3 active storage compliance suite](https://github.com/stackhpc/s3-active-storage-compliance-suite).
256-
257-
### Making requests to active storage endpoints
258-
259-
Request authentication is implemented using [Basic Auth](https://en.wikipedia.org/wiki/Basic_access_authentication) with the username and password consisting of your S3 Access Key ID and Secret Access Key, respectively. These credentials are then used internally to authenticate with the upstream S3 source using [standard AWS authentication methods](https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-auth-using-authorization-header.html)
260-
261-
A basic Python client is provided in `scripts/client.py`.
262-
First install dependencies in a Python virtual environment:
263-
264-
```sh
265-
# Create a virtualenv
266-
python3 -m venv ./venv
267-
# Activate the virtualenv
268-
source ./venv/bin/activate
269-
# Install dependencies
270-
pip install scripts/requirements.txt
271-
```
272-
273-
Then use the client to make a request:
274-
```sh
275-
venv/bin/python ./scripts/client.py sum --server http://localhost:8080 --source http://localhost:9000 --username minioadmin --password minioadmin --bucket sample-data --object data-uint32.dat --dtype uint32
276-
```
277-
278-
---
279-
280-
## Documentation
281-
282-
The source code is documented using [rustdoc](https://doc.rust-lang.org/rustdoc/what-is-rustdoc.html).
283-
Documentation is available on [docs.rs](https://docs.rs/reductionist/latest/reductionist/).
284-
It is also possible to build the documentation locally:
285-
286-
```sh
287-
cargo doc --no-deps
288-
```
289-
290-
The resulting documentation is available under `target/doc`, and may be viewed in a web browser using file:///path/to/reductionist/target/doc/reductionist/index.html.
39+
* [PyActiveStorage](https://github.com/valeriupredoi/PyActiveStorage) is a Python library which performs reductions on numerical data in data sources such as netCDF4. It has support for delegating computation to Reductionist when the data is stored in an S3-compatible object store.
29140

29241
## Contributing
29342

294-
See [CONTRIBUTING.md](CONTRIBUTING.md) for information about contributing to Reductionist.
43+
See the [contributor guide](https://stackhpc.github.io/reductionist-rs/contributing.html) for information about contributing to Reductionist.
29544

29645
## License
29746

0 commit comments

Comments
 (0)