Skip to content

Commit abb378e

Browse files
pcmoritzericl
andauthored
[REP] Refining the Ray AIR Surface API (#36)
* [REP] Refining the Ray AIR Surface API Signed-off-by: Philipp Moritz <[email protected]> Co-authored-by: Eric Liang <[email protected]>
1 parent 6ba5e26 commit abb378e

File tree

1 file changed

+172
-0
lines changed

1 file changed

+172
-0
lines changed

reps/2023-07-08-air-surface-syntax.md

Lines changed: 172 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,172 @@
1+
# Refining the Ray AIR Surface API
2+
3+
## Summary
4+
5+
Disband the `ray.air` namespace and get rid of Ray AIR sessions.
6+
7+
### General Motivation
8+
9+
Ray AIR has made it significantly easier to use Ray's scalable machine learning
10+
libraries
11+
- Ray Data for batch inference and last mile data processing and ingestion,
12+
- Ray Train for machine learning training and
13+
- Ray Serve for model and application serving
14+
15+
together.
16+
17+
One piece of feedback we have frequently received from users is that they are confused how Ray AIR
18+
relates to the individual libraries. In particular:
19+
- When should I use AIR's abstractions (e.g. should I use `BatchPredictor` or use Ray Data's map functionality,
20+
should I use `PredictorDeployment` or deploy my model with Ray Serve directly?) and
21+
- How does the `ray.air` namespace relate to `ray.data`, `ray.train` and `ray.serve`?
22+
23+
The `ray.air` namespace both containing low level common utilities as well as highler level
24+
abstraction adds to this confusion. We have also learned that the higher level abstractions we
25+
originally introduced for Ray AIR become unneccessary and the same functionality can nicely be achieved
26+
with the libraries themselves by making the libraries a little more interoperable.
27+
28+
We have already implemented this strategy by replacing `BatchPredictor` with Ray Data native functionality
29+
(see https://github.com/ray-project/enhancements/blob/main/reps/2023-03-10-batch-inference.md and
30+
https://docs.ray.io/en/master/data/batch_inference.html) and by
31+
improving Ray Train's ingestion APIs
32+
(https://github.com/ray-project/enhancements/blob/main/reps/2023-03-15-train-api.md and
33+
https://docs.ray.io/en/master/ray-air/check-ingest.html).
34+
35+
As a result of these changes, the `ray.air` namespace has become less and less relevant, and in this
36+
REP we propose to go all the way and remove it altogether in line with the Zen of Python
37+
```
38+
There should be one -- and preferably only one -- obvious way to do it.
39+
```
40+
This solves the confusions mentioned above and makes the Ray AIR APIs more coherent and focused around
41+
the cricital workloads (`ray.data` for batch inference, `ray.train` for training and `ray.serve` for serving).
42+
43+
### Should this change be within `ray` or outside?
44+
45+
main `ray` project. Changes are made to Ray Train, Tune and AIR.
46+
47+
## Stewardship
48+
49+
### Required Reviewers
50+
The proposal will be open to the public, but please suggest a few experienced Ray contributors in this technical domain whose comments will help this proposal. Ideally, the list should include Ray committers.
51+
52+
@matthewdeng, @krfricke
53+
54+
### Shepherd of the Proposal (should be a senior committer)
55+
To make the review process more productive, the owner of each proposal should identify a **shepherd** (should be a senior Ray committer). The shepherd is responsible for working with the owner and making sure the proposal is in good shape (with necessary information) before marking it as ready for broader review.
56+
57+
@ericl
58+
59+
## Details of the API changes
60+
61+
Concretely, we replace the Ray AIR session with a training context to
62+
1. avoid the user confusion of what a `session` is (and not having to explain in the documentation) and
63+
2. bring the API in line with other Ray APIs like `get_runtime_context` as well as Ray Data's `DataContext`.
64+
65+
The API changes are
66+
```
67+
from ray import air, train
68+
69+
# Ray Train methods and classes:
70+
71+
air.session.report -> train.report
72+
air.session.get_dataset_shard -> train.get_dataset_shard
73+
air.session.get_checkpoint -> train.get_checkpoint
74+
air.Checkpoint -> train.Checkpoint
75+
air.Result -> train.Result
76+
77+
# Ray Train configurations:
78+
79+
air.config.CheckpointConfig -> train.CheckpointConfig
80+
air.config.FailureConfig -> train.FailureConfig
81+
air.config.RunConfig -> train.RunConfig
82+
air.config.ScalingConfig -> train.ScalingConfig
83+
84+
# Ray TrainContext methods:
85+
86+
air.session.get_experiment_name -> train.get_context().get_experiment_name
87+
air.session.get_trial_name -> train.get_context().get_trial_name
88+
air.session.get_trial_id -> train.get_context().get_trial_id
89+
air.session.get_trial_resources -> train.get_context().get_trial_resources
90+
air.session.get_trial_dir -> train.get_context().get_trial_dir
91+
air.session.get_world_size -> train.get_context().get_world_size
92+
air.session.get_world_rank -> train.get_context().get_world_rank
93+
air.session.get_local_rank -> train.get_context().get_local_rank
94+
air.session.get_local_world_size -> train.get_context().get_local_world_size
95+
air.session.get_node_rank -> train.get_context().get_node_rank
96+
97+
del air
98+
```
99+
100+
These changes are ready to try out with https://github.com/ray-project/ray/pull/36706 and we encourage user feedback on the changes.
101+
102+
## Open Questions
103+
104+
We are likely going to remove `PredictorWrapper` and `PredictorDeployment` and migrate the examples to use Ray Serve deployments
105+
direcly, and we are also likely going to move `air.integrations` to `train.integrations`.
106+
107+
For the `PredictorDeployment` removal, the user code will change from
108+
```python
109+
from ray import serve
110+
from ray.serve import PredictorDeployment
111+
from ray.serve.http_adapters import pandas_read_json
112+
from ray.train.xgboost import XGBoostPredictor
113+
114+
# checkpoint = ...
115+
116+
serve.run(
117+
PredictorDeployment.options(name="XGBoostService").bind(
118+
XGBoostPredictor, checkpoint, http_adapter=pandas_read_json
119+
)
120+
)
121+
```
122+
to
123+
```python
124+
import pandas as pd
125+
from starlette.requests import Request
126+
from ray import serve
127+
from ray.train.xgboost import XGBoostTrainer
128+
129+
# checkpoint = ...
130+
131+
@serve.deployment
132+
class XGBoostService:
133+
def __init__(self, checkpoint):
134+
self.model = XGBoostTrainer.get_model(checkpoint)
135+
136+
async def __call__(self, http_request: Request):
137+
input = await http_request.body()
138+
data = pd.read_json(input.decode(), **http_request.query_params)
139+
return self.model.predict(data)
140+
141+
serve.run(XGBoostService.bind(checkpoint))
142+
```
143+
144+
This is almost as simple but a lot more explicit, removes the magic, and can
145+
be easily adapted to different settings. Furthermore it is more unified with
146+
the Ray Serve documentation and the way Ray Serve is typically used.
147+
148+
## Internal changes
149+
150+
As part of this effort, we are also making the recommendation to completely
151+
remove the `air` namespace also for internal use (just to make things clearer
152+
for developers). This work does not need to be connected to a specific release
153+
and here is an idea on where things could go:
154+
155+
- `air.examples` -- don't have the examples in the source tree, instead put
156+
them into the `ray/doc` folder
157+
- `air.execution` -- due to the layering of Tune depending on Train but not
158+
the other way around, most likely `train._internal` is the right place for these.
159+
- `air.util` -- the tensor extension functionality should go into `ray.data`,
160+
the torch related functions into `ray.train.torch`.
161+
162+
If there are any other common internal utilities that are unaccounted for,
163+
most likely `train._internal` is a good place to put them.
164+
165+
## Migration Plan
166+
167+
We acknowledge that these kinds of API changes are very taxing on our users and we paid special attention that the migration can be done
168+
easily as a simple text substitution without needing large changes for existing code bases. To enable a smooth migration, both APIs will
169+
work for the Ray 2.7 release.
170+
171+
Examples and documentation will be fully converted by Ray 2.7 and the old versions of the APIs will print deprecation warnings together
172+
with instructions on how the user code needs to be upgraded.

0 commit comments

Comments
 (0)