@@ -185,3 +185,59 @@ the interface as describe in the :ref:`Custom Table Provider <io_custom_table_pr
185
185
section. This is an advanced topic, but a
186
186
`user example <https://github.com/apache/datafusion-python/tree/main/examples/ffi-table-provider >`_
187
187
is provided in the DataFusion repository.
188
+
189
+ Catalog
190
+ =======
191
+
192
+ A common technique for organizing tables is using a three level hierarchical approach. DataFusion
193
+ supports this form of organizing using the :py:class: `~datafusion.catalog.Catalog `,
194
+ :py:class: `~datafusion.catalog.Schema `, and :py:class: `~datafusion.catalog.Table `. By default,
195
+ a :py:class: `~datafusion.context.SessionContext ` comes with a single Catalog and a single Schema
196
+ with the names ``datafusion `` and ``default ``, respectively.
197
+
198
+ The default implementation uses an in-memory approach to the catalog and schema. We have support
199
+ for adding additional in-memory catalogs and schemas. This can be done like in the following
200
+ example:
201
+
202
+ .. code-block :: python
203
+
204
+ from datafusion.catalog import Catalog, Schema
205
+
206
+ my_catalog = Catalog.memory_catalog()
207
+ my_schema = Schema.memory_schema()
208
+
209
+ my_catalog.register_schema(" my_schema_name" , my_schema)
210
+
211
+ ctx.register_catalog(" my_catalog_name" , my_catalog)
212
+
213
+ You could then register tables in ``my_schema `` and access them either through the DataFrame
214
+ API or via sql commands such as ``"SELECT * from my_catalog_name.my_schema_name.my_table" ``.
215
+
216
+ User Defined Catalog and Schema
217
+ -------------------------------
218
+
219
+ If the in-memory catalogs are insufficient for your uses, there are two approaches you can take
220
+ to implementing a custom catalog and/or schema. In the below discussion, we describe how to
221
+ implement these for a Catalog, but the approach to implementing for a Schema is nearly
222
+ identical.
223
+
224
+ DataFusion supports Catalogs written in either Rust or Python. If you write a Catalog in Rust,
225
+ you will need to export it as a Python library via PyO3. There is a complete example of a
226
+ catalog implemented this way in the
227
+ `examples folder <https://github.com/apache/datafusion-python/tree/main/examples/ >`_
228
+ of our repository. Writing catalog providers in Rust provides typically can lead to significant
229
+ performance improvements over the Python based approach.
230
+
231
+ To implement a Catalog in Python, you will need to inherit from the abstract base class
232
+ :py:class: `~datafusion.catalog.CatalogProvider `. There are examples in the
233
+ `unit tests <https://github.com/apache/datafusion-python/tree/main/python/tests >`_ of
234
+ implementing a basic Catalog in Python where we simply keep a dictionary of the
235
+ registered Schemas.
236
+
237
+ One important note for developers is that when we have a Catalog defined in Python, we have
238
+ two different ways of accessing this Catalog. First, we register the catalog with a Rust
239
+ wrapper. This allows for any rust based code to call the Python functions as necessary.
240
+ Second, if the user access the Catalog via the Python API, we identify this and return back
241
+ the original Python object that implements the Catalog. This is an important distinction
242
+ for developers because we do *not * return a Python wrapper around the Rust wrapper of the
243
+ original Python object.
0 commit comments