Skip to content

Commit 6db6f94

Browse files
authored
Introduce file-based database export and import (#758)
## Usage and product changes Introduce interfaces to export databases into schema definition and data files and to import databases using these files. Database import supports files exported from both TypeDB 2.x and TypeDB 3.x. Both operations are blocking and may take a significant amount of time to execute for large databases. Use parallel connections to continue operating with the server and its other databases. Usage examples in Rust: ```rust // export let db = driver.databases().get(db_name).await.unwrap(); db.export_to_file(schema_file_path, data_file_path).await.unwrap(); // import let schema = read_to_string(schema_file_path).unwrap(); driver.databases().import_from_file(db_name2, schema, data_file_path).await.unwrap(); ``` Usage examples in Python: ```py # export database = driver.databases.get(db_name) database.export_to_file(schema_file_path, data_file_path) # import with open(schema_file_path, 'r', encoding='utf-8') as f: schema = f.read() driver.databases.import_from_file(db_name2, schema, data_file_path) ``` Usage examples in Java: ```java // export Database database = driver.databases().get(dbName); database.exportToFile(schemaFilePath, dataFilePath); // import String schema = Files.readString(Path.of(schemaFilePath)); driver.databases().importFromFile(dbName2, schema, dataFilePath); ``` ## Implementation Implemented the updated [protocol](typedb/typedb-protocol#224). As both operations work with streaming, the implementation is similar to transactions. The behavior is split into the file processing logic and networking (specialized for sync and async modes). The exposed interfaces present only the file-based versions, but additional interfaces for direct work with streams can be presented in future updates. In Rust, paths are accepted as Rust `Path`s. In other languages working through C interfaces, it's pure strings for C layer transmission. ### Database export Implemented through the `database` interface and accepts two target files for export. Does not require a specific format of naming (so it's necessarily `.typeql` or `.typedb`. If any of the target files already exist, an error is returned. The export operation consists of these steps: * prepare the output files * open a unidirectional GRPC stream from the server to the client * "block" on server response listening until an error or a "done" is received (blocking is implemented through a loop which resolves a `listen` promise presented by the network layer) * if there is a schema message, write it to the schema file and flush it right away * if there is a data items message, encode the items and write them to the data file * in case of an error, the output files are deleted (we own them as we create them at the beginning) The network layer is basically just a task listening for the GRPC stream and transmitting the converted messages to the processing loop. ### Database import Implemented through the `database_manager` interface and accepts a database name, a schema definition query string (can be read from the exported file), and an exported data file. No naming requirements as well. The import operation consists of these steps: * open the input file * open a bidirectional GRPC stream between the server and the client, send the initial request with the database's name and schema * eagerly start reading and decoding data items from the input file one by one, storing up to 250 items in the buffer (this number can be easily changed) * once the buffer is full, attempt items sending operation: this operation will check a potential early error signal from the server and then send the batch, returning to processing the rest of the file * once the file is read, send a "done" message and block until the server responds with either an error or its "done" message The network layer consists of a blocking task for client-side requests and a listening task waiting for a one-shot signal from the server (either an error or a "done" message). Errors can be received at any time of processing, while "done" is expected only after a client-side "done" request. When a response is received, either an async or a sync sink receives this message, which should be checked before any client-side network operations to ensure proper interruption.
1 parent 5c1454d commit 6db6f94

File tree

89 files changed

+2811
-188
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

89 files changed

+2811
-188
lines changed

.factory/automation.yml

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -80,11 +80,17 @@ build:
8080
find "${DOCS_DIRS[@]}" -type f ! -name 'api-reference.adoc' -exec rm -f {} \;
8181
tool/docs/update.sh
8282
git add "${DOCS_DIRS[@]}"
83-
git diff --exit-code HEAD "${DOCS_DIRS[@]}" || echo "Failed to verify docs files: please update it manually and verify the changes"
83+
git diff --exit-code HEAD "${DOCS_DIRS[@]}" || {
84+
echo "Failed to verify docs files: please update it manually and verify the changes"
85+
exit 1
86+
}
8487
8588
tool/docs/update_readme.sh
8689
git add .
87-
git diff --exit-code || echo "Failed to verify README files: please update it manually and verify the changes"
90+
git diff --exit-code || {
91+
echo "Failed to verify README files: plese update it manually and verify the changes"
92+
exit 1
93+
}
8894
8995
test-rust-unit-integration:
9096
image: typedb-ubuntu-20.04 # Ubuntu 20.04 has GLIBC version 2.31 (2020) which we should verify to compile against

Cargo.lock

Lines changed: 5 additions & 5 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

c/Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ features = {}
3131

3232
[dependencies.chrono]
3333
features = ["alloc", "android-tzdata", "clock", "default", "iana-time-zone", "js-sys", "now", "oldtime", "serde", "std", "wasm-bindgen", "wasmbind", "winapi", "windows-link"]
34-
version = "0.4.40"
34+
version = "0.4.41"
3535
default-features = false
3636

3737
[dependencies.itertools]

c/src/connection.rs

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,14 +19,13 @@
1919

2020
use std::{ffi::c_char, path::Path};
2121

22-
use itertools::Itertools;
2322
use typedb_driver::{Credentials, DriverOptions, TypeDBDriver};
2423

2524
use super::{
2625
error::{try_release, unwrap_void},
2726
memory::{borrow, free, string_view},
2827
};
29-
use crate::memory::{release, string_array_view};
28+
use crate::memory::release;
3029

3130
const DRIVER_LANG: &'static str = "c";
3231

c/src/database.rs

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
* under the License.
1818
*/
1919

20-
use std::{ffi::c_char, ptr::addr_of_mut, sync::Arc};
20+
use std::{ffi::c_char, path::Path, ptr::addr_of_mut, sync::Arc};
2121

2222
use typedb_driver::{box_stream, info::ReplicaInfo, Database};
2323

@@ -26,7 +26,7 @@ use super::{
2626
iterator::{iterator_next, CIterator},
2727
memory::{borrow, borrow_mut, free, release, release_optional, release_string, take_ownership},
2828
};
29-
use crate::memory::{decrement_arc, take_arc};
29+
use crate::memory::{decrement_arc, string_view, take_arc};
3030

3131
/// Frees the native rust <code>Database</code> object
3232
#[no_mangle]
@@ -58,6 +58,23 @@ pub extern "C" fn database_type_schema(database: *const Database) -> *mut c_char
5858
try_release_string(borrow(database).type_schema())
5959
}
6060

61+
/// Export a database into a schema definition and a data files saved to the disk.
62+
/// This is a blocking operation and may take a significant amount of time depending on the database size.
63+
///
64+
/// @param database The <code>Database</code> object to export from.
65+
/// @param schema_file The path to the schema definition file to be created.
66+
/// @param data_file The path to the data file to be created.
67+
#[no_mangle]
68+
pub extern "C" fn database_export_to_file(
69+
database: *const Database,
70+
schema_file: *const c_char,
71+
data_file: *const c_char,
72+
) {
73+
let schema_file_path = Path::new(string_view(schema_file));
74+
let data_file_path = Path::new(string_view(data_file));
75+
unwrap_void(borrow(database).export_to_file(schema_file_path, data_file_path))
76+
}
77+
6178
// /// Iterator over the <code>ReplicaInfo</code> corresponding to each replica of a TypeDB cloud database.
6279
// pub struct ReplicaInfoIterator(CIterator<ReplicaInfo>);
6380
//

c/src/database_manager.rs

Lines changed: 26 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
* under the License.
1818
*/
1919

20-
use std::{ffi::c_char, ptr::addr_of_mut, sync::Arc};
20+
use std::{ffi::c_char, path::Path, ptr::addr_of_mut, sync::Arc};
2121

2222
use typedb_driver::{box_stream, Database, TypeDBDriver};
2323

@@ -28,7 +28,7 @@ use super::{
2828
};
2929
use crate::{error::try_release_arc, iterator::iterator_arc_next};
3030

31-
/// An <code>Iterator</code> over databases present on the TypeDB server
31+
/// An <code>Iterator</code> over databases present on the TypeDB server.
3232
pub struct DatabaseIterator(CIterator<Arc<Database>>);
3333

3434
/// Forwards the <code>DatabaseIterator</code> and returns the next <code>Database</code> if it exists,
@@ -38,27 +38,47 @@ pub extern "C" fn database_iterator_next(it: *mut DatabaseIterator) -> *const Da
3838
unsafe { iterator_arc_next(addr_of_mut!((*it).0)) }
3939
}
4040

41-
/// Frees the native rust <code>DatabaseIterator</code> object
41+
/// Frees the native rust <code>DatabaseIterator</code> object.
4242
#[no_mangle]
4343
pub extern "C" fn database_iterator_drop(it: *mut DatabaseIterator) {
4444
free(it);
4545
}
4646

47-
/// Returns a <code>DatabaseIterator</code> over all databases present on the TypeDB server
47+
/// Returns a <code>DatabaseIterator</code> over all databases present on the TypeDB server.
4848
#[no_mangle]
4949
pub extern "C" fn databases_all(driver: *mut TypeDBDriver) -> *mut DatabaseIterator {
5050
try_release(
5151
borrow_mut(driver).databases().all().map(|dbs| DatabaseIterator(CIterator(box_stream(dbs.into_iter())))),
5252
)
5353
}
5454

55-
/// Create a database with the given name
55+
/// Create a database with the given name.
5656
#[no_mangle]
5757
pub extern "C" fn databases_create(driver: *mut TypeDBDriver, name: *const c_char) {
5858
unwrap_void(borrow_mut(driver).databases().create(string_view(name)));
5959
}
6060

61-
/// Checks if a database with the given name exists
61+
/// Create a database with the given name based on previously exported another database's data
62+
/// loaded from a file.
63+
/// This is a blocking operation and may take a significant amount of time depending on the database
64+
/// size.
65+
///
66+
/// @param driver The <code>TypeDBDriver</code> object.
67+
/// @param name The name of the database to be created.
68+
/// @param schema The schema definition query string for the database.
69+
/// @param data_file The exported database file path to import the data from.
70+
#[no_mangle]
71+
pub extern "C" fn databases_import_from_file(
72+
driver: *mut TypeDBDriver,
73+
name: *const c_char,
74+
schema: *const c_char,
75+
data_file: *const c_char,
76+
) {
77+
let data_file_path = Path::new(string_view(data_file));
78+
unwrap_void(borrow_mut(driver).databases().import_from_file(string_view(name), string_view(schema), data_file_path))
79+
}
80+
81+
/// Checks if a database with the given name exists.
6282
#[no_mangle]
6383
pub extern "C" fn databases_contains(driver: *mut TypeDBDriver, name: *const c_char) -> bool {
6484
unwrap_or_default(borrow_mut(driver).databases().contains(string_view(name)))

dependencies/typedb/artifacts.bzl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ def typedb_artifact():
2525
artifact_name = "typedb-all-{platform}-{version}.{ext}",
2626
tag_source = deployment["artifact"]["release"]["download"],
2727
commit_source = deployment["artifact"]["snapshot"]["download"],
28-
commit = "d83f3b2b40c673c30d00b377ce327ac0ff233056"
28+
commit = "a45d7b0003bb95e7b36ab097be468acf2398991b"
2929
)
3030

3131
#def typedb_cloud_artifact():

dependencies/typedb/repositories.bzl

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,19 +21,19 @@ def typedb_dependencies():
2121
git_repository(
2222
name = "typedb_dependencies",
2323
remote = "https://github.com/typedb/typedb-dependencies",
24-
commit = "ab777bf067b1930e35146fd8e25a76a4a360aa74", # sync-marker: do not remove this comment, this is used for sync-dependencies by @typedb_dependencies
24+
commit = "4ffeaabde31c41cee271cbb563f17168f4229a93", # sync-marker: do not remove this comment, this is used for sync-dependencies by @typedb_dependencies
2525
)
2626

2727
def typedb_protocol():
2828
git_repository(
2929
name = "typedb_protocol",
3030
remote = "https://github.com/typedb/typedb-protocol",
31-
tag = "3.2.0", # sync-marker: do not remove this comment, this is used for sync-dependencies by @typedb_protocol
31+
commit = "f6528beec33d6e3c31cb5dafc30e8f0f097c3f82", # sync-marker: do not remove this comment, this is used for sync-dependencies by @typedb_protocol
3232
)
3333

3434
def typedb_behaviour():
3535
git_repository(
3636
name = "typedb_behaviour",
3737
remote = "https://github.com/typedb/typedb-behaviour",
38-
commit = "8f9345de853ad7d0ae66e7afefd16be2cfa3dced", # sync-marker: do not remove this comment, this is used for sync-dependencies by @typedb_behaviour
38+
commit = "65258a4a3ad80be5918f33b74b1b08e25ee6fd7b", # sync-marker: do not remove this comment, this is used for sync-dependencies by @typedb_behaviour
3939
)

docs/modules/ROOT/partials/java/connection/Database.adoc

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,40 @@ Deletes this database.
2727
database.delete()
2828
----
2929

30+
[#_Database_exportToFile_java_lang_String_java_lang_String]
31+
==== exportToFile
32+
33+
[source,java]
34+
----
35+
void exportToFile​(java.lang.String schemaFilePath,
36+
java.lang.String dataFilePath)
37+
throws TypeDBDriverException
38+
----
39+
40+
Export a database into a schema definition and a data files saved to the disk. This is a blocking operation and may take a significant amount of time depending on the database size.
41+
42+
43+
[caption=""]
44+
.Input parameters
45+
[cols=",,"]
46+
[options="header"]
47+
|===
48+
|Name |Description |Type
49+
a| `schemaFilePath` a| The path to the schema definition file to be created a| `java.lang.String`
50+
a| `dataFilePath` a| The path to the data file to be created a| `java.lang.String`
51+
|===
52+
53+
[caption=""]
54+
.Returns
55+
`void`
56+
57+
[caption=""]
58+
.Code examples
59+
[source,java]
60+
----
61+
database.exportToFile("schema.typeql", "data.typedb")
62+
----
63+
3064
[#_Database_name_]
3165
==== name
3266

docs/modules/ROOT/partials/java/connection/DatabaseManager.adoc

Lines changed: 39 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ java.util.List<Database> all()
1616
throws TypeDBDriverException
1717
----
1818

19-
Retrieves all databases present on the TypeDB server
19+
Retrieves all databases present on the TypeDB server.
2020

2121

2222
[caption=""]
@@ -40,7 +40,7 @@ boolean contains​(java.lang.String name)
4040
throws TypeDBDriverException
4141
----
4242
43-
Checks if a database with the given name exists
43+
Checks if a database with the given name exists.
4444
4545
4646
[caption=""]
@@ -72,7 +72,7 @@ void create​(java.lang.String name)
7272
throws TypeDBDriverException
7373
----
7474
75-
Create a database with the given name
75+
Create a database with the given name.
7676
7777
7878
[caption=""]
@@ -128,5 +128,41 @@ a| `name` a| The name of the database to retrieve a| `java.lang.String`
128128
driver.databases().get(name)
129129
----
130130
131+
[#_DatabaseManager_importFromFile_java_lang_String_java_lang_String_java_lang_String]
132+
==== importFromFile
133+
134+
[source,java]
135+
----
136+
void importFromFile​(java.lang.String name,
137+
java.lang.String schema,
138+
java.lang.String dataFilePath)
139+
throws TypeDBDriverException
140+
----
141+
142+
Create a database with the given name based on previously exported another database's data loaded from a file. This is a blocking operation and may take a significant amount of time depending on the database size.
143+
144+
145+
[caption=""]
146+
.Input parameters
147+
[cols=",,"]
148+
[options="header"]
149+
|===
150+
|Name |Description |Type
151+
a| `name` a| The name of the database to be created a| `java.lang.String`
152+
a| `schema` a| The schema definition query string for the database a| `java.lang.String`
153+
a| `dataFilePath` a| The exported database file path to import the data from a| `java.lang.String`
154+
|===
155+
156+
[caption=""]
157+
.Returns
158+
`void`
159+
160+
[caption=""]
161+
.Code examples
162+
[source,java]
163+
----
164+
driver.databases().importFromFile(name, schema, "data.typedb")
165+
----
166+
131167
// end::methods[]
132168

docs/modules/ROOT/partials/java/transaction/QueryOptions.adoc

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,5 +82,59 @@ a| `includeInstanceTypes` a| Whether to include instance types in ConceptRow ans
8282
options.includeInstanceTypes(includeInstanceTypes);
8383
----
8484
85+
[#_QueryOptions_prefetchSize_]
86+
==== prefetchSize
87+
88+
[source,java]
89+
----
90+
@CheckReturnValue
91+
public java.util.Optional<java.lang.Integer> prefetchSize()
92+
----
93+
94+
Returns the value set for the prefetch size in this ``QueryOptions`` object. If set, specifies the number of extra query responses sent before the client side has to re-request more responses. Increasing this may increase performance for queries with a huge number of answers, as it can reduce the number of network round-trips at the cost of more resources on the server side.
95+
96+
97+
[caption=""]
98+
.Returns
99+
`public java.util.Optional<java.lang.Integer>`
100+
101+
[caption=""]
102+
.Code examples
103+
[source,java]
104+
----
105+
options.prefetchSize();
106+
----
107+
108+
[#_QueryOptions_prefetchSize_int]
109+
==== prefetchSize
110+
111+
[source,java]
112+
----
113+
public QueryOptions prefetchSize​(int prefetchSize)
114+
----
115+
116+
Explicitly set the prefetch size. If set, specifies the number of extra query responses sent before the client side has to re-request more responses. Increasing this may increase performance for queries with a huge number of answers, as it can reduce the number of network round-trips at the cost of more resources on the server side. Minimal value: 1.
117+
118+
119+
[caption=""]
120+
.Input parameters
121+
[cols=",,"]
122+
[options="header"]
123+
|===
124+
|Name |Description |Type
125+
a| `prefetchSize` a| Whether to include instance types in ConceptRow answers. a| `int`
126+
|===
127+
128+
[caption=""]
129+
.Returns
130+
`public QueryOptions`
131+
132+
[caption=""]
133+
.Code examples
134+
[source,java]
135+
----
136+
options.prefetchSize(prefetchSize);
137+
----
138+
85139
// end::methods[]
86140

0 commit comments

Comments
 (0)