Skip to content

Commit 85f36e8

Browse files
authored
docs: update docs and examples (#141)
* docs: update README * docs: update User and Contributor Guides * docs: update SSH examples * docs: update rest api examples
1 parent 56245a6 commit 85f36e8

File tree

5 files changed

+69
-114
lines changed

5 files changed

+69
-114
lines changed

README.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,7 @@
66

77
API to interact with a few AIND databases. We have two primary databases:
88

9-
1. A document database (DocDB) to store
10-
unstructured json documents. The DocDB contains AIND metadata.
9+
1. A document database (DocDB) to store unstructured JSON documents. The DocDB contains AIND metadata.
1110
2. A relational database to store structured tables.
1211

1312
## Installation

docs/source/Contributing.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ For development,
3232
off of ``main``
3333

3434
Consult the `Branches and Pull Requests <#branches-and-pull-requests>`__
35-
and `Release Cycles <#release-cycles>`__ for more details.
35+
and `Release Cycles <#release-cycles>`__ sections for more details.
3636

3737
From the root directory, run:
3838

@@ -164,7 +164,7 @@ Hotfixes
164164
~~~~~~~~
165165

166166
- A ``hotfix`` branch is created off of ``main``
167-
- A Pull Request into is ``main`` is opened, reviewed, and merged into
167+
- A Pull Request into ``main`` is opened, reviewed, and merged into
168168
``main``
169169
- A new ``tag`` with a patch bump is created, and a new ``release`` is
170170
deployed
Lines changed: 11 additions & 93 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,16 @@
1-
Examples - DocDB Direct Connection
2-
==================================
1+
Examples - DocDB Direct Connection (SSH)
2+
========================================
33

4-
This page provides examples for interact with the Document Database (DocDB)
5-
using the provided Python client.
4+
This page provides examples to interact with the Document Database (DocDB)
5+
using the provided Python client and SSH tunneling.
66

77
It is assumed that the required credentials are set in environment.
88
Please refer to the User Guide for more details.
99

10+
.. note::
11+
12+
It is recommended to use the REST API client instead of the SSH client for ease of use.
13+
1014

1115
Querying Metadata
1216
~~~~~~~~~~~~~~~~~~~~~~
@@ -43,7 +47,7 @@ Filter Example 1: Get records with a certain subject_id
4347
4448
4549
With projection (recommended):
46-
50+
4751
.. code:: python
4852
4953
with DocumentDbSSHClient(credentials=credentials) as doc_db_client:
@@ -115,94 +119,8 @@ Aggregation Example 1: Get all subjects per breeding group
115119
doc_db_client.collection.aggregate(pipeline=agg_pipeline)
116120
)
117121
print(f"Total breeding groups: {len(result)}")
118-
print(f"First 3 breeding groups and corresponding subjects:")
122+
print("First 3 breeding groups and corresponding subjects:")
119123
print(json.dumps(result[:3], indent=3))
120124
121125
For more info about aggregations, please see MongoDB documentation:
122-
https://www.mongodb.com/docs/manual/aggregation/
123-
124-
125-
Updating Metadata
126-
~~~~~~~~~~~~~~~~~~~~~~
127-
128-
Below is an example of how to update records in DocDB using ``DocumentDbSSHClient``.
129-
130-
.. code:: python
131-
132-
import logging
133-
134-
from aind_data_access_api.document_db_ssh import (
135-
DocumentDbSSHClient,
136-
DocumentDbSSHCredentials,
137-
)
138-
139-
logging.basicConfig(level="INFO")
140-
141-
def _process_docdb_record(record: dict, doc_db_client: DocumentDbSSHClient, dryrun: bool) -> None:
142-
"""
143-
Process record. This example updates the data_description.name field
144-
if it does not match the record.name field.
145-
146-
Parameters
147-
----------
148-
record : dict
149-
150-
"""
151-
_id = record.get("_id")
152-
name = record.get("name")
153-
location = record.get("location")
154-
if _id:
155-
if record.get("data_description") and record["data_description"].get("name") != name:
156-
# update specific fields(s) only
157-
new_fields = {
158-
"data_description.name": name
159-
}
160-
update_docdb_record_partial(record_id=_id, new_fields=new_fields, doc_db_client=doc_db_client, dryrun=dryrun)
161-
# else:
162-
# logging.info(f"Record for {location} does not need to be updated.")
163-
else:
164-
logging.warning(f"Record for {location} does not have an _id field! Skipping.")
165-
166-
167-
def update_docdb_record_partial(record_id: str, new_fields: dict, doc_db_client: DocumentDbSSHClient, dryrun: bool) -> None:
168-
"""
169-
Update record in docdb by updating specific fields only.
170-
Parameters
171-
----------
172-
record_id : str
173-
The _id of the record to update.
174-
new_fields : dict
175-
New fields to update. E.g. {"data_description.name": "new_name"}
176-
177-
"""
178-
if dryrun:
179-
logging.info(f"(dryrun) doc_db_client.collection.update_one (partial): {record_id}")
180-
else:
181-
logging.info(f"doc_db_client.collection.update_one (partial): {record_id}")
182-
response = doc_db_client.collection.update_one(
183-
{"_id": record_id},
184-
{"$set": new_fields},
185-
upsert=False,
186-
)
187-
logging.info(response.raw_result)
188-
189-
190-
if __name__ == "__main__":
191-
credentials = DocumentDbSSHCredentials() # credentials in environment
192-
dryrun = True
193-
filter = {"location": {"$regex": ".*s3://aind-open-data.*"}}
194-
projection = None
195-
196-
with DocumentDbSSHClient(credentials=credentials) as doc_db_client:
197-
db_name = doc_db_client.database_name
198-
col_name = doc_db_client.collection_name
199-
# count = doc_db_client.collection.count_documents(filter)
200-
# logging.info(f"{db_name}.{col_name}: Found {count} records with {filter}")
201-
202-
logging.info(f"{db_name}.{col_name}: Starting to scan for {filter}.")
203-
records = doc_db_client.collection.find(
204-
filter=filter,
205-
)
206-
for record in records:
207-
_process_docdb_record(record=record, doc_db_client=doc_db_client, dryrun=dryrun)
208-
logging.info(f"{db_name}.{col_name}:Finished scanning through DocDb.")
126+
https://www.mongodb.com/docs/manual/aggregation/

docs/source/ExamplesDocDBRestApi.rst

Lines changed: 40 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Examples - DocDB REST API
22
==================================
33

4-
This page provides examples for interact with the Document Database (DocDB)
4+
This page provides examples to interact with the Document Database (DocDB)
55
REST API using the provided Python client.
66

77

@@ -46,7 +46,7 @@ Filter Example 1: Get records with a certain subject_id
4646
4747
4848
With projection (recommended):
49-
49+
5050
.. code:: python
5151
5252
filter = {"subject.subject_id": "731015"}
@@ -110,13 +110,13 @@ Aggregation Example 1: Get all subjects per breeding group
110110
"subject_ids": {"$addToSet": "$subject.subject_id"},
111111
"count": {"$sum": 1},
112112
}
113-
}
113+
}
114114
]
115115
result = docdb_api_client.aggregate_docdb_records(
116116
pipeline=agg_pipeline
117117
)
118118
print(f"Total breeding groups: {len(result)}")
119-
print(f"First 3 breeding groups and corresponding subjects:")
119+
print("First 3 breeding groups and corresponding subjects:")
120120
print(json.dumps(result[:3], indent=3))
121121
122122
For more info about aggregations, please see MongoDB documentation:
@@ -125,7 +125,7 @@ https://www.mongodb.com/docs/manual/aggregation/
125125
Advanced Example: Custom Session Object
126126
-------------------------------------------
127127

128-
It's possible to attach a custom Session to retry certain requests errors
128+
It's possible to attach a custom Session to retry certain requests errors:
129129

130130
.. code:: python
131131
@@ -157,6 +157,31 @@ It's possible to attach a custom Session to retry certain requests errors
157157
) as docdb_api_client:
158158
records = docdb_api_client.retrieve_docdb_records(limit=10)
159159
160+
Utility Methods
161+
---------------
162+
163+
A few utility methods are provided in the :mod:`aind_data_access_api.utils` module
164+
to help with interacting with the DocDB API.
165+
166+
For example, to fetch records that match any value in a list of subject IDs:
167+
168+
.. code:: python
169+
170+
from aind_data_access_api.utils import fetch_records_by_filter_list
171+
172+
records = fetch_records_by_filter_list(
173+
docdb_api_client=docdb_api_client,
174+
filter_key="subject.subject_id",
175+
filter_values=["731015", "741137", "789012"],
176+
projection={
177+
"name": 1,
178+
"location": 1,
179+
"subject.subject_id": 1,
180+
"data_description.project_name": 1,
181+
},
182+
)
183+
print(f"Found {len(records)} records. First 3 records:")
184+
print(json.dumps(records[:3], indent=3))
160185
161186
162187
Updating Metadata
@@ -166,7 +191,10 @@ Updating Metadata
166191
2. **Query DocDB**: Filter for the records you want to update.
167192
3. **Update DocDB**: Use ``upsert_one_docdb_record`` or ``upsert_list_of_docdb_records`` to update the records.
168193

169-
Please note that records are read and written as dictionaries from DocDB (not Pydantic models).
194+
.. note::
195+
196+
Records must be read and written as dictionaries from DocDB (not Pydantic models).
197+
170198
For example, to update the "instrument" and "session" metadata of a record in DocDB:
171199

172200
.. code:: python
@@ -214,7 +242,10 @@ You can also make updates to individual nested fields:
214242
)
215243
print(response.json())
216244
217-
Please note that while DocumentDB supports fieldnames with special characters ("$" and "."), they are not recommended.
218-
There may be issues querying or updating these fields.
245+
.. note::
246+
247+
While DocumentDB supports fieldnames with special characters ("$" and "."), they are not recommended.
248+
There may be issues querying or updating these fields.
219249

220-
It is recommended to avoid these special chars in dictionary keys, e.g. ``{"abc.py": "data"}`` can be written as ``{"filename": "abc.py", "some_file_property": "data"}`` instead.
250+
It is recommended to avoid these special chars in dictionary keys. E.g. ``{"abc.py": "data"}`` can be
251+
written as ``{"filename": "abc.py", "some_file_property": "data"}`` instead.

docs/source/UserGuide.rst

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ with AIND databases.
88
We have two primary databases:
99

1010
1. A `document database (DocDB) <#document-database-docdb>`__ to store
11-
unstructured json documents. The DocDB contains AIND metadata.
11+
unstructured JSON documents. The DocDB contains AIND metadata.
1212
2. A `relational database <#rds-tables>`__ to store structured tables.
1313

1414
Document Database (DocDB)
@@ -30,7 +30,7 @@ The DocDB can be accessed through a public read-only REST API or
3030
through a direct connection using SSH. For a direct connection,
3131
it is assumed you have the appropriate credentials.
3232

33-
REST API (Read-Only)
33+
REST API
3434
~~~~~~~~~~~~~~~~~~~~~~
3535

3636
1. A GET request to ``https://api.allenneuraldynamics.org/v1/metadata_index/data_assets``
@@ -47,7 +47,7 @@ REST API (Read-Only)
4747
response = requests.get(URL, params={"filter": json.dumps(filter), "limit": limit})
4848
print(response.json())
4949
50-
2. Alternatively, we provide a Python client:
50+
2. **We provide a Python client (recommended):**
5151

5252
.. code:: python
5353
@@ -78,7 +78,7 @@ with our document database.
7878

7979
To connect:
8080

81-
1. If provided a temporary SSH password, please first run ``ssh {ssh-username}@{ssh_host}``
81+
1. If provided a temporary SSH password, please first run ``ssh {ssh_username}@{ssh_host}``
8282
and set a new password.
8383
2. Download the full version of `MongoDB Compass <https://www.mongodb.com/try/download/compass>`__.
8484
3. When connecting, click “Advanced Connection Options” and use the configurations below.
@@ -107,7 +107,7 @@ To connect:
107107
- SSL/TLS Connection
108108
- OFF
109109
* - Proxy/SSH
110-
- SSH Tunnel/ Proxy Method
110+
- SSH Tunnel/ Proxy Method
111111
- SSH with Password
112112
* -
113113
- SSH Hostname
@@ -119,10 +119,16 @@ To connect:
119119
- SSH Username
120120
- ``ssh_username``
121121
* -
122-
- SSH Username
122+
- SSH Password
123123
- ``ssh_password``
124-
125-
4. You should be able to see the home page with the ``metadata-index`` database.
124+
* - (Optional) Advanced
125+
- Read Preference
126+
- Secondary Preferred
127+
* -
128+
- Replica Set Name
129+
- rs0
130+
131+
4. You should be able to see the home page with the ``metadata_index`` database.
126132
It should have 1 single collection called ``data_assets``.
127133
5. If provided with a temporary DocDB password, please change it using the embedded
128134
mongo shell in Compass, and then reconnect.
@@ -180,6 +186,7 @@ To use the client:
180186
181187
RDS Tables
182188
------------------
189+
183190
We have some convenience methods to interact with our Relational Database. You can create a client by
184191
explicitly setting credentials, or downloading from AWS Secrets Manager.
185192

0 commit comments

Comments
 (0)