Best practice for large numbers of images per item (>1000) #1357
Replies: 3 comments 2 replies
-
|
That's a really interesting question and I haven't heard of it before, so this is my speculation. I think you are right that all the frames belong under a single item.
That seems like the right approach to me, but it depends on how you expect people to use the data:
I am wondering what you are currently thinking would be in the STAC asset metadata that the user would be interested in. And is it something they would be interested in when searching the catalog or just something that they need to access the data or once they have loaded the data. |
Beta Was this translation helpful? Give feedback.
-
|
This is super interesting to think through! I really appreciate you taking the time to write it up! I should also mention that people are more than happy to talk about questions like this at the biweekly STAC Community Meetings.
Ok so is driving your current need to list every asset individually because some of your data products use certain frames and other ones use other frames? Are there multiple products derived from any given frame? Zooming out if you are worried about response size and you already have an index file then is starting to sound a bit like stac-geoparquet (if you squint). I am wondering if you could look to that work for inspiration? From what I've seen stac-geoparquet has been pretty focused on item collections and what you are talking about is more of a per-item asset collection. In this setup you would have one parquet file per item and it would have one row per asset and the information from the index file would be in there as well as all the stac-asset information. |
Beta Was this translation helpful? Give feedback.
-
Not so much that - it's that this is a low-level data product intended to give a user something close to the raw data, and in some cases, that's a lot of frames! On the processing side of things, there's a separate database with its own structure to handle the linking between the various processing outputs; at that level, we've not found a need to be as verbose as STAC is. The STAC comes in when it's packaged up as a data product to go into the customer-facing catalog. The data processing side packages up the data at a few different processing levels into STAC items, and hands these over to the customer-facing API team. Each STAC item is then used in the STAC API that faces customers for data exploration and ordering, to feed our own frontend (via the API), to ship data to customers when they order it (order item ABC123, we send you all of the assets linked to that item), etc. Having had a chat internally today, with this discussion in mind, the approach we're planning to take is to zip up all the frames for a specific item into a single asset on that data processing side, and to then ship that zip to customers. The thoughts are:
Zips only come in as a convenient data container that everything under the sun can deal with - the compression isn't going to do anything to the frames. It's more .tar than .tar.gz. On GeoParquet, that might be something to consider if we were using our STAC API to provide direct data access as well. In that hypothetical world, I think my main concern would be the barrier to entry for some of our data consumers, particularly if they're using existing third party software. The Index file is currently just a CSV, one row per frame, to keep things nice and simple. That concept came from how other satellite data providers have presented similar products - just roll a dice to choose between whether it's CSV, XML, and JSON. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all,
I'm currently considering the problem of how best to arrange STAC metadata for a data product with a large number (25-1500) of images. It's a low-level data product providing the consumer with the individual frames from a frame camera system during a single acquisition, and so it feels like they belong together as an "item" (this is also how we intend to distribute them, not as individual frames).
The challenging part is how (if at all) to represent them as assets. Things which make that a bit difficult are:
frame1present,frame2missing, thenframe3present wouldn't be very friendly, butframe1,frame2, and no further frames is OK.We do already have an "index" file giving further metadata on a per-frame basis, and one option could be to push everything to do with the frames down to that level, but a data consumer wouldn't then be able to rely on the STAC metadata.
Are there any examples of how others have approached this sort of problem?
Beta Was this translation helpful? Give feedback.
All reactions