-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Overview
This design introduces tiered storage support in Osiris, allowing stream segment files and index files to be offloaded from local disk to alternative storage backends. It preserves the existing write-path semantics, ensuring that writes continue to operate as before. The read path is extended to support retrieving segments of data from multiple "sources," including both local disk and cold storage. The system is designed with a pluggable architecture, enabling integration with arbitrary cold storage backends.
Goals
- Enable storage tiering for stream segment and index files
- Minimize changes to existing Osiris segment file management, writing and reading logic
- Provide native support for tiered storage for local mounted drives
- Provide pluggable support for tiered storage for other storage backends (cloud storage, future extensions)
- Maintain a manifest that tracks segment (and index) file locations and tiers as well as metadata for file size and age, in a manifest file per stream
- Support more than one level of tiered storage via a simple tier index
- Writes and segment data replication between members remain unchanged
Terminology
- Drive: A pluggable module implementing storage access. Could represent local disk, cloud storage, mounted NAS, EBS, or any cold storage layers.
- Manifest: A binary Erlang term describing which drive each segment is located on, along with metadata such as offset, size, timestamp, and tier level.
Architecture
Manifest File
Each stream has a corresponding manifest file on disk, encoded as an Erlang term. It is stored and updated locally by the node and can be replicated using RabbitMQ’s existing distribution mechanisms.
Example structure:
#{
<<"segment-00001">> =>
#{offset => 0,
drive => local,
path => "/var/lib/rabbitmq/segments/segment-00001",
size => 52428800, %% in bytes
timestamp => 1712500000, %% epoch seconds
tier => 0},
<<"segment-00002">> =>
#{offset => 50000,
drive => s3,
path => "s3://rabbitmq-bucket/streams/segment-00002",
size => 52428800,
timestamp => 1712503600,
tier => 1}
}.
Manifest Module
A new osiris_manifest module will:
- Load and cache the manifest on stream startup?
- Allow efficient lookups of segment metadata by name
- Provide functions to update metadata entries
- Serialize and flush the manifest to disk
API Sketch:
-spec get(SegmentId) -> SegmentMeta.
-spec update(SegmentId, SegmentMeta) -> ok.
-spec delete(SegmentId) -> ok.
-spec all() -> SegmentMeta.
Drive Abstraction
Drives implement the behavior for reading segments, given a manifest entry. Example behaviors:
-callback open(File, Opts) -> {ok, io_device()} | {error, term()}.
-callback read_file_info(File) -> {ok, FileInfo} | {error, Reason}.
This approach will allow Osiris to use the file
module just like it does today, regardless of where the io_device
fetches the data from.
Storage Policy
- New messages always write to Tier 0.
- Older, less-accessed segments migrate to higher tiers based on retention config
- This can start out using the current max-age and max-length-bytes but move the files one tier up instead of delete the segment files.
- Reads check the manifest for the segment’s tier and retrieve data accordingly.
Integration with Osiris
- When Osiris opens a segment, it asks the manifest for the path and drive type.
- Osiris uses the drive module to open the file and read from it.
- Writing remains unchanged: writes go to local disk, and segments may be offloaded to higher tier drives based on retention settings.
Replication
- The manifest file is only updated by the stream leader for the stream
- Manifests are replicated (how?) to each member
Benefits
- Minimal disruption to existing Osiris write logic
- Extensible to any number of tier levels
- No impact on publishers; only affects stream consumers
- Supports heterogeneous clusters with multiple storage strategies
- Flexible manifest metadata including offset, size, timestamp, and tier level
Open Questions
- Instead of reading from a higher tier, we could ‘move’ segment files back to tier 0 storage when they are read from
Future Work
- UI/CLI support for viewing and managing segment tiers