HPE Blog, Poland
1850094 Członkowie
2799 Online
104050 Rozwiązania
Nowy artykuł
PiotrDrag

Key-value store - a perfect choice for object storage

HPE20160525617_1600_0_72_RGB.jpg

Storing objects as simple key‑value pairs makes it easy to save and retrieve each item by a unique name, keeping the object’s data and its small amount of descriptive information together. This straightforward approach matches how people and applications normally use object storage, focusing on direct saves and reads rather than complex searches. Because the system treats each object independently, it can spread data across many machines to grow smoothly and stay reliable even when parts fail. The result is a storage model that is fast, durable, and easy for developers and operators to work with.

What is key-value (KV) store

A key-value store is a type of NoSQL database that stores data as simple pairs: a unique key and an associated value. Think of it like a dictionary or hash table where you look up a value by providing its key.

A key-value store uses a simple data model in which each record consists of a unique key and an associated value, enabling straightforward data organization and retrieval. These systems are optimized for fast lookups, providing low-latency GET, PUT, and DELETE operations based on keys. The values they store are highly flexible, capable of holding blobs, JSON documents, binary objects, or structured data without requiring a fixed schema.

Designed for horizontal scalability, many key-value architectures distribute, or "shard," data across multiple nodes to handle growing workloads efficiently. While they excel at rapid direct access, they generally provide minimal query capabilities, lacking built-in support for complex queries, joins, or secondary indexes unless such features are added externally. Additionally, different implementations offer varying consistency and replication models, balancing trade-offs among consistency, availability, and latency—such as choosing between strong and eventual consistency depending on application needs.

The main limitations of key-value stores are that they are not well-suited for complex queries across multiple fields, performing relational joins, or running analytics without adding additional processing layers or complementary systems. Furthermore, their support for range scans and ordered queries can be limited, unless the specific implementation includes mechanisms for maintaining ordered keys.

Key-value stores are commonly used for caching (for example, Redis or Memcached), session stores for web applications, and managing feature flags and configuration data. They’re also ideal for fast lookup tables and user profile storage, serving as metadata and indexing backends for object storage, and supporting high-throughput ingestion and real-time analytics where low-latency key-based access is required.

Common operations:

  • PUT/SET: store or update a value under a key.
  • GET: retrieve the value for a key.
  • DELETE/REMOVE: remove the key and its value.
  • Optional: conditional updates (compare-and-swap), TTL (time-to-live), and atomic counters.

Keyvalue (KV) store - an ideal fit for object storage

Object storage maps naturally to a KV model: each object (data + metadata) can be stored and retrieved by a unique key. The simplicity of PUT/GET semantics aligns with common object store operations and supports scale, performance, durability and operational simplicity.

KV stores natively represent the one-to-one mapping required by object storage without forcing a hierarchy or complex schema. Metadata can be embedded in the value or stored alongside as a small side‑value, preserving atomicity of an object’s data+metadata. KV stores are designed to scale horizontally via consistent hashing or range partitioning. That enables even data distribution, automatic re‑sharding, and predictable scale-out for billions of objects. Stateless routing (key-based) simplifies load balancing and node addition/removal.

KV stores provide simple and efficient access paths to data. Object workloads are dominated by direct lookups (PUT/GET/DELETE). KV stores optimize for low-latency point reads and writes, avoiding overhead of secondary indexing or complex query planning. Sequential or range operations are uncommon for object stores; KV systems can omit heavy range-index machinery and optimize for the common case.

KV store.png

Replication and anti-entropy mechanisms in KV systems ensure high availability and durability of objects across failures. Many KV implementations support configurable replication factors, quorum writes/reads, and eventual or strong consistency trade-offs, matching different object-store SLAs. They also support integration of erasure coding and tiering for cost-effective capacity management. Inline compression and deduplication strategies can be applied per-value to reduce footprint, fitting object access patterns.

KV stores are also very efficient with write/read performance. Log-structured and append-optimized backends (common in KV systems) convert random small writes into sequential disk activity, improving throughput for object ingestion and compaction behavior that suits immutable or append-only object patterns. Caching layers and local hot-key optimizations reduce read latency for frequently accessed objects.

KV-based object storage are also simple to manage. Using lightweight APIs (PUT/GET/DELETE/LIST) that mirror KV interfaces, enabling simpler client libraries and lower cognitive load for developers/operators. Key-based routing simplifies monitoring, repair, and rebalancing workflows. KV stores often expose atomic operations (compare-and-swap, conditional puts) which facilitate safe object versioning, optimistic concurrency control, and multipart upload patterns without complex transaction systems. Lightweight metadata can be attached to values or stored in a companion KV entry. External indexing or secondary services can provide search capabilities without complicating the core object store.

When KV stores are less ideal for workloads requiring rich queries across object contents or complex secondary indexing are not a KV strength. They need search/index layers. Strongly hierarchical directory semantics (POSIX-like) are better served by file systems or object front-ends that emulate directory behavior. Range‑scan dependent workloads require KV systems that support ordered keys. Otherwise additional design is needed.

Log-structured key-value store

A log-structured data layout is a storage design where all updates are written sequentially to an append-only log, rather than overwriting data in place. It treats storage as a continuous stream of immutable records (log entries) and organizes reads/writes around that sequential append pattern.

How it works

  • Writes: New or updated data is appended to the end of the log as a new record. The old copy remains until reclaimed.
  • Indexing: A separate index or memtable maps keys to the log locations (offsets) so reads can find the latest record for a key.
  • Reads: To read a key, the system consults the index to locate the most recent log entry and reads it from disk.
  • Garbage collection/compaction: Because updates create multiple versions, background processes scan the log, identify obsolete records (older versions or deleted items), and rewrite live records into new segments, freeing space from old segments.
  • Checkpointing and snapshots: Periodic checkpoints consolidate state (e.g., flush memtable to disk) so recovery only needs to replay recent log segments.

Using a log or append‑only approach turns many small random writes into large sequential writes, which are much faster on spinning disks and still beneficial on flash by reducing extra write work and improving throughput. Because data is appended rather than updated in place, the system can sustain very high write rates without complex locking or coordination. The log also provides a clear, durable history of operations that makes crash recovery straightforward—replaying recent entries can rebuild state. This model is especially simple and effective for workloads that treat objects as immutable or follow append‑only patterns.

Using a key‑value model brings trade‑offs: background garbage collection and compaction can consume CPU and I/O, create extra writes, and need careful tuning, while data fragmentation and temporary multiple versions can lead to extra reads and higher space use until cleanup finishes. Those maintenance tasks can also cause occasional latency spikes and add implementation complexity for efficient, low‑impact compaction and indexing. Despite these costs, this approach is well suited to workloads that demand very high write throughput—such as time‑series data, logs, or ingestion pipelines—and to systems that work with many immutable or append‑only objects or that need fast crash recovery. It also fits designs that can tolerate or hide background compaction effects by using caching, indexing, or other mitigations to manage read amplification.

Common forms and implementations

  • Log-structured file systems (LFS): Store files as sequences of log records; use segment cleaning to reclaim space.
  • Log-structured merge-trees (LSM-trees): Used by RocksDB, LevelDB, Cassandra — memtable + immutable SSTable files appended/compacted in levels (a variant of log-style writes with more structure).
  • Append-only object stores and write-ahead logs: Many databases and storage systems use append-only logs for durability and then compact or checkpoint.

In short, a log-structured layout trades in-place updates for sequential appends and background compaction. That yields major write-performance and recovery benefits but requires careful handling of compaction, indexing, and resource management to limit read/write amplification and latency impacts. When combined with key-value stores it is an ideal solution for various use cases where vast amount of data is stored in object storage. This model is especially simple and effective for workloads that treat objects as immutable or follow append‑only patterns. One of such workloads is Generative AI.

Key-value-based object storage – a perfect solution for Generative AI Data Management

Generative AI (GenAI) applications are driving unprecedented growth in unstructured data, including text, code, multimedia, and synthetic data, resulting in a soaring demand for new storage capacity. As enterprises scale these applications, they not only generate vast new data but also consume large volumes of input data such as prompts, vectors, and training artifacts. Unlike traditional storage approaches, the nature of GenAI workloads requires storage systems that prioritize rapid data access and scale rather than complex data manipulation. One example of object storage based on log-structured key-value store is  HPE Alletra Storage MP X10000.

Key-value-based object storage — distinguished by its flat namespace architecture — is ideally suited for this purpose, enabling massive scalability to hundreds of billions of objects and offering integrated data intelligence. This intelligence enriches the stored data with custom metadata and enables fine-grained metadata querying, thereby transforming storage from a passive repository into an active platform that optimizes data discovery and retrieval for AI pipelines. Such storage systems eliminate the overhead and complexity of hierarchical file systems, which are less effective for GenAI due to their reliance on features like byte-range locking that GenAI workloads rarely use.

Processing for GenAI primarily occurs on powerful server CPUs and GPUs, with storage serving mainly as a fast, scalable reservoir for raw and preprocessed data, rather than manipulating that data directly. Modern GenAI architectures use intermediate storage such as NVMe SSDs within GPU servers to handle vector representations, while data preprocessing happens in-memory on separate compute nodes using tools like Apache Spark. This separation of computation from storage highlights why key-value object storage (especially those supporting multi-protocol access like S3) is optimal for GenAI use cases, including large context caching needed by advanced language models.

Organizations should consider deploying key-value-based storage with integrated intelligence to improve cost-efficiency and performance for all phases of GenAI workloads, from training to inference and fine-tuning. Furthermore, migrating non-transactional workloads that do not require file locking from traditional file-based systems to intelligent key-value object stores can streamline operations, reduce complexity, and better support the explosive growth in GenAI-generated unstructured data anticipated over the coming decade.

Summary

A key-value store provides a simple, high-performance way to store and retrieve data by key, making it ideal for use cases that need speed, scalability, and operational simplicity. The KV model is an ideal foundational architecture for object storage because it directly reflects the object abstraction (key → immutable payload + metadata), scales horizontally, provides robust durability/availability primitives, and optimizes the common access patterns (direct PUT/GET/DELETE). For use cases emphasizing simple object retrieval, massive scale, and operational efficiency, a KV store offers a concise, performant, and maintainable implementation substrate.

O autorze

PiotrDrag

HPE Storage for Unstructured Data and AI Category & Business Development Manager for Central Europe. Passionate about primary storage, data protecion, Cloud Computing, scale out storage systems and Internet of Things.