Valkey: Prometheus

Prometheus metrics

👋 Welcome to the Stackhero documentation!

Stackhero offers a ready-to-use Valkey cloud solution that provides a host of benefits, including:

  • Redis Commander web UI included.
  • Unlimited message size and transfers.
  • Effortless updates with just a click.
  • Optimal performance and robust security powered by a private and dedicated VM.

Save time and simplify your life: it only takes 5 minutes to try Stackhero's Valkey cloud hosting solution!

Stackhero offers the ability to retrieve metrics in Prometheus format for each of your services. These metrics use the valkey_ prefix when returned to Prometheus which makes them easy to identify and integrate with your monitoring tools.

Below is a detailed overview of each Stackhero for Valkey metric available. Please note that every metric is prefixed with valkey_ when returned to Prometheus.

Note that all these metrics are preceded by "valkey_" when they are returned to your Prometheus.

  • shutdown_in_milliseconds: The maximum remaining time in milliseconds for replicas to catch up with replication before the shutdown sequence is completed. This field is only present during the shutdown process.

  • connected_clients: The number of client connections (excluding connections from replicas).

  • cluster_connections: An approximation of the number of sockets used by the cluster bus.

  • maxclients: The value of the maxclients configuration directive. It represents the upper limit for the sum of connected_clients, connected_slaves, and cluster_connections.

  • client_recent_max_input_buffer: The largest input buffer size among the currently connected clients.

  • client_recent_max_output_buffer: The largest output buffer size among the currently connected clients.

  • blocked_clients: The number of clients waiting on a blocking call such as BLPOP, BRPOP, BRPOPLPUSH, BLMOVE, BZPOPMIN, or BZPOPMAX.

  • tracking_clients: The number of clients that are currently tracked (CLIENT TRACKING).

  • clients_in_timeout_table: The number of clients in the timeout table.

  • used_memory: The total amount of memory (in bytes) allocated by Valkey using its chosen allocator (be it standard libc, jemalloc, or an alternative such as tcmalloc).

  • used_memory_rss: The number of bytes allocated by Valkey as seen by the operating system (also known as the resident set size).

  • used_memory_peak: The peak memory consumed by Valkey.

  • used_memory_peak_perc: The percentage of used_memory_peak relative to used_memory.

  • used_memory_overhead: The total overhead in bytes allocated by the server for managing its internal data structures.

  • used_memory_startup: The initial amount of memory (in bytes) consumed by Valkey at startup.

  • used_memory_dataset: The size in bytes of the dataset (calculated by subtracting used_memory_overhead from used_memory).

  • used_memory_dataset_perc: The percentage of used_memory_dataset relative to the net memory usage (used_memory minus used_memory_startup).

  • total_system_memory: The total amount of memory available on the Valkey host.

  • used_memory_lua: The number of bytes used by the Lua engine.

  • used_memory_scripts: The number of bytes occupied by cached Lua scripts.

  • maxmemory: The value of the maxmemory configuration directive.

  • maxmemory_policy: The value of the maxmemory-policy configuration directive.

  • mem_fragmentation_ratio: The ratio between used_memory_rss and used_memory. Note that this ratio includes not just fragmentation but also other process overheads (see the allocator_* metrics) along with overheads for code, shared libraries, the stack, etc.

  • mem_fragmentation_bytes: The difference in bytes between used_memory_rss and used_memory. When this value is low (only a few megabytes), a high ratio (for example, 1.5 or above) does not necessarily indicate a problem.

  • allocator_frag_ratio: The ratio between allocator_active and allocator_allocated. This is a measure of true (external) fragmentation (unlike mem_fragmentation_ratio).

  • allocator_frag_bytes: The difference in bytes between allocator_active and allocator_allocated. Refer to the note for mem_fragmentation_bytes.

  • allocator_rss_ratio: The ratio between allocator_resident and allocator_active. This metric often indicates pages that the allocator can soon release back to the OS.

  • allocator_rss_bytes: The difference in bytes between allocator_resident and allocator_active.

  • rss_overhead_ratio: The ratio between used_memory_rss (the process RSS) and allocator_resident. This includes RSS overheads that are not related to the allocator or heap.

  • rss_overhead_bytes: The difference in bytes between used_memory_rss (the process RSS) and allocator_resident.

  • allocator_allocated: The total bytes allocated by the allocator, including internal fragmentation. This value is normally the same as used_memory.

  • allocator_active: The total bytes in the allocator's active pages, including external fragmentation.

  • allocator_resident: The total resident bytes (RSS) in the allocator, including pages that can be released back to the OS (by MEMORY PURGE or inactivity).

  • mem_not_counted_for_evict: The used memory not counted for key eviction. This predominantly includes transient replica and AOF buffers.

  • mem_clients_slaves: The memory used by replica clients. Since replica buffers share memory with the replication backlog, this field might show 0 when replicas do not trigger an increase in memory usage.

  • mem_clients_normal: The memory used by normal clients.

  • mem_cluster_links: The memory used by connections to peers on the cluster bus when cluster mode is active.

  • mem_aof_buffer: The transient memory used for AOF and AOF rewrite buffers.

  • mem_replication_backlog: The memory used by the replication backlog.

  • mem_total_replication_buffers: The total memory consumed for replication buffers.

  • mem_allocator: The memory allocator selected at compile time.

  • active_defrag_running: When active defragmentation is enabled, this metric indicates whether defragmentation is currently active and the CPU percentage it intends to use.

  • lazyfree_pending_objects: The number of objects waiting to be freed lazily (due to operations such as UNLINK or asynchronous FLUSHDB/FLUSHALL).

  • lazyfreed_objects: The number of objects that have been freed lazily.

  • loading: A flag indicating if a dump file is currently being loaded.

  • async_loading: Indicates if the replication dataset is being loaded asynchronously while serving old data. This occurs when repl-diskless-load is enabled and set to swapdb.

  • current_cow_peak: The peak size in bytes of copy-on-write memory during a child fork operation.

  • current_cow_size: The size in bytes of copy-on-write memory during a child fork operation.

  • current_cow_size_age: The age in seconds of the current_cow_size value.

  • current_fork_perc: The percentage progress of the current fork process. For AOF and RDB forks, it represents the percentage of current_save_keys_processed out of current_save_keys_total.

  • current_save_keys_processed: The number of keys processed in the current save operation.

  • current_save_keys_total: The total number of keys at the start of the current save operation.

  • rdb_bgsave_in_progress: A flag indicating that an RDB save is in progress.

  • rdb_last_save_time: The epoch timestamp of the last successful RDB save.

  • rdb_last_bgsave_status: The status of the last RDB save operation.

  • rdb_last_bgsave_time_sec: The duration in seconds of the last RDB save operation.

  • rdb_current_bgsave_time_sec: The duration in seconds of an ongoing RDB save operation, if any.

  • rdb_last_cow_size: The size in bytes of copy-on-write memory during the last RDB save operation.

  • rdb_last_load_keys_expired: The number of volatile keys deleted during the last RDB load.

  • rdb_last_load_keys_loaded: The number of keys loaded during the last RDB load.

  • aof_enabled: A flag indicating that AOF logging is activated.

  • aof_rewrite_in_progress: A flag showing that an AOF rewrite operation is in progress.

  • aof_rewrite_scheduled: A flag indicating that an AOF rewrite operation will be scheduled once an ongoing RDB save is complete.

  • aof_last_rewrite_time_sec: The duration, in seconds, of the last AOF rewrite operation.

  • aof_current_rewrite_time_sec: The duration, in seconds, of an ongoing AOF rewrite operation, if any.

  • aof_last_bgrewrite_status: The status of the last AOF rewrite operation.

  • aof_last_write_status: The status of the last write to the AOF.

  • aof_last_cow_size: The size in bytes of copy-on-write memory during the last AOF rewrite operation.

  • module_fork_in_progress: A flag indicating that a module fork is in progress.

  • module_fork_last_cow_size: The size in bytes of copy-on-write memory during the last module fork operation.

  • aof_current_size: The current size of the AOF file.

  • aof_base_size: The AOF file size at the time of the last startup or rewrite.

  • aof_pending_rewrite: A flag indicating that an AOF rewrite operation will be scheduled once the current RDB save completes.

  • aof_buffer_length: The size of the AOF buffer.

  • aof_pending_bio_fsync: The number of fsync jobs pending in the background I/O queue.

  • aof_delayed_fsync: The counter for delayed fsync operations.

  • loading_start_time: The epoch timestamp marking the start of the load operation.

  • loading_total_bytes: The total size of the file being loaded.

  • loading_rdb_used_mem: The memory usage of the server that generated the RDB file at the time of its creation.

  • loading_loaded_bytes: The number of bytes that have already been loaded.

  • loading_loaded_perc: The percentage of the file that has been loaded.

  • loading_eta_seconds: The estimated time in seconds remaining for the load to complete.

  • instantaneous_ops_per_sec: The number of commands processed per second.

  • instantaneous_input_kbps: The network read rate in KB/sec.

  • instantaneous_output_kbps: The network write rate in KB/sec.

  • instantaneous_input_repl_kbps: The network read rate in KB/sec for replication purposes.

  • instantaneous_output_repl_kbps: The network write rate in KB/sec for replication purposes.

  • sync_full: The number of full resynchronizations with replicas.

  • sync_partial_ok: The number of accepted partial resynchronization requests.

  • sync_partial_err: The number of denied partial resynchronization requests.

  • expired_stale_perc: The percentage of keys that have probably expired.

  • expired_time_cap_reached_count: The number of times active expiry cycles have stopped early.

  • expire_cycle_cpu_milliseconds: The cumulative time in milliseconds spent on active expiry cycles.

  • evicted_clients: The number of clients evicted due to the maxmemory-clients limit.

  • pubsub_channels: The total number of pub/sub channels with active client subscriptions.

  • pubsub_patterns: The total number of pub/sub patterns with active client subscriptions.

  • pubsubshard_channels: The total number of pub/sub shard channels with active client subscriptions.

  • latest_fork_usec: The duration in microseconds of the most recent fork operation.

  • migrate_cached_sockets: The number of sockets open for MIGRATE purposes.

  • slave_expires_tracked_keys: The number of keys tracked for expiry purposes (applicable only to writable replicas).

  • active_defrag_hits: The number of value reallocations successfully performed by the active defragmentation process.

  • active_defrag_misses: The number of value reallocations that were aborted by the active defragmentation process.

  • active_defrag_key_hits: The number of keys that were actively defragmented.

  • active_defrag_key_misses: The number of keys that were skipped during the active defragmentation process.

  • tracking_total_keys: The total number of keys being tracked by the server.

  • tracking_total_items: The total number of tracked items (this is the sum of the number of clients per key).

  • tracking_total_prefixes: The number of tracked prefixes in the server's prefix table (only applicable in broadcast mode).

  • role: Returns "master" if the instance is not a replica, or "slave" if it is replicating from a master. Note that a replica may act as a master for another replica (chained replication).

  • master_failover_state: The current state of an ongoing failover, if one exists.

  • master_replid: The replication ID of the Valkey server.

  • master_replid2: The secondary replication ID used for PSYNC after a failover.

  • master_repl_offset: The current replication offset of the server.

  • second_repl_offset: The offset up to which replication IDs are accepted.

  • repl_backlog_active: A flag indicating if the replication backlog is active.

  • repl_backlog_size: The total size in bytes of the replication backlog buffer.

  • repl_backlog_first_byte_offset: The master offset corresponding to the first byte in the replication backlog buffer.

  • repl_backlog_histlen: The size in bytes of data contained in the replication backlog buffer.

  • master_host: The host or IP address of the master instance.

  • master_port: The TCP port on which the master is listening.

  • master_link_status: The status of the link (up or down).

  • master_sync_in_progress: Indicates whether the master is currently syncing with a replica.

  • slave_read_repl_offset: The replication offset up to which data has been read by the replica.

  • slave_repl_offset: The current replication offset of the replica instance.

  • slave_priority: The candidate priority of the instance for failover.

  • slave_read_only: A flag indicating whether the replica is in read-only mode.

  • replica_announced: A flag indicating if the replica has been announced by Sentinel.

  • master_sync_total_bytes: The total number of bytes that need to be transferred during synchronization. This value might be 0 when the size is unknown (for example, when using the repl-diskless-sync configuration directive).

  • master_sync_read_bytes: The number of bytes that have already been transferred.

  • master_sync_left_bytes: The number of bytes remaining to be transferred before synchronization is complete (this value may be negative when master_sync_total_bytes is 0).

  • master_sync_perc: The percentage of bytes transferred (master_sync_read_bytes) from the total (master_sync_total_bytes), or an approximation that uses loading_rdb_used_mem when master_sync_total_bytes is 0.

  • connected_slaves: The number of connected replicas.

  • min_slaves_good_slaves: The number of replicas currently considered good for the purpose of replication.

  • current_eviction_exceeded_time: The time (in milliseconds) since used_memory last exceeded maxmemory.

  • current_active_defrag_time: The time (in milliseconds) since memory fragmentation last exceeded its limit.

  • master_last_io_seconds_ago: The number of seconds since the last interaction with the master.

  • master_sync_last_io_seconds_ago: The number of seconds since the last transfer I/O during a SYNC operation.

  • master_link_down_since_seconds: The number of seconds since the master link went down.

  • total_eviction_exceeded_time: The total time (in milliseconds) that used_memory has been greater than maxmemory since server startup.

  • rdb_changes_since_last_save: The number of changes recorded since the last dump.

  • total_connections_received: The total number of connections accepted since the server started.

  • total_commands_processed: The total number of commands processed by the server.

  • total_net_input_bytes: The total number of bytes read from the network.

  • total_net_output_bytes: The total number of bytes written to the network.

  • total_net_repl_input_bytes: The total number of bytes read from the network for replication purposes.

  • total_net_repl_output_bytes: The total number of bytes written to the network for replication purposes.

  • rejected_connections: The number of connections rejected because the maxclients limit was reached.

  • expired_keys: The total number of key expiration events.

  • evicted_keys: The number of keys evicted due to the maxmemory limit.

  • keyspace_hits: The number of successful lookups of keys in the main dictionary.

  • keyspace_misses: The number of failed lookups of keys in the main dictionary.

  • used_cpu_sys: The system CPU time (in seconds) consumed by Valkey, summing the usage of all threads (main and background).

  • used_cpu_user: The user CPU time (in seconds) consumed by Valkey, summing the usage of all threads.

  • used_cpu_sys_children: The system CPU time (in seconds) consumed by background processes.

  • used_cpu_user_children: The user CPU time (in seconds) consumed by background processes.

  • used_cpu_sys_main_thread: The system CPU time consumed by the main thread of the Valkey server.

  • used_cpu_user_main_thread: The user CPU time consumed by the main thread of the Valkey server.

  • unexpected_error_replies: The number of unexpected error replies, typically arising during AOF loads or replication errors.

  • total_error_replies: The total number of error replies issued. This value includes both errors before command execution (rejected commands) and errors occurring during command execution (failed commands).

  • total_reads_processed: The total number of read events processed.

  • total_writes_processed: The total number of write events processed.

  • io_threaded_reads_processed: The number of read events handled by both the main and I/O threads.

  • io_threaded_writes_processed: The number of write events handled by both the main and I/O threads.

  • dump_payload_sanitizations: The total number of deep integrity validations performed on dump payloads (as configured in sanitize-dump-payload).

  • total_forks: The total number of fork operations since the server started.

  • total_active_defrag_time: The total time (in milliseconds) that memory fragmentation has exceeded the set limit.

  • aof_rewrites: The number of AOF rewrite operations performed since startup.

  • rdb_saves: The number of RDB snapshots performed since startup.