Valkey: Prometheus

Prometheus metrics

👋 Welcome to the Stackhero documentation!

Stackhero offers a ready-to-use Valkey cloud solution that provides a host of benefits, including:

Redis Commander web UI included.

Unlimited message size and transfers.

Effortless updates with just a click.

Optimal performance and robust security powered by a private and dedicated VM.

Save time and simplify your life: it only takes 5 minutes to try Stackhero's Valkey cloud hosting solution!

Stackhero provides the capability to retrieve metrics in Prometheus format for each of your services. These metrics use the valkey_ prefix when returned to Prometheus, making them easy to identify and integrate with your monitoring tools.

Below is a detailed overview of each Stackhero for Valkey metric available. Please note that every metric is prefixed with valkey_ when returned to Prometheus.

Note that all these metrics are preceded by "valkey_" when they are returned to your Prometheus.

shutdown_in_milliseconds: The maximum remaining time in milliseconds for replicas to catch up with replication before the shutdown sequence is completed. This field is only present during the shutdown process.
connected_clients: The number of client connections (excluding connections from replicas).
cluster_connections: An approximation of the number of sockets used by the cluster bus.
maxclients: The value of the maxclients configuration directive. It represents the upper limit for the sum of connected_clients, connected_slaves, and cluster_connections.
client_recent_max_input_buffer: The largest input buffer size among the currently connected clients.
client_recent_max_output_buffer: The largest output buffer size among the currently connected clients.
blocked_clients: The number of clients waiting on a blocking call such as BLPOP, BRPOP, BRPOPLPUSH, BLMOVE, BZPOPMIN, or BZPOPMAX.
tracking_clients: The number of clients that are currently tracked (CLIENT TRACKING).
clients_in_timeout_table: The number of clients in the timeout table.
used_memory: The total amount of memory (in bytes) allocated by Valkey using its chosen allocator (be it standard libc, jemalloc, or an alternative such as tcmalloc).
used_memory_rss: The number of bytes allocated by Valkey as seen by the operating system (also known as the resident set size).
used_memory_peak: The peak memory consumed by Valkey.
used_memory_peak_perc: The percentage of used_memory_peak relative to used_memory.
used_memory_overhead: The total overhead in bytes allocated by the server for managing its internal data structures.
used_memory_startup: The initial amount of memory (in bytes) consumed by Valkey at startup.
used_memory_dataset: The size in bytes of the dataset (calculated by subtracting used_memory_overhead from used_memory).
used_memory_dataset_perc: The percentage of used_memory_dataset relative to the net memory usage (used_memory minus used_memory_startup).
total_system_memory: The total amount of memory available on the Valkey host.
used_memory_lua: The number of bytes used by the Lua engine.
used_memory_scripts: The number of bytes occupied by cached Lua scripts.
maxmemory: The value of the maxmemory configuration directive.
maxmemory_policy: The value of the maxmemory-policy configuration directive.
mem_fragmentation_ratio: The ratio between used_memory_rss and used_memory. Note that this ratio includes not just fragmentation but also other process overheads (see the allocator_* metrics) along with overheads for code, shared libraries, the stack, etc.
mem_fragmentation_bytes: The difference in bytes between used_memory_rss and used_memory. When this value is low (only a few megabytes), a high ratio (for example, 1.5 or above) does not necessarily indicate a problem.
allocator_frag_ratio: The ratio between allocator_active and allocator_allocated. This is a measure of true (external) fragmentation (unlike mem_fragmentation_ratio).
allocator_frag_bytes: The difference in bytes between allocator_active and allocator_allocated. Refer to the note for mem_fragmentation_bytes.
allocator_rss_ratio: The ratio between allocator_resident and allocator_active. This metric often indicates pages that the allocator can soon release back to the OS.
allocator_rss_bytes: The difference in bytes between allocator_resident and allocator_active.
rss_overhead_ratio: The ratio between used_memory_rss (the process RSS) and allocator_resident. This includes RSS overheads that are not related to the allocator or heap.
rss_overhead_bytes: The difference in bytes between used_memory_rss (the process RSS) and allocator_resident.
allocator_allocated: The total bytes allocated by the allocator, including internal fragmentation. This value is normally the same as used_memory.
allocator_active: The total bytes in the allocator's active pages, including external fragmentation.
allocator_resident: The total resident bytes (RSS) in the allocator, including pages that can be released back to the OS (by MEMORY PURGE or inactivity).
mem_not_counted_for_evict: The used memory not counted for key eviction. This predominantly includes transient replica and AOF buffers.
mem_clients_slaves: The memory used by replica clients. Since replica buffers share memory with the replication backlog, this field might show 0 when replicas do not trigger an increase in memory usage.
mem_clients_normal: The memory used by normal clients.
mem_cluster_links: The memory used by connections to peers on the cluster bus when cluster mode is active.
mem_aof_buffer: The transient memory used for AOF and AOF rewrite buffers.
mem_replication_backlog: The memory used by the replication backlog.
mem_total_replication_buffers: The total memory consumed for replication buffers.
mem_allocator: The memory allocator selected at compile time.
active_defrag_running: When active defragmentation is enabled, this metric indicates whether defragmentation is currently active and the CPU percentage it intends to use.
lazyfree_pending_objects: The number of objects waiting to be freed lazily (due to operations such as UNLINK or asynchronous FLUSHDB/FLUSHALL).
lazyfreed_objects: The number of objects that have been freed lazily.
loading: A flag indicating if a dump file is currently being loaded.
async_loading: Indicates if the replication dataset is being loaded asynchronously while serving old data. This occurs when repl-diskless-load is enabled and set to swapdb.
current_cow_peak: The peak size in bytes of copy-on-write memory during a child fork operation.
current_cow_size: The size in bytes of copy-on-write memory during a child fork operation.
current_cow_size_age: The age in seconds of the current_cow_size value.
current_fork_perc: The percentage progress of the current fork process. For AOF and RDB forks, it represents the percentage of current_save_keys_processed out of current_save_keys_total.
current_save_keys_processed: The number of keys processed in the current save operation.
current_save_keys_total: The total number of keys at the start of the current save operation.
rdb_bgsave_in_progress: A flag indicating that an RDB save is in progress.
rdb_last_save_time: The epoch timestamp of the last successful RDB save.
rdb_last_bgsave_status: The status of the last RDB save operation.
rdb_last_bgsave_time_sec: The duration in seconds of the last RDB save operation.
rdb_current_bgsave_time_sec: The duration in seconds of an ongoing RDB save operation, if any.
rdb_last_cow_size: The size in bytes of copy-on-write memory during the last RDB save operation.
rdb_last_load_keys_expired: The number of volatile keys deleted during the last RDB load.
rdb_last_load_keys_loaded: The number of keys loaded during the last RDB load.
aof_enabled: A flag indicating that AOF logging is activated.
aof_rewrite_in_progress: A flag showing that an AOF rewrite operation is in progress.
aof_rewrite_scheduled: A flag indicating that an AOF rewrite operation will be scheduled once an ongoing RDB save is complete.
aof_last_rewrite_time_sec: The duration, in seconds, of the last AOF rewrite operation.
aof_current_rewrite_time_sec: The duration, in seconds, of an ongoing AOF rewrite operation, if any.
aof_last_bgrewrite_status: The status of the last AOF rewrite operation.
aof_last_write_status: The status of the last write to the AOF.
aof_last_cow_size: The size in bytes of copy-on-write memory during the last AOF rewrite operation.
module_fork_in_progress: A flag indicating that a module fork is in progress.
module_fork_last_cow_size: The size in bytes of copy-on-write memory during the last module fork operation.
aof_current_size: The current size of the AOF file.
aof_base_size: The AOF file size at the time of the last startup or rewrite.
aof_pending_rewrite: A flag indicating that an AOF rewrite operation will be scheduled once the current RDB save completes.
aof_buffer_length: The size of the AOF buffer.
aof_pending_bio_fsync: The number of fsync jobs pending in the background I/O queue.
aof_delayed_fsync: The counter for delayed fsync operations.
loading_start_time: The epoch timestamp marking the start of the load operation.
loading_total_bytes: The total size of the file being loaded.
loading_rdb_used_mem: The memory usage of the server that generated the RDB file at the time of its creation.
loading_loaded_bytes: The number of bytes that have already been loaded.
loading_loaded_perc: The percentage of the file that has been loaded.
loading_eta_seconds: The estimated time in seconds remaining for the load to complete.
instantaneous_ops_per_sec: The number of commands processed per second.
instantaneous_input_kbps: The network read rate in KB/sec.
instantaneous_output_kbps: The network write rate in KB/sec.
instantaneous_input_repl_kbps: The network read rate in KB/sec for replication purposes.
instantaneous_output_repl_kbps: The network write rate in KB/sec for replication purposes.
sync_full: The number of full resynchronisations with replicas.
sync_partial_ok: The number of accepted partial resynchronisation requests.
sync_partial_err: The number of denied partial resynchronisation requests.
expired_stale_perc: The percentage of keys that have probably expired.
expired_time_cap_reached_count: The number of times active expiry cycles have stopped early.
expire_cycle_cpu_milliseconds: The cumulative time in milliseconds spent on active expiry cycles.
evicted_clients: The number of clients evicted due to the maxmemory-clients limit.
pubsub_channels: The total number of pub/sub channels with active client subscriptions.
pubsub_patterns: The total number of pub/sub patterns with active client subscriptions.
pubsubshard_channels: The total number of pub/sub shard channels with active client subscriptions.
latest_fork_usec: The duration in microseconds of the most recent fork operation.
migrate_cached_sockets: The number of sockets open for MIGRATE purposes.
slave_expires_tracked_keys: The number of keys tracked for expiry purposes (applicable only to writable replicas).
active_defrag_hits: The number of value reallocations successfully performed by the active defragmentation process.
active_defrag_misses: The number of value reallocations that were aborted by the active defragmentation process.
active_defrag_key_hits: The number of keys that were actively defragmented.
active_defrag_key_misses: The number of keys that were skipped during the active defragmentation process.
tracking_total_keys: The total number of keys being tracked by the server.
tracking_total_items: The total number of tracked items (this is the sum of the number of clients per key).
tracking_total_prefixes: The number of tracked prefixes in the server's prefix table (only applicable in broadcast mode).
role: Returns "master" if the instance is not a replica, or "slave" if it is replicating from a master. Note that a replica may act as a master for another replica (chained replication).
master_failover_state: The current state of an ongoing failover, if one exists.
master_replid: The replication ID of the Valkey server.
master_replid2: The secondary replication ID used for PSYNC after a failover.
master_repl_offset: The current replication offset of the server.
second_repl_offset: The offset up to which replication IDs are accepted.
repl_backlog_active: A flag indicating if the replication backlog is active.
repl_backlog_size: The total size in bytes of the replication backlog buffer.
repl_backlog_first_byte_offset: The master offset corresponding to the first byte in the replication backlog buffer.
repl_backlog_histlen: The size in bytes of data contained in the replication backlog buffer.
master_host: The host or IP address of the master instance.
master_port: The TCP port on which the master is listening.
master_link_status: The status of the link (up or down).
master_sync_in_progress: Indicates whether the master is currently syncing with a replica.
slave_read_repl_offset: The replication offset up to which data has been read by the replica.
slave_repl_offset: The current replication offset of the replica instance.
slave_priority: The candidate priority of the instance for failover.
slave_read_only: A flag indicating whether the replica is in read-only mode.
replica_announced: A flag indicating if the replica has been announced by Sentinel.
master_sync_total_bytes: The total number of bytes that need to be transferred during synchronisation. This value might be 0 when the size is unknown (for example, when using the repl-diskless-sync configuration directive).
master_sync_read_bytes: The number of bytes that have already been transferred.
master_sync_left_bytes: The number of bytes remaining to be transferred before synchronisation is complete (this value may be negative when master_sync_total_bytes is 0).
master_sync_perc: The percentage of bytes transferred (master_sync_read_bytes) from the total (master_sync_total_bytes), or an approximation that uses loading_rdb_used_mem when master_sync_total_bytes is 0.
connected_slaves: The number of connected replicas.
min_slaves_good_slaves: The number of replicas currently considered good for the purpose of replication.
current_eviction_exceeded_time: The time (in milliseconds) since used_memory last exceeded maxmemory.
current_active_defrag_time: The time (in milliseconds) since memory fragmentation last exceeded its limit.
master_last_io_seconds_ago: The number of seconds since the last interaction with the master.
master_sync_last_io_seconds_ago: The number of seconds since the last transfer I/O during a SYNC operation.
master_link_down_since_seconds: The number of seconds since the master link went down.
total_eviction_exceeded_time: The total time (in milliseconds) that used_memory has been greater than maxmemory since server startup.
rdb_changes_since_last_save: The number of changes recorded since the last dump.
total_connections_received: The total number of connections accepted since the server started.
total_commands_processed: The total number of commands processed by the server.
total_net_input_bytes: The total number of bytes read from the network.
total_net_output_bytes: The total number of bytes written to the network.
total_net_repl_input_bytes: The total number of bytes read from the network for replication purposes.
total_net_repl_output_bytes: The total number of bytes written to the network for replication purposes.
rejected_connections: The number of connections rejected because the maxclients limit was reached.
expired_keys: The total number of key expiration events.
evicted_keys: The number of keys evicted due to the maxmemory limit.
keyspace_hits: The number of successful lookups of keys in the main dictionary.
keyspace_misses: The number of failed lookups of keys in the main dictionary.
used_cpu_sys: The system CPU time (in seconds) consumed by Valkey, summing the usage of all threads (main and background).
used_cpu_user: The user CPU time (in seconds) consumed by Valkey, summing the usage of all threads.
used_cpu_sys_children: The system CPU time (in seconds) consumed by background processes.
used_cpu_user_children: The user CPU time (in seconds) consumed by background processes.
used_cpu_sys_main_thread: The system CPU time consumed by the main thread of the Valkey server.
used_cpu_user_main_thread: The user CPU time consumed by the main thread of the Valkey server.
unexpected_error_replies: The number of unexpected error replies, typically arising during AOF loads or replication errors.
total_error_replies: The total number of error replies issued. This value includes both errors before command execution (rejected commands) and errors occurring during command execution (failed commands).
total_reads_processed: The total number of read events processed.
total_writes_processed: The total number of write events processed.
io_threaded_reads_processed: The number of read events handled by both the main and I/O threads.
io_threaded_writes_processed: The number of write events handled by both the main and I/O threads.
dump_payload_sanitizations: The total number of deep integrity validations performed on dump payloads (as configured in sanitize-dump-payload).
total_forks: The total number of fork operations since the server started.
total_active_defrag_time: The total time (in milliseconds) that memory fragmentation has exceeded the set limit.
aof_rewrites: The number of AOF rewrite operations performed since startup.
rdb_saves: The number of RDB snapshots performed since startup.

Valkey: Prometheus

👋 Welcome to the Stackhero documentation!

Other articles for Valkey