Napadlo mě, že bychom mohli poskytovat exporter pro Prometheus, který byste si mohli scrapovat z vašeho vlastního monitoringu, ať už ho máte ve VPS nebo jinde.
Pro vpsAdmin by to mohlo vypadat nějak takto:
# GET https://api.vpsfree.cz/metrics?token=<unikatni pristupovy token>
# HELP 0 = online, 1 = down
# TYPE gauge
vpsadmin_node_status{location_id, location_name, node_id, node_name}
# HELP 0 = no, 1 = under maintenance
# TYPE gauge
vpsadmin_node_maintenance_on{location_id, location_name, node_id, node_name}
# HELP CPU idle in percent
# TYPE gauge
vpsadmin_node_cpu_idle_percent{location_id, location_name, node_id, node_name}
# HELP 0 = online, 1 = degraded, ...?
# TYPE gauge
vpsadmin_node_pool_status{location_id, location_name, node_id, node_name, pool_id, pool_name}
# HELP 0 = none, 1 = scrub, 2 = resilver
# TYPE gauge
vpsadmin_node_pool_scan{location_id, location_name, node_id, node_name, pool_id, pool_name}
# HELP Scan progress
# TYPE gauge
vpsadmin_node_pool_scan_percent{location_id, location_name, node_id, node_name, pool_id, pool_name}
# HELP 1 if the VPS is running, 0 if stopped
# TYPE gauge
vpsadmin_vps_running{vps_id}
# HELP Number of seconds the since the VPS was started
# TYPE gauge
vpsadmin_vps_boot_time_seconds{vps_id}
# HELP Load averages
# TYPE gauge
vpsadmin_vps_load1{vps_id}
vpsadmin_vps_load5{vps_id}
vpsadmin_vps_load15{vps_id}
# HELP Number of processes
# TYPE gauge
vpsadmin_vps_processes_pids{vps_id}
# TYPE gauge
vpsadmin_vps_memory_used_bytes{vps_id}
# TYPE gauge
vpsadmin_vps_memory_total_bytes{vps_id}
# TYPE gauge
vpsadmin_vps_swap_used_bytes{vps_id}
# TYPE gauge
vpsadmin_vps_swap_total_bytes{vps_id}
# TYPE gauge
vpsadmin_vps_cpu_cores{vps_id}
# TYPE gauge
vpsadmin_vps_cpu_percent{vps_id, mode=user|system|idle}
# TYPE counter
vpsadmin_vps_cpu_nanoseconds_total{vps_id, mode=user|system|idle}
# HELP Number of transferred bytes
# TYPE counter
vpsadmin_vps_transferred_bytes{vps_id, netif_id, netif_name, direction=sent|received, year, month}
# HELP Number of transferred packets
# TYPE counter
vpsadmin_vps_transferred_packets{vps_id, netif_id, netif_name, direction=sent|received, year, month}
# HELP Number of available bytes in a dataset
# TYPE gauge
vpsadmin_dataset_avail_bytes{vps_id, dataset_id, dataset_name}
# HELP Number of used bytes in a dataset
# TYPE gauge
vpsadmin_dataset_used_bytes{vps_id, dataset_id, dataset_name}
# HELP Number of referenced bytes in a dataset
# TYPE gauge
vpsadmin_dataset_referenced_bytes{vps_id, dataset_id, dataset_name}
# HELP Dataset quota in bytes
# TYPE gauge
vpsadmin_dataset_quota_bytes{vps_id, dataset_id, dataset_name}
# HELP Dataset reference quota in bytes
# TYPE gauge
vpsadmin_dataset_refquota_bytes{vps_id, dataset_id, dataset_name}
# HELP Compression ratio of used bytes
# TYPE gauge
vpsadmin_dataset_compressratio{vps_id, dataset_id, dataset_name}
# HELP Compression ratio of referenced bytes
# TYPE gauge
vpsadmin_dataset_refcompressratio{vps_id, dataset_id, dataset_name}
# HELP Number of OOM reports
# TYPE counter
vpsadmin_oom_report_count{vps_id, cgroup, invoked_by_process, killed_process}
# HELP Number of incident reports
# TYPE counter
vpsadmin_incident_report_count{vps_id, codename}
A status.vpsf.cz takto:
# GET https://status.vpsf.cz/metrics
# HELP 1 if the service is up, 0 if down
# TYPE gauge
vpsfstatus_vpsadmin_up{service=api|console|webui}
# HELP 1 if vpsAdmin on node is up, 0 if down
# TYPE gauge
vpsfstatus_node_vpsadmin_up{location_id, location_label, node_id, node_name}
# HELP 0 if node is responding to ping, 1 if there is packet loss, 2 if it is not responding
# TYPE gauge
vpsfstatus_node_ping{location_id, location_label, node_id, node_name}
# HELP 1 if node is under maintenance, 0 if not
# TYPE gauge
vpsfstatus_node_maintenance_on{location_id, location_label, node_id, node_name}
# HELP 1 if pool status is known, 0 if not
# TYPE gauge
vpsfstatus_node_pool_up{location_id, location_label, node_id, node_name}
# HELP 0 = online, 1 = degraded
# TYPE gauge
vpsfstatus_node_pool_state{location_id, location_label, node_id, node_name}
# HELP 0 = none, 1 = scrub, 2 = resilver
# TYPE gauge
vpsfstatus_node_pool_scan{location_id, location_label, node_id, node_name}
# HELP Scan progress
# TYPE gauge
vpsfstatus_node_pool_scan_percent{location_id, location_label, node_id, node_name}
# HELP 0 = responding to ping, 1 = packet loss, 2 = not responding
# TYPE gauge
vpsfstatus_dns_resolver_ping{name}
# HELP 1 = DNS lookup operational, 0 = not working
# TYPE gauge
vpsfstatus_dns_resolver_lookup{name}
# HELP 0 = operational, 1 = not working
# TYPE gauge
vpsfstatus_web_service_status{name}
# HELP 0 = responding to ping, 1 = packet loss, 2 = not responding
# TYPE gauge
vpsfstatus_nameserver_ping{name}
# HELP 1 = DNS lookup operational, 0 = not working
# TYPE gauge
vpsfstatus_nameserver_lookup{name}
Měl by pro to někdo využití? Popř. hodily by se vám nějaké další metriky?