监控指标API
监控指标API提供对系统监控指标的查询、统计和分析功能,支持多维度指标数据的获取和处理。
1. 基础路径
所有监控指标API的基础路径为:/api/v1/metrics
2. 认证方式
所有监控指标API需要通过JWT Token或API Key认证。
请求头:
- JWT Token:
Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9... - 或API Key:
Authorization: ApiKey ak_1234567890abcdef1234567890abcdef12345678
3. 指标查询接口
3.1 查询指标列表
请求URL:/api/v1/metrics
请求方法:GET
请求参数:
page:页码,默认1page_size:每页数量,默认50name:指标名称搜索,可选,支持模糊匹配category:指标分类筛选,可选source:指标来源筛选,可选(如server、k8s、database等)sort_by:排序字段,默认created_atsort_order:排序方向,可选asc或desc,默认desc
响应示例:
{
"success": true,
"code": 200,
"message": "查询成功",
"data": {
"total": 200,
"page": 1,
"page_size": 50,
"metrics": [
{
"id": "1",
"name": "cpu_usage",
"display_name": "CPU使用率",
"category": "system",
"source": "server",
"type": "gauge",
"unit": "%",
"description": "CPU使用率百分比",
"created_at": "2023-01-01T00:00:00Z",
"last_updated": "2023-05-01T00:00:00Z"
},
{
"id": "2",
"name": "memory_usage",
"display_name": "内存使用率",
"category": "system",
"source": "server",
"type": "gauge",
"unit": "%",
"description": "内存使用率百分比",
"created_at": "2023-01-01T00:00:00Z",
"last_updated": "2023-05-01T00:00:00Z"
},
...
]
}
}
3.2 获取指标详情
请求URL:/api/v1/metrics/{metric_id}
请求方法:GET
路径参数:
metric_id:指标ID,必填
响应示例:
{
"success": true,
"code": 200,
"message": "查询成功",
"data": {
"id": "1",
"name": "cpu_usage",
"display_name": "CPU使用率",
"category": "system",
"source": "server",
"type": "gauge",
"unit": "%",
"description": "CPU使用率百分比",
"tags": ["host", "cpu_id"],
"min_value": 0,
"max_value": 100,
"default_interval": 60,
"created_at": "2023-01-01T00:00:00Z",
"last_updated": "2023-05-01T00:00:00Z",
"data_points_count": 100000
}
}
3.3 查询指标数据
请求URL:/api/v1/metrics/data
请求方法:POST
请求体:
{
"queries": [
{
"metric": "cpu_usage",
"tags": {
"host": ["server-1", "server-2"],
"cpu_id": ["0", "1"]
},
"aggregator": "avg",
"fill": "null",
"rate": false
},
{
"metric": "memory_usage",
"tags": {
"host": ["server-1", "server-2"]
},
"aggregator": "max",
"fill": "previous",
"rate": false
}
],
"start_time": "2023-05-01T00:00:00Z",
"end_time": "2023-05-01T12:00:00Z",
"interval": 60,
"format": "time_series"
}
参数说明:
queries:查询列表,必填metric:指标名称,必填tags:标签过滤条件,可选aggregator:聚合函数,可选(avg、sum、min、max、median、count、stddev等)fill:缺失数据填充方式,可选(null、zero、previous等)rate:是否计算速率,可选,默认为false
start_time:开始时间,必填end_time:结束时间,必填interval:时间间隔(秒),可选,默认60format:返回格式,可选(time_series、table、json),默认time_series
响应示例:
{
"success": true,
"code": 200,
"message": "查询成功",
"data": {
"start_time": "2023-05-01T00:00:00Z",
"end_time": "2023-05-01T12:00:00Z",
"interval": 60,
"results": [
{
"metric": "cpu_usage",
"tags": {"host": "server-1", "cpu_id": "0"},
"values": [
["2023-05-01T00:00:00Z", 25.5],
["2023-05-01T00:01:00Z", 26.2],
...
]
},
{
"metric": "memory_usage",
"tags": {"host": "server-1"},
"values": [
["2023-05-01T00:00:00Z", 65.3],
["2023-05-01T00:01:00Z", 65.8],
...
]
}
]
}
}
4. 指标聚合和统计接口
4.1 指标实时值
请求URL:/api/v1/metrics/latest
请求方法:POST
请求体:
{
"metrics": [
{
"name": "cpu_usage",
"tags": {
"host": ["server-1", "server-2"]
},
"aggregator": "avg"
},
{
"name": "disk_usage",
"tags": {
"host": ["server-1"],
"mount_point": ["/", "/data"]
},
"aggregator": "max"
}
],
"limit": 100
}
参数说明:
metrics:指标列表,必填name:指标名称,必填tags:标签过滤条件,可选aggregator:聚合函数,可选
limit:返回结果数量限制,可选,默认100
响应示例:
{
"success": true,
"code": 200,
"message": "查询成功",
"data": {
"results": [
{
"metric": "cpu_usage",
"tags": {"host": "server-1"},
"value": 25.5,
"timestamp": "2023-05-01T12:34:56Z"
},
{
"metric": "cpu_usage",
"tags": {"host": "server-2"},
"value": 30.2,
"timestamp": "2023-05-01T12:34:56Z"
},
{
"metric": "disk_usage",
"tags": {"host": "server-1", "mount_point": "/"},
"value": 45.3,
"timestamp": "2023-05-01T12:34:56Z"
}
]
}
}
4.2 指标统计分析
请求URL:/api/v1/metrics/stats
请求方法:POST
请求体:
{
"metric": "cpu_usage",
"tags": {
"host": ["server-1", "server-2"]
},
"start_time": "2023-05-01T00:00:00Z",
"end_time": "2023-05-01T12:00:00Z",
"stats": ["min", "max", "avg", "sum", "count", "median", "p95", "p99"],
"group_by": ["host"]
}
参数说明:
metric:指标名称,必填tags:标签过滤条件,可选start_time:开始时间,必填end_time:结束时间,必填stats:统计函数列表,必填group_by:分组字段列表,可选
响应示例:
{
"success": true,
"code": 200,
"message": "查询成功",
"data": {
"results": [
{
"tags": {"host": "server-1"},
"stats": {
"min": 5.2,
"max": 85.3,
"avg": 35.6,
"sum": 7892.1,
"count": 222,
"median": 32.1,
"p95": 65.8,
"p99": 78.5
}
},
{
"tags": {"host": "server-2"},
"stats": {
"min": 8.1,
"max": 92.4,
"avg": 42.3,
"sum": 9456.7,
"count": 223,
"median": 38.9,
"p95": 72.1,
"p99": 85.2
}
}
]
}
}
4.3 指标趋势分析
请求URL:/api/v1/metrics/trends
请求方法:POST
请求体:
{
"metric": "cpu_usage",
"tags": {
"host": ["server-1"]
},
"start_time": "2023-04-01T00:00:00Z",
"end_time": "2023-05-01T00:00:00Z",
"interval": 86400,
"aggregator": "avg",
"compare_to": {
"days": 7
}
}
参数说明:
metric:指标名称,必填tags:标签过滤条件,可选start_time:开始时间,必填end_time:结束时间,必填interval:时间间隔(秒),可选,默认86400(1天)aggregator:聚合函数,可选,默认avgcompare_to:比较参数,可选days:与过去多少天比较,可选start/end:与指定时间范围比较,可选
响应示例:
{
"success": true,
"code": 200,
"message": "查询成功",
"data": {
"current": [
{"time": "2023-04-25", "value": 35.2},
{"time": "2023-04-26", "value": 36.5},
{"time": "2023-04-27", "value": 34.8},
{"time": "2023-04-28", "value": 38.9},
{"time": "2023-04-29", "value": 37.1},
{"time": "2023-04-30", "value": 39.2},
{"time": "2023-05-01", "value": 40.5}
],
"previous": [
{"time": "2023-04-18", "value": 32.1},
{"time": "2023-04-19", "value": 33.4},
{"time": "2023-04-20", "value": 31.8},
{"time": "2023-04-21", "value": 35.9},
{"time": "2023-04-22", "value": 34.1},
{"time": "2023-04-23", "value": 36.2},
{"time": "2023-04-24", "value": 37.5}
],
"change_percentage": 8.2,
"trend_direction": "increasing"
}
}
4.4 指标异常检测
请求URL:/api/v1/metrics/anomalies
请求方法:POST
请求体:
{
"metric": "cpu_usage",
"tags": {
"host": ["server-1"]
},
"start_time": "2023-05-01T00:00:00Z",
"end_time": "2023-05-01T12:00:00Z",
"algorithm": "zscore",
"threshold": 3.0,
"historical_data_days": 14
}
参数说明:
metric:指标名称,必填tags:标签过滤条件,可选start_time:开始时间,必填end_time:结束时间,必填algorithm:异常检测算法,可选(zscore、iqr、isolation_forest、prophet等)threshold:异常阈值,可选,默认3.0historical_data_days:历史数据天数,可选,默认7
响应示例:
{
"success": true,
"code": 200,
"message": "查询成功",
"data": {
"anomalies": [
{
"timestamp": "2023-05-01T08:30:00Z",
"value": 85.3,
"normal_range": [30.0, 70.0],
"score": 3.2,
"severity": "high"
},
{
"timestamp": "2023-05-01T10:15:00Z",
"value": 92.1,
"normal_range": [30.0, 70.0],
"score": 4.5,
"severity": "critical"
}
],
"summary": {
"total_anomalies": 2,
"anomaly_rate": 0.03,
"detection_algorithm": "zscore",
"threshold_used": 3.0
}
}
}
5. 指标管理接口
5.1 创建自定义指标
请求URL:/api/v1/metrics
请求方法:POST
请求体:
{
"name": "custom_business_metric",
"display_name": "自定义业务指标",
"category": "business",
"source": "application",
"type": "gauge",
"unit": "count",
"description": "自定义业务指标描述",
"tags": ["service", "region"],
"min_value": 0,
"max_value": null,
"default_interval": 60,
"retention_days": 30
}
参数说明:
name:指标名称,必填,必须唯一display_name:显示名称,可选category:指标分类,可选source:指标来源,可选type:指标类型,可选(gauge、counter、histogram、summary)unit:单位,可选description:描述,可选tags:标签列表,可选min_value:最小值,可选max_value:最大值,可选default_interval:默认采集间隔(秒),可选retention_days:数据保留天数,可选
响应示例:
{
"success": true,
"code": 200,
"message": "指标创建成功",
"data": {
"id": "100",
"name": "custom_business_metric",
"display_name": "自定义业务指标",
"category": "business",
"source": "application",
"type": "gauge",
"unit": "count",
"description": "自定义业务指标描述",
"created_at": "2023-05-01T12:00:00Z"
}
}
5.2 上报指标数据
请求URL:/api/v1/metrics/write
请求方法:POST
请求体:
{
"metrics": [
{
"name": "custom_business_metric",
"value": 123.45,
"tags": {
"service": "order",
"region": "cn-east-1"
},
"timestamp": "2023-05-01T12:00:00Z"
},
{
"name": "custom_business_metric",
"value": 234.56,
"tags": {
"service": "payment",
"region": "cn-north-1"
},
"timestamp": "2023-05-01T12:01:00Z"
}
]
}
参数说明:
metrics:指标数据列表,必填name:指标名称,必填value:指标值,必填tags:标签,可选timestamp:时间戳,可选,默认为当前时间
响应示例:
{
"success": true,
"code": 200,
"message": "指标数据写入成功",
"data": {
"written_count": 2,
"failed_count": 0
}
}
5.3 删除自定义指标
请求URL:/api/v1/metrics/{metric_id}
请求方法:DELETE
路径参数:
metric_id:指标ID,必填
请求体:
{
"delete_data": false
}
参数说明:
delete_data:是否同时删除历史数据,可选,默认为false
响应示例:
{
"success": true,
"code": 200,
"message": "指标删除成功",
"data": {
"deleted_metric": "custom_business_metric",
"delete_data": false
}
}
6. 指标批量操作接口
6.1 批量查询指标列表
请求URL:/api/v1/metrics/batch
请求方法:POST
请求体:
{
"metric_ids": ["1", "2", "3", "4", "5"]
}
参数说明:
metric_ids:指标ID列表,必填
响应示例:
{
"success": true,
"code": 200,
"message": "查询成功",
"data": {
"metrics": [
{
"id": "1",
"name": "cpu_usage",
"display_name": "CPU使用率",
"category": "system",
"source": "server",
"type": "gauge",
"unit": "%",
"description": "CPU使用率百分比"
},
{
"id": "2",
"name": "memory_usage",
"display_name": "内存使用率",
"category": "system",
"source": "server",
"type": "gauge",
"unit": "%",
"description": "内存使用率百分比"
},
...
]
}
}
6.2 批量更新指标配置
请求URL:/api/v1/metrics/batch-update
请求方法:PUT
请求体:
{
"updates": [
{
"metric_id": "1",
"display_name": "CPU使用率(%)",
"description": "更新后的CPU使用率描述",
"retention_days": 60
},
{
"metric_id": "2",
"display_name": "内存使用率(%)",
"description": "更新后的内存使用率描述",
"retention_days": 60
}
]
}
参数说明:
updates:更新列表,必填metric_id:指标ID,必填- 其他可更新字段:display_name、description、retention_days等
响应示例:
{
"success": true,
"code": 200,
"message": "批量更新成功",
"data": {
"updated_count": 2,
"failed_count": 0
}
}
7. 高级功能
7.1 指标查询构建器
请求URL:/api/v1/metrics/query-builder
请求方法:POST
请求体:
{
"data_source": "prometheus",
"query": "sum by (instance) (rate(node_cpu_seconds_total{mode!='idle'}[5m]))",
"start_time": "2023-05-01T00:00:00Z",
"end_time": "2023-05-01T12:00:00Z",
"interval": 60,
"max_points": 1000,
"format": "json"
}
参数说明:
data_source:数据源类型,可选(prometheus、influxdb、timescaledb等)query:原始查询语句,必填start_time:开始时间,必填end_time:结束时间,必填interval:时间间隔(秒),可选max_points:最大数据点数量,可选format:返回格式,可选
响应示例:
{
"success": true,
"code": 200,
"message": "查询成功",
"data": {
"results": [
{
"metric": "node_cpu_seconds_total",
"tags": {"instance": "server-1:9100"},
"values": [
["2023-05-01T00:00:00Z", 0.25],
["2023-05-01T00:01:00Z", 0.26],
...
]
},
...
]
}
}
7.2 指标标签管理
请求URL:/api/v1/metrics/tags
请求方法:GET
请求参数:
metric:指标名称,必填tag_name:标签名称,可选limit:返回结果数量限制,可选,默认100
响应示例:
{
"success": true,
"code": 200,
"message": "查询成功",
"data": {
"metric": "cpu_usage",
"tags": {
"host": ["server-1", "server-2", "server-3", "server-4", "server-5"],
"cpu_id": ["0", "1", "2", "3", "4", "5", "6", "7"]
}
}
}
8. 常见问题
8.1 查询超时
问题:指标查询请求返回504错误
可能原因:
- 查询时间范围过大
- 查询的指标数量过多
- 查询的标签过滤条件不充分
- 服务器负载过高
解决方案:
- 缩小查询时间范围
- 减少单次查询的指标数量
- 增加更具体的标签过滤条件
- 使用更大的时间间隔参数
8.2 数据不准确
问题:查询返回的指标数据与预期不符
可能原因:
- 聚合函数选择不当
- 时间间隔设置不合理
- 指标采集器配置问题
- 数据缺失或采集异常
解决方案:
- 选择合适的聚合函数(avg、max、sum等)
- 调整时间间隔参数
- 检查指标采集器状态和配置
- 使用fill参数处理缺失数据
8.3 接口限流
问题:API调用返回429错误
可能原因:
- API调用频率超过限制
- 单用户查询请求过多
解决方案:
- 减少API调用频率
- 合并多个查询为批量操作
- 增加缓存机制减少重复查询
- 联系管理员提高API配额
