Istio可观测性系列-监控指标没有上报
Categories:
背景
istio
版本: 1.13.4
aeraki
版本: 1.1.5
业务反馈在测试过程中发现 istio_request_total
指标查找不到,通过Prometheus查看发现这个指标部分服务是正常采集上报。
通过 istioctl ps
预览集群状态
# ./istioctl ps
NAME CLUSTER CDS LDS EDS RDS ISTIOD VERSION
test-1.istio-system NOT SENT NOT SENT NOT SENT NOT SENT istiod-66bd9d59d8-k6qzk 65536.65536.65536
test-1.istio-system NOT SENT NOT SENT NOT SENT NOT SENT istiod-66bd9d59d8-k6qzk 65536.65536.65536
components-comsumer-sfkt-default-test-rfgds.components-nacos Kubernetes SYNCED SYNCED SYNCED SYNCED istiod-66bd9d59d8-k6qzk 1.14-dev
components-provider-sfkt-default-test-6gw8b.components-nacos Kubernetes SYNCED SYNCED SYNCED SYNCED istiod-66bd9d59d8-k6qzk 1.14-dev
components-server-sfkt-default-test-frjwm.components-nacos Kubernetes SYNCED SYNCED SYNCED SYNCED istiod-66bd9d59d8-k6qzk 1.14-dev
istio-egressgateway-c5b57c584-rz9rw.istio-system Kubernetes SYNCED SYNCED SYNCED NOT SENT istiod-66bd9d59d8-k6qzk 1.13.4
istio-ingressgateway-7767c959b4-jkbgb.istio-system Kubernetes SYNCED SYNCED SYNCED NOT SENT istiod-66bd9d59d8-k6qzk 1.13.4
mesh-demo-dubbo-consumer-test-rpzzh.pulsar-manager Kubernetes SYNCED SYNCED SYNCED SYNCED istiod-66bd9d59d8-k6qzk 1.14-dev
mesh-demo-dubbo-pv1111-test-rfmtg.pulsar-manager Kubernetes SYNCED SYNCED SYNCED SYNCED istiod-66bd9d59d8-k6qzk 1.14-dev
mesh-demo-dubbo-pv2-test-gjhwb.pulsar-manager Kubernetes SYNCED SYNCED SYNCED SYNCED istiod-66bd9d59d8-k6qzk 1.14-dev
mesh-demo-httpv1-test-8b7f5.pulsar-manager Kubernetes SYNCED SYNCED SYNCED SYNCED istiod-66bd9d59d8-k6qzk 1.14-dev
mesh-demo-httpv2-test-qlkhb.pulsar-manager Kubernetes SYNCED SYNCED SYNCED SYNCED istiod-66bd9d59d8-k6qzk 1.14-dev
sl-job-agentd-sl-job-agentd-test-fskpl-lg8hq.sl-devops Kubernetes SYNCED SYNCED SYNCED SYNCED istiod-66bd9d59d8-k6qzk 1.12-dev
sl-job-agentd-sl-job-server-test-fskpl-z5b4t.sl-devops Kubernetes SYNCED SYNCED SYNCED SYNCED istiod-66bd9d59d8-k6qzk 1.12-dev
sl-job-web-api-sl-job-web-api-dev-2pm6n.sl-devops Kubernetes SYNCED SYNCED SYNCED SYNCED istiod-66bd9d59d8-k6qzk 1.12-dev
可以看到整个istio
集群下存在多个版本,其中 1.13.4
是属于istio正常安装版本,发现 1.12-dev
pod能够正常上报 istio_request_total
指标,而 1.14-dev
没有正常上报。那么 1.12-dev
和 1.14-dev
是怎么来的呢?查看这些pod的使用的 istio-proxy
的镜像是 aeraki
项目下的 meta-protocol-proxy
代理镜像,并且使用的版本不一致,将无法正常上报指标的 istio-proxy
镜像修改后可以正常查看指标。
# kubectl get pod -n ns sl-starter-mesh-demo-sfkt-deve-default-deve-8hlgb -o yaml|more
apiVersion: v1
kind: Pod
metadata:
annotations:
kubectl.kubernetes.io/default-container: sl-starter-mesh-demo-hades
kubectl.kubernetes.io/default-logs-container: sl-starter-mesh-demo-hades
lifecycle.apps.kruise.io/timestamp: "2023-01-03T07:45:40Z"
lxcfs-admission-webhook.aliyun.com/status: mutated
prometheus.io/path: /stats/prometheus
prometheus.io/port: "15020"
prometheus.io/scrape: "true"
sidecar.istio.io/bootstrapOverride: aeraki-bootstrap-config
sidecar.istio.io/inject: "true"
sidecar.istio.io/proxyImage: aeraki/meta-protocol-proxy-debug:1.1.2
...
...
# kubectl get pod -n ns components-comsumer-sfkt-default-test-rfgds -o yaml|more
apiVersion: v1
kind: Pod
metadata:
annotations:
kubectl.kubernetes.io/default-container: components-comsumer
kubectl.kubernetes.io/default-logs-container: components-comsumer
lifecycle.apps.kruise.io/timestamp: "2023-01-12T05:32:51Z"
lxcfs-admission-webhook.aliyun.com/status: mutated
prometheus.io/path: /stats/prometheus
prometheus.io/port: "15020"
prometheus.io/scrape: "true"
sidecar.istio.io/bootstrapOverride: aeraki-bootstrap-config
sidecar.istio.io/inject: "true"
sidecar.istio.io/proxyImage: aeraki/meta-protocol-proxy-debug:1.1.5
...
...
根本原因
通过Istio官方文档发现istio默认指标也是由 EnvoyFilter
控制采集,集群内安装的是1.13.4,默认配置了1.11、1.12和1.13版本的,但是没有1.14版本的,而 aeraki/meta-protocol-proxy-debug:1.1.5
是基于 istio 1.14版本开发的,由于没有集群内没有配置1.14版本的 EnvoyFilter
,所以最终导致1.12版本的proxy可以正常上报指标,1.14版本的proxy没有正常上报指标
# kubectl get envoyfilter -n istio-system
NAME AGE
stats-filter-1.11 69d
stats-filter-1.12 69d
stats-filter-1.13 69d
tcp-stats-filter-1.11 69d
tcp-stats-filter-1.12 69d
tcp-stats-filter-1.13 69d
# kubectl get envoyfilter -n istio-system
NAME AGE
stats-filter-1.11 69d
stats-filter-1.12 69d
stats-filter-1.13 69d
stats-filter-1.14 2m15s
tcp-stats-filter-1.11 69d
tcp-stats-filter-1.12 69d
tcp-stats-filter-1.13 69d
tcp-stats-filter-1.14 2m8s
添加1.14版本相关的 EnvoyFilter
之后指标可以正常上报
# curl 10.98.211.214:15020/stats/prometheus|grep istio_request|more
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 13666 0 13666 0 0 23879 0 --:--:-- --:--:-- --:--:-- 23849# TYPE istio_requests_total counter
istio_requests_total{response_code="200",reporter="destination",source_workload="components-server-sfkt-default-test",source_workload_namespace="components-nacos",source_pr
incipal="spiffe://kubeyy.com/ns/components-nacos/sa/default",source_app="components-server-sfkt-default-test",source_version="unknown",source_cluster="Kubernetes",destinati
on_workload="components-comsumer-sfkt-default-test",destination_workload_namespace="components-nacos",destination_principal="spiffe://kubeyy.com/ns/components-nacos/sa/defa
ult",destination_app="components-comsumer-sfkt-default-test",destination_version="unknown",destination_service="components-comsumer-test-ps.components-nacos.svc.kubeyy.com"
,destination_service_name="components-comsumer-test-ps",destination_service_namespace="components-nacos",destination_cluster="Kubernetes",request_protocol="http",response_f
lags="-",grpc_response_status="",connection_security_policy="mutual_tls",source_canonical_service="components-server-sfkt-default-test",destination_canonical_service="compo
nents-comsumer-sfkt-default-test",source_canonical_revision="latest",destination_canonical_revision="latest"} 699
istio_requests_total{response_code="200",reporter="destination",source_workload="unknown",source_workload_namespace="unknown",source_principal="unknown",source_app="unknown
",source_version="unknown",source_cluster="unknown",destination_workload="components-comsumer-sfkt-default-test",destination_workload_namespace="components-nacos",destinati
on_principal="unknown",destination_app="components-comsumer-sfkt-default-test",destination_version="unknown",destination_service="svc-metrics-components-nacos-test.componen
ts-nacos.svc.kubeyy.com",destination_service_name="svc-metrics-components-nacos-test",destination_service_namespace="components-nacos",destination_cluster="Kubernetes",requ
est_protocol="http",response_flags="-",grpc_response_status="",connection_security_policy="none",source_canonical_service="unknown",destination_canonical_service="component
s-comsumer-sfkt-default-test",source_canonical_revision="latest",destination_canonical_revision="latest"} 2