Istio可观测性系列-性能指标
Categories:
Istio 指标
Istio 自己的 Metrics
标准指标说明
参考:https://istio.io/latest/docs/reference/config/metrics/
Metrics
对于 HTTP、HTTP/2 和 GRPC 流量,Istio 默认生成以下指标:
- Request Count (
istio_requests_total): This is aCOUNTERincremented for every request handled by an Istio proxy. - Request Duration (
istio_request_duration_milliseconds): This is aDISTRIBUTIONwhich measures the duration of requests. - Request Size (
istio_request_bytes): This is aDISTRIBUTIONwhich measures HTTP request body sizes. - Response Size (
istio_response_bytes): This is aDISTRIBUTIONwhich measures HTTP response body sizes. - gRPC Request Message Count (
istio_request_messages_total): This is aCOUNTERincremented for every gRPC message sent from a client. - gRPC Response Message Count (
istio_response_messages_total): This is aCOUNTERincremented for every gRPC message sent from a server.
对于 TCP 流量,Istio 生成以下指标:
- Tcp Bytes Sent (
istio_tcp_sent_bytes_total): This is aCOUNTERwhich measures the size of total bytes sent during response in case of a TCP connection. - Tcp Bytes Received (
istio_tcp_received_bytes_total): This is aCOUNTERwhich measures the size of total bytes received during request in case of a TCP connection. - Tcp Connections Opened (
istio_tcp_connections_opened_total): This is aCOUNTERincremented for every opened connection. - Tcp Connections Closed (
istio_tcp_connections_closed_total): This is aCOUNTERincremented for every closed connection.
Prometheus 的 Labels
Reporter: This identifies the reporter of the request. It is set to
destinationif report is from a server Istio proxy andsourceif report is from a client Istio proxy or a gateway.Source Workload: This identifies the name of source workload which controls the source, or “unknown” if the source information is missing.
Source Workload Namespace: This identifies the namespace of the source workload, or “unknown” if the source information is missing.
Source Principal: This identifies the peer principal of the traffic source. It is set when peer authentication is used.
Source App: This identifies the source application based on
applabel of the source workload, or “unknown” if the source information is missing.Source Version: This identifies the version of the source workload, or “unknown” if the source information is missing.
Destination Workload: This identifies the name of destination workload, or “unknown” if the destination information is missing.
Destination Workload Namespace: This identifies the namespace of the destination workload, or “unknown” if the destination information is missing.
Destination Principal: This identifies the peer principal of the traffic destination. It is set when peer authentication is used.
Destination App: This identifies the destination application based on
applabel of the destination workload, or “unknown” if the destination information is missing.Destination Version: This identifies the version of the destination workload, or “unknown” if the destination information is missing.
Destination Service: This identifies destination service host responsible for an incoming request. Ex:
details.default.svc.cluster.local.Destination Service Name: This identifies the destination service name. Ex: “details”.
Destination Service Namespace: This identifies the namespace of destination service.
Request Protocol: This identifies the protocol of the request. It is set to request or connection protocol.
Response Code: This identifies the response code of the request. This label is present only on HTTP metrics.
Connection Security Policy: This identifies the service authentication policy of the request. It is set to
mutual_tlswhen Istio is used to make communication secure and report is from destination. It is set tounknownwhen report is from source since security policy cannot be properly populated.Response Flags: Additional details about the response or connection from proxy. In case of Envoy, see
%RESPONSE_FLAGS%in Envoy Access Log for more detail.Canonical Service: A workload belongs to exactly one canonical service, whereas it can belong to multiple services. A canonical service has a name and a revision so it results in the following labels.
source_canonical_service source_canonical_revision destination_canonical_service destination_canonical_revisionDestination Cluster: This identifies the cluster of the destination workload. This is set by:
global.multiCluster.clusterNameat cluster install time.Source Cluster: This identifies the cluster of the source workload. This is set by:
global.multiCluster.clusterNameat cluster install time.gRPC Response Status: This identifies the response status of the gRPC. This label is present only on gRPC metrics.
使用
istio-proxy 与应用的 Metrics 整合输出
参考:https://istio.io/v1.14/docs/ops/integrations/prometheus/#option-1-metrics-merging
Istio 能够完全通过 prometheus.io annotations 来控制抓取。虽然 prometheus.io annotations 不是 Prometheus 的核心部分,但它们已成为配置抓取的事实标准。
此选项默认启用,但可以通过在 安装 期间传递 --set meshConfig.enablePrometheusMerge=false 来禁用。启用后,将向所有数据平面 pod 添加适当的 prometheus.io annotations 以设置抓取。如果这些注释已经存在,它们将被覆盖。使用此选项,Envoy sidecar 会将 Istio 的指标与应用程序指标合并。合并后的指标将从 /stats/prometheus:15020 中抓取。
此选项以明文形式公开所有指标。
定制:为 Metrics 增加维度
如,增加端口、与 HTTP HOST 头 维度。
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
values:
telemetry:
v2:
prometheus:
configOverride:
inboundSidecar:
metrics:
- name: requests_total
dimensions:
destination_port: string(destination.port)
request_host: request.host
outboundSidecar:
metrics:
- name: requests_total
dimensions:
destination_port: string(destination.port)
request_host: request.host
gateway:
metrics:
- name: requests_total
dimensions:
destination_port: string(destination.port)
request_host: request.host
- 使用以下命令将以下 annotation 应用到所有注入的 pod,其中包含要提取到 Prometheus 时间序列 的维度列表:
仅当您的维度不在 [DefaultStatTags 列表] 中时才需要此步骤(https://github.com/istio/istio/blob/release-1.14/pkg/bootstrap/config.go)
apiVersion: apps/v1
kind: Deployment
spec:
template: # pod template
metadata:
annotations:
sidecar.istio.io/extraStatTags: destination_port,request_host
要在网格范围内启用额外 Tag ,您可以将 extraStatTags 添加到网格配置中:
meshConfig:
defaultConfig:
extraStatTags:
- destination_port
- request_host
参考 : https://istio.io/latest/docs/reference/config/proxy_extensions/stats/#MetricConfig
定制:加入 request / response 元信息维度
可以把 request 或 repsonse 里一些基础信息 加入到 指标的维度。如,URL Path,这在需要为相同服务分隔统计不同 REST API 的指标时,相当有用。
参考 : https://istio.io/latest/docs/tasks/observability/metrics/classify-metrics/
工作原理
istio stat filter 使用
Istio 在自己的定制版本 Envoy 中,加入了 stats-filter 插件,用于计算 Istio 自己想要的指标:
$ k -n istio-system get envoyfilters.networking.istio.io stats-filter-1.14 -o yaml
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
annotations:
labels:
install.operator.istio.io/owning-resource-namespace: istio-system
istio.io/rev: default
operator.istio.io/component: Pilot
operator.istio.io/version: 1.14.3
name: stats-filter-1.14
namespace: istio-system
spec:
configPatches:
- applyTo: HTTP_FILTER
match:
context: SIDECAR_OUTBOUND
listener:
filterChain:
filter:
name: envoy.filters.network.http_connection_manager
subFilter:
name: envoy.filters.http.router
proxy:
proxyVersion: ^1\.14.*
patch:
operation: INSERT_BEFORE
value:
name: istio.stats
typed_config:
'@type': type.googleapis.com/udpa.type.v1.TypedStruct
type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
value:
config:
configuration:
'@type': type.googleapis.com/google.protobuf.StringValue
value: |
{
"debug": "false",
"stat_prefix": "istio"
}
root_id: stats_outbound
vm_config:
code:
local:
inline_string: envoy.wasm.stats
runtime: envoy.wasm.runtime.null
vm_id: stats_outbound
- applyTo: HTTP_FILTER
match:
context: SIDECAR_INBOUND
listener:
filterChain:
filter:
name: envoy.filters.network.http_connection_manager
subFilter:
name: envoy.filters.http.router
proxy:
proxyVersion: ^1\.14.*
patch:
operation: INSERT_BEFORE
value:
name: istio.stats
typed_config:
'@type': type.googleapis.com/udpa.type.v1.TypedStruct
type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
value:
config:
configuration:
'@type': type.googleapis.com/google.protobuf.StringValue
value: |
{
"debug": "false",
"stat_prefix": "istio",
"disable_host_header_fallback": true,
"metrics": [
{
"dimensions": {
"destination_cluster": "node.metadata['CLUSTER_ID']",
"source_cluster": "downstream_peer.cluster_id"
}
}
]
}
root_id: stats_inbound
vm_config:
code:
local:
inline_string: envoy.wasm.stats
runtime: envoy.wasm.runtime.null
vm_id: stats_inbound
- applyTo: HTTP_FILTER
match:
context: GATEWAY
listener:
filterChain:
filter:
name: envoy.filters.network.http_connection_manager
subFilter:
name: envoy.filters.http.router
proxy:
proxyVersion: ^1\.14.*
patch:
operation: INSERT_BEFORE
value:
name: istio.stats
typed_config:
'@type': type.googleapis.com/udpa.type.v1.TypedStruct
type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
value:
config:
configuration:
'@type': type.googleapis.com/google.protobuf.StringValue
value: |
{
"debug": "false",
"stat_prefix": "istio",
"disable_host_header_fallback": true
}
root_id: stats_outbound
vm_config:
code:
local:
inline_string: envoy.wasm.stats
runtime: envoy.wasm.runtime.null
vm_id: stats_outbound
priority: -1
istio stat Plugin 实现
https://github.com/istio/proxy/blob/release-1.14/extensions/stats/plugin.cc
内置的 Metric:
const std::vector<MetricFactory>& PluginRootContext::defaultMetrics() {
static const std::vector<MetricFactory> default_metrics = {
// HTTP, HTTP/2, and GRPC metrics
MetricFactory{"requests_total", MetricType::Counter,
[](::Wasm::Common::RequestInfo&) -> uint64_t { return 1; },
static_cast<uint32_t>(Protocol::HTTP) |
static_cast<uint32_t>(Protocol::GRPC),
count_standard_labels, /* recurrent */ false},
MetricFactory{"request_duration_milliseconds", MetricType::Histogram,
[](::Wasm::Common::RequestInfo& request_info) -> uint64_t {
return request_info.duration /* in nanoseconds */ /
1000000;
},
static_cast<uint32_t>(Protocol::HTTP) |
static_cast<uint32_t>(Protocol::GRPC),
count_standard_labels, /* recurrent */ false},
MetricFactory{"request_bytes", MetricType::Histogram,
[](::Wasm::Common::RequestInfo& request_info) -> uint64_t {
return request_info.request_size;
},
static_cast<uint32_t>(Protocol::HTTP) |
static_cast<uint32_t>(Protocol::GRPC),
count_standard_labels, /* recurrent */ false},
MetricFactory{"response_bytes", MetricType::Histogram,
[](::Wasm::Common::RequestInfo& request_info) -> uint64_t {
return request_info.response_size;
},
static_cast<uint32_t>(Protocol::HTTP) |
static_cast<uint32_t>(Protocol::GRPC),
count_standard_labels, /* recurrent */ false},
...
https://github.com/istio/proxy/blob/release-1.14/extensions/stats/plugin.cc#L591
void PluginRootContext::report(::Wasm::Common::RequestInfo& request_info,
bool end_stream) {
...
map(istio_dimensions_, outbound_, peer_node_info.get(), request_info);
for (size_t i = 0; i < expressions_.size(); i++) {
if (!evaluateExpression(expressions_[i].token,
&istio_dimensions_.at(count_standard_labels + i))) {
LOG_TRACE(absl::StrCat("Failed to evaluate expression: <",
expressions_[i].expression, ">"));
istio_dimensions_[count_standard_labels + i] = "unknown";
}
}
auto stats_it = metrics_.find(istio_dimensions_);
if (stats_it != metrics_.end()) {
for (auto& stat : stats_it->second) {
if (end_stream || stat.recurrent_) {
stat.record(request_info);
}
LOG_DEBUG(
absl::StrCat("metricKey cache hit ", ", stat=", stat.metric_id_));
}
cache_hits_accumulator_++;
if (cache_hits_accumulator_ == 100) {
incrementMetric(cache_hits_, cache_hits_accumulator_);
cache_hits_accumulator_ = 0;
}
return;
}
...
}
关于 Istio 的指标原理,这是一个很好的参考文章:https://blog.christianposta.com/understanding-istio-telemetry-v2/
Envoy 内置的 Metrics
Istio 默认用 istio-agent 去整合 Envoy 的 metrics。 而 Istio 默认打开的 Envoy 内置 Metrics 很少:
见:https://istio.io/latest/docs/ops/configuration/telemetry/envoy-stats/
cluster_manager
listener_manager
server
cluster.xds-grpc
定制 Envoy 内置的 Metrics
参考:https://istio.io/latest/docs/ops/configuration/telemetry/envoy-stats/
如果要配置 Istio Proxy 以记录 其它 Envoy 原生的指标,您可以将 ProxyConfig.ProxyStatsMatcher 添加到网格配置中。 例如,要全局启用断路器、重试和上游连接的统计信息,您可以指定 stats matcher,如下所示:
代理需要重新启动以获取统计匹配器配置。
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
meshConfig:
defaultConfig:
proxyStatsMatcher:
inclusionRegexps:
- ".*circuit_breakers.*"
inclusionPrefixes:
- "upstream_rq_retry"
- "upstream_cx"
您还可以使用 proxy.istio.io/config annotation 为个别代码指定配置。 例如,要配置与上面相同的统计信息,您可以将 annotation 添加到 gateway proxy 或 workload,如下所示:
metadata:
annotations:
proxy.istio.io/config: |-
proxyStatsMatcher:
inclusionRegexps:
- ".*circuit_breakers.*"
inclusionPrefixes:
- "upstream_rq_retry"
- "upstream_cx"
原理
下面,看看 Istio 默认配置下,如何配置 Envoy。
istioctl proxy-config bootstrap fortio-server | yq eval -P > envoy-config-bootstrap-default.yaml
输出:
bootstrap:
...
statsConfig:
statsTags: # 从指标名中抓取 Tag(prometheus label)
- tagName: cluster_name
regex: ^cluster\.((.+?(\..+?\.svc\.cluster\.local)?)\.)
- tagName: tcp_prefix
regex: ^tcp\.((.*?)\.)\w+?$
- tagName: response_code
regex: (response_code=\.=(.+?);\.;)|_rq(_(\.d{3}))$
- tagName: response_code_class
regex: _rq(_(\dxx))$
- tagName: http_conn_manager_listener_prefix
regex: ^listener(?=\.).*?\.http\.(((?:[_.[:digit:]]*|[_\[\]aAbBcCdDeEfF[:digit:]]*))\.)
...
useAllDefaultTags: false
statsMatcher:
inclusionList:
patterns: # 选择要记录的指标
- prefix: reporter=
- prefix: cluster_manager
- prefix: listener_manager
- prefix: server
- prefix: cluster.xds-grpc ## 只记录 xDS cluster. 即不记录用户自己服务的 cluster !!!
- prefix: wasm
- suffix: rbac.allowed
- suffix: rbac.denied
- suffix: shadow_allowed
- suffix: shadow_denied
- prefix: component
这时,如果修改 pod 的定义为:
annotations:
proxy.istio.io/config: |-
proxyStatsMatcher:
inclusionRegexps:
- "cluster\\..*fortio.*" #proxy upstream(outbound)
- "cluster\\..*inbound.*" #proxy upstream(inbound,这里一般就是指到同一 pod 中运行的应用了)
- "http\\..*"
- "listener\\..*"
产生新的 Envoy 配置:
"stats_matcher": {
"inclusion_list": {
"patterns": [
{
"prefix": "reporter="
},
{
"prefix": "cluster_manager"
},
{
"prefix": "listener_manager"
},
{
"prefix": "server"
},
{
"prefix": "cluster.xds-grpc"
},
{
"safe_regex": {
"google_re2": {},
"regex": "cluster\\..*fortio.*"
}
},
{
"safe_regex": {
"google_re2": {},
"regex": "cluster\\..*inbound.*"
}
},
{
"safe_regex": {
"google_re2": {},
"regex": "http\\..*"
}
},
{
"safe_regex": {
"google_re2": {},
"regex": "listener\\..*"
}
},
总结:Istio-Proxy 指标地图
要做好监控,首先要深入了解指标原理。而要了解指标原理,当然要知道指标是产生流程中的什么位置,什么组件。看完上面关于 Envoy 与 Istio 的指标说明后。可以大概得到以下结论:

本节的实验环境说明见于: {ref}`appendix-lab-env/appendix-lab-env-base:简单分层实验环境`
本文出自https://istio-insider.readthedocs.io/