Сервер метрик находится в CrashLoopBackOff с НОВОЙ установкой rke
Я устанавливал как минимум 10 раз за последние дни, но каждый раз одно и то же. Все работает нормально, но сервер метрик находится в CrashLoopBackOff.
то, что, как я понимаю, приведено ниже, отсутствует в YAML-файле модулей и его необходимо добавить в развертывание.
Я новичок в Kubernetes, у меня 2 вопроса
Я использую rke для установки кластера rancher, так почему в модулях не будет указана следующая настройка для запуска метрического сервера?
Команда: /metrics-server
--cert-dir=/tmp
--secure-port=4443
--kubelet-preferred-address-types=InternalIP
--kubelet-insecure-tlsкак лучше всего добавить эти строки, я новичок, поэтому мне нужны некоторые рекомендации
Информация о кластере:
Kubernetes version:
[rke@rke19-master1 ~]$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
192.168.0.56 Ready controlplane,etcd 17m v1.19.10
192.168.0.57 Ready controlplane,etcd 17m v1.19.10
192.168.0.58 Ready controlplane,etcd 17m v1.19.10
192.168.0.59 Ready worker 17m v1.19.10
192.168.0.60 Ready worker 17m v1.19.10
[rke@rke19-master1 ~]$
[rke@rke19-master1 ~]$ kubectl get pods metrics-server-5b6d79d4f4-ggl57 -n kube-system -o yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
cni.projectcalico.org/podIP: 10.42.4.3/32
cni.projectcalico.org/podIPs: 10.42.4.3/32
creationTimestamp: "2021-08-16T23:00:42Z"
generateName: metrics-server-5b6d79d4f4-
labels:
k8s-app: metrics-server
pod-template-hash: 5b6d79d4f4
managedFields:
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:generateName: {}
f:labels:
.: {}
f:k8s-app: {}
f:pod-template-hash: {}
f:ownerReferences:
.: {}
k:{"uid":"fb15b257-4a9d-478b-b461-8b61c165e3db"}:
.: {}
f:apiVersion: {}
f:blockOwnerDeletion: {}
f:controller: {}
f:kind: {}
f:name: {}
f:uid: {}
f:spec:
f:affinity:
.: {}
f:nodeAffinity:
.: {}
f:requiredDuringSchedulingIgnoredDuringExecution:
.: {}
f:nodeSelectorTerms: {}
f:containers:
k:{"name":"metrics-server"}:
.: {}
f:args: {}
f:image: {}
f:imagePullPolicy: {}
f:livenessProbe:
.: {}
f:failureThreshold: {}
f:httpGet:
.: {}
f:path: {}
f:port: {}
f:scheme: {}
f:periodSeconds: {}
f:successThreshold: {}
f:timeoutSeconds: {}
f:name: {}
f:ports:
.: {}
k:{"containerPort":4443,"protocol":"TCP"}:
.: {}
f:containerPort: {}
f:name: {}
f:protocol: {}
f:readinessProbe:
.: {}
f:failureThreshold: {}
f:httpGet:
.: {}
f:path: {}
f:port: {}
f:scheme: {}
f:periodSeconds: {}
f:successThreshold: {}
f:timeoutSeconds: {}
f:resources: {}
f:securityContext:
.: {}
f:readOnlyRootFilesystem: {}
f:runAsNonRoot: {}
f:runAsUser: {}
f:terminationMessagePath: {}
f:terminationMessagePolicy: {}
f:volumeMounts:
.: {}
k:{"mountPath":"/tmp"}:
.: {}
f:mountPath: {}
f:name: {}
f:dnsPolicy: {}
f:enableServiceLinks: {}
f:priorityClassName: {}
f:restartPolicy: {}
f:schedulerName: {}
f:securityContext: {}
f:serviceAccount: {}
f:serviceAccountName: {}
f:terminationGracePeriodSeconds: {}
f:tolerations: {}
f:volumes:
.: {}
k:{"name":"tmp-dir"}:
.: {}
f:emptyDir: {}
f:name: {}
manager: kube-controller-manager
operation: Update
time: "2021-08-16T23:00:42Z"
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:cni.projectcalico.org/podIP: {}
f:cni.projectcalico.org/podIPs: {}
manager: calico
operation: Update
time: "2021-08-16T23:00:47Z"
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:status:
f:conditions:
k:{"type":"ContainersReady"}:
.: {}
f:lastProbeTime: {}
f:lastTransitionTime: {}
f:message: {}
f:reason: {}
f:status: {}
f:type: {}
k:{"type":"Initialized"}:
.: {}
f:lastProbeTime: {}
f:lastTransitionTime: {}
f:status: {}
f:type: {}
k:{"type":"Ready"}:
.: {}
f:lastProbeTime: {}
f:lastTransitionTime: {}
f:message: {}
f:reason: {}
f:status: {}
f:type: {}
f:containerStatuses: {}
f:hostIP: {}
f:phase: {}
f:podIP: {}
f:podIPs:
.: {}
k:{"ip":"10.42.4.3"}:
.: {}
f:ip: {}
f:startTime: {}
manager: kubelet
operation: Update
time: "2021-08-16T23:00:54Z"
name: metrics-server-5b6d79d4f4-ggl57
namespace: kube-system
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: metrics-server-5b6d79d4f4
uid: fb15b257-4a9d-478b-b461-8b61c165e3db
resourceVersion: "5775"
selfLink: /api/v1/namespaces/kube-system/pods/metrics-server-5b6d79d4f4-ggl57
uid: af8d4e07-aa3f-4efe-8169-feb37cfd97df
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: beta.kubernetes.io/os
operator: NotIn
values:
- windows
- key: node-role.kubernetes.io/worker
operator: Exists
containers:
- args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
- --logtostderr
image: 192.168.0.35:5000/rancher/metrics-server:v0.3.6
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /livez
port: https
scheme: HTTPS
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: metrics-server
ports:
- containerPort: 4443
name: https
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /readyz
port: https
scheme: HTTPS
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources: {}
securityContext:
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /tmp
name: tmp-dir
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: metrics-server-token-78b6h
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: 192.168.0.59
preemptionPolicy: PreemptLowerPriority
priority: 2000000000
priorityClassName: system-cluster-critical
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: metrics-server
serviceAccountName: metrics-server
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
operator: Exists
- effect: NoSchedule
operator: Exists
volumes:
- emptyDir: {}
name: tmp-dir
- name: metrics-server-token-78b6h
secret:
defaultMode: 420
secretName: metrics-server-token-78b6h
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2021-08-16T23:00:43Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2021-08-16T23:00:43Z"
message: 'containers with unready status: [metrics-server]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2021-08-16T23:00:43Z"
message: 'containers with unready status: [metrics-server]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2021-08-16T23:00:43Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://344c587a7edd3abed035c12bfc16b9dbd0da3f26ba9101aa246bf4793648d380
image: 192.168.0.35:5000/rancher/metrics-server:v0.3.6
imageID: docker-pullable://192.168.0.35:5000/rancher/metrics-server@sha256:c9c4e95068b51d6b33a9dccc61875df07dc650abbf4ac1a19d58b4628f89288b
lastState:
terminated:
containerID: docker://e28b6812965786cd2f520a20dd2adf6cbe9c6a720de905ce16992ed0f4cd7c9e
exitCode: 2
finishedAt: "2021-08-16T23:21:47Z"
reason: Error
startedAt: "2021-08-16T23:21:18Z"
name: metrics-server
ready: false
restartCount: 12
started: true
state:
running:
startedAt: "2021-08-16T23:26:52Z"
hostIP: 192.168.0.59
phase: Running
podIP: 10.42.4.3
podIPs:
- ip: 10.42.4.3
qosClass: BestEffort
startTime: "2021-08-16T23:00:43Z"
[rke@rke19-master1 ~]$ kubectl describe pods metrics-server-5b6d79d4f4-ggl57 -n kube-system
Name: metrics-server-5b6d79d4f4-ggl57
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: 192.168.0.59/192.168.0.59
Start Time: Tue, 17 Aug 2021 00:00:43 +0100
Labels: k8s-app=metrics-server
pod-template-hash=5b6d79d4f4
Annotations: cni.projectcalico.org/podIP: 10.42.4.3/32
cni.projectcalico.org/podIPs: 10.42.4.3/32
Status: Running
IP: 10.42.4.3
IPs:
IP: 10.42.4.3
Controlled By: ReplicaSet/metrics-server-5b6d79d4f4
Containers:
metrics-server:
Container ID: docker://74ea122709aefc07b89dcbd3514e86fdff9874627b87413571d1624a55c32baa
Image: 192.168.0.35:5000/rancher/metrics-server:v0.3.6
Image ID: docker-pullable://192.168.0.35:5000/rancher/metrics-server@sha256:c9c4e95068b51d6b33a9dccc61875df07dc650abbf4ac1a19d58b4628f89288b
Port: 4443/TCP
Host Port: 0/TCP
Args:
--cert-dir=/tmp
--secure-port=4443
--kubelet-insecure-tls
--kubelet-preferred-address-types=InternalIP
--logtostderr
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Tue, 17 Aug 2021 00:27:18 +0100
Finished: Tue, 17 Aug 2021 00:27:47 +0100
Ready: False
Restart Count: 13
Liveness: http-get https://:https/livez delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get https://:https/readyz delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/tmp from tmp-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from metrics-server-token-78b6h (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
tmp-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
metrics-server-token-78b6h:
Type: Secret (a volume populated by a Secret)
SecretName: metrics-server-token-78b6h
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: :NoExecuteop=Exists
:NoScheduleop=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 28m default-scheduler Successfully assigned kube-system/metrics-server-5b6d79d4f4-ggl57 to 192.168.0.59
Normal Pulling 28m kubelet Pulling image "192.168.0.35:5000/rancher/metrics-server:v0.3.6"
Normal Pulled 28m kubelet Successfully pulled image "192.168.0.35:5000/rancher/metrics-server:v0.3.6" in 4.687484656s
Warning Unhealthy 28m kubelet Readiness probe failed: Get "https://10.42.4.3:4443/readyz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 28m kubelet Liveness probe failed: Get "https://10.42.4.3:4443/livez": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 27m kubelet Readiness probe failed: Get "https://10.42.4.3:4443/readyz": dial tcp 10.42.4.3:4443: connect: connection refused
Warning Unhealthy 27m (x5 over 28m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 404
Warning Unhealthy 27m (x5 over 28m) kubelet Liveness probe failed: HTTP probe failed with statuscode: 404
Normal Killing 27m (x2 over 27m) kubelet Container metrics-server failed liveness probe, will be restarted
Normal Created 27m (x3 over 28m) kubelet Created container metrics-server
Normal Started 27m (x3 over 28m) kubelet Started container metrics-server
Normal Pulled 8m14s (x10 over 27m) kubelet Container image "192.168.0.35:5000/rancher/metrics-server:v0.3.6" already present on machine
Warning BackOff 3m15s (x97 over 25m) kubelet Back-off restarting failed container
[rke@rke19-master1 ~]$
[rke@rke19-master1 ~]$ ^C
[rke@rke19-master1 ~]$ kubectl logs metrics-server-5b6d79d4f4-ggl57 -n kube-system
I0816 23:27:20.011598 1 secure_serving.go:116] Serving securely on [::]:4443
[rke@rke19-master1 ~]$
2 ответа
The Failed with statuscode: 404
Сообщение предполагает, что вы запрашиваете несуществующий адрес.
Мы видим, что вы извлекаете какой-то тег v0.3.6 из образа сервера метрик. И хотя это исходит от Rancher, мы можем предположить, что они придерживаются верхнего уровня версий.
Проверяя журналы изменений в апстриме, мы видим, что/livez
и/readyz
были представлены в версии 0.4.0, см.: https://github.com/kubernetes-sigs/metrics-server/releases/tag/v0.4.0 .
Я бы посоветовал вам попробовать запросить/healthz
URL-адрес, который был удален из версии 0.4.0. Или замените зонды httpGet на tcpSocket. Или: попробовать обновить сервер метрик до последней версии?
Я обнаружил, что тайм-ауты проверки на сервере метрик слишком агрессивны:
readinessProbe:
failureThreshold: 3
httpGet:
path: /readyz
port: https
scheme: HTTPS
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
livenessProbe:
failureThreshold: 3
httpGet:
path: /livez
port: https
scheme: HTTPS
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
Что происходит, так это то, что metrics-server возвращает «ok» как для livez, так и для Readyz, но обработка запросов занимает более одной секунды:
$ time curl -k https://SNIPPED:4443/livez
ok
real 0m3.081s
user 0m0.031s
sys 0m0.005s
$ time curl -k https://SNIPPED:4443/readyz
ok
real 0m3.206s
user 0m0.020s
sys 0m0.013s
Поскольку 3 секунды больше 1 секунды, оно не «живое» и не «готовое». Я понятия не имею, почему ответ занимает 3 секунды, но это основная проблема, почему происходит CrashLoopBackOff.
Способ обойти эту проблему — изменить время ожидания, скажем, на 5 или 10 секунд. Использоватьkubectl edit metrics-server -n kube-system
чтобы изменить его на месте.