반응형

몇 달전 Prometheus를 설치했었다.

그런데 오랜만에 사용하려고 보니, ImagePullBackOff가 떠있는 것이다.

controller-0:~/Jane# kubectl get po -n jane-infra-monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-jane-prometheus-kube-promet-alertmanager-0 2/2 Running 0 24m
jane-prometheus-grafana-5d7d5b55dd-hrdcn 0/3 ImagePullBackOff 0 4m59s
jane-prometheus-grafana-66b98448b8-ddb7d 3/3 Running 0 23m
jane-prometheus-kube-promet-operator-5579fff8bf-rbtcl 1/1 Running 0 23m
jane-prometheus-kube-promet-operator-6c84f4b45c-f2p89 0/1 ImagePullBackOff 0 4m59s
jane-prometheus-kube-state-metrics-59c698f9d6-zjl6f 1/1 Running 0 20m
jane-prometheus-prometheus-node-exporter-rbmgx 1/1 Running 0 10d
jane-prometheus-prometheus-node-exporter-szcnj 1/1 Running 0 157m
jane-prometheus-prometheus-node-exporter-twlwt 1/1 Running 0 24d
prometheus-jane-prometheus-kube-promet-prometheus-0 0/2 CrashLoopBackOff 0 4m51s

 

한 pod를 describe로 조회해보니 image가 없다고 한다.

controller-0:~/Jane# kubectl describe po -n jane-infra-monitoring jane-prometheus-grafana-5d7d5b55dd-hrdcn
Warning Failed 89s (x3 over 2m9s) kubelet, controller-1 Failed to pull image "registry.infra.jane.cluster.local:19092/quay.io/prometheus/prometheus:v2.33.1": rpc error: code = NotFound desc = failed to pull and unpack image "registry.infra.jane.cluster.local:19092/quay.io/prometheus/prometheus:v2.33.1": failed to resolve reference "registry.infra.jane.cluster.local:19092/quay.io/prometheus/prometheus:v2.33.1": registry.infra.jane.cluster.local:19092/quay.io/prometheus/prometheus:v2.33.1: not found

 

그래서 image를 다른 registry에서 가져오고, deployment에서 image에 맞는 tag값까지 수정해줬다.

(관련 포스팅은  https://countrymouse.tistory.com/entry/docker  )

 

 

그런데, 맨 밑 pod(prometheus-jane-prometheus-kube-promet-prometheus)는 deployment가 아닌 statefulset이였다.

그래서 이 또한 수정을 했는데,

controller-0:~$ kubectl edit statefulsets.apps -n dso-infra-monitoring prometheus-dso-prometheus-kube-promet-prometheus
statefulset.apps/prometheus-dso-prometheus-kube-promet-prometheus edited


노답. 수정하자마자 기존 image tag로 값이 원복되는 것이다.

그래서 pod가 안 산다.

 

 

해결책

결국 해결책은, cr(custom resource)에서 직접 tag값을 수정하는 것이었다.

1. statfulset owner 확인: Prometheus, dso-prometheus-kube-promet-prometheus

controller-0:~# kubectl get sts -n jane-infra-monitoring  prometheus-jane-prometheus-kube-promet-prometheus -oyaml
  ownerReferences:
  - apiVersion: monitoring.coreos.com/v1
    blockOwnerDeletion: true
    controller: true
    kind: Prometheus
    name: jane-prometheus-kube-promet-prometheus
    uid: fb999cb7-55e2-4756-8f74-eeeba6e943c1

 

2. cr 수정 : 이미지 tag 변경
# kubectl edit Prometheus  -n jane-infra-monitoring

...
spec:
  alerting:
    alertmanagers:
    - apiVersion: v2
      name: jane-prometheus-kube-promet-alertmanager
      namespace: jane-infra-monitoring
      pathPrefix: /
      port: http-web
  enableAdminAPI: false
  externalUrl: http://jane-prometheus-kube-promet-prometheus.jane-infra-monitoring:9090
  image: registry.infra.jane.cluster.local:19092/quay.io/prometheus/prometheus:v2.22.1
  imagePullSecrets:
  - name: registrykey
  listenLocal: false
  logFormat: logfmt
  logLevel: info
  paused: false
  podMonitorNamespaceSelector: {}
  podMonitorSelector:
    matchLabels:
      release: jane-prometheus
  portName: http-web
  probeNamespaceSelector: {}
  probeSelector:
    matchLabels:
      release: jane-prometheus
  replicas: 1
  retention: 10d
  routePrefix: /
  ruleNamespaceSelector: {}
  ruleSelector:
    matchLabels:
      release: jane-prometheus
  securityContext:
    fsGroup: 2000
    runAsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  serviceAccountName: jane-prometheus-kube-promet-prometheus
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector:
    matchLabels:
      release: jane-prometheus
  shards: 1
  version: v2.22.1   // 여기서 변경

 

반응형

+ Recent posts