잘 돌던 Gitlab에서 발생한 500 Error

1. 개요

gitlab에서 간헐적으로 500에러가 발생했다.

gitlab웹화면을 왔다갔다하다보면 500에러 화면이 나타났다 사라졌다 한다. 심지어 ci/cd파이프라인이 작동하고 나면 관련된 gitlab 파이프라인 페이지가 업데이트가 안됐다. (ex. 종료된 파이프라인인데 계속 실행 중인 것으로 조회되는 등..)

오늘은 관련해서 처리한 내용을 포스팅한다.


2. 원인을 찾자

로그 테이블을 마운트시켜둔 위치에서 익셉션 로그 파일을 확인한다.

$ vi logs/gitlab-rails/exceptions_json.log


2.1. 에러로그

{"severity":"ERROR","time":"2021-10-18T01:31:03.373Z","correlation_id":null,"exception.class":"IOError","exception.message":"Can't reserve 1048576 bytes for memory-mapped file in /dev/shm/gitlab/puma/histogram_puma_14-0.db","exception.backtrace":["config/initializers/7_prometheus_metrics.rb:62:in `block in \u003ctop (required)\u003e'","lib/gitlab/cluster/lifecycle_events.rb:151:in `block in call'","lib/gitlab/cluster/lifecycle_events.rb:150:in `each'","lib/gitlab/cluster/lifecycle_events.rb:150:in `call'","lib/gitlab/cluster/lifecycle_events.rb:115:in `do_worker_start'"],"user.username":null,"tags.program":"web","tags.locale":"en","tags.feature_category":null,"tags.correlation_id":null}
{"severity":"ERROR","time":"2021-10-18T01:48:32.613Z","correlation_id":null,"exception.class":"IOError","exception.message":"Can't reserve 1048576 bytes for memory-mapped file in /dev/shm/gitlab/puma/histogram_puma_9-0.db","exception.backtrace":["config/initializers/7_prometheus_metrics.rb:62:in `block in \u003ctop (required)\u003e'","lib/gitlab/cluster/lifecycle_events.rb:151:in `block in call'","lib/gitlab/cluster/lifecycle_events.rb:150:in `each'","lib/gitlab/cluster/lifecycle_events.rb:150:in `call'","lib/gitlab/cluster/lifecycle_events.rb:115:in `do_worker_start'"],"user.username":null,"tags.program":"web","tags.locale":"en","tags.feature_category":null,"tags.correlation_id":null}
{"severity":"ERROR","time":"2021-10-18T01:53:12.871Z","correlation_id":null,"exception.class":"IOError","exception.message":"Can't reserve 1048576 bytes for memory-mapped file in /dev/shm/gitlab/puma/histogram_puma_24-0.db","exception.backtrace":["config/initializers/7_prometheus_metrics.rb:62:in `block in \u003ctop (required)\u003e'","lib/gitlab/cluster/lifecycle_events.rb:151:in `block in call'","lib/gitlab/cluster/lifecycle_events.rb:150:in `each'","lib/gitlab/cluster/lifecycle_events.rb:150:in `call'","lib/gitlab/cluster/lifecycle_events.rb:115:in `do_worker_start'"],"user.username":null,"tags.program":"web","tags.locale":"en","tags.feature_category":null,"tags.correlation_id":null}
{"severity":"ERROR","time":"2021-10-18T01:56:14.267Z","correlation_id":null,"exception.class":"IOError","exception.message":"Can't reserve 1048576 bytes for memory-mapped file in /dev/shm/gitlab/puma/histogram_puma_37-0.db","exception.backtrace":["config/initializers/7_prometheus_metrics.rb:62:in `block in \u003ctop (required)\u003e'","lib/gitlab/cluster/lifecycle_events.rb:151:in `block in call'","lib/gitlab/cluster/lifecycle_events.rb:150:in `each'","lib/gitlab/cluster/lifecycle_events.rb:150:in `call'","lib/gitlab/cluster/lifecycle_events.rb:115:in `do_worker_start'"],"user.username":null,"tags.program":"web","tags.locale":"en","tags.feature_category":null,"tags.correlation_id":null}
{"severity":"ERROR","time":"2021-10-18T01:57:34.534Z","correlation_id":null,"exception.class":"IOError","exception.message":"Can't reserve 1048576 bytes for memory-mapped file in /dev/shm/gitlab/puma/histogram_puma_19-0.db","exception.backtrace":["config/initializers/7_prometheus_metrics.rb:62:in `block in \u003ctop (required)\u003e'","lib/gitlab/cluster/lifecycle_events.rb:151:in `block in call'","lib/gitlab/cluster/lifecycle_events.rb:150:in `each'","lib/gitlab/cluster/lifecycle_events.rb:150:in `call'","lib/gitlab/cluster/lifecycle_events.rb:115:in `do_worker_start'"],"user.username":null,"tags.program":"web","tags.locale":"en","tags.feature_category":null,"tags.correlation_id":null}
{"severity":"ERROR","time":"2021-10-18T01:58:16.581Z","correlation_id":null,"exception.class":"IOError","exception.message":"Can't reserve 1048576 bytes for memory-mapped file in /dev/shm/gitlab/puma/histogram_puma_29-0.db","exception.backtrace":["config/initializers/7_prometheus_metrics.rb:62:in `block in \u003ctop (required)\u003e'","lib/gitlab/cluster/lifecycle_events.rb:151:in `block in call'","lib/gitlab/cluster/lifecycle_events.rb:150:in `each'","lib/gitlab/cluster/lifecycle_events.rb:150:in `call'","lib/gitlab/cluster/lifecycle_events.rb:115:in `do_worker_start'"],"user.username":null,"tags.program":"web","tags.locale":"en","tags.feature_category":null,"tags.correlation_id":null}
{"severity":"ERROR","time":"2021-10-18T02:03:57.601Z","correlation_id":null,"exception.class":"IOError","exception.message":"Can't reserve 1048576 bytes for memory-mapped file in /dev/shm/gitlab/puma/histogram_puma_31-0.db","exception.backtrace":["config/initializers/7_prometheus_metrics.rb:62:in `block in \u003ctop (required)\u003e'","lib/gitlab/cluster/lifecycle_events.rb:151:in `block in call'","lib/gitlab/cluster/lifecycle_events.rb:150:in `each'","lib/gitlab/cluster/lifecycle_events.rb:150:in `call'","lib/gitlab/cluster/lifecycle_events.rb:115:in `do_worker_start'"],"user.username":null,"tags.program":"web","tags.locale":"en","tags.feature_category":null,"tags.correlation_id":null}
{"severity":"ERROR","time":"2021-10-18T02:07:02.135Z","correlation_id":null,"exception.class":"IOError","exception.message":"Can't reserve 1048576 bytes for memory-mapped file in /dev/shm/gitlab/puma/histogram_puma_58-0.db","exception.backtrace":["config/initializers/7_prometheus_metrics.rb:62:in `block in \u003ctop (required)\u003e'","lib/gitlab/cluster/lifecycle_events.rb:151:in `block in call'","lib/gitlab/cluster/lifecycle_events.rb:150:in `each'","lib/gitlab/cluster/lifecycle_events.rb:150:in `call'","lib/gitlab/cluster/lifecycle_events.rb:115:in `do_worker_start'"],"user.username":null,"tags.program":"web","tags.locale":"en","tags.feature_category":null,"tags.correlation_id":null}

위와 같이 로그가 남아있었는데, 가장 의심가는 내용은 해당 로그였다.

Can't reserve 1048576 bytes for memory-mapped file in /dev/shm/gitlab/puma/histogram_puma_14-0.db

로그 내용에 따르면, /dev/shm/gitlab 하위 디렉토리의 파일에 데이터를 write하지 못한 듯 하다.


3. 해결

gitlab 이슈에서 관련된 레퍼런스를 찾을 수 있었다.

gitlab 도커이미지는 그라파나 및 프로메테우스 같은 오픈소스를 내장해 모니터링 기능을 제공한다. 기본 설정은 해당 모니터링 기능이 켜져있는데, 주기적으로 프로메테우스 메트릭을 저장하다가 발생한 이슈였다.

gitlab을 재기동하면 임시적으로 정상화되나 시간이 지나면 결국 재발생하는 이슈로, 모니터링 기능을 끄거나 shm사이즈를 늘려주어야 한다.

따라서 아래 두가지 방법 중 하나로 처리 가능한데, 나같은 경우 아예 모니터링 기능을 비활성화해 해결했다.


3.1. docker-compose수정

gitlab:
  image: 'gitlab/gitlab-ee:latest'
  restart: always
  hostname: 'localhost'
  shm_size: '2gb'

위와 같이 shm_size사이즈를 지정해준다.


3.2. 프로메테우스 disable

Root계정으로 접속한 후, Admin Area > Settings > Metrics and profiling > Metrics - Prometheus에서 아래와 같이 프로메테우스 메트릭을 비활성화한다.

비활성화 후에는 gitlab을 재기동시켜주어야 한다.

grafana.png