Project

General

Profile

Anomalie #1179

agir.april.org out of memory

Added by Loïc Dachary almost 9 years ago. Updated over 2 years ago.

Status:
Fermé
Priority:
Immédiate
Assignee:
Category:
Task
Target version:
Start date:
01/31/2013
Due date:
% Done:

100%

Estimated time:
Spent time:
Difficulté:
2 Facile

Description

Un process redmine tourne en boucle et occupe toute la RAM disponible, provoquant des OOM qui impactent le vserver host ( problème déjà identifié dans le passé ) qui a pour conséquence la perte d'interface pour un certain nombre de serveurs. Le cloisonement n'est pas ce que l'on pourrait espérer.

oom

Corrections:

  • Restart les vservers ayant perdu leur IP dans cet ordre : dns, nginx, lamp, mail, spamvir, harmine
  • Ajoute une limite haute a la RAM que peut prendre un processus sur la machine redmine pour que le processus fou qui mange toute la RAM soit tué avant de faire des dégats : ulimit -v 1048576

Files

oom.png (46.7 KB) oom.png oom Loïc Dachary, 01/31/2013 09:31 AM

Related issues

Related to Admins - Anomalie #1176: redmine est a 100% de CPU depuis plusieurs heuresFermé

Actions
Related to Admins - Anomalie #1234: oom killer sur pavotFermé03/14/201303/16/2013

Actions

History

#1

Updated by Loïc Dachary almost 9 years ago

  • File oom.png oom.png added
  • Status changed from Nouveau to En cours de traitement
  • Priority changed from Normale to Immédiate
#2

Updated by Loïc Dachary almost 9 years ago

la directive de passenger qui permet de limiter la mémoire n'existe pas dans la version logiciel libre de ce merveilleux logiciel a deux vitesses http://www.modrails.com/documentation/Users%20guide%20Apache.html#PassengerMemoryLimit

#3

Updated by Loïc Dachary almost 9 years ago

Avant:

root@amphetamine:/proc/4024# cat limits | grep 'Max address'
Max address space         unlimited            unlimited            bytes

root@amphetamine:/etc# git show 
Author: Loic Dachary <loic@dachary.org>
Date:   Thu Jan 31 10:09:24 2013 +0100

    Limit memory to 1GB

diff --git a/default/apache2 b/default/apache2
index ffabf86..5d88e00 100644
--- a/default/apache2
+++ b/default/apache2
@@ -24,3 +24,7 @@ HTCACHECLEAN_PATH=/var/cache/apache2/mod_disk_cache
 ## -n : be nice
 ## -t : remove empty directories
 HTCACHECLEAN_OPTIONS="-n" 
+
+# make sure no process goes above 1GB of VSZ
+# https://agir.april.org/issues/1179
+ulimit -v 1048576

Après:

root@amphetamine:/proc/4024# cat limits | grep 'Max address'
Max address space         1073741824           1073741824           bytes
#4

Updated by Loïc Dachary almost 9 years ago

  • Status changed from En cours de traitement to Résolu
  • % Done changed from 0 to 100
#5

Updated by Loïc Dachary almost 9 years ago

L'alerte nagios oom killer a bien fonctionnée:

root@pavot:~# cat /tmp/nagios_oom_killer 
Jan 31 05:18:50 pavot kernel: [4442301.095555] Out of memory: kill process postgres(5882:#32) score 178969 or a child
Jan 31 05:25:00 pavot kernel: [4442671.138448] Out of memory: kill process postgres(1488:#32) score 178357 or a child
Jan 31 05:26:41 pavot kernel: [4442772.087112] Out of memory: kill process postgres(3846:#32) score 177268 or a child
Jan 31 05:26:52 pavot kernel: [4442782.519378] Out of memory: kill process postgres(3863:#32) score 177011 or a child
Jan 31 05:27:42 pavot kernel: [4442832.994136] Out of memory: kill process postgres(3904:#32) score 176755 or a child
Jan 31 05:27:58 pavot kernel: [4442848.985565] Out of memory: kill process postgres(3923:#32) score 176762 or a child
Jan 31 05:28:14 pavot kernel: [4442864.973732] Out of memory: kill process postgres(3929:#32) score 176757 or a child
Jan 31 05:28:21 pavot kernel: [4442871.645366] Out of memory: kill process postgres(3937:#32) score 176770 or a child
Jan 31 05:28:38 pavot kernel: [4442888.684548] Out of memory: kill process postgres(3945:#32) score 176755 or a child
Jan 31 05:28:48 pavot kernel: [4442899.207214] Out of memory: kill process postgres(3947:#32) score 176434 or a child
Jan 31 05:28:59 pavot kernel: [4442909.622834] Out of memory: kill process postgres(3955:#32) score 176436 or a child
Jan 31 05:29:24 pavot kernel: [4442935.255507] Out of memory: kill process postgres(3961:#32) score 176434 or a child
Jan 31 05:29:37 pavot kernel: [4442947.554322] Out of memory: kill process postgres(3983:#32) score 176473 or a child
Jan 31 05:29:42 pavot kernel: [4442952.296553] Out of memory: kill process postgres(3987:#32) score 176503 or a child
Jan 31 05:29:42 pavot kernel: [4442952.462890] Out of memory: kill process postgres(3991:#32) score 176483 or a child
Jan 31 05:29:42 pavot kernel: [4442952.864119] Out of memory: kill process postgres(3997:#32) score 176506 or a child
Jan 31 05:29:42 pavot kernel: [4442952.975678] Out of memory: kill process postgres(3997:#32) score 176506 or a child
Jan 31 05:31:42 pavot kernel: [4443072.521801] Out of memory: kill process postgres(5184:#32) score 178200 or a child
Jan 31 05:32:13 pavot kernel: [4443104.214183] Out of memory: kill process postgres(5282:#32) score 176919 or a child
Jan 31 05:32:26 pavot kernel: [4443116.711460] Out of memory: kill process postgres(5292:#32) score 176923 or a child
Jan 31 05:32:37 pavot kernel: [4443127.964506] Out of memory: kill process postgres(5315:#32) score 176903 or a child
Jan 31 05:32:58 pavot kernel: [4443148.638188] Out of memory: kill process postgres(5334:#32) score 176864 or a child
Jan 31 05:33:08 pavot kernel: [4443159.185321] Out of memory: kill process postgres(5340:#32) score 176733 or a child
Jan 31 05:33:19 pavot kernel: [4443169.730754] Out of memory: kill process postgres(5352:#32) score 176741 or a child
Jan 31 05:33:29 pavot kernel: [4443180.049845] Out of memory: kill process postgres(5358:#32) score 176726 or a child
Jan 31 05:33:44 pavot kernel: [4443194.658865] Out of memory: kill process postgres(5365:#32) score 176725 or a child
Jan 31 05:33:50 pavot kernel: [4443201.168928] Out of memory: kill process postgres(5367:#32) score 176477 or a child
Jan 31 05:33:54 pavot kernel: [4443205.066457] Out of memory: kill process postgres(5375:#32) score 176469 or a child
Jan 31 05:34:33 pavot kernel: [4443244.183768] Out of memory: kill process postgres(5384:#32) score 176469 or a child
Jan 31 05:34:34 pavot kernel: [4443245.187935] Out of memory: kill process postgres(5399:#32) score 176469 or a child
Jan 31 05:34:35 pavot kernel: [4443245.331541] Out of memory: kill process postgres(5404:#32) score 176469 or a child
Jan 31 05:34:35 pavot kernel: [4443245.557086] Out of memory: kill process postgres(5411:#32) score 176470 or a child
Jan 31 05:34:35 pavot kernel: [4443245.661566] Out of memory: kill process postgres(5411:#32) score 176470 or a child
Jan 31 07:05:36 pavot kernel: [4448707.208880] Out of memory: kill process postgres(32131:#32) score 176480 or a child
Jan 31 07:05:37 pavot kernel: [4448707.610944] Out of memory: kill process postgres(32137:#32) score 176499 or a child
Jan 31 07:05:37 pavot kernel: [4448707.734098] Out of memory: kill process postgres(32137:#32) score 176524 or a child

#6

Updated by Loïc Dachary over 8 years ago

commit 993cec72393b9fc30c2fe3794e5f2f4be111f97a
Author: root <root@april.org>
Date:   Sat Apr 27 06:25:12 2013 +0200

    daily autocommit

diff --git a/default/apache2 b/default/apache2
index 5d88e00..38b8299 100644
--- a/default/apache2
+++ b/default/apache2
@@ -27,4 +27,4 @@ HTCACHECLEAN_OPTIONS="-n" 

 # make sure no process goes above 1GB of VSZ
 # https://agir.april.org/issues/1179
-ulimit -v 1048576
+ulimit -v 524288
#7

Updated by Quentin Gibeaux over 2 years ago

  • Status changed from Résolu to Fermé

Also available in: Atom PDF