Anomalie #4447
ferméCoupures des connexions au service Mumble April
0%
Description
Ce jeudi 30 avril 2020 à 11h, avait lieu une réunion d'adhérents sur le Mumble April, moins d'une dizaine de personnes.
Entre 10h59 et 11h09 (à peu prêt), constat de déconnexions répétées.
La solution a été de migrer vers le Mumble Chapril.
Demande : essayer de voir d'où c'est venu.
Fichiers
Mis à jour par Christian P. Momon il y a plus de 4 ans
Sur Icinga1, section « Event log » (merci PoluX) :
Service Critical[2020-04-30 09:58:40] SERVICE ALERT: drupal6;Check Web april.org: https form action;CRITICAL;SOFT;1;CRITICAL - Socket timeout after 10 seconds Service Ok [2020-04-30 09:59:32] SERVICE ALERT: drupal6;Check Web april.org: https form action;OK;SOFT;2;HTTP OK: Status line output matched "200" - 48922 bytes in 0.149 second response time April 30, 2020 09:00 Service Warning [2020-04-30 10:02:20] SERVICE ALERT: drupal6;LOAD;WARNING;SOFT;1;WARNING - load average: 6.18, 6.85, 3.65 Service Warning [2020-04-30 10:03:21] SERVICE ALERT: drupal6;LOAD;WARNING;SOFT;2;WARNING - load average: 2.27, 5.60, 3.42 Service Ok [2020-04-30 10:04:20] SERVICE ALERT: drupal6;LOAD;OK;SOFT;3;OK - load average: 0.98, 4.62, 3.22 Service Warning [2020-04-30 10:28:10] SERVICE ALERT: calamus;LOAD;WARNING;SOFT;1;WARNING - load average: 5.32, 3.62, 2.87 Service Ok [2020-04-30 10:29:10] SERVICE ALERT: calamus;LOAD;OK;SOFT;2;OK - load average: 4.29, 3.67, 2.94 Service Critical[2020-04-30 10:29:20] SERVICE ALERT: drupal6;LOAD;CRITICAL;SOFT;1;CRITICAL - load average: 11.26, 9.59, 5.03 Service Warning [2020-04-30 10:30:01] SERVICE ALERT: virola;IOWAIT DRBD0;WARNING;SOFT;1;WARNING - iowait stats iowait=19.38 Service Critical[2020-04-30 10:30:20] SERVICE ALERT: bastion;Check Web Secure Redirection wiki.april.org;CRITICAL;SOFT;1;CRITICAL - Socket timeout after 10 seconds Service Critical[2020-04-30 10:30:20] SERVICE ALERT: bastion;Check Web Redirection wiki.april.org;CRITICAL;SOFT;1;CRITICAL - Socket timeout after 10 seconds Service Warning [2020-04-30 10:30:20] SERVICE ALERT: drupal6;LOAD;WARNING;SOFT;2;WARNING - load average: 5.83, 8.30, 4.87 Service Ok [2020-04-30 10:31:01] SERVICE ALERT: virola;IOWAIT DRBD0;OK;SOFT;2;OK - iowait stats iowait=7.09 Service Ok [2020-04-30 10:31:10] SERVICE ALERT: bastion;Check Web Secure Redirection wiki.april.org;OK;SOFT;2;HTTP OK: Status line output matched "301,Location: https://wiki.april.org/w/Accueil" - 446 bytes in 0.075 second response time Service Ok [2020-04-30 10:31:11] SERVICE ALERT: bastion;Check Web Redirection wiki.april.org;OK;SOFT;2;HTTP OK: Status line output matched "301,Location: https://wiki.april.org/w/Accueil" - 446 bytes in 0.079 second response time Service Warning [2020-04-30 10:31:20] SERVICE ALERT: drupal6;LOAD;WARNING;SOFT;3;WARNING - load average: 3.25, 7.12, 4.68 Service Warning [2020-04-30 10:32:21] SERVICE ALERT: drupal6;LOAD;WARNING;HARD;4;WARNING - load average: 2.18, 6.10, 4.47 Service Critical[2020-04-30 10:42:20] SERVICE ALERT: drupal6;LOAD;CRITICAL;HARD;4;CRITICAL - load average: 13.45, 7.67, 5.59 Service Warning [2020-04-30 10:43:31] SERVICE ALERT: calamus;IOWAIT DRBD1;WARNING;SOFT;1;WARNING - iowait stats iowait=20.33 Service Ok [2020-04-30 10:44:32] SERVICE ALERT: calamus;IOWAIT DRBD1;OK;SOFT;2;OK - iowait stats iowait=13.59 Service Critical[2020-04-30 10:46:01] SERVICE ALERT: virola;IOWAIT DRBD0;CRITICAL;SOFT;1;CRITICAL - iowait stats iowait=45.66 Service Ok [2020-04-30 10:47:01] SERVICE ALERT: virola;IOWAIT DRBD0;OK;SOFT;2;OK - iowait stats iowait=7.27 Service Ok [2020-04-30 10:47:20] SERVICE ALERT: drupal6;LOAD;OK;HARD;4;OK - load average: 1.19, 3.88, 4.57 Service Warning [2020-04-30 10:49:32] SERVICE ALERT: calamus;IOWAIT DRBD1;WARNING;SOFT;1;WARNING - iowait stats iowait=21.91 Service Ok [2020-04-30 10:50:31] SERVICE ALERT: calamus;IOWAIT DRBD1;OK;SOFT;2;OK - iowait stats iowait=7.57 Auto-save retention data[2020-04-30 10:51:59] Auto-save of retention data completed successfully. Service Warning [2020-04-30 10:55:31] SERVICE ALERT: calamus;IOWAIT DRBD1;WARNING;SOFT;1;WARNING - iowait stats iowait=16.49 Service Ok [2020-04-30 10:56:32] SERVICE ALERT: calamus;IOWAIT DRBD1;OK;SOFT;2;OK - iowait stats iowait=10.33 Service Critical[2020-04-30 10:57:01] SERVICE ALERT: virola;IOWAIT DRBD0;CRITICAL;SOFT;1;CRITICAL - iowait stats iowait=39.41 Service Ok [2020-04-30 10:58:01] SERVICE ALERT: virola;IOWAIT DRBD0;OK;SOFT;2;OK - iowait stats iowait=12.41 Service Warning [2020-04-30 10:58:51] SERVICE ALERT: virola;LOAD;WARNING;SOFT;1;WARNING - load average: 8.37, 5.66, 4.16 Service Critical[2020-04-30 10:59:11] SERVICE ALERT: calamus;LOAD;CRITICAL;SOFT;1;CRITICAL - load average: 11.29, 5.84, 4.28 Service Critical[2020-04-30 10:59:51] SERVICE ALERT: virola;LOAD;CRITICAL;SOFT;2;CRITICAL - load average: 12.54, 7.53, 4.90 April 30, 2020 10:00 Service Warning [2020-04-30 11:00:12] SERVICE ALERT: calamus;LOAD;WARNING;SOFT;2;WARNING - load average: 9.23, 6.21, 4.50 Service Critical[2020-04-30 11:00:50] SERVICE ALERT: virola;LOAD;CRITICAL;SOFT;3;CRITICAL - load average: 12.68, 8.43, 5.38 Service Warning [2020-04-30 11:01:11] SERVICE ALERT: calamus;LOAD;WARNING;SOFT;3;WARNING - load average: 9.34, 6.85, 4.84 Service Critical[2020-04-30 11:01:31] SERVICE ALERT: calamus;IOWAIT DRBD1;CRITICAL;SOFT;1;CRITICAL - iowait stats iowait=45.59 Service Critical[2020-04-30 11:01:50] SERVICE ALERT: virola;LOAD;CRITICAL;HARD;4;CRITICAL - load average: 11.58, 8.98, 5.77 Service Warning [2020-04-30 11:02:10] SERVICE ALERT: calamus;LOAD;WARNING;HARD;4;WARNING - load average: 7.54, 6.88, 4.97 Service Critical[2020-04-30 11:02:31] SERVICE ALERT: calamus;IOWAIT DRBD1;CRITICAL;SOFT;2;CRITICAL - iowait stats iowait=34.38 Service Critical[2020-04-30 11:03:01] SERVICE ALERT: virola;IOWAIT DRBD0;CRITICAL;SOFT;1;CRITICAL - iowait stats iowait=75.85 Service Critical[2020-04-30 11:03:31] SERVICE ALERT: calamus;IOWAIT DRBD1;CRITICAL;SOFT;3;CRITICAL - iowait stats iowait=39.13 Service Warning [2020-04-30 11:04:01] SERVICE ALERT: mail;Check SRS procs;WARNING;SOFT;1;PROCS WARNING: 19 processes with command name 'postsrsd' Service Ok [2020-04-30 11:04:01] SERVICE ALERT: virola;IOWAIT DRBD0;OK;SOFT;2;OK - iowait stats iowait=6.06 Service Critical[2020-04-30 11:04:32] SERVICE ALERT: calamus;IOWAIT DRBD1;CRITICAL;HARD;4;CRITICAL - iowait stats iowait=46.75 Service Warning [2020-04-30 11:05:00] SERVICE ALERT: mail;Check SRS procs;WARNING;SOFT;2;PROCS WARNING: 19 processes with command name 'postsrsd' Service Critical[2020-04-30 11:05:11] SERVICE ALERT: bastion;Check Secure Web libre-en-fete.net;CRITICAL;SOFT;1;HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/1.1 429 Too Many Requests Service Ok [2020-04-30 11:06:00] SERVICE ALERT: mail;Check SRS procs;OK;SOFT;3;PROCS OK: 8 processes with command name 'postsrsd' Service Critical[2020-04-30 11:06:10] SERVICE ALERT: bastion;Check Secure Web libre-en-fete.org;CRITICAL;SOFT;1;HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/1.1 429 Too Many Requests Service Ok [2020-04-30 11:06:11] SERVICE ALERT: bastion;Check Secure Web libre-en-fete.net;OK;SOFT;2;HTTP OK: Status line output matched "200" - 15330 bytes in 0.067 second response time Service Ok [2020-04-30 11:07:10] SERVICE ALERT: bastion;Check Secure Web libre-en-fete.org;OK;SOFT;2;HTTP OK: Status line output matched "200" - 15330 bytes in 0.122 second response time Service Ok [2020-04-30 11:09:32] SERVICE ALERT: calamus;IOWAIT DRBD1;OK;HARD;4;OK - iowait stats iowait=5.76 Service Warning [2020-04-30 11:11:50] SERVICE ALERT: virola;LOAD;WARNING;HARD;4;WARNING - load average: 2.85, 5.53, 6.23 Service Ok [2020-04-30 11:12:10] SERVICE ALERT: calamus;LOAD;OK;HARD;4;OK - load average: 2.88, 4.51, 4.94 Service Critical[2020-04-30 11:13:22] SERVICE ALERT: calamus;Check if Fail2ban DB is purged;CRITICAL;SOFT;1;CRITICAL : sqlite returned an error Service Ok [2020-04-30 11:14:20] SERVICE ALERT: calamus;Check if Fail2ban DB is purged;OK;SOFT;2;OK : database is purged Service Warning [2020-04-30 11:19:01] SERVICE ALERT: virola;IOWAIT DRBD0;WARNING;SOFT;1;WARNING - iowait stats iowait=16.07 Service Warning [2020-04-30 11:20:01] SERVICE ALERT: virola;IOWAIT DRBD0;WARNING;SOFT;2;WARNING - iowait stats iowait=29.51 Service Warning [2020-04-30 11:21:00] SERVICE ALERT: mail;Check SRS procs;WARNING;SOFT;1;PROCS WARNING: 22 processes with command name 'postsrsd' Service Warning [2020-04-30 11:21:01] SERVICE ALERT: virola;IOWAIT DRBD0;WARNING;SOFT;3;WARNING - iowait stats iowait=18.95 Service Warning [2020-04-30 11:22:01] SERVICE ALERT: mail;Check SRS procs;WARNING;SOFT;2;PROCS WARNING: 22 processes with command name 'postsrsd' Service Ok [2020-04-30 11:22:01] SERVICE ALERT: virola;IOWAIT DRBD0;OK;SOFT;4;OK - iowait stats iowait=13.96 Service Warning [2020-04-30 11:23:01] SERVICE ALERT: mail;Check SRS procs;WARNING;SOFT;3;PROCS WARNING: 22 processes with command name 'postsrsd' Service Warning [2020-04-30 11:24:01] SERVICE ALERT: mail;Check SRS procs;WARNING;HARD;4;PROCS WARNING: 21 processes with command name 'postsrsd' Service Ok [2020-04-30 11:26:51] SERVICE ALERT: virola;LOAD;OK;HARD;4;OK - load average: 3.44, 3.39, 4.39 Service Ok [2020-04-30 11:29:00] SERVICE ALERT: mail;Check SRS procs;OK;HARD;4;PROCS OK: 9 processes with command name 'postsrsd' Service Ok [2020-04-30 11:51:59] Auto-save of retention data completed successfully.
Mis à jour par Christian P. Momon il y a plus de 4 ans
Sur Icinga2 April :
virola.april.org Service: Ping Remote LAN WARNING 10:59:33 [ 1/5 ] PING WARNING - Packet loss = 0%, RTA = 196.49 ms
Mis à jour par Christian P. Momon il y a plus de 4 ans
- Fichier graphique-virola.png graphique-virola.png ajouté
Des graphiques de virola par Scaleway. Le pic de 11h est à comparer au pic de 21h la veille, moment de la réunion adminsys sur le même Mumble.
Mis à jour par Christian P. Momon il y a plus de 4 ans
Ce vendredi 22 mai 2020 à 11h00 pile, une petite dizaine de personnes en réunion sur le Mumble April se sont vus déconnectés, une fois.
Cela rappelle exactement le problème qui a ouvert ce ticket. Même horaire, pas le même jour.
<W>2020-05-22 10:59:58.495 1 => <146:(-1)> New connection: 90.XX.XX.XX:44080 <W>2020-05-22 10:59:59.377 1 => <146:(-1)> Client version 1.3.0 (X11: 1.3.0~git20190125.440b173+dfsg-2) <W>2020-05-22 10:59:59.514 1 => <146:lonugem(-1)> Authenticated <W>2020-05-22 11:00:17.129 1 => <146:lonugem(-1)> Moved lonugem:146(-1) to Terrasse Sud[5:0] <W>2020-05-22 11:00:17.531 1 => <147:(-1)> New connection: 91.XX.XX.XX:27936 <W>2020-05-22 11:00:19.251 1 => <138:isa(-1)> Connection closed: The remote host closed the connection [1] <W>2020-05-22 11:00:21.948 1 => <145:Kévin(-1)> Connection closed: The remote host closed the connection [1] <W>2020-05-22 11:00:22.630 1 => <146:lonugem(-1)> Connection closed: The remote host closed the connection [1] <W>2020-05-22 11:00:23.084 1 => <141:mo(-1)> Connection closed: The remote host closed the connection [1] <W>2020-05-22 11:00:23.878 1 => <139:madix(-1)> Connection closed: The remote host closed the connection [1] <W>2020-05-22 11:00:24.113 1 => <140:Christian(Cpm)(-1)> Connection closed: The remote host closed the connection [1] <W>2020-05-22 11:00:24.598 1 => <147:(-1)> Client version 1.2.19 (X11: 1.2.19-1ubuntu1) <W>2020-05-22 11:00:24.971 1 => <147:Didier(-1)> Authenticated <W>2020-05-22 11:00:26.393 1 => <148:(-1)> New connection: 93.XX.XX.XX:46384 <W>2020-05-22 11:00:27.018 1 => <149:(-1)> New connection: 90.XX.XX.XX:49896 <W>2020-05-22 11:00:27.227 1 => <148:(-1)> Client version 1.3.0 (X11: 1.3.0~git20190125.440b173+dfsg-2) <W>2020-05-22 11:00:27.441 1 => <148:isa(-1)> Authenticated <W>2020-05-22 11:00:27.693 1 => <149:(-1)> Client version 1.3.0 (X11: 1.3.0) <W>2020-05-22 11:00:27.886 1 => <149:Kévin(-1)> Authenticated
Ça serait bien de trouver d'où vient le problème.
Deux fois au même horaire, le problème est-il reproductible ?
Mis à jour par Christian P. Momon il y a plus de 4 ans
Dans bastion:/var/log/syslog, détection d'un incident au même moment :
May 22 10:59:40 bastion icinga2[510]: [2020-05-22 10:59:40 +0200] information/RemoteCheckQueue: items: 0, rate: 0/s (6/min 30/5min 90/15min); May 22 10:59:50 bastion icinga2[510]: [2020-05-22 10:59:50 +0200] information/RemoteCheckQueue: items: 0, rate: 0/s (12/min 60/5min 180/15min); May 22 11:00:00 bastion icinga2[510]: [2020-05-22 11:00:00 +0200] information/RemoteCheckQueue: items: 0, rate: 0/s (6/min 30/5min 90/15min); May 22 11:00:00 bastion kernel: [1027919.996688] warn_alloc: 62 callbacks suppressed May 22 11:00:00 bastion kernel: [1027919.996692] ksoftirqd/0: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC), nodemask=(null) May 22 11:00:00 bastion kernel: [1027919.996693] ksoftirqd/0 cpuset=/ mems_allowed=0 May 22 11:00:00 bastion kernel: [1027919.996698] CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 4.19.0-9-amd64 #1 Debian 4.19.118-2 May 22 11:00:00 bastion kernel: [1027919.996699] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 May 22 11:00:00 bastion kernel: [1027919.996700] Call Trace: May 22 11:00:00 bastion kernel: [1027919.996719] dump_stack+0x66/0x90 May 22 11:00:00 bastion kernel: [1027919.996726] warn_alloc.cold.120+0x6c/0xec May 22 11:00:00 bastion kernel: [1027919.996730] __alloc_pages_slowpath+0xc73/0xcb0 May 22 11:00:00 bastion kernel: [1027919.996733] __alloc_pages_nodemask+0x28b/0x2b0 May 22 11:00:00 bastion kernel: [1027919.996734] page_frag_alloc+0x124/0x150 May 22 11:00:00 bastion kernel: [1027919.996737] __napi_alloc_skb+0x88/0xe0 May 22 11:00:00 bastion kernel: [1027919.996740] ? dev_gro_receive+0x39b/0x660 May 22 11:00:00 bastion kernel: [1027919.996755] page_to_skb+0x62/0x330 [virtio_net] May 22 11:00:00 bastion kernel: [1027919.996758] receive_buf+0x61d/0x14c0 [virtio_net] May 22 11:00:00 bastion kernel: [1027919.996762] ? enqueue_entity+0x108/0x640 May 22 11:00:00 bastion kernel: [1027919.996766] ? vring_unmap_one+0x16/0x70 [virtio_ring] May 22 11:00:00 bastion kernel: [1027919.996768] ? detach_buf+0x69/0x110 [virtio_ring] May 22 11:00:00 bastion kernel: [1027919.996770] virtnet_poll+0x143/0x328 [virtio_net] May 22 11:00:00 bastion kernel: [1027919.996773] net_rx_action+0x149/0x3b0 May 22 11:00:00 bastion kernel: [1027919.996776] __do_softirq+0xde/0x2d8 May 22 11:00:00 bastion kernel: [1027919.996779] ? sort_range+0x20/0x20 May 22 11:00:00 bastion kernel: [1027919.996782] run_ksoftirqd+0x26/0x40 May 22 11:00:00 bastion kernel: [1027919.996785] smpboot_thread_fn+0xc5/0x160 May 22 11:00:00 bastion kernel: [1027919.996788] kthread+0x112/0x130 May 22 11:00:00 bastion kernel: [1027919.996789] ? kthread_bind+0x30/0x30 May 22 11:00:00 bastion kernel: [1027919.996791] ret_from_fork+0x35/0x40 May 22 11:00:00 bastion kernel: [1027919.996795] warn_alloc_show_mem: 9 callbacks suppressed May 22 11:00:00 bastion kernel: [1027919.996796] Mem-Info: May 22 11:00:00 bastion kernel: [1027919.996799] active_anon:3924 inactive_anon:5484 isolated_anon:0 May 22 11:00:00 bastion kernel: [1027919.996799] active_file:11966 inactive_file:10683 isolated_file:0 May 22 11:00:00 bastion kernel: [1027919.996799] unevictable:0 dirty:2253 writeback:0 unstable:0 May 22 11:00:00 bastion kernel: [1027919.996799] slab_reclaimable:10120 slab_unreclaimable:9080 May 22 11:00:00 bastion kernel: [1027919.996799] mapped:7645 shmem:454 pagetables:452 bounce:0 May 22 11:00:00 bastion kernel: [1027919.996799] free:1485 free_pcp:81 free_cma:0 May 22 11:00:00 bastion kernel: [1027919.996802] Node 0 active_anon:15696kB inactive_anon:21936kB active_file:47864kB inactive_file:42732kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:30580kB dirty:9012kB writeback:0kB shmem:1816kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no May 22 11:00:00 bastion kernel: [1027919.996803] Node 0 DMA free:2004kB min:140kB low:172kB high:204kB active_anon:372kB inactive_anon:500kB active_file:2352kB inactive_file:4644kB unevictable:0kB writepending:64kB present:15992kB managed:15908kB mlocked:0kB kernel_stack:256kB pagetables:36kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB May 22 11:00:00 bastion kernel: [1027919.996806] lowmem_reserve[]: 0 184 184 184 184 May 22 11:00:00 bastion kernel: [1027919.996808] Node 0 DMA32 free:3936kB min:1664kB low:2080kB high:2496kB active_anon:15324kB inactive_anon:21436kB active_file:45508kB inactive_file:38088kB unevictable:0kB writepending:8940kB present:245604kB managed:220820kB mlocked:0kB kernel_stack:1680kB pagetables:1772kB bounce:0kB free_pcp:324kB local_pcp:324kB free_cma:0kB May 22 11:00:00 bastion kernel: [1027919.996811] lowmem_reserve[]: 0 0 0 0 0
Cela pourrait être un problème de mémoire insuffisante sur la machine bastion. Ticket à suivre : #4497.
Mis à jour par Christian P. Momon il y a plus de 4 ans
- Statut changé de Nouveau à En cours de traitement
- Assigné à mis à Christian P. Momon
Mis à jour par Christian P. Momon il y a plus de 4 ans
- Statut changé de En cours de traitement à Résolu
La concomitance avec le ticket #4497 est cohérente donc on va dire que c'est ça, un problème de mémoire sur la vm bastion.