Projet

Général

Profil

Anomalie #2494

Instabilités carte réseau principale de maine

Ajouté par Quentin Gibeaux il y a presque 7 ans. Mis à jour il y a plus de 4 ans.

Statut:
Fermé
Priorité:
Élevée
Assigné à:
Catégorie:
-
Version cible:
-
Début:
14/06/2017
Echéance:
% réalisé:

0%

Temps estimé:

Description

J'ai lancé quelques backup ce soir, et maine a arrêté de répondre au ping…
Après un reset hardware, j'ai pu voir ça dans les logs :

un 14 21:17:17 maine kernel: [27713.793425] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
Jun 14 21:17:17 maine kernel: [27713.793425]   TDH                  <64>
Jun 14 21:17:17 maine kernel: [27713.793425]   TDT                  <71>
Jun 14 21:17:17 maine kernel: [27713.793425]   next_to_use          <71>
Jun 14 21:17:17 maine kernel: [27713.793425]   next_to_clean        <5f>
Jun 14 21:17:17 maine kernel: [27713.793425] buffer_info[next_to_clean]:
Jun 14 21:17:17 maine kernel: [27713.793425]   time_stamp           <100689089>
Jun 14 21:17:17 maine kernel: [27713.793425]   next_to_watch        <64>
Jun 14 21:17:17 maine kernel: [27713.793425]   jiffies              <100689350>
Jun 14 21:17:17 maine kernel: [27713.793425]   next_to_watch.status <0>
Jun 14 21:17:17 maine kernel: [27713.793425] MAC Status             <80083>
Jun 14 21:17:17 maine kernel: [27713.793425] PHY Status             <796d>
Jun 14 21:17:17 maine kernel: [27713.793425] PHY 1000BASE-T Status  <3800>
Jun 14 21:17:17 maine kernel: [27713.793425] PHY Extended Status    <3000>
Jun 14 21:17:17 maine kernel: [27713.793425] PCI Status             <10>
Jun 14 21:17:20 maine kernel: [27716.032971] ------------[ cut here ]------------
Jun 14 21:17:20 maine kernel: [27716.032998] WARNING: CPU: 0 PID: 0 at /build/linux-cKfWAz/linux-4.9.30/net/sched/sch_generic.c:316 dev_watchdog+0x22d/0x230
Jun 14 21:17:20 maine kernel: [27716.033002] NETDEV WATCHDOG: enp0s31f6 (e1000e): transmit queue 0 timed out
Jun 14 21:17:20 maine kernel: [27716.033006] Modules linked in: vhost_net vhost macvtap macvlan ebtable_nat drbd ip6t_REJECT nf_reject_ipv6 xt_comment xt_nat xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables xt_multiport iptable_filter intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm hci_uart btbcm btqca irqbypass btintel bluetooth evdev intel_lpss_acpi intel_lpss mfd_core mei_me mei iTCO_wdt serio_raw crct10dif_pclmul crc32_pclmul rfkill i915 ghash_clmulni_intel iTCO_vendor_support drm_kms_helper drm sg shpchp i2c_algo_bit video acpi_als kfifo_buf pcspkr industrialio wmi button acpi_pad lru_cache ip_tables
Jun 14 21:17:20 maine kernel: [27716.033097]  x_tables autofs4 ext4 crc16 jbd2 fscrypto ecb mbcache raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid0 multipath linear dm_mod raid1 md_mod sd_mod crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd ahci libahci xhci_pci psmouse libata xhci_hcd i2c_i801 i2c_smbus scsi_mod e1000e usbcore usb_common ptp pps_core i2c_hid hid [last unloaded: drbd]
Jun 14 21:17:20 maine kernel: [27716.033158] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-3-amd64 #1 Debian 4.9.30-1
Jun 14 21:17:20 maine kernel: [27716.033161] Hardware name: FUJITSU  /D3401-H2, BIOS V5.0.0.12 R1.5.0 for D3401-H2x                     02/27/2017
Jun 14 21:17:20 maine kernel: [27716.033165]  0000000000000000 ffffffff98f28634 ffff930aee403e30 0000000000000000
Jun 14 21:17:20 maine kernel: [27716.033171]  ffffffff98c76eae 0000000000000000 ffff930aee403e88 ffff930ac4b34000
Jun 14 21:17:20 maine kernel: [27716.033177]  0000000000000000 ffff930ac5266e80 0000000000000001 ffffffff98c76f2f
Jun 14 21:17:20 maine kernel: [27716.033183] Call Trace:
Jun 14 21:17:20 maine kernel: [27716.033186]  <IRQ> 
Jun 14 21:17:20 maine kernel: [27716.033193]  [<ffffffff98f28634>] ? dump_stack+0x5c/0x78
Jun 14 21:17:20 maine kernel: [27716.033200]  [<ffffffff98c76eae>] ? __warn+0xbe/0xe0
Jun 14 21:17:20 maine kernel: [27716.033206]  [<ffffffff98c76f2f>] ? warn_slowpath_fmt+0x5f/0x80
Jun 14 21:17:20 maine kernel: [27716.033213]  [<ffffffff9912998d>] ? dev_watchdog+0x22d/0x230
Jun 14 21:17:20 maine kernel: [27716.033217]  [<ffffffff99129760>] ? qdisc_rcu_free+0x40/0x40
Jun 14 21:17:20 maine kernel: [27716.033224]  [<ffffffff98ce3e80>] ? call_timer_fn+0x30/0x110
Jun 14 21:17:20 maine kernel: [27716.033234]  [<ffffffff98ce43be>] ? run_timer_softirq+0x1ce/0x420
Jun 14 21:17:20 maine kernel: [27716.033240]  [<ffffffff98cb3953>] ? rebalance_domains+0x253/0x2b0
Jun 14 21:17:20 maine kernel: [27716.033246]  [<ffffffff99207d55>] ? __do_softirq+0x105/0x290
Jun 14 21:17:20 maine kernel: [27716.033251]  [<ffffffff98c7cf5e>] ? irq_exit+0xae/0xb0
Jun 14 21:17:20 maine kernel: [27716.033254]  [<ffffffff99207662>] ? reschedule_interrupt+0x82/0x90
Jun 14 21:17:20 maine kernel: [27716.033255]  <EOI> 
Jun 14 21:17:20 maine kernel: [27716.033263]  [<ffffffff990ca3aa>] ? cpuidle_enter_state+0x11a/0x2b0
Jun 14 21:17:20 maine kernel: [27716.033268]  [<ffffffff98cb94f4>] ? cpu_startup_entry+0x154/0x240
Jun 14 21:17:20 maine kernel: [27716.033274]  [<ffffffff99938f57>] ? start_kernel+0x443/0x463
Jun 14 21:17:20 maine kernel: [27716.033278]  [<ffffffff99938120>] ? early_idt_handler_array+0x120/0x120
Jun 14 21:17:20 maine kernel: [27716.033282]  [<ffffffff99938408>] ? x86_64_start_kernel+0x14c/0x170
Jun 14 21:17:20 maine kernel: [27716.033287] ---[ end trace ca3d4362d9132a0d ]---
Jun 14 21:17:20 maine kernel: [27716.033317] e1000e 0000:00:1f.6 enp0s31f6: Reset adapter unexpectedly
Jun 14 21:17:24 maine kernel: [27720.467706] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jun 14 21:17:32 maine kernel: [27728.801335] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
Jun 14 21:17:32 maine kernel: [27728.801335]   TDH                  <0>
Jun 14 21:17:32 maine kernel: [27728.801335]   TDT                  <4>
Jun 14 21:17:32 maine kernel: [27728.801335]   next_to_use          <4>
Jun 14 21:17:32 maine kernel: [27728.801335]   next_to_clean        <0>
Jun 14 21:17:32 maine kernel: [27728.801335] buffer_info[next_to_clean]:
Jun 14 21:17:32 maine kernel: [27728.801335]   time_stamp           <100689f80>
Jun 14 21:17:32 maine kernel: [27728.801335]   next_to_watch        <1>
Jun 14 21:17:32 maine kernel: [27728.801335]   jiffies              <10068a1f8>
Jun 14 21:17:32 maine kernel: [27728.801335]   next_to_watch.status <0>
Jun 14 21:17:32 maine kernel: [27728.801335] MAC Status             <80083>
Jun 14 21:17:32 maine kernel: [27728.801335] PHY Status             <796d>
Jun 14 21:17:32 maine kernel: [27728.801335] PHY 1000BASE-T Status  <7800>
Jun 14 21:17:32 maine kernel: [27728.801335] PHY Extended Status    <3000>
Jun 14 21:17:32 maine kernel: [27728.801335] PCI Status             <10>
Jun 14 21:17:34 maine kernel: [27730.785311] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
Jun 14 21:17:34 maine kernel: [27730.785311]   TDH                  <0>
Jun 14 21:17:34 maine kernel: [27730.785311]   TDT                  <4>
Jun 14 21:17:34 maine kernel: [27730.785311]   next_to_use          <4>
Jun 14 21:17:34 maine kernel: [27730.785311]   next_to_clean        <0>
Jun 14 21:17:34 maine kernel: [27730.785311] buffer_info[next_to_clean]:
Jun 14 21:17:34 maine kernel: [27730.785311]   time_stamp           <100689f80>
Jun 14 21:17:34 maine kernel: [27730.785311]   next_to_watch        <1>
Jun 14 21:17:34 maine kernel: [27730.785311]   jiffies              <10068a3e8>
Jun 14 21:17:34 maine kernel: [27730.785311]   next_to_watch.status <0>
Jun 14 21:17:34 maine kernel: [27730.785311] MAC Status             <80083>
Jun 14 21:17:34 maine kernel: [27730.785311] PHY Status             <796d>
Jun 14 21:17:34 maine kernel: [27730.785311] PHY 1000BASE-T Status  <7800>
Jun 14 21:17:34 maine kernel: [27730.785311] PHY Extended Status    <3000>
Jun 14 21:17:34 maine kernel: [27730.785311] PCI Status             <10>
Jun 14 21:17:36 maine kernel: [27732.801299] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
Jun 14 21:17:36 maine kernel: [27732.801299]   TDH                  <0>
Jun 14 21:17:36 maine kernel: [27732.801299]   TDT                  <4>
Jun 14 21:17:36 maine kernel: [27732.801299]   next_to_use          <4>
Jun 14 21:17:36 maine kernel: [27732.801299]   next_to_clean        <0>
Jun 14 21:17:36 maine kernel: [27732.801299] buffer_info[next_to_clean]:
Jun 14 21:17:36 maine kernel: [27732.801299]   time_stamp           <100689f80>
Jun 14 21:17:36 maine kernel: [27732.801299]   next_to_watch        <1>
Jun 14 21:17:36 maine kernel: [27732.801299]   jiffies              <10068a5e0>
Jun 14 21:17:36 maine kernel: [27732.801299]   next_to_watch.status <0>
Jun 14 21:17:36 maine kernel: [27732.801299] MAC Status             <80083>
Jun 14 21:17:36 maine kernel: [27732.801299] PHY Status             <796d>
Jun 14 21:17:36 maine kernel: [27732.801299] PHY 1000BASE-T Status  <7800>
Jun 14 21:17:36 maine kernel: [27732.801299] PHY Extended Status    <3000>
Jun 14 21:17:36 maine kernel: [27732.801299] PCI Status             <10>
(…) block répété indéfiniment

Il semblerait qu'il y ai un soucis avec la carte réseau et/ou son driver…
https://serverfault.com/questions/616485/e1000e-reset-adapter-unexpectedly-detected-hardware-unit-hang
Ici ils conseillent de désactiver l'option TSO (j'ignore ce qu'elle fait), je tente ça…

Historique

#1

Mis à jour par Quentin Gibeaux il y a presque 7 ans

  • Priorité changé de Normale à Élevée
#2

Mis à jour par Quentin Gibeaux il y a presque 7 ans

  • Statut changé de Nouveau à Résolu

Il n'y a pas eu de problème à nouveau, depuis que les TSO ont été désactivé.
ethtool -K enp0s31f6 gso off gro off tso off

#3

Mis à jour par Christian P. Momon il y a plus de 4 ans

  • Assigné à mis à Quentin Gibeaux
#4

Mis à jour par Quentin Gibeaux il y a plus de 4 ans

  • Statut changé de Résolu à Fermé
#5

Mis à jour par Christian P. Momon il y a plus de 4 ans

  • Projet changé de Chapril à Infra Chapril

Formats disponibles : Atom PDF