Anomalie #2494
Instabilités carte réseau principale de maine
Début:
14/06/2017
Echéance:
% réalisé:
0%
Temps estimé:
Description
J'ai lancé quelques backup ce soir, et maine a arrêté de répondre au ping…
Après un reset hardware, j'ai pu voir ça dans les logs :
un 14 21:17:17 maine kernel: [27713.793425] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang: Jun 14 21:17:17 maine kernel: [27713.793425] TDH <64> Jun 14 21:17:17 maine kernel: [27713.793425] TDT <71> Jun 14 21:17:17 maine kernel: [27713.793425] next_to_use <71> Jun 14 21:17:17 maine kernel: [27713.793425] next_to_clean <5f> Jun 14 21:17:17 maine kernel: [27713.793425] buffer_info[next_to_clean]: Jun 14 21:17:17 maine kernel: [27713.793425] time_stamp <100689089> Jun 14 21:17:17 maine kernel: [27713.793425] next_to_watch <64> Jun 14 21:17:17 maine kernel: [27713.793425] jiffies <100689350> Jun 14 21:17:17 maine kernel: [27713.793425] next_to_watch.status <0> Jun 14 21:17:17 maine kernel: [27713.793425] MAC Status <80083> Jun 14 21:17:17 maine kernel: [27713.793425] PHY Status <796d> Jun 14 21:17:17 maine kernel: [27713.793425] PHY 1000BASE-T Status <3800> Jun 14 21:17:17 maine kernel: [27713.793425] PHY Extended Status <3000> Jun 14 21:17:17 maine kernel: [27713.793425] PCI Status <10> Jun 14 21:17:20 maine kernel: [27716.032971] ------------[ cut here ]------------ Jun 14 21:17:20 maine kernel: [27716.032998] WARNING: CPU: 0 PID: 0 at /build/linux-cKfWAz/linux-4.9.30/net/sched/sch_generic.c:316 dev_watchdog+0x22d/0x230 Jun 14 21:17:20 maine kernel: [27716.033002] NETDEV WATCHDOG: enp0s31f6 (e1000e): transmit queue 0 timed out Jun 14 21:17:20 maine kernel: [27716.033006] Modules linked in: vhost_net vhost macvtap macvlan ebtable_nat drbd ip6t_REJECT nf_reject_ipv6 xt_comment xt_nat xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables xt_multiport iptable_filter intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm hci_uart btbcm btqca irqbypass btintel bluetooth evdev intel_lpss_acpi intel_lpss mfd_core mei_me mei iTCO_wdt serio_raw crct10dif_pclmul crc32_pclmul rfkill i915 ghash_clmulni_intel iTCO_vendor_support drm_kms_helper drm sg shpchp i2c_algo_bit video acpi_als kfifo_buf pcspkr industrialio wmi button acpi_pad lru_cache ip_tables Jun 14 21:17:20 maine kernel: [27716.033097] x_tables autofs4 ext4 crc16 jbd2 fscrypto ecb mbcache raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid0 multipath linear dm_mod raid1 md_mod sd_mod crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd ahci libahci xhci_pci psmouse libata xhci_hcd i2c_i801 i2c_smbus scsi_mod e1000e usbcore usb_common ptp pps_core i2c_hid hid [last unloaded: drbd] Jun 14 21:17:20 maine kernel: [27716.033158] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-3-amd64 #1 Debian 4.9.30-1 Jun 14 21:17:20 maine kernel: [27716.033161] Hardware name: FUJITSU /D3401-H2, BIOS V5.0.0.12 R1.5.0 for D3401-H2x 02/27/2017 Jun 14 21:17:20 maine kernel: [27716.033165] 0000000000000000 ffffffff98f28634 ffff930aee403e30 0000000000000000 Jun 14 21:17:20 maine kernel: [27716.033171] ffffffff98c76eae 0000000000000000 ffff930aee403e88 ffff930ac4b34000 Jun 14 21:17:20 maine kernel: [27716.033177] 0000000000000000 ffff930ac5266e80 0000000000000001 ffffffff98c76f2f Jun 14 21:17:20 maine kernel: [27716.033183] Call Trace: Jun 14 21:17:20 maine kernel: [27716.033186] <IRQ> Jun 14 21:17:20 maine kernel: [27716.033193] [<ffffffff98f28634>] ? dump_stack+0x5c/0x78 Jun 14 21:17:20 maine kernel: [27716.033200] [<ffffffff98c76eae>] ? __warn+0xbe/0xe0 Jun 14 21:17:20 maine kernel: [27716.033206] [<ffffffff98c76f2f>] ? warn_slowpath_fmt+0x5f/0x80 Jun 14 21:17:20 maine kernel: [27716.033213] [<ffffffff9912998d>] ? dev_watchdog+0x22d/0x230 Jun 14 21:17:20 maine kernel: [27716.033217] [<ffffffff99129760>] ? qdisc_rcu_free+0x40/0x40 Jun 14 21:17:20 maine kernel: [27716.033224] [<ffffffff98ce3e80>] ? call_timer_fn+0x30/0x110 Jun 14 21:17:20 maine kernel: [27716.033234] [<ffffffff98ce43be>] ? run_timer_softirq+0x1ce/0x420 Jun 14 21:17:20 maine kernel: [27716.033240] [<ffffffff98cb3953>] ? rebalance_domains+0x253/0x2b0 Jun 14 21:17:20 maine kernel: [27716.033246] [<ffffffff99207d55>] ? __do_softirq+0x105/0x290 Jun 14 21:17:20 maine kernel: [27716.033251] [<ffffffff98c7cf5e>] ? irq_exit+0xae/0xb0 Jun 14 21:17:20 maine kernel: [27716.033254] [<ffffffff99207662>] ? reschedule_interrupt+0x82/0x90 Jun 14 21:17:20 maine kernel: [27716.033255] <EOI> Jun 14 21:17:20 maine kernel: [27716.033263] [<ffffffff990ca3aa>] ? cpuidle_enter_state+0x11a/0x2b0 Jun 14 21:17:20 maine kernel: [27716.033268] [<ffffffff98cb94f4>] ? cpu_startup_entry+0x154/0x240 Jun 14 21:17:20 maine kernel: [27716.033274] [<ffffffff99938f57>] ? start_kernel+0x443/0x463 Jun 14 21:17:20 maine kernel: [27716.033278] [<ffffffff99938120>] ? early_idt_handler_array+0x120/0x120 Jun 14 21:17:20 maine kernel: [27716.033282] [<ffffffff99938408>] ? x86_64_start_kernel+0x14c/0x170 Jun 14 21:17:20 maine kernel: [27716.033287] ---[ end trace ca3d4362d9132a0d ]--- Jun 14 21:17:20 maine kernel: [27716.033317] e1000e 0000:00:1f.6 enp0s31f6: Reset adapter unexpectedly Jun 14 21:17:24 maine kernel: [27720.467706] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx Jun 14 21:17:32 maine kernel: [27728.801335] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang: Jun 14 21:17:32 maine kernel: [27728.801335] TDH <0> Jun 14 21:17:32 maine kernel: [27728.801335] TDT <4> Jun 14 21:17:32 maine kernel: [27728.801335] next_to_use <4> Jun 14 21:17:32 maine kernel: [27728.801335] next_to_clean <0> Jun 14 21:17:32 maine kernel: [27728.801335] buffer_info[next_to_clean]: Jun 14 21:17:32 maine kernel: [27728.801335] time_stamp <100689f80> Jun 14 21:17:32 maine kernel: [27728.801335] next_to_watch <1> Jun 14 21:17:32 maine kernel: [27728.801335] jiffies <10068a1f8> Jun 14 21:17:32 maine kernel: [27728.801335] next_to_watch.status <0> Jun 14 21:17:32 maine kernel: [27728.801335] MAC Status <80083> Jun 14 21:17:32 maine kernel: [27728.801335] PHY Status <796d> Jun 14 21:17:32 maine kernel: [27728.801335] PHY 1000BASE-T Status <7800> Jun 14 21:17:32 maine kernel: [27728.801335] PHY Extended Status <3000> Jun 14 21:17:32 maine kernel: [27728.801335] PCI Status <10> Jun 14 21:17:34 maine kernel: [27730.785311] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang: Jun 14 21:17:34 maine kernel: [27730.785311] TDH <0> Jun 14 21:17:34 maine kernel: [27730.785311] TDT <4> Jun 14 21:17:34 maine kernel: [27730.785311] next_to_use <4> Jun 14 21:17:34 maine kernel: [27730.785311] next_to_clean <0> Jun 14 21:17:34 maine kernel: [27730.785311] buffer_info[next_to_clean]: Jun 14 21:17:34 maine kernel: [27730.785311] time_stamp <100689f80> Jun 14 21:17:34 maine kernel: [27730.785311] next_to_watch <1> Jun 14 21:17:34 maine kernel: [27730.785311] jiffies <10068a3e8> Jun 14 21:17:34 maine kernel: [27730.785311] next_to_watch.status <0> Jun 14 21:17:34 maine kernel: [27730.785311] MAC Status <80083> Jun 14 21:17:34 maine kernel: [27730.785311] PHY Status <796d> Jun 14 21:17:34 maine kernel: [27730.785311] PHY 1000BASE-T Status <7800> Jun 14 21:17:34 maine kernel: [27730.785311] PHY Extended Status <3000> Jun 14 21:17:34 maine kernel: [27730.785311] PCI Status <10> Jun 14 21:17:36 maine kernel: [27732.801299] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang: Jun 14 21:17:36 maine kernel: [27732.801299] TDH <0> Jun 14 21:17:36 maine kernel: [27732.801299] TDT <4> Jun 14 21:17:36 maine kernel: [27732.801299] next_to_use <4> Jun 14 21:17:36 maine kernel: [27732.801299] next_to_clean <0> Jun 14 21:17:36 maine kernel: [27732.801299] buffer_info[next_to_clean]: Jun 14 21:17:36 maine kernel: [27732.801299] time_stamp <100689f80> Jun 14 21:17:36 maine kernel: [27732.801299] next_to_watch <1> Jun 14 21:17:36 maine kernel: [27732.801299] jiffies <10068a5e0> Jun 14 21:17:36 maine kernel: [27732.801299] next_to_watch.status <0> Jun 14 21:17:36 maine kernel: [27732.801299] MAC Status <80083> Jun 14 21:17:36 maine kernel: [27732.801299] PHY Status <796d> Jun 14 21:17:36 maine kernel: [27732.801299] PHY 1000BASE-T Status <7800> Jun 14 21:17:36 maine kernel: [27732.801299] PHY Extended Status <3000> Jun 14 21:17:36 maine kernel: [27732.801299] PCI Status <10> (…) block répété indéfiniment
Il semblerait qu'il y ai un soucis avec la carte réseau et/ou son driver…
https://serverfault.com/questions/616485/e1000e-reset-adapter-unexpectedly-detected-hardware-unit-hang
Ici ils conseillent de désactiver l'option TSO (j'ignore ce qu'elle fait), je tente ça…
Historique
Mis à jour par Quentin Gibeaux il y a presque 7 ans
- Statut changé de Nouveau à Résolu
Il n'y a pas eu de problème à nouveau, depuis que les TSO ont été désactivé.
ethtool -K enp0s31f6 gso off gro off tso off