Mysteriöser 11.4 Absturz - Maschine läuft, SSH und vor Ort Login unmöglich

martin12345 · 12.11.2011

Hallo Linux Community,

als Wiedereinsteiger habe ich als kleines Projekt einen Fileserver mit Samba aufgesetzt, der gleichzeitig als Apache und MySQL Server herhält. Installation usw. verlief nach einigen kleineren Problemchen okay und der Server lief stabil und verrichtete seinen Dienst brav.

Heute aber traten einige Probleme auf, die mit einem kurzen SSH Ausfall begannen (dauerte nur 1 min), übergingen in Probleme beim Nachladen von XML Daten für PHPsysinfo und schließlich nach SAMBA Hackeleien zum Absturz kamen.
Der Server war dann weder über Port 80 noch über SSH noch über Ping ansprechbar.
Ein Gang in den Keller (bewaffnet mit Monitor und Tastatur) offenbarte mir einen dahin schnurrenden Server, der jedoch nach Anschluss von VGA und USB Kabel weder Bild auf den Monitor noch ein Erleuchten des Numlocks brachte.
Schweren Herzens musste ich ihn also hardresetten und jetzt läuft er seit einer Stunde wieder wie gewohnt.

Könnt ihr mir helfen den Absturz zu analysieren und zukünftige Probleme derart zu vermeiden?

Hier zunächst mal "/var/log/messages" für den entsprechenden Zeitraum:

Nov 12 21:40:28 fileserver dhclient: XMT: Solicit on eth4, interval 118030ms.
Nov 12 21:42:26 fileserver dhclient: XMT: Solicit on eth4, interval 112090ms.
Nov 12 21:44:19 fileserver dhclient: XMT: Solicit on eth4, interval 119420ms.
Nov 12 21:46:18 fileserver dhclient: XMT: Solicit on eth4, interval 119300ms.
Nov 12 21:48:17 fileserver dhclient: XMT: Solicit on eth4, interval 113670ms.
Nov 12 21:50:11 fileserver dhclient: XMT: Solicit on eth4, interval 109920ms.
Nov 12 21:51:23 fileserver nmbd[2317]: [2011/11/12 21:51:23.748679, 0] nmbd/nmbd_browsesync.c:350(find_domain_master_name_query_fail)
Nov 12 21:51:23 fileserver nmbd[2317]: find_domain_master_name_query_fail:
Nov 12 21:51:23 fileserver nmbd[2317]: Unable to find the Domain Master Browser name WORKGROUP<1b> for the workgroup WORKGROUP.
Nov 12 21:51:23 fileserver nmbd[2317]: Unable to sync browse lists in this workgroup.
Nov 12 21:52:01 fileserver dhclient: XMT: Solicit on eth4, interval 123530ms.
Nov 12 21:54:05 fileserver dhclient: XMT: Solicit on eth4, interval 113630ms.
Nov 12 21:55:03 fileserver smartd[4337]: Device: /dev/sda [SAT], SMART Usage Attribute: 199 UDMA_CRC_Error_Count changed from 197 to 18\
6
Nov 12 21:55:03 fileserver smartd[4337]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 125 to 124
Nov 12 21:55:04 fileserver smartd[4337]: Device: /dev/sdd [SAT], 1 Currently unreadable (pending) sectors
Nov 12 21:55:04 fileserver smartd[4337]: Device: /dev/sdd [SAT], 1 Offline uncorrectable sectors
Nov 12 21:55:04 fileserver smartd[4337]: Device: /dev/sdd [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 54 to 5\
3
Nov 12 21:55:04 fileserver smartd[4337]: Device: /dev/sdd [SAT], SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 54 to 5\
3
Nov 12 21:55:58 fileserver kernel: [174739.110435] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Nov 12 21:55:58 fileserver kernel: [174739.110445] ata5.00: ST-ATA: DRQ=0 without device error, dev_stat 0x0
Nov 12 21:55:58 fileserver kernel: [174739.110455] ata5.00: failed command: SMART
Nov 12 21:55:58 fileserver kernel: [174739.110472] ata5.00: cmd b0/d1:01:00:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in
Nov 12 21:55:58 fileserver kernel: [174739.110476] res d0/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x202 (HSM violation)
Nov 12 21:55:58 fileserver kernel: [174739.110485] ata5.00: status: { Busy }
Nov 12 21:55:58 fileserver kernel: [174739.110502] ata5: hard resetting link
Nov 12 21:55:59 fileserver dhclient: XMT: Solicit on eth4, interval 110530ms.
Nov 12 21:55:59 fileserver kernel: [174739.567182] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Nov 12 21:55:59 fileserver kernel: [174739.574623] ata5.00: configured for UDMA/133
Nov 12 21:55:59 fileserver kernel: [174739.574685] ata5: EH complete
Nov 12 21:57:49 fileserver dhclient: XMT: Solicit on eth4, interval 118560ms.
Nov 12 21:59:48 fileserver dhclient: XMT: Solicit on eth4, interval 127720ms.
Nov 12 22:01:56 fileserver dhclient: XMT: Solicit on eth4, interval 124340ms.
Nov 12 22:04:00 fileserver dhclient: XMT: Solicit on eth4, interval 124160ms.
Nov 12 22:06:04 fileserver dhclient: XMT: Solicit on eth4, interval 130760ms.
Nov 12 22:
Nov 12 22:06:23 fileserver nmbd[2317]: find_domain_master_name_query_fail:
Nov 12 22:06:23 fileserver nmbd[2317]: Unable to find the Domain Master Browser name WORKGROUP<1b> for the workgroup WORKGROUP.
Nov 12 22:06:23 fileserver nmbd[2317]: Unable to sync browse lists in this workgroup.
Nov 12 22:08:15 fileserver dhclient: XMT: Solicit on eth4, interval 131780ms.
Nov 12 22:10:27 fileserver dhclient: XMT: Solicit on eth4, interval 122380ms.
Nov 12 22:12:30 fileserver dhclient: XMT: Solicit on eth4, interval 121530ms.
Nov 12 22:14:31 fileserver dhclient: XMT: Solicit on eth4, interval 108590ms.
Nov 12 22:16:20 fileserver dhclient: XMT: Solicit on eth4, interval 120860ms.
Nov 12 22:18:21 fileserver dhclient: XMT: Solicit on eth4, interval 120720ms.
Nov 12 22:20:22 fileserver dhclient: XMT: Solicit on eth4, interval 123670ms.
Nov 12 22:21:23 fileserver nmbd[2317]: [2011/11/12 22:21:23.702703, 0] nmbd/nmbd_browsesync.c:350(find_domain_master_name_query_fail)
Nov 12 22:21:23 fileserver nmbd[2317]: find_domain_master_name_query_fail:
Nov 12 22:21:23 fileserver nmbd[2317]: Unable to find the Domain Master Browser name WORKGROUP<1b> for the workgroup WORKGROUP.
Nov 12 22:21:23 fileserver nmbd[2317]: Unable to sync browse lists in this workgroup.
Nov 12 22:22:25 fileserver dhclient: XMT: Solicit on eth4, interval 129220ms.
Nov 12 22:22:36 fileserver ntfs-3g[572]: ntfs_mst_post_read_fixup: magic: 0x00000002 size: 1024 usa_ofs: 0 usa_count: 65535: Invalid\
argument
06:23 fileserver nmbd[2317]: [2011/11/12 22:06:23.701264, 0] nmbd/nmbd_browsesync.c:350(find_domain_master_name_query_fail)

Nov 12 22:22:36 fileserver ntfs-3g[572]: ntfs_mst_post_read_fixup: magic: 0x00000002 size: 1024 usa_ofs: 0 usa_count: 65535: Invalid\
argument
Nov 12 22:24:35 fileserver dhclient: XMT: Solicit on eth4, interval 113030ms.
Nov 12 22:25:03 fileserver smartd[4337]: Device: /dev/sda [SAT], SMART Prefailure Attribute: 8 Seek_Time_Performance changed from 243 t\
o 244
Nov 12 22:25:03 fileserver smartd[4337]: Device: /dev/sda [SAT], SMART Usage Attribute: 199 UDMA_CRC_Error_Count changed from 186 to 19\
9
Nov 12 22:25:03 fileserver smartd[4337]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 124 to 126
Reboot
Nov 12 22:46:40 fileserver kernel: imklog 5.6.3, log source = /proc/kmsg started.
Nov 12 22:46:40 fileserver rsyslogd: [origin software="rsyslogd" swVersion="5.6.3" x-pid="882" x-info="http://www.rsyslog.com"] start
Nov 12 22:46:40 fileserver kernel: [ 45.011104] powernow: This module only works with AMD K7 CPUs
Nov 12 22:46:40 fileserver rc.cpufreq: CPU frequency scaling is not supported by your processor.
Nov 12 22:46:40 fileserver rc.cpufreq: boot with 'CPUFREQ=no' in to avoid this warning.
Nov 12 22:46:40 fileserver kernel: [ 45.124521] ip6_tables: (C) 2000-2006 Netfilter Core Team
Nov 12 22:46:40 fileserver kernel: [ 45.182795] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)

Hier noch ein Paar weitere Angaben:

Code:

Hostname localhost resolves to 2 IPs. Only scanned 127.0.0.1
Not shown: 991 closed ports
PORT     STATE SERVICE
22/tcp   open  ssh
25/tcp   open  smtp
80/tcp   open  http
111/tcp  open  rpcbind
139/tcp  open  netbios-ssn
443/tcp  open  https
445/tcp  open  microsoft-ds
631/tcp  open  ipp
3306/tcp open  mysql

Nmap done: 1 IP address (1 host up) scanned in 0.27 seconds

Code:

cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 2
model name      : Intel(R) Xeon(TM) CPU 2.40GHz
stepping        : 5
cpu MHz         : 2392.226
cache size      : 512 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
apicid          : 0
initial apicid  : 0
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pebs bts cid xtpr
bogomips        : 4784.45
clflush size    : 64
cache_alignment : 128
address sizes   : 36 bits physical, 32 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 15
model           : 2
model name      : Intel(R) Xeon(TM) CPU 2.40GHz
stepping        : 5
cpu MHz         : 2392.226
cache size      : 512 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
apicid          : 1
initial apicid  : 1
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pebs bts cid xtpr
bogomips        : 4783.72
clflush size    : 64
cache_alignment : 128
address sizes   : 36 bits physical, 32 bits virtual
power management:

bitmuncher · 13.11.2011

Die messages ist in diesem Fall eigentlich weniger interessant. Wie sieht denn die syslog aus und was sagte die LoadAVG des Servers zum Zeitpunkt des Absturzes? Ist ACPI aktiviert? Wie sieht der SMART-Status der Platten aus?

Lord_x · 13.11.2011

bitmuncher schrieb:
Die messages ist in diesem Fall eigentlich weniger interessant.... Wie sieht der SMART-Status der Platten aus?

Das steht doch oben im Logfile?!

Code:

Nov 12 21:55:03 fileserver smartd[4337]: Device: /dev/sda [SAT], SMART Usage Attribute: 199 UDMA_CRC_Error_Count changed from 197 to 186
Nov 12 21:55:03 fileserver smartd[4337]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 125 to 124
Nov 12 21:55:04 fileserver smartd[4337]: Device: /dev/sdd [SAT], 1 Currently unreadable (pending) sectors
Nov 12 21:55:04 fileserver smartd[4337]: Device: /dev/sdd [SAT], 1 Offline uncorrectable sectors
Nov 12 21:55:04 fileserver smartd[4337]: Device: /dev/sdd [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 54 to 53
Nov 12 21:55:04 fileserver smartd[4337]: Device: /dev/sdd [SAT], SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 54 to 53
Nov 12 21:55:58 fileserver kernel: [174739.110435] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Nov 12 21:55:58 fileserver kernel: [174739.110445] ata5.00: ST-ATA: DRQ=0 without device error, dev_stat 0x0
Nov 12 21:55:58 fileserver kernel: [174739.110455] ata5.00: failed command: SMART
Nov 12 21:55:58 fileserver kernel: [174739.110472] ata5.00: cmd b0/d1:01:00:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in
Nov 12 21:55:58 fileserver kernel: [174739.110476] res d0/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x202 (HSM violation)

Ich würde mal alle HD's testen:
Schnelltest starten:

Code:

sudo smartctl -H /dev/sda

und das Ergebnis nach ca. 3-4 Minuten anzeigen:

Code:

sudo smartctl -a /dev/sda

martin12345 · 13.11.2011

smartctl -H liefert bei allen Platten "PASSED"

smartctl -a liefert folgende Angaben (etwas lang, daher auf pastebin): http://pastebin.com/iVCTHyup

Ins Bios komme ich gerade nicht (läuft ja wieder), dmesg liefert folgende Angaben zu ACPI:

Code:

[    0.000000]  BIOS-e820: 00000000f7ffc000 - 00000000f7fff000 (ACPI data)
[    0.000000]  BIOS-e820: 00000000f7fff000 - 00000000f8000000 (ACPI NVS)
[    0.000000] DMI: System Manufacturer System Name/PU-DLS  , BIOS ASUS PU-DLS/533 ACPI BIOS Revision 1006 08/25/2003
[    0.000000] ACPI: RSDP 000f5a70 00014 (v00 ASUS  )
[    0.000000] ACPI: RSDT f7ffc000 00034 (v01 ASUS   PU-DLS   42302E31 MSFT 31313031)
[    0.000000] ACPI: FACP f7ffc145 00074 (v01 ASUS   PU-DLS   42302E31 MSFT 31313031)
[    0.000000] ACPI: DSDT f7ffc1b9 025B6 (v01   ASUS PU-DLS   00001000 MSFT 0100000B)
[    0.000000] ACPI: FACS f7fff000 00040
[    0.000000] ACPI: BOOT f7ffc034 00028 (v01 ASUS   PU-DLS   42302E31 MSFT 31313031)
[    0.000000] ACPI: SPCR f7ffc05c 0004D (v01 ASUS   PU-DLS   42302E31 MSFT 31313031)
[    0.000000] ACPI: APIC f7ffc0a9 00080 (v01 ASUS   PU-DLS   42302E31 MSFT 31313031)
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] could not find any ACPI SRAT memory areas.
[    0.000000] ACPI: PM-Timer IO Port: 0xe408
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
[    0.000000] ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0])
[    0.000000] ACPI: IOAPIC (id[0x09] address[0xfec80000] gsi_base[24])
[    0.000000] ACPI: IOAPIC (id[0x0a] address[0xfec80400] gsi_base[48])
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl edge)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    0.000000] ACPI: IRQ0 used by override.
[    0.000000] ACPI: IRQ2 used by override.
[    0.000000] ACPI: IRQ9 used by override.
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.002799] ACPI: Core revision 20101013
[    0.100676] ACPI: bus type pci registered
[    0.106782] ACPI: EC: Look up EC in DSDT
[    0.112233] ACPI: Interpreter enabled
[    0.112245] ACPI: (supports S0 S1 S4 S5)
[    0.112294] ACPI: Using IOAPIC for interrupt routing
[    0.117395] ACPI Exception: AE_NOT_FOUND, Evaluating _PRW (20101013/scan-743)
[    0.120810] ACPI Exception: AE_NOT_FOUND, Evaluating _PRW (20101013/scan-743)
[    0.122882] ACPI Exception: AE_NOT_FOUND, Evaluating _PRW (20101013/scan-743)
[    0.125153] ACPI Exception: AE_NOT_FOUND, Evaluating _PRW (20101013/scan-743)
[    0.125578] ACPI Exception: AE_NOT_FOUND, Evaluating _PRW (20101013/scan-743)
[    0.125786] ACPI Exception: AE_NOT_FOUND, Evaluating _PRW (20101013/scan-743)
[    0.127033] ACPI: No dock devices found.
[    0.127042] PCI: Ignoring host bridge windows from ACPI; if necessary, use "pci=use_crs" and report a bug
[    0.127273] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[    0.128487] pci 0000:00:1f.0: quirk: [io  0xe400-0xe47f] claimed by ICH4 ACPI/GPIO/TCO
[    0.130353] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
[    0.130522] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCI1.PCI2._PRT]
[    0.130703] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCI1.PCI3._PRT]
[    0.130959] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCI4._PRT]
[    0.135076] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 9 10 *11 12 14 15)
[    0.135249] ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
[    0.135449] ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 9 *10 11 12 14 15)
[    0.135641] ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 9 10 11 *12 14 15)
[    0.135804] ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
[    0.136008] ACPI: PCI Interrupt Link [LNKF] (IRQs *3 4 5 6 7 9 10 11 12 14 15)
[    0.136210] ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 *5 6 7 9 10 11 12 14 15)
[    0.136403] ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 7 *9 10 11 12 14 15)
[    0.137177] PCI: Using ACPI for IRQ routing
[    0.141367] pnp: PnP ACPI init
[    0.141423] ACPI: bus type pnp registered
[    0.141843] pnp 00:00: Plug and Play ACPI device, IDs PNP0c01 (active)
[    0.142682] pnp 00:01: Plug and Play ACPI device, IDs PNP0a03 (active)
[    0.142970] pnp 00:02: Plug and Play ACPI device, IDs PNP0c02 (active)
[    0.143226] pnp 00:03: Plug and Play ACPI device, IDs PNP0c02 (active)
[    0.146873] pnp 00:04: Plug and Play ACPI device, IDs PNP0200 (active)
[    0.147034] pnp 00:05: Plug and Play ACPI device, IDs PNP0b00 (active)
[    0.147144] pnp 00:06: Plug and Play ACPI device, IDs PNP0800 (active)
[    0.147262] pnp 00:07: Plug and Play ACPI device, IDs PNP0c04 (active)
[    0.147865] pnp 00:08: Plug and Play ACPI device, IDs PNP0700 (active)
[    0.149382] pnp 00:09: Plug and Play ACPI device, IDs PNP0401 (active)
[    0.150451] pnp 00:0a: Plug and Play ACPI device, IDs PNP0501 (active)
[    0.151453] pnp 00:0b: Plug and Play ACPI device, IDs PNP0303 (active)
[    0.151734] pnp 00:0c: Plug and Play ACPI device, IDs PNP0c02 (active)
[    0.151764] pnp: PnP ACPI: found 13 devices
[    0.151768] ACPI: ACPI bus type pnp unregistered
[    1.029606] ACPI: acpi_idle registered with cpuidle
[    1.029664] ACPI: Invalid PBLK length [5]
[   15.073487] ACPI: Power Button [PWRB]
[   15.073856] ACPI: Power Button [PWRF]
[   15.737568] parport_pc 00:09: reported by Plug and Play ACPI

Loadavg sollte bei max. 50% gelegen haben, eher weniger.

Syslog ist /var/log/messages, korrekt? Diese Angaben hatte ich im Ausgangspost.

Always-Godlike · 13.11.2011

Ich nehme mal an es handelt sich um ein openSuSE 11.4. Da ist die /var/log/messages das syslog.
Wie groß sind denn der RAM und die swap-Partition? Kenne dieses Verhalten von meinem Server wenn der Speicher alle ist. (Ist jetzt aber auch nur eine Vermutung)

martin12345 · 13.11.2011

Sehr richtig, Opensuse 11.4 ist das OS. Ram und Swap sollte nicht das Problem sein, beide nicht ansatzweise voll.
Verbaut sind die maximalen 12 GB Ram, die das Board unterstützt (hatte ich eh liegen) - könnte das vielleicht das Problem sein? Momentan habe ich ja nur eine CPU verbaut, kann er vielleicht den kompletten Ram nicht korrekt adressieren, da es sich um ein Multi CPU Board handelt?

marcellus · 13.11.2011

bin ich der einzige, dem

Code:

Nov 12 22:25:03 fileserver smartd[4337]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 124 to 126

etwas hoch vorkommt. Ich denk, dass 126°C doch recht viel sind für eine festplatte. Ist natürlich die frage, ob der Sensor hin ist, oder die Platte wirklich so heiß wird.

gropiuskalle · 13.11.2011

Nee, solche Werte sieht mal regelmäßig, und wie die zu interpretieren sind, weiß ich auch nicht.

Code:

Nov 13 16:39:03 hoppers smartd[2696]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 108 to 107
Nov 13 16:39:03 hoppers smartd[2696]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 110 to 109
Nov 13 17:09:03 hoppers smartd[2696]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 107 to 108
[...]

Ohnehin sind Plattenanalysen via s.m.a.r.t. mit Vorsicht zu genießen.

martin12345 · 13.11.2011

marcellus schrieb:
bin ich der einzige, dem

Code:

Nov 12 22:25:03 fileserver smartd[4337]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 124 to 126

etwas hoch vorkommt. Ich denk, dass 126°C doch recht viel sind für eine festplatte. Ist natürlich die frage, ob der Sensor hin ist, oder die Platte wirklich so heiß wird.

Ich glaube es kommt eher von der merkwürdigen Werteaufteilung in smart ctl. Momentan sieht es z.B. so aus:

Code:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
194 Temperature_Celsius     0x0032   253   253   000    Old_age   Always       -       34

Einschlägig ist aber nur der letzte Wert, ergo Raw Value = 34. Anscheinend übergibt er es fehlerhaft in Syslog.

martin12345 · 19.11.2011

Leider musste ich heute morgen schon wieder den Server hardresetten... /var/log/messages erneut unauffällig.... suche dringend Tipps...

Mysteriöser 11.4 Absturz - Maschine läuft, SSH und vor Ort Login unmöglich

martin12345

Grünschnabel

bitmuncher

Foren Gott

Lord_x

martin12345

Grünschnabel

Always-Godlike

Das Freak

martin12345

Grünschnabel

marcellus

Kaiser

gropiuskalle

terra incognita

martin12345

Grünschnabel

martin12345

Grünschnabel

Ähnliche Themen

X not starting anymore since filesystem was full

VPN verbindet nichtmehr

zwei Sambaserver binden

System friert einfach ein

Rollei Mini Wifi Camcorder

Neueste Themen