Tisknout stránku - S.M.A.R.T na MegaRaid poli

Ahoj...
mam HW raid 1 (MegaRaid) a při záloze LV snapshotu jsem zachytil výstup:

DESTROY all the LVM snaphots
  /dev/databases/sql-server_snapshot: read failed after 0 of 4096 at 118111535104: Input/output error
  /dev/databases/sql-server_snapshot: read failed after 0 of 4096 at 118111592448: Input/output error
  /dev/databases/sql-server_snapshot: read failed after 0 of 4096 at 0: Input/output error
  /dev/databases/sql-server_snapshot: read failed after 0 of 4096 at 4096: Input/output error

pustil jsem tedy smartclt na oba disky:
smartctl -a -d megaraid,6 /dev/sdb

Kód: [Vybrat]

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   424188540        0         0  424188540          0       1404.456           0
write:         0        0         0         0          0        612.763           0
verify: 2700427945        0         0  2700427945          0       2070.401           0

Non-medium error count:        2

smartctl -a -d megaraid,7 /dev/sdb

Kód: [Vybrat]

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   420704234        0         0  420704234          0       1532.549           0
write:         0        0         0         0          0        612.736           0
verify: 2155469642        0         0  2155469642          0       1897.588           0

Non-medium error count:        8

Non-medium error count - to míní chyby jiné než na mediu/plotně? ...tedy elektronika, komunikace s řadičem(kabel) ?

díky

pustil jsem na to ještě PATROL READ:
megacli -AdpPR -Start -aALL

jak to dojede tak se podivam na vystup:
megacli -AdpEventLog -GetSinceReboot -warning -fatal -aALL

možná se bude někomu hodit:
http://fibrevillage.com/storage/176-megaraid-patrol-read-detail (http://fibrevillage.com/storage/176-megaraid-patrol-read-detail)

mhmm na relativne cerstvym serveru mam:

Kód: [Vybrat]

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

..

Kód: [Vybrat]

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   3849552949        4         0  3849552953          4      11801,563           0
write:         0        0         0         0          0       6361,595           0
verify: 1579853104        0         0  1579853104          0      18033,382           0

Non-medium error count:       37

na dalsim disku

Kód: [Vybrat]

Non-medium error count: 116
a jeste dalsim

Kód: [Vybrat]

Non-medium error count: 41
takže nejsem si jistý, co to znamená, ale beru to tak, že dokud raid ten disk nevykopne, tak se s tím nějak popasuje.

Fórum Root.cz

Hlavní témata => Server => Téma založeno: ZAJDAN 13. 06. 2018, 16:58:47