Fórum Root.cz
Hlavní témata => Server => Téma založeno: ZAJDAN 13. 06. 2018, 16:58:47
-
Ahoj...
mam HW raid 1 (MegaRaid) a při záloze LV snapshotu jsem zachytil výstup:
DESTROY all the LVM snaphots
/dev/databases/sql-server_snapshot: read failed after 0 of 4096 at 118111535104: Input/output error
/dev/databases/sql-server_snapshot: read failed after 0 of 4096 at 118111592448: Input/output error
/dev/databases/sql-server_snapshot: read failed after 0 of 4096 at 0: Input/output error
/dev/databases/sql-server_snapshot: read failed after 0 of 4096 at 4096: Input/output error
pustil jsem tedy smartclt na oba disky:
smartctl -a -d megaraid,6 /dev/sdb
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 424188540 0 0 424188540 0 1404.456 0
write: 0 0 0 0 0 612.763 0
verify: 2700427945 0 0 2700427945 0 2070.401 0
Non-medium error count: 2
smartctl -a -d megaraid,7 /dev/sdb
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 420704234 0 0 420704234 0 1532.549 0
write: 0 0 0 0 0 612.736 0
verify: 2155469642 0 0 2155469642 0 1897.588 0
Non-medium error count: 8
Non-medium error count - to míní chyby jiné než na mediu/plotně? ...tedy elektronika, komunikace s řadičem(kabel) ?
díky
-
pustil jsem na to ještě PATROL READ:
megacli -AdpPR -Start -aALL
jak to dojede tak se podivam na vystup:
megacli -AdpEventLog -GetSinceReboot -warning -fatal -aALL
možná se bude někomu hodit:
http://fibrevillage.com/storage/176-megaraid-patrol-read-detail (http://fibrevillage.com/storage/176-megaraid-patrol-read-detail)
-
mhmm na relativne cerstvym serveru mam:
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
..
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 3849552949 4 0 3849552953 4 11801,563 0
write: 0 0 0 0 0 6361,595 0
verify: 1579853104 0 0 1579853104 0 18033,382 0
Non-medium error count: 37
na dalsim disku
Non-medium error count: 116
a jeste dalsim
Non-medium error count: 41
takže nejsem si jistý, co to znamená, ale beru to tak, že dokud raid ten disk nevykopne, tak se s tím nějak popasuje.