VMware ESXiからProxmox VE 8へ入れ替えたLenovo System x3650 M5のハードウェアRAIDを構成するディスクに障害が発生。検知からIMM2による不良ディスクの特定、交換とリビルドの作業記録です。
第一報はSMARTエラーを知らせるメール
VMware ESXiハイパバイザからProxmox VE 8へ乗り換えた、Lenovo System x3650 M5のハードウェアRAIDを構成するディスクに不穏な兆候が。
それは、Proxmox VEノードから、普段見慣れない次のようなメールを受け取ったのが始まり。
ただ表題にもあるように、これだけでは単なるSMARTエラーなのだろう、とおろそかにしてしまいがち。
|
1 2 |
SMART error (FailedReadSmartSelfTestLog) detected on host: pve25 SMART error (FailedReadSmartData) detected on host: pve25 |
IMM2の状態
その後も同じメールを連日受信するので、IMM2の状態を確認してようやく、ディスク異常に気づきます。
System Status ページ下方で Critical になっている Local Storage をクリックして詳細を確認。
RAID1アレイを構成する Disk0 、 Disk1 共に Critical ですがそのメッセージの内容から、障害の発生しているのは Disk1 であることが分かります。
実機前面の状態とディスク交換
早速、予備のHDDを携えてサーバルームへ。実機前面の2番めのスロットのHDDがアンバー点灯。ディスプレイにも HDD 2 fault と宣告されていました。
ホットスワップなので、オンラインのままディスクを抜き出して交換。自動的にリビルドが始まったようなので、ここでサーバルームを退散。
IMM2の状態
再びIMM2を開いて、リビルドの状況を確認(ディスクサイズは300GB)。
RAID Logsにも、リビルド中とのエントリが残されていました。
1時間ほど経ってリビルドが完了、全て問題無いことを確認。
SMARTエラーは?
リビルド完了したはずなのですが、その後もメール通知が止まりません。Proxmox VEノードのターミナルでログやsmartdサービスの状態を確認。
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
root@pve25:~# systemctl status smartd ● smartmontools.service - Self Monitoring and Reporting Technology (SMART) Daemon Loaded: loaded (/lib/systemd/system/smartmontools.service; enabled; preset: enabled) Active: active (running) since Fri 2024-11-15 06:37:31 HKT; 3 months 2 days ago Docs: man:smartd(8) man:smartd.conf(5) Main PID: 1155 (smartd) Status: "Next check of 8 devices will start at 13:37:31" Tasks: 1 (limit: 76733) Memory: 5.7M CPU: 31.354s CGroup: /system.slice/smartmontools.service └─1155 /usr/sbin/smartd -n -q never Feb 12 13:37:40 pve25 smartd[1155]: Sending warning via /usr/share/smartmontools/smartd-runner to root ... Feb 12 13:37:40 pve25 smartd[1155]: Warning via /usr/share/smartmontools/smartd-runner to root: successful Feb 13 13:37:40 pve25 smartd[1155]: Sending warning via /usr/share/smartmontools/smartd-runner to root ... Feb 13 13:37:40 pve25 smartd[1155]: Warning via /usr/share/smartmontools/smartd-runner to root: successful Feb 14 14:07:40 pve25 smartd[1155]: Sending warning via /usr/share/smartmontools/smartd-runner to root ... Feb 14 14:07:40 pve25 smartd[1155]: Warning via /usr/share/smartmontools/smartd-runner to root: successful Feb 15 14:07:41 pve25 smartd[1155]: Sending warning via /usr/share/smartmontools/smartd-runner to root ... Feb 15 14:07:41 pve25 smartd[1155]: Warning via /usr/share/smartmontools/smartd-runner to root: successful Feb 16 14:37:40 pve25 smartd[1155]: Sending warning via /usr/share/smartmontools/smartd-runner to root ... Feb 16 14:37:40 pve25 smartd[1155]: Warning via /usr/share/smartmontools/smartd-runner to root: successful |
smartd サービス再起動して、デバイス構成を読み直すことで解消されました。
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
root@pve25:~# systemctl stop smartd root@pve25:~# systemctl start smartd root@pve25:~# systemctl status smartd ● smartmontools.service - Self Monitoring and Reporting Technology (SMART) Daemon Loaded: loaded (/lib/systemd/system/smartmontools.service; enabled; preset: enabled) Active: active (running) since Mon 2025-02-17 13:29:26 HKT; 17s ago Docs: man:smartd(8) man:smartd.conf(5) Main PID: 2022825 (smartd) Status: "Next check of 8 devices will start at 13:59:26" Tasks: 1 (limit: 76733) Memory: 1.3M CPU: 33ms CGroup: /system.slice/smartmontools.service └─2022825 /usr/sbin/smartd -n -q never Feb 17 13:29:17 pve25 smartd[2022825]: Monitoring 0 ATA/SATA, 8 SCSI/SAS and 0 NVMe devices Feb 17 13:29:26 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_03], state written to /var/lib/smartmontools/smartd.LENOVO-X> Feb 17 13:29:26 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_04], state written to /var/lib/smartmontools/smartd.LENOVO-X> Feb 17 13:29:26 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_05], state written to /var/lib/smartmontools/smartd.LENOVO-X> Feb 17 13:29:26 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_06], state written to /var/lib/smartmontools/smartd.LENOVO-X> Feb 17 13:29:26 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_07], state written to /var/lib/smartmontools/smartd.LENOVO-X> Feb 17 13:29:26 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_08], state written to /var/lib/smartmontools/smartd.LENOVO-X> Feb 17 13:29:26 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_11], state written to /var/lib/smartmontools/smartd.IBM-ESXS> Feb 17 13:29:26 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_12], state written to /var/lib/smartmontools/smartd.IBM-ESXS> Feb 17 13:29:26 pve25 systemd[1]: Started smartmontools.service - Self Monitoring and Reporting Technology (SMART) Daemon. |
サービス再起動後の挙動をログから読み取ると…
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
Feb 17 13:29:12 pve25 smartd[2022825]: smartd 7.3 2022-02-28 r5338 [x86_64-linux-6.8.12-2-pve] (local build) Feb 17 13:29:12 pve25 smartd[2022825]: Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org Feb 17 13:29:12 pve25 smartd[2022825]: Opened configuration file /etc/smartd.conf Feb 17 13:29:12 pve25 smartd[2022825]: Drive: DEVICESCAN, implied '-a' Directive on line 21 of file /etc/smartd.conf Feb 17 13:29:12 pve25 smartd[2022825]: Configuration file /etc/smartd.conf was parsed, found DEVICESCAN, scanning devices Feb 17 13:29:12 pve25 smartd[2022825]: Device: /dev/sda, opened Feb 17 13:29:12 pve25 smartd[2022825]: Device: /dev/sda, [IBM ServeRAID M5210 4.29], lu id: 0x600605b00aed8ba02eab31728088151b, S/N: 001b1588807231ab2ea08bed0ab00506, 298 GB Feb 17 13:29:12 pve25 smartd[2022825]: Device: /dev/sda, IE (SMART) not enabled, skip device Feb 17 13:29:12 pve25 smartd[2022825]: Try 'smartctl -s on /dev/sda' to turn on SMART features Feb 17 13:29:12 pve25 smartd[2022825]: Device: /dev/sdb, opened Feb 17 13:29:12 pve25 smartd[2022825]: Device: /dev/sdb, [IBM ServeRAID M5210 4.29], lu id: 0x600605b00aed8ba02eab31f388356743, S/N: 0043673588f331ab2ea08bed0ab00506, 998 GB Feb 17 13:29:12 pve25 smartd[2022825]: Device: /dev/sdb, IE (SMART) not enabled, skip device Feb 17 13:29:12 pve25 smartd[2022825]: Try 'smartctl -s on /dev/sdb' to turn on SMART features Feb 17 13:29:12 pve25 smartd[2022825]: Device: /dev/sdc, opened Feb 17 13:29:12 pve25 smartd[2022825]: Device: /dev/sdc, [IBM ServeRAID M5210 4.29], lu id: 0x600605b00aed8ba02eab32478d40e481, S/N: 0081e4408d4732ab2ea08bed0ab00506, 998 GB Feb 17 13:29:12 pve25 smartd[2022825]: Device: /dev/sdc, IE (SMART) not enabled, skip device Feb 17 13:29:12 pve25 smartd[2022825]: Try 'smartctl -s on /dev/sdc' to turn on SMART features Feb 17 13:29:12 pve25 smartd[2022825]: Device: /dev/sdd, opened Feb 17 13:29:12 pve25 smartd[2022825]: Device: /dev/sdd, [IBM ServeRAID M5210 4.29], lu id: 0x600605b00aed8ba02eab32a893031578, S/N: 0078150393a832ab2ea08bed0ab00506, 998 GB Feb 17 13:29:12 pve25 smartd[2022825]: Device: /dev/sdd, IE (SMART) not enabled, skip device Feb 17 13:29:12 pve25 smartd[2022825]: Try 'smartctl -s on /dev/sdd' to turn on SMART features Feb 17 13:29:12 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_03], opened Feb 17 13:29:12 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_03], [LENOVO-X ST91000640SS LD2L], lu id: 0x5000c50084a75f9f, S/N: 9XG9MNQK0000C6172HMC, 1.00 TB Feb 17 13:29:13 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_03], is SMART capable. Adding to "monitor" list. Feb 17 13:29:13 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_03], state read from /var/lib/smartmontools/smartd.LENOVO-X-ST91000640SS-9XG9MNQK0000C6172HMC.scsi.state Feb 17 13:29:13 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_04], opened Feb 17 13:29:13 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_04], [LENOVO-X ST91000640SS LD2L], lu id: 0x5000c500849a705b, S/N: 9XG9LFGE0000C61667CM, 1.00 TB Feb 17 13:29:14 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_04], is SMART capable. Adding to "monitor" list. Feb 17 13:29:14 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_04], state read from /var/lib/smartmontools/smartd.LENOVO-X-ST91000640SS-9XG9LFGE0000C61667CM.scsi.state Feb 17 13:29:14 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_05], opened Feb 17 13:29:14 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_05], [LENOVO-X ST91000640SS LD2L], lu id: 0x5000c50084a74073, S/N: 9XG9MP4B0000C6172G35, 1.00 TB Feb 17 13:29:15 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_05], is SMART capable. Adding to "monitor" list. Feb 17 13:29:15 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_05], state read from /var/lib/smartmontools/smartd.LENOVO-X-ST91000640SS-9XG9MP4B0000C6172G35.scsi.state Feb 17 13:29:15 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_06], opened Feb 17 13:29:15 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_06], [LENOVO-X ST91000640SS LD2L], lu id: 0x5000c500848e8c97, S/N: 9XG9KW6P0000C6165K3W, 1.00 TB Feb 17 13:29:15 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_06], is SMART capable. Adding to "monitor" list. Feb 17 13:29:15 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_06], state read from /var/lib/smartmontools/smartd.LENOVO-X-ST91000640SS-9XG9KW6P0000C6165K3W.scsi.state Feb 17 13:29:15 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_07], opened Feb 17 13:29:15 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_07], [LENOVO-X ST91000640SS LD2L], lu id: 0x5000c50084a79393, S/N: 9XG9MMXN0000C6172HK3, 1.00 TB Feb 17 13:29:16 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_07], is SMART capable. Adding to "monitor" list. Feb 17 13:29:16 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_07], state read from /var/lib/smartmontools/smartd.LENOVO-X-ST91000640SS-9XG9MMXN0000C6172HK3.scsi.state Feb 17 13:29:16 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_08], opened Feb 17 13:29:16 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_08], [LENOVO-X ST91000640SS LD2L], lu id: 0x5000c50084938303, S/N: 9XG9L6JC0000C6168KCQ, 1.00 TB Feb 17 13:29:17 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_08], is SMART capable. Adding to "monitor" list. Feb 17 13:29:17 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_08], state read from /var/lib/smartmontools/smartd.LENOVO-X-ST91000640SS-9XG9L6JC0000C6168KCQ.scsi.state Feb 17 13:29:17 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_11], opened Feb 17 13:29:17 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_11], [IBM-ESXS CBRCA300C3ETS0 N C370], lu id: 0x5000cca00a31f494, S/N: PDVWGP7E, 300 GB Feb 17 13:29:17 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_11], is SMART capable. Adding to "monitor" list. Feb 17 13:29:17 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_11], state read from /var/lib/smartmontools/smartd.IBM-ESXS-CBRCA300C3ETS0_N-PDVWGP7E.scsi.state Feb 17 13:29:17 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_12], opened Feb 17 13:29:17 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_12], [IBM-ESXS CBRCA300C3ETS0 N C370], lu id: 0x5000cca00a352168, S/N: PDVY6UAE, 300 GB Feb 17 13:29:17 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_12], is SMART capable. Adding to "monitor" list. Feb 17 13:29:17 pve25 smartd[2022825]: Monitoring 0 ATA/SATA, 8 SCSI/SAS and 0 NVMe devices Feb 17 13:29:26 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_03], state written to /var/lib/smartmontools/smartd.LENOVO-X-ST91000640SS-9XG9MNQK0000C6172HMC.scsi.state Feb 17 13:29:26 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_04], state written to /var/lib/smartmontools/smartd.LENOVO-X-ST91000640SS-9XG9LFGE0000C61667CM.scsi.state Feb 17 13:29:26 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_05], state written to /var/lib/smartmontools/smartd.LENOVO-X-ST91000640SS-9XG9MP4B0000C6172G35.scsi.state Feb 17 13:29:26 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_06], state written to /var/lib/smartmontools/smartd.LENOVO-X-ST91000640SS-9XG9KW6P0000C6165K3W.scsi.state Feb 17 13:29:26 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_07], state written to /var/lib/smartmontools/smartd.LENOVO-X-ST91000640SS-9XG9MMXN0000C6172HK3.scsi.state Feb 17 13:29:26 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_08], state written to /var/lib/smartmontools/smartd.LENOVO-X-ST91000640SS-9XG9L6JC0000C6168KCQ.scsi.state Feb 17 13:29:26 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_11], state written to /var/lib/smartmontools/smartd.IBM-ESXS-CBRCA300C3ETS0_N-PDVWGP7E.scsi.state Feb 17 13:29:26 pve25 smartd[2022825]: Device: /dev/bus/0 [megaraid_disk_12], state written to /var/lib/smartmontools/smartd.IBM-ESXS-CBRCA300C3ETS0_N-PDVY6UAE.scsi.state |
RAID構成された論理ディスクはSMART対象外とし、物理ディスクを正しく検出していました。
SMARTエラーだけではいまいち決定打に欠けるように感じるので、いずれProxmox VEノードにMegaCliを導入してみるつもりです。











