VMware ESXiホストを担うIBM x3650M3のRAID1アレイを構成する HDD に障害が発生、入れ替えに同容量の SSD を当てるも、アレイ内の HDD と SSD の 混在 は搭載の IBM ServeRAID M5014 コントローラが非対応という結果に。
ディスク交換してもリビルドが始まらない
前回、RAID1アレイを構成するHDDの障害を知らせるアラーム音を一時的に止めたこのサーバに、ようやく交換用ディスクが手配されました。
IBM x3650M3に搭載の ServerRAID M5014コントローラはディスクのホットスワップに対応しているので、システムを稼働させたまま、ディスクの交換が可能で、通常は交換が済むと自動的にリビルドが始まるはず。
が、いつまで経ってもアクティビティを示すLEDが点灯しないとのことで、この時点で初めて現場確認に呼ばれます。
挿したディスクを抜いて確認すると、それは新品で同容量のSATA SSDでした。
ServeRAID M5014の仕様をMegaCliで確認
ストレージをHDDからSSDへリプレースする過渡期などに於いて、RAIDアレイ内で両者を混在させたいと言うケースは実際あると思うのですが、RAIDコントローラによってはサポートされないのだそう。
MegaCLIで現在使用中のRAIDコントローラの情報を確認してみます。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 |
[root@vm21:/opt/lsi/MegaCLI] ./MegaCli -Adpallinfo -Aall Adapter #0 ============================================================================== Versions ================ Product Name : ServeRAID M5014 SAS/SATA Controller Serial No : SV02008590 FW Package Build: 12.0.1-0090 Mfg. Data ================ Mfg. Date : 05/16/10 Rework Date : 00/00/00 Revision No : Battery FRU : N/A Image Versions in Flash: ================ BIOS Version : 3.09.00 FW Version : 2.0.13-0748 Preboot CLI Version: 02.00-015:#%00008 WebBIOS Version : 3.0-22-e_12-Rel NVDATA Version : 2.02.0038 Boot Block Version : 2.00.00.00-0018 BOOT Version : 01.250.04.219 Pending Images in Flash ================ None PCI Info ================ Controller Id : 0000 Vendor Id : 1000 Device Id : 0079 SubVendorId : 1014 SubDeviceId : 03c7 Host Interface : PCIE Link Speed : 0 Number of Frontend Port: 0 Device Interface : PCIE Number of Backend Port: 8 Port : Address 0 4433221102000000 1 4433221103000000 2 0000000000000000 3 0000000000000000 4 0000000000000000 5 0000000000000000 6 0000000000000000 7 0000000000000000 HW Configuration ================ SAS Address : 500605b002343e50 BBU : Absent Alarm : Present NVRAM : Present Serial Debugger : Present Memory : Present Flash : Present Memory Size : 256MB TPM : Absent On board Expander: Absent Upgrade Key : Absent Temperature sensor for ROC : Absent Temperature sensor for controller : Absent Settings ================ Current Time : 7:52:58 4/18, 2023 Predictive Fail Poll Interval : 300sec Interrupt Throttle Active Count : 16 Interrupt Throttle Completion : 50us Rebuild Rate : 90% PR Rate : 30% BGI Rate : 30% Check Consistency Rate : 30% Reconstruction Rate : 30% Cache Flush Interval : 4s Max Drives to Spinup at One Time : 2 Delay Among Spinup Groups : 12s Physical Drive Coercion Mode : 1GB Cluster Mode : Disabled Alarm : Disabled Auto Rebuild : Enabled Battery Warning : Enabled Ecc Bucket Size : 15 Ecc Bucket Leak Rate : 1440 Minutes Restore HotSpare on Insertion : Disabled Expose Enclosure Devices : Enabled Maintain PD Fail History : Enabled Host Request Reordering : Enabled Auto Detect BackPlane Enabled : SGPIO/i2c SEP Load Balance Mode : Auto Use FDE Only : Yes Security Key Assigned : No Security Key Failed : No Security Key Not Backedup : No Default LD PowerSave Policy : Controller Defined Maximum number of direct attached drives to spin up in 1 min : 0 Auto Enhanced Import : No Any Offline VD Cache Preserved : No Allow Boot with Preserved Cache : No Disable Online Controller Reset : No PFK in NVRAM : No Use disk activity for locate : No POST delay : 90 seconds BIOS Error Handling : Stop On Errors Current Boot Mode :Normal Capabilities ================ RAID Level Supported : RAID0, RAID1, RAID5, RAID00, RAID10, RAID50, PRL 11, PRL 11 with spanning, SRL 3 supported, PRL11-RLQ0 DDF layout with no span, PRL11-RLQ0 DDF layout with span Supported Drives : SAS, SATA Allowed Mixing: Mix in Enclosure Allowed Status ================ ECC Bucket Count : 0 Limitations ================ Max Arms Per VD : 32 Max Spans Per VD : 8 Max Arrays : 128 Max Number of VDs : 64 Max Parallel Commands : 1008 Max SGE Count : 80 Max Data Transfer Size : 8192 sectors Max Strips PerIO : 42 Max LD per array : 16 Min Strip Size : 8 KB Max Strip Size : 1.0 MB Max Configurable CacheCade Size: 0 GB Current Size of CacheCade : 0 GB Current Size of FW Cache : 0 MB Device Present ================ Virtual Drives : 1 Degraded : 1 Offline : 0 Physical Devices : 3 Disks : 2 Critical Disks : 0 Failed Disks : 0 Supported Adapter Operations ================ Rebuild Rate : Yes CC Rate : Yes BGI Rate : Yes Reconstruct Rate : Yes Patrol Read Rate : Yes Alarm Control : Yes Cluster Support : No BBU : Yes Spanning : Yes Dedicated Hot Spare : Yes Revertible Hot Spares : Yes Foreign Config Import : Yes Self Diagnostic : Yes Allow Mixed Redundancy on Array : No Global Hot Spares : Yes Deny SCSI Passthrough : No Deny SMP Passthrough : No Deny STP Passthrough : No Support Security : No Snapshot Enabled : No Support the OCE without adding drives : No Support PFK : No Support PI : No Support Boot Time PFK Change : No Disable Online PFK Change : No Support Shield State : No Block SSD Write Disk Cache Change: No Supported VD Operations ================ Read Policy : Yes Write Policy : Yes IO Policy : Yes Access Policy : Yes Disk Cache Policy : Yes Reconstruction : Yes Deny Locate : No Deny CC : No Allow Ctrl Encryption: No Enable LDBBM : No Support Breakmirror : No Power Savings : No Supported PD Operations ================ Force Online : Yes Force Offline : Yes Force Rebuild : Yes Deny Force Failed : No Deny Force Good/Bad : No Deny Missing Replace : No Deny Clear : No Deny Locate : No Support Temperature : No Disable Copyback : No Enable JBOD : No Enable Copyback on SMART : No Enable Copyback to SSD on SMART Error : Yes Enable SSD Patrol Read : No PR Correct Unconfigured Areas : Yes Enable Spin Down of UnConfigured Drives : No Disable Spin Down of hot spares : No Spin Down time : 0 T10 Power State : No Error Counters ================ Memory Correctable Errors : 0 Memory Uncorrectable Errors : 0 Cluster Information ================ Cluster Permitted : No Cluster Active : No Default Settings ================ Phy Polarity : 0 Phy PolaritySplit : 0 Background Rate : 30 Strip Size : 128kB Flush Time : 4 seconds Write Policy : WB Read Policy : None Cache When BBU Bad : Disabled Cached IO : No SMART Mode : Mode 6 Alarm Disable : No Coercion Mode : 1GB ZCR Config : Unknown Dirty LED Shows Drive Activity : No BIOS Continue on Error : 0 Spin Down Mode : None Allowed Device Type : SAS/SATA Mix Allow Mix in Enclosure : Yes Allow HDD SAS/SATA Mix in VD : No Allow SSD SAS/SATA Mix in VD : No Allow HDD/SSD Mix in VD : No Allow SATA in Cluster : No Max Chained Enclosures : 16 Disable Ctrl-R : Yes Enable Web BIOS : Yes Direct PD Mapping : No BIOS Enumerate VDs : Yes Restore Hot Spare on Insertion : No Expose Enclosure Devices : Yes Maintain PD Fail History : Yes Disable Puncturing : Yes Zero Based Enclosure Enumeration : No PreBoot CLI Enabled : Yes LED Show Drive Activity : No Cluster Disable : Yes SAS Disable : No Auto Detect BackPlane Enable : SGPIO/i2c SEP Use FDE Only : Yes Enable Led Header : No Delay during POST : 4 EnableCrashDump : No Disable Online Controller Reset : No EnableLDBBM : No Un-Certified Hard Disk Drives : Allow Treat Single span R1E as R10 : No Max LD per array : 16 Power Saving option : All power saving options are enabled Default spin down time in minutes: 0 Enable JBOD : No Time taken to detect CME : 60s Exit Code: 0x00 |
長い出力結果の後方、 Default Settings 欄に Allow HDD/SSD Mix in VD : No とあることから、このRAIDコントローラでは同じアレイの中にHDDとSSDが混在できないことが分かります。SSDを挿してもリビルドが始まらないのは、この仕様によるものです。
ちなみに別のRAIDコントローラではこの混在に対応したモデルもあるようです。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
Adapter #0 ============================================================================== Versions ================ Product Name : LSI 2108 MegaRAID Serial No : FW Package Build: 12.15.0-0239 Capabilities ================ RAID Level Supported : RAID0, RAID1, RAID5, RAID6, RAID00, RAID10, RAID50, RAID60, PRL 11, PRL 11 with spanning, SRL 3 supported, PRL11-RLQ0 DDF layout with no span, PRL11-RLQ0 DDF layout with span Supported Drives : SAS, SATA Allowed Mixing: Mix in Enclosure Allowed Mix of SAS/SATA of HDD type in VD Allowed Mix of SAS/SATA of SSD type in VD Allowed Mix of SSD/HDD in VD Allowed Default Settings ================ Allowed Device Type : SAS/SATA Mix Allow Mix in Enclosure : Yes Allow HDD SAS/SATA Mix in VD : No Allow SSD SAS/SATA Mix in VD : No Allow HDD/SSD Mix in VD : No Allow SATA in Cluster : No |
HDDアレイにSSDを入れるとどうなるか
リビルドが始まらない理由が分かってしまえばなるほどその通りなのですが、エラーメッセージか何か警告が挙がるわけではないので、実はなかなか気づきませんでした。以下がそのMegaCLIと格闘のログになります。
まず始めに、RAIDアレイの設定をMegaCLIで確認してみると、
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
[root@vm21:/opt/lsi/MegaCLI] ./MegaCli -CfgDsply -a0 ============================================================================== Adapter: 0 Product Name: ServeRAID M5014 SAS/SATA Controller Memory: 256MB BBU: Absent Serial No: SV02008590 ============================================================================== Number of DISK GROUPS: 1 DISK GROUP: 0 Number of Spans: 1 SPAN: 0 Span Reference: 0x00 Number of PDs: 2 Number of VDs: 1 Number of dedicated Hotspares: 0 Virtual Drive Information: Virtual Drive: 0 (Target Id: 0) Name : RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0 Size : 930.390 GB Sector Size : 512 Mirror Data : 930.390 GB State : Degraded Strip Size : 128 KB Number Of Drives : 2 Span Depth : 1 Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU Default Access Policy: Read/Write Current Access Policy: Read/Write Disk Cache Policy : Disabled Encryption Type : None Is VD Cached: No Physical Disk Information: -略- Exit Code: 0x00 |
続いて、故障したHDDと入れ替えたSSDの状態を見てみます。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
-a0"][root@vm21:/opt/lsi/MegaCLI] ./MegaCli -PDInfo -PhysDrv [252:2] -a0 Adapter #0 Enclosure Device ID: 252 Slot Number: 1 Enclosure position: N/A Device Id: 17 WWN: Sequence Number: 11 Media Error Count: 0 Other Error Count: 1 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 931.512 GB [0x74706db0 Sectors] Non Coerced Size: 931.012 GB [0x74606db0 Sectors] Coerced Size: 930.390 GB [0x744c8000 Sectors] Sector Size: 0 Firmware state: Unconfigured(good), Spun Up Device Firmware Level: 0400 Shield Counter: 0 Successful diagnostics completion on : N/A SAS Address(0): 0x4433221102000000 Connected Port Number: 0(path0) Inquiry Data: 2205FN454601 WD Green 2.5 1000GB UW210400 FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: 6.0Gb/s Link Speed: 6.0Gb/s Media Type: Solid State Device Drive: Not Certified Drive Temperature : N/A PI Eligibility: No Drive is formatted for PI information: No PI: No PI Port-0 : Port status: Active Port's Linkspeed: 6.0Gb/s Drive has flagged a S.M.A.R.T alert : No Exit Code: 0x00 |
オフラインにしようとすると
1 2 3 |
-a0"][root@vm21:/opt/lsi/MegaCLI] ./MegaCli -PDOffline -PhysDrv [252:1] -a0 Adapter: 0: Failed to change PD state at EnclId-252 SlotId-1. Exit Code: 0x01 |
ミッシングディスクの一覧に残るのはあくまでも、故障して既に取り除いたHDD。
1 2 3 4 5 |
[root@vm21:/opt/lsi/MegaCLI] ./MegaCli -Pdgetmissing -aAll Adapter 0 - Missing Physical drives No. Array Row Size Expected 0 0 0 952720 MB Exit Code: 0x00 |
ミッシングディスクのリプレースを試みるも、少しだけ意味深なエラーメッセージ。
1 2 3 4 5 |
-Array0 -row0 -a0"][root@vm21:/opt/lsi/MegaCLI] ./MegaCli -PdReplaceMissing -PhysDrv [252:1] -Array0 -row0 -a0 Adapter: 0: Failed to replace Missing PD at Array 0, Row 0. FW error description: The specified physical disk does not have the appropriate attributes to complete the requested command. Exit Code: 0x26 |
リビルドの手動開始も当然できません。
1 2 3 4 5 |
-a0"][root@vm21:/opt/lsi/MegaCLI] ./MegaCli -PDRbld -Start -PhysDrv[252:1] -a0 Cannot Rebuild Physical Drive at Enclosure - 252, Slot - 1. FW error description: The specified device is in a state that doesn't support the requested command. Exit Code: 0x32 |
念のため、RAIDコントローラ上に既存の設定が残っていないか確認するも、全くなし。
1 2 3 |
[root@vm21:/opt/lsi/MegaCLI] ./MegaCli -CfgForeign -Scan -aAll There is no foreign configuration on controller 0. Exit Code: 0x00 |
但し、同じコントローラ上にHDDとSSDが混在することは許されるので、このSSDをグローバルホットスペアにすることはできまずが、既存のHDDアレイへ加わることはできません。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
-a0"][root@vm21:/opt/lsi/MegaCLI] ./MegaCli -PDHSP -Set -PhysDrv [252:1] -a0 Adapter: 0: Set Physical Drive at EnclId-252 SlotId-1 as Hot Spare Success. Exit Code: 0x00 [root@vm21:/opt/lsi/MegaCLI] ./MegaCli -PDList -PhysDrv [252:1] -a0 | egrep 'Adapter|Enclosure|Slot|Inquiry|Foreign|Firmware' Adapter #0 Enclosure Device ID: 252 Slot Number: 2 Enclosure position: N/A Firmware state: Unconfigured(good), Spun Up Device Firmware Level: 0400 Inquiry Data: 2205FN454601 WD Green 2.5 1000GB UW210400Foreign State: None Enclosure Device ID: 252 Slot Number: 0 Enclosure position: N/A Firmware state: Hotspare, Spun Up Device Firmware Level: BE24 Inquiry Data: 9XG179M7ST91000640NS 81Y9731 81Y3829IBM BE24 Foreign State: None |
(とにかく情報量が多い出力結果も egrep でキーワード抽出することで見やすくなるので、定期監視時にも使えそう)
Default Settingなら変更できる?
冒頭のRAIDコントローラ情報で、 Allow HDD/SSD Mix in VD : No が記述されていたのが、 Capabilities ではなく、 Default Settings 項であったことから変更が可能なのではと思い、調べてみるとこちらのスレッドを見つけました。
MegaCLIの流れをくむStorCLIでも変更できないようなので、 Capabilities に記載されるべき項目なのかも知れません。
結局、本事案は同じSSDをもう1基調達し、2基のSSDで新たにRAIDアレイを作成、ESXiホスト上に新しいデータストアとして登録し、仮想マシンを順次移設させることになりそうです。