More on AIX...
We recently brought some new ports online on the 3PAR and began rebalancing our connected hosts to the new ports. We assumed that with AIX MPIO we could kill then add a path hot and online, just as long as the system always had live paths to work with... the assumption was good, and we were able to move systems around without immediate interuption ... except... we had a server crash about 45 minutes after the moves were done!
IBM support reviewed the dump files and pointed the finger at 3PAR kernel extensions, however, the real problem was the result of the AIX system not having its FAST_FAIL and DYNAMIC_TRACKING settings on the HBA enabled per both AIX and 3PAR documentation (why these settings aren't on by default boggles me).
3PAR support responded with this well written, well referenced, accurate response to my issue. Compared to recent support I have been getting from both Symantec and Microsoft, I am pleased and impressed with the details of this response which made it crystal clear what my problem was, and how to fix it.
Case Analysis ReportSR#: 482626-181966951
Reported Symptom: 3par_pcmke kernel extension causes AIX servers to crash
Reported By: Pier 1 Services Company
Description:
After successfully adding new MPIO paths, then deleting the old ones to an AIX 5.3 server, about an hour later, that server crashed. Crash dumps were collected and sent to IBM and they responded with the following:
Subject: PMR 18921,004,000 3par_pcmke kernel extension
CRASH INFORMATION: CPU 0 CSA F00000002FF47600 at time of crash, error code for LEDs: 30000000
pvthread+03AB00 STACK: [04167B70]3par_pcmke:pcmSelectIoctlPath+0000DC (F1000110104AD350,
F100011010433800)
--
The problem is due to some issue in the 3par_pcmke kernel extension.
The owner of this kernel extension is 3PAR company.
Findings:3PAR investigation shows that a similar kernel extension crash was reported to 3PAR engineering and was determined to be caused by HBA settings related to the dynamic tracking and fast fail attribute settings.
Per 3PAR engineering when dynamic tracking is not enabled in the HBA, the 3PAR MPIO path pointers can get null values which can cause problems similar to what you reported.
Further research with IBM reveals that , for hosts systems that run an AIX® 5.2 or later operating system, the fast fail and dynamic tracking attributes must be enabled.
See link:
IBM Aix Config for Fast Fail and Dynamic TrackingReview of the log lsattr_fscsi.out you provided, we confirmed that the dynamic tracking and fast fail attributes are not enabled on
this host as recommended.
From lsattr_fscsi.out:
### fscsi0
attach switch How this adapter is CONNECTED False
dyntrk no Dynamic Tracking of FC Devices True
fc_err_recov delayed_fail FC Fabric Event Error RECOVERY Policy True
scsi_id 0xa30024 Adapter SCSI ID False
sw_fc_class 3 FC Class for Fabric True
…
### fscsi2
attach switch How this adapter is CONNECTED False
dyntrk no Dynamic Tracking of FC Devices True
fc_err_recov delayed_fail FC Fabric Event Error RECOVERY Policy True
scsi_id 0xa20020 Adapter SCSI ID False
sw_fc_class 3 FC Class for Fabric True
When dynamic tracking of FC devices is enabled, the FC adapter driver can detect when the Fiber Channel N_Port ID of a device changes and re-route traffic destined for that device to the new address while the devices are still online.
The 3PAR Implementation Guide for AIX also additional information on these settings including a list of events when the N_Port ID can change. See section 3.2.5 of the attached AIX implementation guide.
Solution :The dynamic tracking and fast fail commands can be enabled by running these commands.
chdev -l fscsi0 -a fc_err_recov=fast_fail
chdev -l fscsi0 -a dyntrk=yes
Notes:
1. Change the settings on all applicable HBA in the system.
2. A Reboot may be required for these changes to take effect.
***Please follow all necessary pre-cautions before rebooting your host. ***
For further details and other considerations please refer to the IBM documentation on how to implement these changes.
IBM Aix Config Fast Fail and Dynamic Tracking