Patch-ID# 105459-13 Keywords: cvm suncluster Synopsis: Sun Cluster 2.1: Patch for CVM 2.2.1 Date: Feb/27/2001 Solaris Release: 2.5.1 2.6 SunOS Release: 5.5.1 5.6 Unbundled Product: Sun Cluster Unbundled Release: 2.1 Xref: This patch available for SunOS 5.6 as patch 107958 Topic: Sun Cluster 2.1: Patch for CVM 2.2.1 Relevant Architectures: sparc BugId's fixed with this patch: 4104327 4114085 4124356 4124916 4126657 4126726 4129451 4144143 4144806 4151066 4155764 4183493 4190362 4195113 4221522 4227590 4235825 4263564 4306191 4310145 4314433 4325092 4372113 Changes incorporated in this version: 4372113 Patches accumulated and obsoleted by this patch: Patches which conflict with this patch: Patches required with this patch: Obsoleted by: Files included with this patch: /kernel/drv/vxio.SunOS_5.5.1 /kernel/drv/vxio.SunOS_5.6 /kernel/drv/vxio.conf /kernel/drv/vxspec.SunOS_5.5.1 /kernel/drv/vxspec.SunOS_5.6 /kernel/drv/vxspec.conf /opt/SUNWvxva/bin/vxva /opt/SUNWvxvm/include/volintf.h /opt/SUNWvxvm/lib/libvxvm.a /opt/SUNWvxvm/man/man1m/vxassist.1m /opt/SUNWvxvm/man/man1m/vxvol.1m /sbin/vxconfigd.SunOS_5.5.1 /sbin/vxconfigd.SunOS_5.6 /scripts/fix_lib/sbin/vxconfigd /scripts/fix_lib/usr/lib/vxvm/type/gen/vxmake /scripts/fix_lib/usr/lib/vxvm/type/raid5/vxmake /scripts/fix_lib/usr/sbin/vxclust /scripts/fix_lib/usr/sbin/vxmake /scripts/upgrade_finish /scripts/upgrade_start /usr/lib/vxvm/bin/egettxt /usr/lib/vxvm/bin/lookup_node_name /usr/lib/vxvm/bin/strtovoff /usr/lib/vxvm/bin/vxcheckda /usr/lib/vxvm/bin/vxckdiskrm /usr/lib/vxvm/bin/vxedvtoc /usr/lib/vxvm/bin/vxparms /usr/lib/vxvm/bin/vxslicer /usr/lib/vxvm/bin/vxspare /usr/lib/vxvm/diag.d/vxconfigdump /usr/lib/vxvm/diag.d/vxkprint /usr/lib/vxvm/diag.d/vxprivutil /usr/lib/vxvm/type/fsgen/fs.d/vxfs/vxsync /usr/lib/vxvm/type/fsgen/vxplex /usr/lib/vxvm/type/gen/vxinfo /usr/lib/vxvm/type/gen/vxmake /usr/lib/vxvm/type/gen/vxmend /usr/lib/vxvm/type/gen/vxplex /usr/lib/vxvm/type/gen/vxsd /usr/lib/vxvm/type/gen/vxvol /usr/lib/vxvm/type/raid5/vxmake /usr/lib/vxvm/type/raid5/vxmend /usr/lib/vxvm/type/raid5/vxplex /usr/lib/vxvm/type/raid5/vxsd /usr/lib/vxvm/type/raid5/vxvol /usr/sbin/vxassist /usr/sbin/vxclust /usr/sbin/vxdctl /usr/sbin/vxdg /usr/sbin/vxdisk /usr/sbin/vxedit /usr/sbin/vxinfo /usr/sbin/vxiod /usr/sbin/vxmake /usr/sbin/vxmend /usr/sbin/vxnotify /usr/sbin/vxplex /usr/sbin/vxprint /usr/sbin/vxrecover /usr/sbin/vxrecover.wrap /usr/sbin/vxsd /usr/sbin/vxserial /usr/sbin/vxstat /usr/sbin/vxtrace /usr/sbin/vxvol Problem Description: 4372113 cvm hang in volcvm_log_copy_enable Veritas Incident 51863 - IO hanging after reconfiguration. 4314433 remaped device Minor number effects to another node after volume create Veritas Incident 31005 - Panic in vxio driver caused by kmem_free() being invoked with a bogus size. Problem was root caused to a missing 'return' statement. Veritas Incident 39880 - vxassist tries to enforce disk alignment even for DRL. The check for alignment was inappropriate and has been removed. 4310145 Cluster 2.1 panics in volcvm_reset_join Veritas Incident 39356 - case not 222386 System panic with recursive mutex in volcvm_reset_join. 4306191 SC 2.1 master_takeover failed by "Disk group has no valid configuration copies" Veritas Incident 25581 Config and log copy state not distributed. 4190362 E10K hangs in VXFS 3.2.5 Veritas Incident 26860 case no 224121 System panic in voliomem_more_memory() when voliomem_free_page_alloc is NULL. 4325092 SC2.1, `vxdg -s -C import` got faild by reason "another host uses" Veritas Incident 41727 - case no 236825 CVM `vxdg -s -C import` fails because dg in use by another host. 4263564 adding drl failed vxvm: vxassist errors: cannot allocate for 17000 block log Veritas Incident i31005 Panic in vxio driver caused by kmem_free() being invoked with a bogus size. Problem was root caused to a missing 'return' statement. Veritas Incident i39880 vxassist tries to enforce disk alignment even for DRL. The check for alignment was inappropriate and has been removed. 4221522 vxrecover process takes so long time recovering volumes on SC 2.1 OPS Veritas Incident i21679 While testing and tuning various GLOCK protocol based parameters for Plex attach timing, a hang occurs with some settings of vol_kmsg_max_send and vol_kmsg_send_period. The Plex Attach/Resync freezes and they do not complete indefinitely. 4227590 removing disks from shared group causes problems on slave node Veritas Incident i30046 When removing disks from a Shared Disk group, the Slave Node still shows a removed disk as existing in the Group. From the Master, vxdisk list shows the disks as removed but from the Slave node, vxdisk list shows one of the disks still available. vxdctl enable on the Slave does not change anything and problem remains. Killing and restarting vxconfigd on Slave disables the Shared disk group on the Slave. 4235825 reconfiguration time is difference between master down and slave down This bug fix includes: i30161, i30338 ------ Veritas Incident i30161 In some cases, CVM does not set the Maximum number of Dirty bits correctly in a Volume's DRL. It ends up being 3 on the Slave Node while as it should be 128 by default. ------ Veritas Incident i30338 I/O for Oracle REDO Logs is sequential and using the default DRL bits for such sequential Volumes is not required. A much lower number of bits would suffice. In order to speed up the Recovery of such sequential Volumes, the number of Dirty bits at a time can be set from the command line. 4183493 PDB 1.2 / Oracle hang - volsoq_idle_start head/tail NULL This problem has been isolated, root-caused, and fixed. The problem was that vol_mem_allocsio_start did not return a value. The result was that the function itself would put the memallocsio on the idle_start queue and if the function returned the wrong garbage from the stack the caller of the function would also put the memallocsio on the idle_start queue. Putting one sio on multiple queues at the same time can cause things in the queue to instantly disappear. The symptom can be seen from the core by examing the volsioq_idle_start pointer which contains NULL head/tail pointers, but a non-null counter. i.e. there should be work on the queue, but the work isn't there. -------------------------------------------------------------------------- How to identify this problem: Go into adb and check CVM's global external structure volsioq_idle_start: --------------------- volsioq_idle_start/3X volsioq_idle_start: volsioq_idle_start: 0 0 2 --------------------- The first two NULL values are head-tail pointers. Since the head-tail pointers are NULL, the last value, a counter, should also be 0. The counter is not 0, it is 2. So we presume there are two I/O's which are only trackable via the volsioq_idle_start queue, which are now missing, hence the hang. In any case, the structure is inconsistent and this is a bug. -------------------------------------------------------------------------- 4126726 vxdisk list shows duplicate entries following A5000 disk replacement When a disk drive in A5000 is replaced, "vxdisk list" prints duplicate lines for same DEVICES after "vxdctl enable". The duplicated DEVICEs is different from the replaced disk. If the duplication remains, vxva aborts with segmentation fault when it is executed. Before this patch, the duplication can be removed after rebooting the machine. Changes have been incorporated in the vxconfigd to handle this situation properly. 4144143 reconfig fails if using drl on each plex of mirror If there are two DRL logs (one per plex) associated with data plexes, and the node on which they are imported crashes during I/O, then during reimport "vxvol startall" fails (coredumps) with the following message: ----------------------------------------------------------------------- vxvm:vxvol: ERROR: Volume junkdg.1: Invalid active log magic number vxvm:vxvol: ERROR: Volume junkdg.2: Invalid active log magic number vxvm:vxvol: ERROR: gen/vxvol killed by signal 11, core dumped ----------------------------------------------------------------------- To compound the problem "vxrecover -sb" fails to start the volumes in the diskgroup. vxrecover exits with status zero and does not report any error message. So you would not notice the problem until you try to access the volume, this results in cluster abort. 4195113 vxclust changes vxiod to 10 on every cluster reconf or startcluster When SC2.1 is started or goes through any reconfiguration, vxiod gets to be reseted by vxclust to 10 (default). It will overwrite any customized setting for vxiod. 4155764 Mirror plex data mismatch under certain circumstances following hardware fault When plexes are detached as a result of I/O failure on one or more disks, volume manager stores this information in Config copies and/or Logs. If the diskgroup is deported before the failure is corrected, when detached disk come back online, it might be auto imported, in the process old/stale config/log overwrite new/clean config/log. This could lead to loss of configuration changes and data corruption (as the fact that disk was detached is lost and no recovery takes place). 4151066 vxconfigd: Cannot get record from Volumes with DRL logs created in a specific manner fails to start with error "Cannot get record from ". Under such circumstance it is not possible to any volume reconfiguration. 4144806 view shared dg window does not get updated when creating/adding vol on E Display on VxVA GUI is not updated when disks/volume/plex/subdisk are added to/removed from the diskgroup. 4129451 intermittent error mesg. from vxconfigd during step 9 When nodes are joining cluster, during reconfiguration Step 9, following message appear on the MASTER node : vxvm:vxconfigd: Cannot issue internal transaction: Configuration daemon error -1 This message was informational (with no consequence) and has been dropped. 4126657 vxdg import of disk groups failed during takeover Sometimes private diskgroup import fails with the message "Internal configuration daemon error" during reconfiguration step 11, resulting in logical host not coming up. This can happen if the private diskgroup import takes place in the middle of a shared diskgroup transaction (transaction initiated by diskgroup import is rejected as one transaction is in progress, which is reported as "Internal configuration daemon error"). 4124916 Erroneous warning in volume manager upgrade_start script Running upgrade_start script on a Solaris 2.6 machine reports *** WARNING *** You are running Solaris version less than 2.3. 4124356 Cannot use fcs cvm version (vxvm2.2src/cvm2.2.1-sc2.1ga) with encapsulated root Can not encapsulate rootdisk using CVM. Second reboot after vxinstall fails with message "Cannot mount root on /pseudo/vxio@0:0 fstype ufs". 4114085 vxconfigd dying during disk group import Sometimes vxconfigd coredumps on SLAVE node during reconfiguration step 11. This can happen if a private diskgroup import takes place in the middle of a shared diskgroup transaction. 4104327 vxva dumps core on Solaris 2.6 and sun4u The visual administrator GUI (vxva) dumps core if you start it under certain fault conditions. It continues to run if it was started before the fault (such as disk or controller failure) occurred. It can also be (re)started after the fault condition has been corrected. Patch Installation Instructions: -------------------------------- Refer to the Install.info file for instructions on using the generic 'installpatch' and 'backoutpatch' scripts provided with each patch. Any other special or non-generic installation instructions should be described below as special instructions. Special Install Instructions: ----------------------------- You must use the installpatch that is shipped with this patch to install this patch. Do not use patchadd to install this patch. This patch MUST be installed before attempting to encapsulate root disk. The system should be rebooted after the patch has been installed. If you have encapsulated your root disk, you must run upgrade_start, install this patch, reboot, run upgrade_finish and reboot to complete the upgrade. If you NEED to backout this patch, you MUST unencapsulate the root disk (using upgrade_start script) first, failure to do so will prevent you from booting off this disk (see bugID 4124356 above). If you backout this patch, do not run upgrade_finish (as that would attempt to encapsulate root disk) until you have (re)installed this or an earlier revision of this patch. README -- Last modified date: Tuesday, February 27, 2001