OBSOLETE Patch-ID# 101946-07 Keywords: uppc procfs sockmod nfs RDBMS savecore hang crash fcntl strrput Synopsis: SunOS 5.4_x86: jumbo patch for kernel (includes libsocket, procfs, nfs) Date: Sep/30/1994 Install Requirements: None Solaris Release: 2.4_x86 SunOS Release: 5.4_x86 Unbundled Product: Unbundled Release: Xref: This patch available for SPARC as patch 101945 Topic: SunOS 5.4_x86: jumbo patch for kernel (includes libsocket, procfs, nfs) Relevant Architectures: i386 BugId's fixed with this patch: 1120225 1151364 1152710 1157053 1160112 1165687 1167235 1169686 1169791 1169909 1171478 1171939 1172009 1172243 1172243 1172542 1172979 1173969 1174830 1176467 1177091 1177572 1177578 Changes incorporated in this version: 1177091 1177578 1176467 1172243 Patches accumulated and obsoleted by this patch: 101903-01 101919-01 102113-02 Patches which conflict with this patch: Patches required with this patch: Obsoleted by: Files included with this patch: /kernel/drv/cn /kernel/drv/tl /kernel/fs/nfs /kernel/fs/procfs /kernel/mach/uppc /kernel/strmod/sockmod /kernel/sys/nfs /kernel/unix /usr/kvm/crash /usr/lib/libsocket.a /usr/lib/libsocket.so.1 Problem Description: 1177091 prgetstatus can generate pagefault holding p_lock, can deadlock if freemem is 0 1177578 strmakemsg/strgeterr causes panic in strrput due to NULL mblk ptr 1176467 fcntl system call fails in process run by rcmd 1172243 Customer runs application from dumb terminal and system crashes. The system can freeze under heavy swapping pressure due to procfs holding a critical lock when it takes a page fault. Doing I_SETSIG on a console window through serial line and exiting the process could cause a system panic. Kernel panic in putnext/ptcwrite. A socket endpoint not created through the socket library (by dup() of a socket endpoint for example) may experience some failures on fcntl()/ioctl() calls. (This bug is only limited to 2.4 release) (from 101946-06) 1177572 installing Solaris 2.4 ON patch 101945-05 and running OW causes machine to panic The patch to bug ID 1151364 broke OW''s consolidation. This happened bacause releasef() changed to have an extra argument. OW shouldn''t have been dependent on releasef() which is private to the ON consolidation. Since this problem was not discovered until after the patch was made, it made more sense for ON to produce a new patch which restores releasef() to have its old interface. The interface changed for kaio. A new interface is added called areleasef() which is only used by kaio. (from 101946-05) 1174830 savecore on diskless machine didn''t generate unix, vmcore is trash 1151364 asynchronous I/O in the user level hurts RDBMS performance This is a performance improvement for applications that are using libaio for doing async IO to raw files or devices. There are no API changes, only a new version of libaio.so.1 is installed. One side benefit of this fix is that async IO to tape should now work. This patch to bug 1151364 requires installation of libaio/kaio patch 102021-01 or later) Kernel crash dumps generated on diskless sun4m, sun4d or i86pc systems are not complete. (from 101946-04) 1172243 Customer runs application from dumb terminal and system crashes 1169686 4.1.3 system on network goes down, hangs 2.3 system The problem shows up when a "ps" thread is running through the virtual memory area to get the address space size for a mapped file. The address space lock is held and a get attributes function is called. This initiates an nfs get attribute request. If the machine that the request is made to is not responding the nfs request will block. The address space lock which is held by the blocked ps thread might block other processes on the local machine. Typically when a server goes down all nfs file system activity is blocked on any clients. The nfs operation resumes once the server comes up. In this situation a server is powered down and causes a client to hang. The hang is due to a process pile-up. The client is doing a ps and its thread is holding the address space lock (as_lock) for a running process lets call A. The A process is a mapped file from the server. The client ps thread path has reached rm_assize() which needs to get the file size so it calls VOP_GETATTR() which goes across the wire to the server. This operation goes nowhere because the server is not running. The as_lock held by the ps process is blocking other processes such as init. The solution is not to go over the wire but to return a cached entry for the file size. The change is to define a new attribute flag in vnode.h called ATTR_HINT. The rm_assize() function recognizes will use this flag when it calls VOP_GETATTR(). The nfs getattr function will see that the size of the file is requested and that the passed in flag is ATTR_HINT. It will return the file size from the rnode rather than make a request to the server. Typically when a server goes down all nfs file system activity is blocked on any clients. The nfs operation resumes once the server comes up. In this situation a server is powered down and causes a client to hang. The hang is due to a process pile-up. The client is doing a ps and its thread is holding the address space lock (as_lock) for a running process lets call A. The A process is a mapped file from the server. The client ps thread path has reached rm_assize() which needs to get the file size so it calls VOP_GETATTR() which goes across the wire to the server. This operation goes nowhere because the server is not running. The as_lock held by the ps process is blocking other processes such as init. The solution is not to go over the wire but to return a cached entry for the file size. The change is to define a new attribute flag in vnode.h called ATTR_HINT. The rm_assize() function recognizes will use this flag when it calls VOP_GETATTR(). The nfs getattr function will see that the size of the file is requested and that the passed in flag is ATTR_HINT. It will return the file size from the rnode rather than make a request to the server. Running applications that do I_SETSIG on console, when console is the serial port (i.e not the frame buffer), causes system to crash, when attempting to send signal to a process. (from 101946-03) 1169909 Running xlib code in Realtime class causes code to block. in poll() 1167235 panic data fault in strioctl - apparently doing TIOCSPGRP Protect with mutex the testing and setting of the session and controlling terminal related flags in the streamhead. Real time stream threads will block in a poll. (from 101946-02) 1172979 spurious SIGALRM received in test program that forks child processes 1172009 recv() on sockets should return the error only once for SunOS 4.X compatibility 1165687 non-blocking reads on sockets block under Solaris 2.3 1160112 socket library accidentally closes file descriptor on error 1120225 recv() returns EPIPE when called with MSG_PEEK 1152710 socket lib in 2.3/2.2 have problems with not clearing bad connections and errno 1171478 socket recv() calls fail with EINVAL due to bad fix in 494 AF_UNIX and AF_INET sockets can sometimes get EPIPE errors for recv(MSG_PEEK). When the socket library sees the EPIPE error it will in some cases close the file descriptor causing the application to get EBADF errors for subsequent operations. A AF_UNIX listening socket can get into a permanant error state (returning EPIPE or ECONNRESET) for any operation until the socket is closed. The non-blocking attribute of a socket endpoint is not transferred from a non-blocking listener endpoint to a accepting endpoint. This causes some socket non-blocking programs to block. This patch fixes the problem by setting the accepting endpoint non-blocking attribute if the listener was non-blocking. In SunOS 4.X sockets when a read() or recv*() call returns an error the application can do another read()/recv*() and get an EOF. This patch applies this subtle aspect of socket semantics to SunOS 5.X. This specification of signal actions from the signal(5) manual page was being violated: Setting a signal action to SIG_IGN for a signal that is pending causes the pending signal to be discarded, whether or not it is blocked. Any queued values pending are also discarded, and the resources used to queue them are released and made available to queue other signals. The condition under which the pending signal was not being discarded was the specific case of SIGALRM signals generated by the setitimer(ITIMER_REAL) interface. The malfunction happens in a narrow race condition which will be triggered under intensive setting of a signal handler and setting it to SIG_IGN while the itimer is active. (from 101946-01) 1173969 MT process doesn''t stop on multi processor systems dbx appears to malfunction when controlling a multithreaded process that does many fork1()s. The bug is in the system, not dbx. Also, stopping dbx with a jobcontrol signal from the terminal, ^Z, while it is controlling a multithreaded process will cause the multithreaded process to becomed permanently stopped. (from 101903-01) 1172542 gettimeofday() returns negative nanosecond value on x86 1171939 Process dump core at random on loaded systems 1169791 processes often getting killed with SIGABRT and core dumped on MP IntelExpress gettimeofday() call can return negative nsec value at times. Processes can dump core on heavily loaded systems. (from 101919-01) 1157053 System panics when doing a copy to NFS file system mounted across FDDI-S Cause of problem is due to non-aligned transfers. The memory address alignment trap happened in xdr_writeargs() when copying data in a loop. The address was not on a long word boundary, it was on a word boundary. nfs_feedback() can adjust the transfer address and size for a request such as for a retransmission. The xdr_writeargs() can make use of bcopy(). The xdr_writeargs() is in file nfs_xdr.c. There are a few other functions in this file that do a similar copy operation that should be changed to use bcopy. Patch Installation Instructions: -------------------------------- Generic ''installpatch'' and ''backoutpatch'' scripts are provided within each patch package with instructions appended to this section. Other specific or unique installation instructions may also be necessary and should be described below. Special Install Instructions: ----------------------------- none ----------------------------- none README -- Last modified date: Tuesday, January 7, 2003