----------------------------------------------------------------------- NOV-PER2.DOC -- 19971114 -- Email thread on NetWare Performance Aspects ----------------------------------------------------------------------- Feel free to add or edit this document and then email it back to faq@jelyon.com Date: Wed, 28 May 1997 18:44:06 -0600 From: Joe Doupnik Subject: Redesigning a server for faster/busier use, random thoughts I'd like to share my "random thoughts" on redesigning one particular student open lab which uses INW 4.11. There maybe items of interest when you face similar problems. The msg is rather long so feel free to skip it. The setup. INW 4.11 server, 486-33 EISA bus, three NE-3200 EISA bus Ethernet boards of interest (plus two NE-2000's not of interest here), 64MB, 4GB mirrored. Clients are Pentium-90's using NE-2000 clones on coax, put onto those three nets as 12, 16, and 20 machines (that's a physical constraint when the wiring was done), no hard disks on clients. The problem is we are beating the living daylights out of the wires. Data transfer rates average about 30-40% of capacity when viewed over 5 min. That means I ask a server lan adapter for its byte count now, and again five minutes later, divide the difference by 300 seconds to get bytes/sec, and factor that into 10Mbps for regular Ethernet. This says, on paper and phospher, the traffic peaks are strongly clipped and packets wait in line a lot. It says in practice things get slow when folks get busy. Just by way of context, almost all the traffic is server to client, with less than 10% data bytes flowing to scratch areas on the server or being packet ACKs. I've been planning and saving funds over the past year to go to 100Mbps Ethernet all round. I now have the money and it looks as if the costs will just fit the budget (or close enough to sneak by). Fine, fine, but will it work or even be constructive? Let's look at some technical numbers. To move data at say 350KB/sec from server to a client takes about 350 millisec. That's figuring at 1 byte every microsecond, which is a good round number in practice and close enough to a full 10Mbps Ethernet. That satisfactory but not outstanding data transfer rate is 1/3 of a wire, just for one client. Clearly lots of clients make the traffic saturate the wire and things queue and queue, hence slowness when folks are busy. Step one is to say "100Mbps Ethernet is the solution." Sure, right. Why? Well, the same packets now consume only 3% of the wire because the bit rate is ten times greater. Yup, that opens capacity on the wire very nicely indeed. Step two is to try it. If one uses a so-so board in the client and a similar one in the server then throughput goes up some, about twice or so. We are being throttled by movement of bytes from buffers, across system buses, to lan adapters on both ends, plus the delays in servicing boards by drivers. But this environment keeps server CPU utilization moderate (packets are spaced out in time, even though each is shorter in duration). Not the best solution but a step in the right direction. Step two bis. Put in better 100Mbps Ethernet boards. Wow! Throughput goes up by about a factor of three to four compared to the original situation. This is relieving the machine bottlenecks. The way to go. But, there is always a "but" in engineering work, the server utilization goes through the roof from the increased packet rate and because the server is using the same "client boards" as the clients. One station can cause the server to sustain 100% utilization. Oops. I know this because the experiments were run. Step three. Well, that suggests we must redesign the server side to stand the increased traffic rate. The way this is done is use smarter server boards, not client-style boards, which have a processor present to do what most drivers do via the server's CPU. Yup, that is just right. For example. Those original NE-2000 clones could drive the NE-3200 bus master boards hard and all one client could do is cause max of 20% server CPU utilization. Put an NE-2000 in the server instead and the server would run flat out servicing one client; bus master boards can help a great deal. NE-2000 clients could not pump packets fast enough, the NE-3200 could easily keep up and still catch up on sleep. Thus lots of clients could bang away and at least the server felt fine. Replace both boards with faster units but without the co-CPU part and the server works harder handling the traffic. Mind you, things work faster but the server is huffing and puffing dangerously. Step four is then use smart server boards. Clearly today those boards will be PCI bus units. Oh my, but my server is an EISA bus machine. Step five says replace the server motherboard with a decent PCI bus unit. True. Motherboards are dirt cheap, Intel CPU's are still overly expensive, memory is moderately expensive. After shopping and thinking I selected a Pentium Pro 200 system, which is near enough to the same price as any decent Pentium system at this time. Plan for the future when the server is busier yet with web server and friends, and so on. Plan to save two of four SIMM sockets for more memory when NW 6 or 7 is released. Ensure the PCI chipset can cache memory above 64MB, and ensure the memory system can run in ECC mode (true parity, correcting). Those PCI bus contraints say use a 430HX chipset for Pentiums or the 440FX chipset (or similar) for Pentium Pros. The popular Pentium PCI 430TX chipset cannot cache memory above 64MB; leave that for desktop boxes. Now we have extra horses in the stable for intensive requests when needed but not expected often, fast low-CPU-loading server Ethernet boards to deal with dense 100Mbps traffic, decent client boards to move packets without slowing down mice or dropping packets, and capacity on the wire to sustain many clients at one time. And the server is ready to run another three or four years without an engine change. Finally, 100Mbps Ethernet runs only over twisted pair wiring. I'm selecting hubs which are 24, 24, and 12 connections, spreading clients over them, and connecting each hub to a server Ethernet board. This avoids reintroducing massive congestion which would occur if the three hubs were cascaded onto one wire to the server. Etherswitching is of zero benefit here, because the traffic heads to one destination anyway (same as cascading hubs, but far more expensively and slowly). Smaller hubs would require more server Ethernet boards and we don't have more motherboard slots for them. (Quick slot count: three PCI Ethernet boards, PCI SCSI board, no more PCI slots, two NE-2000's for other things plus a video adapter, sums to one full motherboard). Note something here. I could have stayed with 10Mbps Ethernet on each client. But that solves little because the wire would be just as congested as before, and this is the fundamental problem to be solved. 100Mbps Ethernet opens up wire capacity, better boards use more of that capacity than before (users get better**2 performance), and we must amplify the server to sustain the now greater load. We see how the simple congestion problem grows into a fairly complete and balanced system redesign. Throughput is expected to increase by at least a factor of four or more under heavy load, and two to three under light load. Joe D. --------- Date: Thu, 29 May 1997 10:50:34 +0100 From: Phil Randal Subject: Re: Redesigning a server for faster/busier use, random thou > Step three. Well, that suggests we must redesign the server >side to stand the increased traffic rate. The way this is done is use >smarter server boards, not client-style boards, which have a processor >present to do what most drivers do via the server's cpu. Yup, that is >just right. For example. Those original NE-2000 clones could drive >the NE-3200 bus master boards hard and all one client could do is >cause max of 20% server cpu utilization. Put an NE-2000 in the server >instead and the server would run flat out servicing one client; bus >master boards can help a great deal. NE-2000 clients could not pump >packets fast enough, the NE-3200 could easily keep up and still catch >up on sleep. Thus lots of clients could bang away and at least the server >felt fine. Replace both boards with faster units but without the co-cpu >part and the server works harder handling the traffic. Mind you, things >work faster but the server is huffing and puffing dangerously. Our experience here backs this up - we had a 486DX4/100 VL Bus server (not ideal, I know, but everything here is under tight budgetary constraints), 9GB HDD (16K block size) and 96 MB RAM with 4 Intel EtherExpress Pro 10 cards, (with 120-odd clients) and server utilisation rarely exceeded 60%. We upgraded to a Pentium 133 with 2 dual-ported SMC bus-mastering PCI ethernet cards, upped the RAM to 128MB, PCI SCSI card, combined with a growing number of Pentium PCs with PCI Ethernet cards and peak utilisation shot up, frequently to 80+%. Perceived performance got worse, according to many users. Packet burst made things even worse, not better, so we had to turn that off in the clients. My guess is that as we removed bottlenecks (faster SCSI card and server LAN adapters), the server processor load dramatically increased. > Step five says replace the server motherboard with a decent PCI >bus unit. True. Motherboards are dirt cheap, Intel cpu's are still overly >expensive, memory is moderately expensive. After shopping and thinking >I selected a Pentium Pro 200 system, which is near enough to the same >price as any decent Pentium system at this time. Plan for the future >when the server is busier yet with web server and friends, and so on. >Plan to save two of four SIMM sockets for more memory when NW 6 or 7 is >released. Ensure the PCI chipset can cache memory above 64MB, and ensure >the memory system can run in ECC mode (true parity, correcting). Those PCI >bus contraints say use a 430HX chipset for Pentiums or the 440FX chipset >(or similar) for Pentium Pros. The popular Pentium PCI 430TX chipset cannot >cache memory above 64MB; leave that for desktop boxes. It is well worth considering a 430HX chipset motherboard with an AMD K6/200. It may well outperform the equivalent Pentium Pro, and is certainly more cost-effective. Note too that some HX motherboards (e.g. GigaByte 586HX) need extra tag RAM in order to cache above 64MB. > We see how the simple congestion problem grows into a fairly >complete and balanced system redesign. Throughput is expected to increase >by at least a factor of four or more under heavy load, and two to three >under light load. I couldn't agree more. --------- Date: Thu, 29 May 1997 11:02:41 -0600 From: Joe Doupnik Subject: Re: Redesigning a server, cont'd To reinforce my comments to the list yesterday evening, here is a message stolen from NEWS along the same lines. Note, an Intel EtherExpress 32 is a licensed clone of the Novell NE-3200 EISA board. Putting matters into an order for simple understaning, on servers buses count much more heavily than cpu speed. ISA bus is slow, EISA bus is fast, PCI bus can be four times faster than EISA bus. Bus master boards generally work faster than non-bus master boards. Smart boards are not only bus master but also offload many driver housekeeping duties to a processor on the adapter, thus freeing the server's cpu for other chores. Smart boards are as fast as and less taxing than bus master boards. Thus we go from client boards to bus master boards to smart boards, and ISA to EISA to PCI buses (MCA is about the same or slightly better than EISA). The lab being refurbished here will use Intel EtherExpress Pro 100B PCI client boards and three Intel EE Pro/Server PCI boards (smart) in the server. That's what the fellow below is referencing (i960 stuff). By the way, this is not a recommendation from me. It is what I am doing, not necessarily what you should be doing under your circumstances. Joe D. >Newsgroups: comp.os.netware.connectivity >Subject: Re: Server NICs. Is Intel smartadapter worth it? >From: philip@aleytys.pc.my (Philip Chee) >Date: Wed, 28 May 97 14:13:21 GMT > >>Having recently replaced our aging novell servers I have been looking into >>NICs and whether different manufacturers NICs provide any significant >>advantage from a speed/utilisation perspective. >> >>Primarily I have been looking at the Intel EtherExpress smartadapter 100tx >>which claims to significantly reduce cpu utilisation on the server via the use >>of an onboard Intel i960 processor for packet processing. > >>What I was wondering is if anybody has actually compared this NIC to other >>devices from 3com etc to see if it does actually live up to it's claims and >>whether it would be worthwhile investing in this product for our main server. > >We were using a HP Vectra server with a 486DX33 cpu and a Intel >EtherExpress32 (the predecessor to the smartadapter). With about 48 >client PCs hitting, the server the max cpu utilisation I've seen is 30%. > >We have just upgraded to a HP LH Pro [1] with a Pentium 200 and a Intel Nitro >smart server adapter. With the same clients logging in, this server is >barely peaking at 5%. > >By the way there are sufficient spare CPU cycles on the smart adapter's >i960 to run certain network daemons such as SNMP further offloading >processing from the server CPU. The smart adapter 100TX costs >significantly more than the average 100baseT NIC but I feel it's worth >every penny (YMMV). --------- Date: Thu, 29 May 1997 12:40:25 -0600 From: Joe Doupnik Subject: Re: Re[2]: Redesigning a server, cont'd >>>three Intel EE Pro/Server PCI boards (smart) in the server. That's >>>what the fellow below is referencing (i960 stuff). > >Gee, Joe's got some serious money to spend--three Intel Smart PCI (960) >boards? Those aren't exactly dirt cheap. It's nice to see people sometimes >do get the money they need to set things up appropriately. That was my impression too, a year ago. So I skimped and squeezed and waited a year for 100Mbps prices to drop, and saved up about US$15K for the upgrade. No trips to Comdex or Interop, etc. If one listens to sales persons and instead opts for Etherswitches, ATM or FDDI, or even a Netframe server, then the price would have been far higher and out of reach. The smart boards are under $600 each (ouch!) while the alternatives are much more expensive, and non-smart boards are spending money while not solving problems. The real expense is in client boards because there are so many of them, and in the hubs, so I pursue an aggressive purchasing policy to push down those costs. Notice I'm buying a motherboard, not a HP/Compaq dream machine at ten times the cost. That's risky but has paid off here because I am picky enough (do homework long term) to not make many mistakes here, emphasis on "many." >Oh, and thanks for the nice "overview", it's always a good idea to have >certain concepts restated now and again. Thanks. I thought it would be worthwhile to see engineering design be done openly as an example of how to think/measure through the process, as contrasted to throwing money at problems. I've withheld most of my numbers as boring and requiring too much explanation and possibly resulting in too much legal personal exposure. >Harry Campbell >Applied Microsystems Corp > (I wonder if Intel used our i960 emulator to design the Smart NICs?) Interesting question. We suspect Intel puts that overkill processor on the adapters to soak up fab and design group time, as they do in other areas. An NE-3200 gets away nicely with an x186 (or is it x188). I might add another observation that there is a movement across the industry to make smarter peripherals, from the awful Universal Serial Bus, to SCSI and video adapters, to say the I2O api on lan adapters and more. Stretching this some, the Novell Wolf Mountain project is demonstrating clustered NetWare servers (many servers sharing a disk farm, looks like one server to the user). And jokingly, this includes us people as necessary evil peripherals to keep the system fed and happy. Joe D. --------- Date: Sat, 31 May 1997 13:09:05 -0600 From: Joe Doupnik Subject: Lan design, cont'd For those wishing to dig deeper into the topic of Ethernet switching, especially as contrasted to my recent design of using 100Mbps boards and hubs only, there is a decent readable discussion of the issues on Intel's web pages. Contact http://www.intel.com/ Choose "Product Info" "EtherExpress PRO/100 Fast Ethernet Adapters" "Express Ethernet and Fast Ethernet Switches" "White Papers" "The Advantages of Fast Ethernet for New Networks." While that paper looks fine on my screen and even print previews well, it fails to print on my HP 4M LaserJet printer. Good luck with printing. I suspect that other vendors have similar white papers worth reading. I stumbled across this paper today while looking for better written descriptions than mine. For those who want only the executive summary punch line, it is using all 100Mbps boards and hubs is faster and cheaper than adding switches. Joe D. --------- Date: Tue, 3 Jun 1997 22:31:45 -0600 From: Joe Doupnik Subject: Re: lan design, summary We spent many messages examining the lan design example presented last week. I've listed some conceptual items below which might be useful as they stand, or as strawmen for more incisive descriptions. They are clearly over simplifications merely to make them easy to use and remember. The problem is basically allocating bandwidth along the pathways used by nodes, up to economic limits. Making a connection (link) between two end nodes "creates" bandwidth (carrying capacity). Bandwidth is "exploited" (consumed) by passing packets between computers/nodes/lan adapters. We call this traffic. Adding boards to a server creates bandwidth, by creating new end nodes. But they must be able to sustain the traffic or the link must be derated to match the board's capabilities. A frequent mistake here is to use a weak Ethernet in a server. The end to end bandwidth of a link is the smallest bandwidth along the path, diluted by competition from sharing links with other sources and destinations. It's congestion again. Where a computer cannot fully utilize the bandwidth of its link then other computers can share the link, with small to large delays to interleave packets (congestion, competition). The number of computers in a shared domain often exceeds the capability of the link to sustain peak transfer rates by all such computers at one time. The degree of overcommittment is a judgment by the designer based on average usage and acceptable delays when peak demands occur. This is the most difficult problem to manage. An Etherswitch does not create bandwidth; it distributes traffic. A fast uplink port operates at either the packet rate of the slower side when traffic is only between those two ports, or at a higher rate if the switch is able to buffer fast rate packets for simultaneous delivery to several low rate ports or vice versa. A switch shares its backplane amongst ports and the backplane can be considered the fastest communal but hidden link. Thus a switch can aggregate traffic (many sources to a few destinations) or sustain bandwidth if crossing pathways are all different. A two port bridge conserves small amounts of bandwidth on each wire by preventing unnecessary traffic from reaching the other side. Bridges add one packet delay. A hub does not create bandwidth; it aggregates traffic. It's capacity is that of the slowest port. It creates one logical wire from many physical wires. A speed changing hub (with a fast uplink port) operates the fastest wire at the packet rate of the slower links even though the bit rate is higher (only one transmitter can be active at a time on a hub). Speed changing devices (10/100MHz bridges, many Etherswitches and some hubs with a fast Ethernet uplink port) introduce packet delay. The amount of delay depends on the direction of travel and is one packet's time on the exit side of the device. Thus a 10MHz client transmission to a 100MHz server adds 1/10 of the 10MHz transmission time, or the time to send the 100MHz version of the packet. Delays reduce throughput. Delay is necessary to prevent a fast transmitter from prematurely exhausting a slow source (a "DMA underrun" situation). Even hubs introduce a small amount of delay, typically 8 byte times or less if Ethernet preamble is regenerated. Packet delays reduce throughput by extending the time of arrival of permission to send the next packet. When streaming protocols such as TCP or IPX Packet Burst are used delay is diluted over the duration of the streaming transmission, provided transmission windows are not exhausted (they aren't on LANs but can be on long WANs). Packet Burst is a short stream to avoid overrunning client boards which typically have small buffers and slow response times. The design goals are to balance available bandwidth (most often over committed) against its exploitation across the network and to minimize costs. Balancing is creating new bandwidth (parallel links, faster links) versus aggregating traffic (hubs and switches), weighted by costs. Joe D. ------------------------------ Date: Fri, 13 Jun 1997 21:03:18 -0600 From: Joe Doupnik Subject: Lan performance, cont'd, fragmentary story Since this seems to be the season to tell tales of hardware success (and failure), here is a short unorganized report on my 100Mbps installation thus far. In a quiet lab (term ended, next is yet to come) a single Pentium 90 client with an NE-2000 board reached a maximum throughput of 800KB/sec to a 486-33 EISA bus file server. The server uses NE-3200 smart boards. It reached that 800KBps figure with Perform3 on long file lengths, and server utilization was about 20%. (Normally we can get only about 350KBps under average usage conditions, meaning with other folks on the wire and not all transfers were of long length.) As more stations joined the same wire aggregate throughput went up to 1000KBps or one very full 10Mbps Ethernet. Each station shared equally in that throughput, thus dividing 1000KBps by the number of stations. Twelve, the max tested yesterday, together yielded a wonderful 80KBps each, or about floppy speed. The point is we could fill the wire with traffic if two or more clients worked at the same time, assuming the file server could keep up as it did here, and each station got only a proportional share. Then we lashed together the new Pentium Pro 200 server. It had an Intel Etherexpress PRO 100 PCI client board (more on why below). The clients had the same Intel board and were again the Pentium 90's. One station alone got over 6000KBps throughput. Server utilization was about 15%. Two or more stations saturated the 100Mbps wire (say 10000KBps) and again divided traffic evenly. Server utilization was about 20% with four stations (max wired tonight). We see the same point being repeated, but scaled up by the 1:10 wire capacity. One limit is wire capacity, and that applies when many stations share it. Another limit is how fast a client can move bytes. The third is how much punishment the server can take. Both client and server can move more bytes with faster cpus, up to the limits imposed by the Ethernet board and system bus. PCI bus isn't saturated by 100Mbps Ethernet, EISA is working up a sweat, and ISA is overwhelmed. To see the EISA sweat item, the EISA version of the Intel client board in a server pushed the 486-33 server to 100% utilization from a single PCI client, and throughput was under 4000KBps. A PCI bus server with more horses raised the single station throughput to 6000KBps at 15% server utilization. And 6000KBps seems to be the limit of a Pentium 90 PCI bus client (vs 800KBps with an ISA bus NE-2000 in the same machine). I was surprized to learn the Intel client board could push a full 100Mbps Ethernet, but with a PPro 200 providing excessive free cycles it managed to do it. A Pentium 90 client managed only 60% of that rate. One significant difference between servers is the bus, EISA vs PCI, at least a 1:4 difference in capacity (and hence lower cpu utilization on the faster bus because the job finishes more quickly). The new server will receive Intel Etherexpress PRO 100/Server smart boards. But when I plugged together items yesterday the Ethernet board was not recognized by the machine, nor could NetWare use it. Oh boy, the bleeding edge is present. Call to Intel: incident filed, engineering will call back within a week or so. Msg to ASUS: please fix your Bios, no response yet of course. Repeat msg to ASUS NEWS group for emphasis. Dig around ASUS ftp site (not the www site), discover a beta bios, flash same. Lo, the Ethernet board is recognized at cold boot time, but now IRQ's are forced into conflict with the SCSI adapter and the board won't run NW. Ok, it's an unmarked beta. So, between ASUS and Intel the problem remains to be solved. The Ethernet board is a PCI to PCI bridge affair with an i960 coprocessor, and hence complicated. But then so is the Adaptec 3490 SCSI board (no big square hot cpu though) and it works ok. Could we run in production mode with that client Intel board in the server? No, the server has to pay attention to disk and other boards; perform3 lets the "file" remain tiny enough to always be in cache. Those smart boards are needed to free time to do real disk work and printing and all those other jobs we load onto servers. And someday the clients will be faster than Pentium 90's. Joe D. --------- Date: Mon, 16 Jun 1997 11:57:37 -0600 From: Joe Doupnik Subject: Re: Lan performance, cont'd, fragmentary story Following up my message last week on bringing 100Mbps to the desktop, cheaply and effectively. Here are some numbers to think about. You can repeat the experiments locally, by obtaining perform3 from netlab2.usu.edu directory apps, or equivalently from netlab1.usu.edu directory pub/mirror/apps. ... below are some Perform3 test results. These were gathered using VLMs on Pentium 90 clients. No client side caching. Packet Burst was active. The first set is 10Mbps Ethernet. The server was a 486-33 EISA bus unit with an NE-3200. Server cpu utilization reached 20% at full load. Clients used NE-2000 clones. The second set is 100Mbps Ethernet. The server was a Pentium Pro 200 PCI bus unit with an Intel EtherExpress PRO/100 client-style board. Server cpu utilization reached 20% at full load. Clients used the same Intel board. A hub joined server and clients. Perform3 writes and reads small files, 8KB and smaller in steps in this case, so they remain in server cache rather than being slowed down by accessing the server's hard drive. Small file sizes reveal system overhead in opening/closing files etc. 10Mbps Ethernet has a user-data capacity of say 1MBps (leaving room for packet headers and so on). 100Mbps Ethernet has a user-data capacity of say 10MBps. "file length" below is user-data capacity. KBps is kilobytes per second. Mbps is megabits per second. 10Mbps Ethernet. One station alone, limited by client capability. file length r/w speed total on wire 8192 bytes. 745.86 KBps. 745.86 Aggregate KBps. 7680 bytes. 727.46 KBps. 727.46 Aggregate KBps. 7168 bytes. 712.85 KBps. 712.85 Aggregate KBps. 6656 bytes. 706.47 KBps. 706.47 Aggregate KBps. 6144 bytes. 683.21 KBps. 683.21 Aggregate KBps. 5632 bytes. 669.92 KBps. 669.92 Aggregate KBps. 5120 bytes. 634.39 KBps. 634.39 Aggregate KBps. 4608 bytes. 585.60 KBps. 585.60 Aggregate KBps. 4096 bytes. 603.51 KBps. 603.51 Aggregate KBps. 3584 bytes. 570.47 KBps. 570.47 Aggregate KBps. 3072 bytes. 527.32 KBps. 527.32 Aggregate KBps. 2560 bytes. 489.77 KBps. 489.77 Aggregate KBps. 2048 bytes. 424.54 KBps. 424.54 Aggregate KBps. 1536 bytes. 337.22 KBps. 337.22 Aggregate KBps. 1024 bytes. 301.34 KBps. 301.34 Aggregate KBps. 512 bytes. 196.07 KBps. 196.07 Aggregate KBps. 745.86 Maximum KBps. 557.25 Average KBps. 10Mbps Ethernet. Two stations, sharing nearly full wire. 8192 bytes. 502.68 KBps. 1005.37 Aggregate KBps. 7680 bytes. 490.77 KBps. 981.54 Aggregate KBps. 7168 bytes. 469.80 KBps. 939.01 Aggregate KBps. 6656 bytes. 498.82 KBps. 997.23 Aggregate KBps. 6144 bytes. 479.70 KBps. 957.88 Aggregate KBps. 5632 bytes. 475.25 KBps. 950.50 Aggregate KBps. 5120 bytes. 475.67 KBps. 951.34 Aggregate KBps. 4608 bytes. 463.97 KBps. 927.94 Aggregate KBps. 4096 bytes. 461.07 KBps. 922.48 Aggregate KBps. 3584 bytes. 442.79 KBps. 885.57 Aggregate KBps. 3072 bytes. 418.54 KBps. 837.68 Aggregate KBps. 2560 bytes. 410.86 KBps. 821.73 Aggregate KBps. 2048 bytes. 403.52 KBps. 807.05 Aggregate KBps. 1536 bytes. 312.84 KBps. 625.54 Aggregate KBps. 1024 bytes. 261.07 KBps. 522.23 Aggregate KBps. 512 bytes. 176.89 KBps. 353.69 Aggregate KBps. 1005.37 Maximum KBps. 842.92 Average KBps. 10Mbps Ethernet. Three stations (the wire fills completely) 8192 bytes. 339.60 KBps. 1018.12 Aggregate KBps. 7680 bytes. 333.47 KBps. 1001.05 Aggregate KBps. 7168 bytes. 314.77 KBps. 943.71 Aggregate KBps. 6656 bytes. 335.36 KBps. 1005.53 Aggregate KBps. 6144 bytes. 325.17 KBps. 975.50 Aggregate KBps. 5632 bytes. 338.67 KBps. 1016.02 Aggregate KBps. 5120 bytes. 332.63 KBps. 999.16 Aggregate KBps. 4608 bytes. 319.38 KBps. 958.89 Aggregate KBps. 4096 bytes. 333.56 KBps. 1001.01 Aggregate KBps. 3584 bytes. 319.46 KBps. 958.66 Aggregate KBps. 3072 bytes. 295.47 KBps. 886.41 Aggregate KBps. 2560 bytes. 298.91 KBps. 896.47 Aggregate KBps. 2048 bytes. 272.65 KBps. 817.78 Aggregate KBps. 1536 bytes. 255.83 KBps. 763.40 Aggregate KBps. 1024 bytes. 238.59 KBps. 716.19 Aggregate KBps. 512 bytes. 168.96 KBps. 506.46 Aggregate KBps. 1018.12 Maximum KBps. 904.02 Average KBps. 10Mbps Ethernet. Four stations. Note division of capacity. 8192 bytes. 255.70 KBps. 1021.69 Aggregate KBps. 7680 bytes. 250.42 KBps. 1002.30 Aggregate KBps. 7168 bytes. 236.66 KBps. 947.82 Aggregate KBps. 6656 bytes. 252.69 KBps. 1011.41 Aggregate KBps. 6144 bytes. 245.13 KBps. 980.03 Aggregate KBps. 5632 bytes. 255.16 KBps. 1020.64 Aggregate KBps. 5120 bytes. 251.05 KBps. 1003.14 Aggregate KBps. 4608 bytes. 240.48 KBps. 961.08 Aggregate KBps. 4096 bytes. 252.35 KBps. 1009.06 Aggregate KBps. 3584 bytes. 243.12 KBps. 970.74 Aggregate KBps. 3072 bytes. 228.52 KBps. 914.85 Aggregate KBps. 2560 bytes. 244.34 KBps. 976.93 Aggregate KBps. 2048 bytes. 205.54 KBps. 822.98 Aggregate KBps. 1536 bytes. 167.51 KBps. 670.02 Aggregate KBps. 1024 bytes. 215.78 KBps. 873.16 Aggregate KBps. 512 bytes. 131.00 KBps. 523.70 Aggregate KBps. 1021.69 Maximum KBps. 919.35 Average KBps. 100Mbps Ethernet. One station. Limited by client capability. 8192 bytes. 6361.24 KBps. 6361.24 Aggregate KBps. 7680 bytes. 6101.50 KBps. 6101.50 Aggregate KBps. 7168 bytes. 5853.59 KBps. 5853.59 Aggregate KBps. 6656 bytes. 5708.93 KBps. 5708.93 Aggregate KBps. 6144 bytes. 5422.06 KBps. 5422.06 Aggregate KBps. 5632 bytes. 5221.55 KBps. 5221.55 Aggregate KBps. 5120 bytes. 4967.42 KBps. 4967.42 Aggregate KBps. 4608 bytes. 4613.44 KBps. 4613.44 Aggregate KBps. 4096 bytes. 4290.48 KBps. 4290.48 Aggregate KBps. 3584 bytes. 3933.92 KBps. 3933.92 Aggregate KBps. 3072 bytes. 3516.79 KBps. 3516.79 Aggregate KBps. 2560 bytes. 3151.42 KBps. 3151.42 Aggregate KBps. 2048 bytes. 2620.87 KBps. 2620.87 Aggregate KBps. 1536 bytes. 2053.92 KBps. 2053.92 Aggregate KBps. 1024 bytes. 1474.35 KBps. 1474.35 Aggregate KBps. 512 bytes. 804.55 KBps. 804.55 Aggregate KBps. 6361.24 Maximum KBps. 4131.00 Average KBps. 100Mbps Ethernet. Two stations. Full wire, shared by stations. 8192 bytes. 5459.73 KBps. 10925.50 Aggregate KBps. 7680 bytes. 5376.47 KBps. 10752.94 Aggregate KBps. 7168 bytes. 5185.99 KBps. 10371.39 Aggregate KBps. 6656 bytes. 5418.12 KBps. 10841.33 Aggregate KBps. 6144 bytes. 5209.23 KBps. 10416.95 Aggregate KBps. 5632 bytes. 4889.09 KBps. 9777.72 Aggregate KBps. 5120 bytes. 4792.61 KBps. 9581.20 Aggregate KBps. 4608 bytes. 4490.18 KBps. 8980.37 Aggregate KBps. 4096 bytes. 4297.65 KBps. 8594.63 Aggregate KBps. 3584 bytes. 3904.32 KBps. 7808.35 Aggregate KBps. 3072 bytes. 3474.66 KBps. 6949.33 Aggregate KBps. 2560 bytes. 3101.51 KBps. 6203.65 Aggregate KBps. 2048 bytes. 2584.06 KBps. 5168.29 Aggregate KBps. 1536 bytes. 2021.22 KBps. 4042.07 Aggregate KBps. 1024 bytes. 1454.45 KBps. 2908.98 Aggregate KBps. 512 bytes. 797.91 KBps. 1599.13 Aggregate KBps. 10925.50 Maximum KBps. 7807.61 Average KBps. 100Mbps Ethernet. Three stations, sharing full wire. 8192 bytes. 3655.03 KBps. 10967.11 Aggregate KBps. 7680 bytes. 3629.19 KBps. 10873.08 Aggregate KBps. 7168 bytes. 3573.99 KBps. 10722.57 Aggregate KBps. 6656 bytes. 3631.17 KBps. 10895.46 Aggregate KBps. 6144 bytes. 3580.45 KBps. 10779.93 Aggregate KBps. 5632 bytes. 3594.38 KBps. 10786.62 Aggregate KBps. 5120 bytes. 3585.99 KBps. 10758.39 Aggregate KBps. 4608 bytes. 3524.44 KBps. 10604.71 Aggregate KBps. 4096 bytes. 3550.67 KBps. 10637.18 Aggregate KBps. 3584 bytes. 3479.74 KBps. 10437.46 Aggregate KBps. 3072 bytes. 3112.00 KBps. 9339.51 Aggregate KBps. 2560 bytes. 2823.62 KBps. 8471.89 Aggregate KBps. 2048 bytes. 2571.45 KBps. 7711.70 Aggregate KBps. 1536 bytes. 2012.67 KBps. 6037.88 Aggregate KBps. 1024 bytes. 1455.45 KBps. 4361.67 Aggregate KBps. 512 bytes. 801.47 KBps. 2404.07 Aggregate KBps. 10967.11 Maximum KBps. 9111.83 Average KBps. 100Mbps Ethernet. Four stations, sharing full wire. 8192 bytes. 2744.30 KBps. 10942.28 Aggregate KBps. 7680 bytes. 2721.27 KBps. 10867.45 Aggregate KBps. 7168 bytes. 2690.77 KBps. 10734.31 Aggregate KBps. 6656 bytes. 2723.78 KBps. 10885.42 Aggregate KBps. 6144 bytes. 2696.98 KBps. 10737.58 Aggregate KBps. 5632 bytes. 2712.16 KBps. 10778.52 Aggregate KBps. 5120 bytes. 2661.63 KBps. 10595.06 Aggregate KBps. 4608 bytes. 2624.50 KBps. 10440.60 Aggregate KBps. 4096 bytes. 2671.81 KBps. 10625.50 Aggregate KBps. 3584 bytes. 2651.72 KBps. 10568.12 Aggregate KBps. 3072 bytes. 2576.93 KBps. 10248.71 Aggregate KBps. 2560 bytes. 2519.51 KBps. 10073.62 Aggregate KBps. 2048 bytes. 2481.38 KBps. 9922.82 Aggregate KBps. 1536 bytes. 1907.43 KBps. 7626.27 Aggregate KBps. 1024 bytes. 1454.03 KBps. 5816.36 Aggregate KBps. 512 bytes. 798.16 KBps. 3195.04 Aggregate KBps. 10942.28 Maximum KBps. 9628.60 Average KBps. Joe D. --------- Date: Mon, 16 Jun 1997 13:23:06 -0600 From: Joe Doupnik Subject: Re: Lan performance, cont'd, fragmentary story Just a brief addendum to the lan performance material. You can run perform3 against local drives too. Trying it may be informative. As examples, on a Pentium 100 EISA bus desktop machine: Local drive, Seagate Hawk (2GB, 5400 RPM, SCSI), Adaptec 2742 SCSI EISA bus controller, no caches. That should be a swift configuation, more so that IDE drives or ISA bus stuff. Max throughput is 2MBps. Ram drive. Max throughput is 6MBps. And this says perform3 itself maxes out at that speed, the same as we saw for 100Mbps Ethernet and one station. 100Mbps Ethernet can produce faster transfers than the local drive, and match the RAM drive. Throughput from the server will then be limited by the disk system on it, which can be made fast indeed, or the capacity of the wire. Joe D. ------------------------------ Date: Sat, 7 Jun 1997 20:58:15 +0200 From: "Arthur B." Subject: Re: Performance Gain from Replacing Hub with Switch >For 30 users, our LAN has become slow (Compaq Proliant 800, 80 >Meg RAM, 4 Gig duplexed HD's). Question--will replacing the hub >with a switch (10BaseT, but 100BaseT to the server) likely create >a noticeable difference in speed by itself? It will. Users and servers get full-duplex (send and receive at the same time) instead of half-duplex. And traffic goes their only where it is needed. It's a nice device but not often used in a 30 PC environment. The big question is... why do you think it's needed and is it needed? There are other options you might want to consider. ------------------------------ Date: Thu, 19 Jun 1997 10:04:39 -0600 From: Joe Doupnik Subject: Re: Load balancing on local LAN with IPXRTR >We are experiencing some bottlenecks on the NIC of one of our main >4.10 servers, and are looking for ways to improve/eliminate this >condition. The server is on its own segment, connected directly to >our backbone Ethernet switch. I know that Novell has the >ability to "load balance" the traffic using the IPXRTR NLM. By >putting multiple NIC's in the server, binding the same IPX network >number and address to both cards, and loading the IPXRTR NLM you >would be able to distribute the traffic over the NIC's. > >My question: Is anyone else already doing this? Has it helped? >Are there any other options? -------------- Let's take apart the problem. Lots of traffic on the wire is fine, provided the queueing delays don't become too long nor lan adapters get smashed from too high a packet rate. Some numbers are often helpful in defining size adjectives. So we can have a wire that is just plain too full for decent response times. And we can have a server lan adapter that is overwhelmed by the traffic, or both. What does load balancing do for us? The idea is to split the traffic going out of the server across one or more lan adapters and somehow put it onto one wire, or if money is there then to multiple ports on an Etherswitch. That is, the "balance" part is sensed by the outgoing queue lengths. If only one wire is available then simply get a better lan adapter and forget complexity. If multiple wires are available then my inclination is connect them to separate hubs and hence truely multiply your lan bandwidth: one server lan adapter per wire. Load balancing is an IPX affair, not an IP one. Splitting the wiring into parallel paths, with no load balancing software, is simpler and works on all kinds of traffic. It also introduces IP subnetting issues. Clearly, load balancing and multiple lan adapters feeding a clogged wire does nothing good at all. A decent lan adapter in a server can deal with a totally full Ethernet wire, without load balancing software. Again, if only one wire is available then consider making that wire 100Mbps Ethernet to gain capacity. Many hubs have such an "uplink" port available. Match the wire with a decent lan adapter (I happen to like the Intel EtherExpress PRO/100 boards at this time for reasons of outstanding performance, the low price doesn't hurt either). Whatever you purchase, ensure it is a technically satisfactory product. If the traffic situation is generally awful all around then its time to redesign the topology and strategy. Segregrating traffic is the normal first step, and we use bridges and routers for the task. Each extracts a toll of reduced throughput (but still a net gain from more open time on the wires) from the one packet transit time and the expense of buying the boxes. Sometimes a better hub concentration plan does the trick without spending much money, and we couple that with more wires to the server to gain bandwidth at minimal expense. Clustering clients behind NW servers rather than putting everyone on a backbone is a normal curative; that's a router style solution. If there is only one wire and it is busy from traffic not involving the NW server then an Etherswitch is the item to employ, to let traffic cross over point to point without clogging the wire to the server. An Etherswitch doesn't do any good if nearly all the traffic goes to one place, and it can merely slow down the net with no benefit. Please do read the fine print on Etherswitches (such as backplane speed, MAC table lengths per port, whether the MAC tables automatically refresh if a station moves from wire A to wire B, and so on); they are not created equal. Carrying the general topology plan to greater lengths suggests merging 10Mbps streams into backbone fat pipes, either 100Mbps Ethernet or FDDI or ATM. Before spending lots of money one needs traffic measurments. Joe D. --------- Date: Thu, 19 Jun 1997 11:19:21 -0700 From: Andrew Bynum Subject: Re: Load balancing on local LAN with IPXRTR Yes, it does work, but it depends on what the users are accessing the segment for. How many users do you have logging on to the server, and what are they doing once they are there? The problem with putting a server directly onto a switch is that the buffering mechanism that is built into ethernet (collision domains, with partitioning hubs) no longer is effective at doing its job. If you are saying that the load reported by the server statistics is higher than what you want, it very well may be that your baseline is not accurate for your configuration. I did some stress testing on switches for Intel, utilizing multiple NICs in Novell servers (3.12 - 4.11) and found no problem. Are people not able to login? Are print jobs taking longer to print? These are the indications that show you have a configuration problem. Whenever you plug a NIC directly into a switch, whether it be a server, client, or printer, you are always going to have higher statistics reported at that node. Where are you getting your statistics from? NIC's were designed with CSMA/CD in mind, but switches deny them of much of this function (although on occasion it does come into use). What kind of switch are you using, and does it have any kind of reporting features built in. ------------------------------ Date: Fri, 4 Jul 1997 01:41:34 +0200 From: "Arthur B." Subject: Re: .....utilitzation....... >Hello......when looking at and considering "utiliazation" which is >"more" important, packets per second and packet size or bytes per second? >Given the "real world" thru put of about 5MB/Sec , what is a good way to >think about and measure what's on the "wire." Packets/second. Each packet should claim the entire wire for a single moment in time in which time other packets shouldn't be transmitted. Size of the packet is not important for this. Too many packets and clients need to be more patient in transmitting their packets (and wait longer for the answer to arrive). Thus resulting in loss of performance. In worse cases a lot of excessive collisions may occur, resulting in too much fragmentation and following jamming signals being transmitted on the wire (which tell every NIC to "shut up" for a while). After which almost every NIC connected on the wire has the need to transmit their packets since they have waited long enough as far as they are concerned. ...but increasing the chance of another excessive collision. After a while this behaviour will fade out if overall network utilization lowers. Another thing to watch is the 'network utilization'. The overall workload on your wire. The higher peaks are moments of performance loss. Too many of them and users are not happy. The average workload determines the chance on getting the higher peaks. I like an average utilization below 3% and most of the peaks not above 35% (problem is getting there if at all achievable that is). Try to pinpoint the processes/NICs that are responsible for the most peaks and the ones that boost your average utilization by a steady stream of packets. If you can lower their transmitting behaviour enough you should get noticeble results. If all fails you may wish to separate them (eg segmented hub). Example: pinpointing a bunch of printers that are searching for jobs with an interval of 1 second (increase their interval to 5 seconds and you just lowered your average utilization) -or- replacing the widly used but not network-friendly app with a calmer one (average and peak go down) -or even- pinpoint a process that checks for the existence of certain files every so often but is probing target directories *and* the entire search path (do a SET PATH= just before starting that process and average utilization just went down again). --------- Date: Thu, 3 Jul 1997 18:55:10 -0600 From: Joe Doupnik Subject: Re: .....utilitzation....... >>Hello......when looking at and considering "utiliazation" which is >>"more" important, packets per second and packet size or bytes per second? >>Given the "real world" thru put of about 5MB/Sec , what is a good way to >>think about and measure what's on the "wire." > >Packets/second. >Each packet should claim the entire wire for a single moment in time in >which time other packets shouldn't be transmitted. Size of the packet is >not important for this. Too many packets and clients need to be more >patient in transmitting their packets (and wait longer for the answer >to arrive). Thus resulting in loss of performance. Let's be much simpler here. If the person is concerned about utilization of the wire then that is insufficient verbage to define a problem. For example: client talks to server and moves big files. The wire can run at 80-90% capacity with just that traffic and things are just perfect. I can and have shown on this list just such data for 10 and 100Mbps Ethernet to INW 4.11. The wire doesn't care. I'll explain more below. There is a concern about packets per second, however, as a consequence of spending cpu time processing each packet. On slow machines this becomes a dominant concern. With awkward or slow lan adapters, or with slow buses, this becomes *the* dominant factor. It's part of the overhead of doing business. Another part is packet headers using time on the wire, and it too can be horrid if tinygrams are employed. >In worst cases a lot of excessive collisions may occur, resulting in too >much fragmentation and following jamming signals being transmitted on the >wire (which tell every NIC to "shut up" for a while). After which almost >every NIC connected on the wire has the need to transmit their packets >since they have waited long enough as far as they are concerned, >increasing the chance on another excessive collision. After a while this >behaviour will fade out if overall network utilization lowers. Not quite. Yes, there will be collisions if multiple parties try to transmit at the same time; that's normal. Collision pieces are tiny, a dozen bytes or less, because the distances are tiny (speed of light is finite etc). The transmitter sensing a collision may continue to send jam info to fill 64Bytes (because the controller can be made that simple). That is very very little time on the wire. Stations separate themselves via the Ethernet binary truncated exponential backoff algorithm. Up to 1024 stations can contend, successfully, for the wire, and we put nothing like that number in one contention domain. >Another thing to watch is the 'network utilization'. The overall workload >on your wire. The higher peaks are moments of performance loss. Too many >of them and users are not happy. The average workload determines the >change on getting the higher peaks. I like an average utilization >below 3% and most of the peaks not above 35% (problem is getting there >if at all achievable that is). That's not the way I see things. High wire utilization is performance, period. Not a loss. Wire utilization itself tells us little other than the wire is doing its job. One Pentium 90 client can fill 60% of a 100Mbps Ethernet all by itself. A Pentium 200 can use it all. That's good, not bad. It means they can get the job done quickly and leave the wire free for other machines, and user's like things which happen quickly (or, um, many such things). It also means stations don't let the wire go idle unnecessarily and thus increase the time to complete a job. 100% utilization peaks mean you have very fine stations on the wire, including the server. It can also mean you have a lot of maybe fine stations hammering on a faithful server, and each gets only a fraction of the available total capacity (wire and server, not just wire). To rub in the salt, when a packet is on the wire the utilization is 100%, by definition. Smaller wire utilization means time goes by with no transmission, and that is not doing anything useful. Now let us see what the underlying reason is for nets slowing down when the wires become busy. It's called queueing. If packets need to go out but can't right now because the wire is occuppied then they form a queue. Simple queueing theory for exponentially distributed packet generation (Poisson process) by many stations and exponentially distributed packet servicing (by their disappearance onto the wire) yields the interesting result that the average time a packet spends waiting in the queue is T packet times = 1/(service - arrival) where "service" is the average service rate in packets per possible packet time (think of packet slots for convenience), and "arrival" the average rate of packets joining the queue per packet time. This is known as Little's result (See, for example, Andrew Tanenbaum's "Computer Networks"). Notice a couple of things. If "service" and "arrival" rates are equal the average queue delay is infinite. Yikes. The number of packets in the queue is T * "arrival" packets, which also goes infinite. Remember, we are dealing with statistical averages, where idle servers mean wire capacity is lost forever without recovery. If "service" is twice as large as "arrival" then on average each packet waits twice as long compared to the "arrival" = 0 (idle wire) or the "service" = infinity case. Fast service means packets are carried away in the ether very quickly. Infinity is slightly faster than say Pentabit Ethernet. Some quiet reflection says when a queue becomes busy the line (and delay) grow very quickly indeed. This is the traffic jam effect. There is not a mention of collisions so far, but to accomodate them make retransmission attempts be fresh arrivals in the sending queue. Half or three quarter capacity utilization means things will wait to get onto the wire, and the busier the wire the longer will be the waiting time. We can lower the delay by increasing the service rate, "service." I did just that by going from 10 to 100Mbps Ethernet. Microsoft could lower the delay by making the "arrival" rate (program size) smaller. The other part of service rate is what the receiving end needs to do with a packet to keep the conversation going. Sluggish servers will make the system slow regardless of wire speed. The receiver is another queue in series with that of the wire. Sharp readers will instantly recall the "packet receive buffers" value on Monitor climbing when the server becomes blocked by other events, and that's one important part of the receiver queue. If the server is really swift one needs very few "packet receive buffers." Overall, looking at only one component of the network is silly and misleading. The network is a set of coupled systems: clients (many), wires, delays in bridges/switches/routers, delays/queues in the server, queues for disk activity, and so on. Good networks are balanced to put the major delays at either extremity of the system (client and server) where software can sense what is going on and take constructive steps to behave appropriately (not to mention provide queue buffer space). Bad nets have bottlenecks in the middle where the only recourse to overload is dropping packets. Ethernet Carrier Sense is just dandy as a local throttle to keep material queued at home until the wire is free. Ask meaningful questions, such as how fast can my clients get their job done, and the answers are revealed through analysis and measurment of all the components (not neglecting bloated programs and poorly written ones doing small transfers per packet). Joe D. P.S. Appologies for the long message. This is part of the foundation of computer networking and ought to be understood by professionals. >Try to pinpoint the processes/NICs that are responsible for the most peaks >and the ones that boost your average utilization by a steady stream of >packets. If you can lower their transmitting behaviour enough you should >get noticeble results. If all fails you may wish to separate them >(eg segmented hub). > >Example: pinpointing a bunch of printers that are searching for jobs with >an interval of 1 second (increase their interval to 5 seconds and you >just lowered your average utilization) -or- replacing the widly used but >not network-friendly app with a calmer one (average and peak go down) >-or even- pinpoint a process that checks for the existing of certain >files every so often but is probing target directories *and* the entire >search path (do a SET PATH= just before starting that process and average >utilization just went down again). >* Arthur B. ------------------------------ Date: Wed, 6 Aug 1997 21:50:16 +0100 From: Richard Letts Subject: Re: Netware 4.11 high utilization >We had a netware 4.11 server with 550 concurrent users, 2 nic 3com >3c59x at 10 bmps and NLSP with LOAD BALANCE set to "ON" >the processor utilization in the server was 10 %.. > >Now, when we install switches 100basex in our LAN and set the nics in >the 4.11 server to 100mbps, the processor utilization up to 80% . 10 times the network thoughput, 10 times the cpu-load (about).. >AES processes call backs These are the hooks that get called when a packet arrives. All network cards are not equal, some are bus-mastering, others have on-board processors to off-load the main CPU. The 59x series of cards rely on the main CPU to do alot of the work; Joe'll be along any moment to probably reccomend you try Intel's P100 SERVER boards as these have on-board i860 CPU's --------- Date: Wed, 6 Aug 1997 15:07:24 -0600 From: Joe Doupnik Subject: Re: Netware 4.11 high utilization Is that my cue? Ok, 80% utilization for two 100Mbps lan adapters is indeed high compared to at least one other adapter maker. Tests here using full 100Mbps Ethernets shows the Intel EE Pro 100B (a client board, but in the server) to consume about 20% of the cpu per wire, on a PPro 200 motherboard server. That too is INW 4.11. I have three such boards in the server, and two NE-2000's for printer wiring and for connection to the outside world. The Intel boards are PCI based, full bus master units, and their cost has dropped to attractive levels. Keep in mind that your load balancing requires lots of decisions to be made, for each outgoing packet. I turn off NLSP completely and have no such overhead. In fact load balancing is questionable in many situations. Instead I split the network into separate collision domains and assign a server board to each. The Intel board is quite able to drive a full 100Mbps wire all by itself, at the stated 20% cpu loading. The better Intel EE PRO/ Server board should drop the utilization down to the 5% range; it has bigger buffers and an i960 cpu to keep the lan adapter happy. Alas I have motherboard versus Pro/Server board troubles at the moment, so the client style boards are doing the work. Joe D. ------------------------------ Date: Thu, 7 Aug 1997 20:51:37 -0500 From: "John H. Lederer" Subject: Raid Arrays I thought I might share some of the research/materials I found. A lot of the info comes from Gibson, Patterson, Katz (the original researchers on Raid) and their subsequent associates. === First, and most interesting they regard Raid 1 as a specific instance of Raid 5. Think about it and you will see what they mean. === Second, in terms of first order effects and assuming that all disks cost the same and have the same performance, the following table gives a fairly good explication of Raid performance. The table is one for a ratio of cost to performance with a Raid 0 disk unit equalling 1. A number of 1/2 means that the performance cost ratio is 1/2 that of a single Raid 0 drive in terms of I/O per second -- either you have twice as many drive units and the same i/o performance, or the same number of drive uits and half the i/o performance: For Raid 1 and Raid 5. N= number of drive units Small Read Small Write Large Read Large Write Storage efficiency 1 1/N 1 (N-1)/N (N-1)/N (but never less than 1/4) This implies that if small writes are an important performance criteria, then use Raid 1, if large writes are important, then use Raid 5 with many disks. The chart dioes not account for duplexing which will double read performance. To me this factor is critical, because I can get duplexing out of Novell for "free" (or some cpu load). As a practical matter I don't have the money to duplex Raid 5. ============ Reliability There are a number of scenarios under which any Raid array can have data loss (e.g. two drives fail). In a very rough way, the probability of the majhor ones of these scenarios increases as the number of disks increase. Therefore, for instance, from a reliability point of view, a 3 disk Raid 5 array is more reliable than a 7 disk raid 5 array. =========== My thoughts: As I said duplexing made up my mind (and my environment has a large number of small writes). However, there is a second factor. If I do duplexed Raid 1 I can use high quality "standard" components. I can, for instance, buy high quality disks at a reasonable price. If I do raid 5, I normally end up with proprietary components that are very high priced, e.g. the same drive in a plastic tray with a different plug and a fancy label costs me 1.5-2.5x as much. This makes those ratios even worse. Duplexing and standard drives, gives Raid 1 almost a 3 or 4 to 1 performance cost adavanatge. I know some of you will suggest that caching makes this all wrong. It might, and I don't have good figures. However, as a general matter, I think cache is better in a single large cache (higher probability of hits). Thus I suspect that I do nearly as well or better to increase system memory than to buy cache for the controller (particularly if the cahe is porprietary and expensive). --------- Date: Sat, 9 Aug 1997 01:25:38 GMT From: "Eric E. Allen" Subject: Re: RAId Check out the RAID Advisory Board web page. www.raid-advisory.com If you are member you are able to receive the booklets on RAID at no cost. However, they are priced very reasonably if you or your company are not a member. ------------------------------ Date: Mon, 11 Aug 1997 21:19:48 GMT From: "Eric E. Allen" Subject: Re: Raid Arrays >I would love to get some detail on cache performance. The old saw is >that cache is best kept in a large single location (e.g. the server >cache) because one's probability of an early hit in cache increases. It is true for Novell OS's the best performance boost for caching is to add more memory to the server in the beginning. But as the Dirty Cache Buffers rise it (usually means there is a bottle neck in the I/O) This is where the added cache on the controller helps the most. RAID 5 write performance can be made equivilant to RAID 3 by increasing the cache. The large cache offering from the manufactures is to help Windows NT Server get better performance since it does not a very good job at caching. >One disadvanatge that I would see for controller cache/write >verification is that it would not meet the requirements of some for "end >to end" verfification. The controller would verify that it wrote to disk >what it got from the server -- but cannot verify that what it got from >the server is what the server sent. Array controllers have the ability to perform read/write/read verification of the data at the array level. Most users sacrifice this function for performance. >A Compaq salesman made the claim that Compaq drives (third party >relabeled) were built and tested to a higher specification. When I >pressed for detail (what is the higher spec and how does it differ from >the manufactruer's normal spec, does the disk manufacturer run a >separate line, etc.) I was unable to obtain detail. This area depends on the relationship the company has with the drive manufacture. If it is a good relationship then drives that the company receives from the drive manufacturer can be rejected for having to large of a permanent bad block table or any other quality benchmark that is set in place. ------------------------------ Date: Sun, 26 Oct 1997 11:01:27 -0600 From: Joe Doupnik Subject: Re: Netware slower than DOS -- an update >Some of you may recall a couple of messages I have posted here >recently, in which I described apparent disk I/O limitations of >NetWare 3.12 servers. > >To summarize, I found that raw disk I/O speed of NetWare 3.12 was >several times slower than that of DOS running on the _same_ _machine_, >and that this result was _independent_ of the type of hard drive used. >I confirmed this on more that one differently configured machine, >although I was not able to do NetWare 3.12 tests on any name-brand >server-grade systems. > >I have just done some tests which I think are rather interesting. On >one of my test machines, I installed NetWare 4.11 and ran IOZONE on >it. During this test, dirty cache buffers never got above about 500, >and IOZONE's throughput remained steady at about 500,000 bytes per >second. The dirty cache buffer observed write rate was around 250 per >second. NCOPY of a 10MB file took about 5 seconds. > >Booting this same machine under 3.12 was a rather different story. >Dirty Cache Buffers easily reached in the vicinity of 3,000, and >IOZONE's throughput dropped as low as 60,000 at one point. The dirty >cache buffer observed write rate was around 25 per second. I didn't >try to 10MB copy this time, but tests in the past, even using the NW4 >NCOPY on a 3.12 system, took over a minute. > >So it looks like NetWare 4.11 has nearly an order of magnitude better >disk I/O than Netware 3.12. Yes, I am incredulous. I'd love to hear >of tests that prove me wrong, or explanations for this dramatic >difference that has not to my knowledge been previously reported. -------- Perhaps I can explain all this and put matters back into perspective. The transfer rates you are seeing, say 0.5MB/sec, are determined by the network, not by the server's hard disk. Some tests and insights. Run Novell's Perform3 to your server. It first creates a file of a few KB (the size of the test) and then repeatedly reads it (NCP request of Read File Handle N, offset of 0 bytes) for the test duration (12 secs default). That file is small enough to stay in the server's cache memory and hence eliminate the disk from the end to end situation. Thus Perform3 is basically a test of the network component. Decent 10Mbps Ethernet adapters and drivers can reach up to 0.6 to 0.7MB/sec on long transfers using Packet Burst, half that if PBurst is not used. 10Mbps Ethernet can carry about 1+MB/sec. Two or more clients doing the same test together can fully use a conventional 10Mbps Ethernet. Decent 100Mbps Ethernet adapters and drivers can reach 10 times these values: 6MB/sec for a Pentium 90 client, a full 10+MB/sec for two or more (or just a faster client). Applying iozone to a local disk drive yields numbers in the range of say 2.5MB/sec. Iozone first writes a long file and then reads it back. It then (in "auto" mode) repeats this with double the file length. When using a file server the whole file goes to the server's disk, and only part of it is likely to remain in server cache for the read-back. Transfer rates to decent servers over 100Mbps Ethernet are about 3.5MB/sec. That's faster than a local hard disk. Let's say this again. A 100Mbps lan plus a NW 4 server and decent disks can be markedly faster than local disks. If the network is slower than 3.5MB/sec transfer rate then clearly the iozone test to a server will be appropriately slower too, and that is what you observed with 10Mbps Ethernet. Max throughput of 100Mbps Ethernet is about 10+MB/sec of user data. Further, NW permits using logical disk units (disk allocation units) larger than 4KB. NW 3 servers are typically constructed with 4KB units to reduce loss of space from tag ends. NW 4 servers have subblock allocation and thus we operate them with typically 64KB allocation units and let the subblock stuff reuse tag ends. The difference in allocation unit size means the server's disk system works a great deal less hard with larger rather than smaller allocation units. Think of this as block i/o with larger blocks to reduce the per call busy-work overhead, which of course is precisely the idea of the matter. Depending on the lan adapters involved, the server can fall behind by spending lots of cpu cycles on the lan adapter rather than the disk drive, or by spending too many cycles beating on say IDE drives and not keep up with the network. Smart stuff really counts, so use SCSI. Server bus kind counts for a great deal too. ISA is slow, EISA is much faster, PCI is faster yet. Finally, to repeat a word of caution on Packet Burst. It can and does become unstable under heavy stress. That first reduces throughput and then can overwhelm a receiver as PB tries to recover by sending as fast as physically possible. Instability depends on the lan adapters involved, on the speed of the client cpu, and probably phase of the moon, but happen it certainly does. Use a good wire monitor, say Novell's LZFW, to observe it. Unstable PBurst can yield throughput numbers which get worse rather than better as transfers become longer. If you were to compare a NW 3 with a NW 4 file server with the same disk and lan setup then one could make remarks about the efficiency of the o/s. But I'll wager that is not the way you did things. Instead, my guess is you used 4KB disk allocation units on NW 3 and 64KB with subblock allocation on NW 4. Try again with identical allocation unit sizes. Try NW 4 further with and without suballocation, and see that the feature likely costs time yet saves disk space. Joe D. --------- Date: Mon, 27 Oct 1997 11:04:34 -0600 From: Joe Doupnik Subject: Re: Netware slower than DOS -- an update >>The transfer rates you are seeing, say 0.5MB/sec, are determined >>by the network, not by the server's hard disk. > >or the peak rates, that's clearly true, Joe. But it doesn't explain >why transfer rates would dip so much lower on 3.12 as dirty buffers >climb(packet burst?). Packet Burst, yes, also a constipated server. Novell's Lanalyzer is especially good about the latter by displaying a "server overload" alarm. That alarm really means an NCP request was heard two or more times while the first instance had not been satisfied, and the server reports back "yes, yes, I heard that, please be patient" which then triggers the LZFW alarm. Clearly, a sluggish disk farm can lead to these blockages too. >And it says nothing about the dramatic _differences_ in the dirty >cache buffer counters and write rates between 3.12 and 4.11. These >are critically important to running a busy server with a large, >randomly-accessed data base. Actually I wonder about that. The dirty buffers simply reflect the number of items waiting in the disk-write queue. That means they are available, in principle anyway, for being part of the disk cache and thus readable from that cache before hitting the disk drives. >A server that's busy writing because of slow disk I/O will have much >poorer read performance on uncached data, which will be frequent with >large, active data bases. That's complicated and I'm not ready to speculate on it. >or these tests comparing disk read/write rates on 3.12 and 4.11, I >pretty much ignored the issue of absolute transfer rate as being >irrelevant (they _were_ close to identical with small, easily-cached >files). Instead, I concentrated on the relative rates between them. >I guess I assumed everyone else would do the same. Sorry if I was >unclear. Still isn't clear. There are yet two more items not mentioned but hopefully were controlled. One is the disk read-after-write checking. NW 3 has it on by default, NW 4 has it off by default. The difference is about a factor of two in write time, and hence is reflected in vastly different queue lengths (dirty cache buffers). And the two O/S's can differ in their default number of simultaneous disk writes, so align those guys too. Through all this I am not stating one O/S is necessarily faster than other, but one suspects NW 4 does have faster paths based on learning experiences from NW 3. However, I am saying 10Mbps Ethernet itself is the dominant limiting factor for client-server exchanges, because those exchanges go dramatically faster with 100Mbps Ethernet and the disk farm is the limiting factor with 100Mbps links. Add PBurst instability and the numbers can go all over the map (use LZFW to observe this item). I'm delighted to see continued probes of system performance. We learn about all kinds of non-obvious impediments this way. My grad computer networks class conducted very similar experiments last week as one of their lab assignments: 10 and 100Mbps Ethernets to the same server, Perform3 and iozone tests, etc. They observed the same numbers I reported from my own tests. INW 4.11 server in this case rather than dual NW 3 and 4 servers. VLMs and not Client32 as well (yet another item to watch when folks do testing). Joe D. --------- Date: Tue, 28 Oct 1997 04:08:17 GMT From: Ken Wallewein Subject: Re: Netware slower than DOS -- an update I just had to respond briefly (sorta) to a couple of points. Anything more would be better served with fresh data. >>And it says nothing about the dramatic _differences_ in the dirty >>cache buffer counters and write rates between 3.12 and 4.11. These >>are critically important to running a busy server with a large, >>randomly-accessed data base. > > Actually I wonder about that. The dirty buffers simply reflect >the number of items waiting in the disk-write queue. That means they >are available, in principle anyway, for being part of the disk cache >and thus readable from that cache before hitting the disk drives. Agreed, for data in the dirty cache buffers. But what about reading data not currently in RAM? Semi-casual observation (not careful tests, granted), as well as NetWare documentation on side effects of tuning the dirty cache buffer concurrent write rate, tend to support my analysis (a server with many dirties gets dog slow reading the disk). And it only makes sense that slow disk I/O will exacerbate both conditions. > There are yet two more items not mentioned but hopefully were >controlled. One is the disk read-after-write checking. NW 3 has it on by >default, NW 4 has it off by default. The difference is about a factor of >two in write time, and hence is reflected in vastly different queue = lengths Actually, I tried that on a mirrored server, Joe. I was disappointed by the results. Does mirroring affect it much? >(dirty cache buffers). And the two O/S's can differ in their default >number of simultaneous disk writes, so align those guys too. I'm not sure I would want to do that (see below). >Through all this I am not stating one O/S is necessarily faster >than other, but one suspects NW 4 does have faster paths based on >learning experiences from NW 3. However, I am saying 10Mbps Ethernet >itself is the dominant limiting factor for client-server exchanges, >because those exchanges Several thousand dirty cache buffers generated by a single 10baseT client running IOZONE doesn't sound like a network bottleneck to me. >go dramatically faster with 100Mbps Ethernet and the disk farm is the >limiting factor with 100Mbps links. Add PBurst instability and the >numbers can go all over the map (use LZFW to observe this item). Joe, you make a number of excellent points. But you've got me thinking. For these tests, I did fresh installs of NetWare 3.12 and 4.11 on the same machine. By and large, I used the default settings. Before I test a changed configuration, I want to think carefully about whether I even care what the results would be. My objective in all of this, after all, is not to determine relative speed of respective OS's core I/O performance. It is to optimize the performance of mission-critical production business systems. OS internals are interesting only over beer or to the extent that they can be applied. --------- Date: Tue, 28 Oct 1997 09:33:23 -0600 From: Joe Doupnik Subject: Re: Netware slower than DOS -- an update >I just had to respond briefly (sorta) to a couple of points. Anything >more would be better served with fresh data. > >>>And it says nothing about the dramatic _differences_ in the dirty >>>cache buffer counters and write rates between 3.12 and 4.11. These >>>are critically important to running a busy server with a large, >>>randomly-accessed data base. >> >> Actually I wonder about that. The dirty buffers simply reflect >>the number of items waiting in the disk-write queue. That means they >>are available, in principle anyway, for being part of the disk cache >>and thus readable from that cache before hitting the disk drives. > >Agreed, for data in the dirty cache buffers. But what about reading >data not currently in RAM? Semi-casual observation (not careful >tests, granted), as well as NetWare documentation on side effects of >tuning the dirty cache buffer concurrent write rate, tend to support >my analysis (a server with many dirties gets dog slow reading the >disk). And it only makes sense that slow disk I/O will exacerbate >both conditions. Clearly, if a disk is way behind with a huge write queue and then someone asks to read from the disk itself then there will be competion for sequencing of events. That's a system strategy item. Rather than get wound up in things I can't change I tend to stand back and ask simpler questions: is all this because I have a weak system and could it be improved by changing components? For example, disk controllers, the age of their drivers, the tuning of the drivers, the strength of the disk drive itself, all count a lot. Some disk drives don't work well with lots of outstanding requests (tagged queueing in Adaptec-speak), so the driver needs tuning to reduce the load; Adaptec says this in their docs. In this case I suggest, if you can, using the latest drivers rather than those shipped on the very old original NW 3.12 media, and change the disk drive to a current production unit. This should, add quotes here, improve matters. As yet another example on drives. Our likable Seagate Barracuda SCSI units (7200 RPM) differ by a factor of about four in sustained throughput from early to current models. And that's from higher bit densities on the disk surface; the mechanicals are about the same. These are drives with decent SCSI implementations. >> There are yet two more items not mentioned but hopefully were >>controlled. One is the disk read-after-write checking. NW 3 has it on by >>default, NW 4 has it off by default. The difference is about a factor of >>two in write time, and hence is reflected in vastly different queue = >lengths > >Actually, I tried that on a mirrored server, Joe. I was disappointed >by the results. Does mirroring affect it much? Mirroring costs a little time, yes. Same controller, twice the data to move, with disk rotation waiting. If SCSI disconnect is defeated then the SCSI bus is tied up waiting. Flogging a weak disk system isn't productive. Joe D. ------------------------------ Date: Tue, 28 Oct 1997 12:04:48 -0600 From: Joe Doupnik Subject: Re: mirroring/duplexing performance >>>Actually, I tried that on a mirrored server, Joe. I was disappointed >>>by the results. Does mirroring affect it much? >> >> Mirroring costs a little time, yes. Same controller, twice the >>data to move, with disk rotation waiting. If SCSI disconnect is defeated >>then the SCSI bus is tied up waiting. > >Does duplexing avoid this performance issue of writing the same >amount of data twice on a write? > >I know reads are faster from duplexing... ------- Duplexing still requires the disk driver to send the data twice, once to each controller. But the controllers themselves will normally deal with moving data across the system bus into their own silicon. That means the driver issues the commands twice and then gets out of the way of the bus master controllers. If one wants to be cheap and clever and fast at the same time without going to full blown hardware RAID then adding spindles to a NW volume creates a striping situation where bits and pieces of files are scattered across each spindle. That can increase disk farm throughput, at very obvious risk factors. A safer move is simply use the fastest disks with excellent SCSI controllers and PCI bus, with large disk allocation units (and optionally subblock allocation to avoid space wastage). I use mirroring only as a crutch to solve some immediate hardware problems (drive goes off line every now and then). Remirroring is a very heavy load and long process when the server goes down abnormally (ABEND). I have not had the money nor free slots to play the duplexing game. Joe D. ------------------------------ Date: Wed, 5 Nov 1997 17:11:57 -0600 From: Joe Doupnik Subject: Re: ..packets per second..... >I know there are hundreds of variables, but what would be a reasonable >figure for packets per second and bytes per second in and out on a 200 >WKS, 10 server, IP and IPX network?...is 200 to 500 PK/s good, bad, >OK...should it be 80 to 120 PK/S ??...according to my ManageWise >system my network seems to run in the 100-200 PK/S range...is this >then the answer for my system?...is there no "normal" or "usual." >What about across a router ?...packets in and packtes out >?...20-80, or 200-300 ? --------- First, I always like to know if there is a problem. So we break apart the pkts/sec question into two parts. The first part is how many packets can we squeeze onto the wire, realizing that the wire does not care if it is very busy all day. The second part is what consequences are there for hammering a machine, say a server or even a client, with too many packets in a short time interval. The busier the wire the more your network is earning its keep. Do some arithmetic to see how many packets the wire could carry at the max. What are min/max Ethernet frame sizes? What separation must occur between each? Please do not pay any attention to that man labeled "collision" standing behind the magic curtain. What happens when a frame arrives at its destination station? An interrupt from the lan adapter to the cpu+driver is what. And that means time to deal with the arrived frame. How much of that do we think a machine can take? Good question, and the answer is not a terrific amount and still move the mouse cursor. It all depends, as we are wont to say, on the robustness and speed of the receiver and what else is competing for resources. Rather than reteach a course on networking I'll make two additional comments. One is for regular 10Mbps Ethernet 1000 pkts/sec is a lot, multiply by 10 for Fast Ethernet. The other is experiment to discover what sustained loading does between a client and a server, and what it does to other stations trying to reach the same server. Perform3 and iozone make nifty test tools, as we have discussed on the list many times. Visit netlab2.usu.edu, cd apps, for both. I warmly recommend testing, and thinking about the results. Joe D. --------- Date: Wed, 5 Nov 1997 17:50:42 -0600 From: Joe Doupnik Subject: Re: ..packets per second..... --------- To see graphs of throughput on 10/100Mbps Ethernet you might wish to look at the visuals-only portion of a presentation I put together this summer. It's a Office 97 PowerPoint thingy for Win95, archived as file fasteth.zip in directory misc on netlab2.usu.edu. Some careful looking reveals stations competing on the wire, Perform3, and what fraction of it they get. Pentium-90 clients to a PPro 200 INW 4.11 server. On the matter of bytes/sec, first figure out the carrying capacity of Ethernet. Then look at the many sites now running MRTG monitoring information to graph bytes/sec or bits/sec on a five minute average basis. Clicking on netlab1.usu.edu will show one example. Server edu-usu-engrlab is a student lab, INW 4.11, upon which the above presentation experiments were run (when quiet). Joe D. ------------------------------ Date: Thu, 6 Nov 1997 21:18:00 -0600 From: Joe Doupnik Subject: Re: Anyone using Adaptec AHA2940, NW4.1 more than one HDD? >I have a problem with the "speed" of hard disk access on our system. > >We use to have 1 Seagate ST32430N on the server. About 6 months ago I added >3 ST12400N's to the same SCSI bus, mirroring a pair of each. The result has >been a general slowing of disk I/O by (maybe) a factor of 3-4. (Each >physical drive is 2GB) If I count correctly that's seven (7) drives on a single SCSI bus. If you made your own ribbon connector the attachments must be more than just a few inches apart to prevent reflections/impedance problems from the lumped load of a drive. If the drives are external then the cable length must be short, not ten feet or more, and the cables must be of high quality (translation: fat and expensive). And the final terminator ought to be of the active variety. Further, ensure two more things. 1. Caching on the drives is turned off (please) with an Seagate supplied DOS program. 2. The controller is set to Not map sectors for large drives (NW does not use the PC bios after server.exe starts). I need not add that the drives must be kept cool. >I made a "rash" attempt to upgrade the 7800 Adaptec drivers and the >associated NLM's (NBI NWPA NWPALOAD) but the result was a disaster. Had a >few abends and it took ages to remirror the "mess". Per a Novell technote I >also decreased the bus speed and disabled tags, but the drives then >deactivated maybe three times a week. I am now back to what my setup was >previously! Then there maybe much more wrong in your system than just drives. Take a very careful look at PCI latency, less means get off the bus faster when asked, and generally less is better. Your lan driver can hog the PCI bus too. >I now want to make a more ordered upgrade even starting as low as the SCSI >BIOS (currently 1.11) You may wish to consider the dual SCSI channel version, Adaptec 3940, to gain more SCSI bandwidth. I'm assuming those are SCSI narrow drives and so is the controller. Mixing wide and narrow is not a good idea. >I'd like to contact anyone that has a configuration similar to this or if >you have experienced the kind of problems I am talking about. > >Server has 48MB RAM, SYS volume has 16K blocks and compression and suball >is turned on (woops). VOL1 has 64K blocks and likewise has comp and suball >on. I have actually disabled overnight compression though. Chache hits are >always 96% plus. Well, may I recommend turning off compression and never turning it on again. It is trouble waiting to happen. Sub-block allocation is a good thing; use it. Go for the full 64KB disk allocation units to improve disk channel efficiency. Joe D. ------------------------------ Date: Fri, 7 Nov 1997 13:52:00 +0200 From: "David W. Hanson" Subject: Re: LZFW shows errors 300/sec with no other packets moving. >We have a 10baseT ethernet network which has an averaged packet rate >of 500 to 1500 pkts /sec giving an average utilization of 20%, spread >over 5 hubswith fibre links. We have a single NW 3.11 server, not >running packet burst, and a number of unix SGI and SUNs. For the past >week , at randomn intervals and times all packets cease to flow, LZFW >show errors at 300pkts/s, the vast majority being fragments, with all >other packets ceasing for the same period of time, in fact the graphs >are an inverse of each other. Both rise and fall are steep with a >flat plateau of between 2 and 10 minutes. We have tried disconnecting >fibre to our main hub, but because of the randomness off the >occurances we have been unable to identify which particular branch of >our network contains the faulty hardware, as we assume it to be >hardware related. Another feature has been a steady flow of oversized >packet errors, but we are not certain if this is related other then >they also feature during the seizure of other packets. First, capture the oversized packets and use the MAC address to locate their source. A 'steady flow' of them should not be tolerated. Try pointing Lanalyzer at all segments and constantly capturing all packets, overwriting the buffer when it fills. Then, as soon as you have a failure, stop the captures and pick through the fragments to see if you can identify a MAC address. Most likely it is an intermittant NIC failure. BTW, what do you do to get things going again, or does it fix itself? --------- Date: Fri, 7 Nov 1997 12:01:00 -0700 From: Hansang Bae Subject: Re: LZFW shows errors 300/sec with no other packets moving. You don't by any chance have any duplexe NICs do you? I've seen this happen when a card decides and/or hub port decides to go from full to simplex operation.. Since there are NO media contention in a full duplex operation, you can imagine the problems that can show up. Capture the trafic and see who the source MAC address is. You may have to go to each segment to find out. 300pkts/s error most being fragments... hmm this smacks of faulty collision detection going on.... are there any usuable captures? ------------------------------ Date: Thu, 13 Nov 1997 14:30:15 -0500 From: "Brien K. Meehan" Subject: Re: Compression, problem? >I've stumbled across some info/opinion that indicates that Novell's file >compression available in Intranetware shouldn't be used. That's fine, but >I was curious as to why. I always advise against using Netware's compression. I believe it's poorly designed, because it depends on the file being decompressed on the disk for it to be presented to the client, (which negates any usefulness it might have, in my mind) and because it's a CPU-intensive task that they tried to cram in to a low priority thread. I've had lots of problems with it. They were caused by demands for compression during periods of high utilization. Compression is a CPU hog that tries to run as a low priority thread. During high utilization periods, other processes wind up waiting for compression. Other functions try to start, and wind up starting a new service process because the old ones are busy waiting. It tends to cascade. On a good day, the server would run out of service processes and return an error to the requesting thread. The server would usually ABEND, though. Also, I've run into this problem often: A server administrator notices that they are running out of disk space, so they start looking around for files to delete. They delete a bunch, and wind up with less disk space than they had! They've just uncompressed their old files. So, they start working with current files that can't be decompressed due to lack of disk space ... chaos ensues. --------- Date: Thu, 13 Nov 1997 17:18:00 -0500 From: David Weaver Subject: Re: Compression, problem? >>I've stumbled across some info/opinion that indicates that Novell's file >>compression available in Intranetware shouldn't be used. That's fine, but >>I was curious as to why. >>Can anyone shed some light on this? > >Once mangaed a 4.02 server w/ compression turned on. On any given >moment, had 600 users connected to it (1000 user license) Did not have >much problems w/ compression. HOWEVER, disks are cheap. Compression >just adds yet another layer of complexity. On top of that, you have to >keep an eye out during backups and restores, have to ensure that files >can be decompressed, etc etc. One other thing, it takes CPU cyclees to >compress and decompress files.... cycles that can be used for other >things. With today's processors, namely the P-II, are so fast that the performance bottleneck is in the nic or disk sub-system. With a 386, admins would have to be cautious of how much of a load certain processes put on the CPU but still the impact of compression is minimal. --------- Date: Thu, 13 Nov 1997 18:56:10 -0500 From: "Brien K. Meehan" Subject: Re: Compression >With today's processors, namely the P-II, are so fast that the performance >bottleneck is in the nic or disk sub-system. With a 386, admins would have >to be cautious of how much of a load certain processes put on the CPU but >still the impact of compression is minimal. Well, that sounds lovely. (Does anyone remember the good old days, when we'd use XOR AX,AX instead of MOV AX,0 because it was faster?) I've compared utilization on a server with a compressed volume to an uncompressed volume, using a Pentium Pro 200. I think that's sufficiently "up to date." In both circumstances, the volume total was 8GB. The volume supporting compression was 90% full, 50% of which was compressed files, which would have taken 6GB without compression. The volume not supporting compression was also 90% full. I tried attaching with 2 workstations. I've had "bad" results by trying to perform compression, full virus scan, and a tape backup at the same time, using default settings. So that's what I used to test them. With the volume supporting compression, over a sample period of 30 minutes, the average CPU utilization was 95%, and would often "peg" at 100. The number of service processes increased throughout the period (but didn't max out, unlike my production servers). The workstations were unable to get a response from the server about 25% of the time. With the volume not supporting compression, over the same sample period, the average CPU utilization was 6%, peaking at about 20%. The number of service processes did not increase. The workstations were always able to get a response from the server. I performed this non-scientfic benchmark to show a department why I was buying them so much new disk space for an upgrade. So, even in light of the Intel hype, it's still my "opinion" that Netware Compression Kills Servers. --------- Date: Thu, 13 Nov 1997 17:23:37 -0800 From: Alan Rowe Subject: Re: Compression In that situation I agree compression kills. Here we live off it. We do a lot of graphic production and use compression for seldom used files etc. With setting like Wait 30 days Before Compression I am pretty sure that any normally used files are not compressed and the server never takes a big hit decompressing files. Like anything else used with care it can be a helpful tool but taken to an extreme it can kill file servers. --------- Date: Fri, 14 Nov 1997 09:35:31 -0500 From: James E Borchart Subject: A differing opinion on compression My company would not be surviving too well if it werent' for Novell's compression, and we make drives! Using Novell's defaults to only compress at night, we have never had a significant problem with compression. It works very well for large volumes with zillions of files. Our users, like most users, will not delete any of their own old files. I have never seen an abend, and have never seen other processes wait due to compression. We have one server with a 27GB raid array and 8GB mirrored (that's available partition sizes after mirroring or raiding.). This server has 1400 people on it during the day and 500 at night in a 24 hour factory. It only stores applications and every imaginable version of every imaginable application is stored on it, 80% of the files rarely get used and are compressed. Its never had a compression problem. 3.12 servers, on the other hand, don't have enough disk space ever. They constantly get filled up with old files and we don't know what to delete. After moving to 4.x we don't have disk fill up problems anymore. ------------------------------