----------------------------------------------------------------------- NOV-PER3.DOC -- 19980303 -- Email thread on NetWare Performance Aspects ----------------------------------------------------------------------- Feel free to add or edit this document and then email it back to faq@jelyon.com Date: Mon, 1 Dec 1997 14:00:39 -0600 From: Joe Doupnik Subject: Re: Client32 vs Netx vs VLMs >I still have the question of performance. > >We are running netware 3.12 and workstations vary from 286 to >Pentium133. The 286 are using DOS, but all others are either Win3.1 >or Win95. On the Win95 I am using Client32 quite happily. I am >questioning about the Win3.1 units. I did a test on a 386dx25 with >8Mb RAM and 120Mb HD. I compared the speed of NETx, VLMs and >Client32 using PERFORM3. NETx : approx 120Kbs, VLMs: approx >300+Kbs, and Client32 : 230 Kbs. I was expecting Client32 to be the >fastest as it's the newest. The test was run under Win3.1 on the >same PC with no other configuration changes at the slowest time on my >network. --------- It is easy to diagnose. NETX does not support Novell's Packet Burst mode, so it uses Stop and Wait exchanges (send pkt, wait for response, send next pkt, etc). VLMs and later do support PBurst. VLMs are real mode affairs and the code nicely fits the Intel architecture so they can be very fast. C32 is a protected mode business, 32 bits and all that sales stuff, and consequently is must struggle very hard to compete with the speed of VLMs. Ask folks about the history of NT. In the beginning it was built but ex-DEC people to be extremely robust. To accomplish that goal only the kernel ran in unrestricted Intel Ring 0 and all other components ran in restricted Ring 3. The crossing from kernel to "user" modes, Rings 0,3 is costly in time. NT ran like a dog. When the reason became apparent more and more material was moved into Ring 0 where context switches and buffer copies etc could be removed. C32 has similar hoops to jump. NT ran faster, and lasted much shorter times between crashes. It seems that trusted agents, helpers in kernel space, were not so trustworthy after all. The push is on to move more into kernel space for performance reasons, and that's not a blessing. So the cleanest simplest tool for the job can often be the fastest. As to your numbers, they aren't all that great. Recall that regular 10Mbps Ethernet can carry 1MB/sec of user data (the remainder is in packet headers etc). Better lan adapters can use all that capacity and ask for more. Increasing the bit rate to 100Mbps gives them more, and traffic can easily saturate it at 10+MB/sec (does so here). Tuning PBurst can help in difficult cases, but the need for tuning is a sign that the real difficulty is in the lan adapter and its driver. Finally, there are two ends to the comms pathway so one must consider both in the throughput problem. MS Windows (any version) is an actively hostile environment for communications. That's not a grudge but rather an observation of architecture. To sense lan adapter and driver component performance never dream of testing under Windows, unless Windows is to be part of the overall equation. Joe D. ------------------------------ Date: Mon, 1 Dec 1997 17:49:47 -0600 From: Joe Doupnik Subject: Re: Perform3 - How to use? >I just used perform3 for the first time and don't really understand it. >I ran with the parameters: > > PERFORM3 RESULTS 12 1 4096 128 > >and got: > > 4096 bytes. 6057.81 KBps. 6057.81 Aggregate KBps. > 3968 bytes. 6061.56 KBps. 6061.56 Aggregate KBps. > 3840 bytes. 5787.91 KBps. 5787.91 Aggregate KBps. > 3712 bytes. 5669.17 KBps. 5669.17 Aggregate KBps. >[snip] > 768 bytes. 272.89 KBps. 272.89 Aggregate KBps. > 640 bytes. 523.12 KBps. 523.12 Aggregate KBps. > 512 bytes. 739.85 KBps. 739.85 Aggregate KBps. > 384 bytes. 557.36 KBps. 557.36 Aggregate KBps. > 256 bytes. 366.25 KBps. 366.25 Aggregate KBps. > 128 bytes. 184.00 KBps. 183.99 Aggregate KBps. > 6061.56 Maximum KBps. 1943.75 Average KBps. > >I also ran Lanalyzer next to it, on the same workstation, and was >surprised to see that I wasn't emitting any traffic on the LAN, >other than normal Pegasus mail checking, etc., as was expected. >So, what is perform3 doing? I thought it was going to send >progressively larger files to someone on the LAN, like the broadcast, >my default server, etc. Lanalyzer seems to say my workstation wasn't >speaking to the LAN at all. --------- Perform3 is a simple thing. It read/writes to THE CURRENT DRIVE LETTER a sync file and a short test file. Other stations running Perform3 at the same time look for the sync file and wait for it to be changed by the master station (the one creating the sync file, first to be started). All then read the short test file, often only a few KB. Because that fits well within NW server cache the test measures memory to lan to client memory speeds. Joe D. ------------------------------ Date: Thu, 18 Dec 1997 20:12:58 +0200 From: Henrik Olsen Subject: True Commit/Write Cache in client 32, some numbers (long) After reading lots of guesses and more or less accurate impressions on the effect of using true commit and write cache under win95, I decided to do some testing to get my numbers right. First of all, the tools: Server: NW 3.12, patched to the second latest set, EISA bus, 486DX 33Mhz, 16 MB ram, Adaptec 1742 scsi controller NE-3200 net card. Clients: PCI bus Pentium 90MHz, 32 MB ram, NE-2000 compatible noname/3com netcards 3 running win95, v. client 32 1 running dos v. netx, and one 486DX2-66 with 8 MB ram running dos v. netx Net: Thin coax ethernet. The tests: First I ran a series of tests using perform3 running on all clients at once. This gives a good indication of the peak performance of the net as a whole, and the servers ability to serve packets. The result of this was a sustained throughput of around 1000 KBps for the net as a whole, with an interesting pattern in the relative throughputs of the different systems. The two dos clients, and one win95 client which had packet burst disabled had a throughput of between 1/3 and 1/2.5 (around 120KBps) of the two win95 clients with packet burst enabled (around 320), making me conclude that with a strong netcard on the server, packet bust will give a performance increase when multiple clients are competing for the bandwidth. Running perform3 with only one client, with packet burst on and off gave averages of 722 KBps and 423 KBps respectively, which leads me to the conclusion that packet burst (with a strong netcard in the server) will increase the performance of the net as seen from the user, with the greatest increase when the clients are in a mixed pb and non pb environment and has to fight for bandwidth. During this set of tests, the server sat quietly, twiddling its thumbs at 20% cpu utilisation. :) The second set of test I ran was meant to test the performance loss of turning true commit and write caching on or off. These tests where run using the iozone program which (in this test) wrote a 1 MB file as 2048 blocks of 512 bytes, then read it again the same way, timing each operation with the dos timer running at approx 18.3 ticks per second (the latter is mentioned since some of the operations took less than 2 seconds making the numbers somewhat vague). First, the dos/netx clients, 486 : write 164 kBps, read 140 kBps pentium : write 289 kBps, read 290 kBps Then one of the win95 machines, with packet burst on except where noted: With True commit off, write caching on write 9,532 kBps, read 617 kBps, large insecurity on the write time, as it was done in less time than the timer could distinguish. With True commit off, write caching off write 245 kBps, read 637 kBps, dirty buffers jumped temporarily on monitor With True commit on, Write caching on write 5.7 kBps, read 579 kBps, never more than 1 dirty buffer With True commit on, Write caching off write 5.7 kBps, read 636 kBps, never more than 1 dirty buffer With True commit on, Write caching off and Packet Burst off write 5.7 kBps, read 415 kBps, never more than 1 dirty buffer Since someone mentioned the "NCP File Commit" console parameter, I repeated the previous set of tests with NCP File Commit=OFF. This didn't change the results in any significant way. What this shows is that for read performance, packet burst is the significant factor, whereas for writing it's True commit and, if that's off, write cache. My conclusion from this set of tests, is that you can turn Write cache on if you trust windows 95 (I don't), and that you will avoid a very large performance loss by not turning True commit on, which you can do if you trust your server (I do). My personal conclusions are that the server is seriously starved for memory, something I can't help unless I find someone who supplies EISA memory boards, and possibly that it could do with an upgrade to a 2740 scsi card instead of the old 1742 it has now. I'm pinching pennies in preparation for buying a new server:) I hope this helps, it definitely decided what setting we have now:) --------- Date: Thu, 18 Dec 1997 21:03:52 +0200 From: Henrik Olsen Subject: True Commit/Write Cache in client 32, some numbers (update) Just a quick update, running iozone in auto mode, which cause it to do the writes and read in different block sizes, gave the following results when run with True Commit on: IOZONE: auto-test mode MB reclen bytes/sec written bytes/sec read 1 512 5631 354248 1 1024 9990 354248 1 2048 18232 347210 1 4096 32086 347210 1 8192 43473 579323 2 512 5526 647269 2 1024 10397 647269 2 2048 18339 659481 2 4096 31895 647269 2 8192 40928 762600 As you can see, the size of the written blocks have an extremely large influence when true commit is on leading me to conclude that the wast majority of time is spent waiting for the handshake to confirm that the block truely has been written. --------- Date: Thu, 18 Dec 1997 14:49:12 -0600 From: Joe Doupnik Subject: Re: True Commit/Write Cache in client 32, some numbers (update) Very interesting and well done experiments Henrik. True Commit involves waiting for the disk drive to report back success in detail (read back the data just written), and thus rotational delays etc become dominant. Normally we don't see this delay because the server promises to check for us and leaves data in the disk write queue. NE-2000's aren't the swiftest 10Mbps Ethernet boards available and with them it is easy to run in cpu-cycle starved mode on clients. As an example, using a swift board lets a client saturate a 10Mbps Ethernet with ease, and to use 60% of 100Mbps Ethernet. That's a Pentium-90 client. Putting these boards into a server readily shows the cpu consumption effect. Your tests largely avoided server memory so I don't think you can conclude server starvation from them. EISA machines use the same memory SIMMs as other machines, unless the machine is rather peculiar. A 2740 SCSI controller will be faster than a 1740, by tests here. PBurst makes a big difference on large block transfers. This is easily seen with a network monitor. Joe D. ------------------------------ Date: Wed, 7 Jan 1998 11:51:54 -0700 From: Joe Doupnik Subject: More on diskless clients >What about comprimising with a shared install, local HDD for swap? If you >had an image of the workstation HDD on the server I have to believe you >could copy what little is on it down quickly. My problem in this install >has been getting the "my documents" folder in the right place. ------- Engineer hat placed to state = on. Swapping is infrequent if the machine has say 32MB. Thus swapping is not a major concern for such installations. I put the swap file on the server in a per-station (not per username) dedicated space, and in practice this has worked well indeed. 32MB clients. In three years time more client memory will be needed for the same results. Win95 registry stuff goes to a small-ish C: ramdrive. The server is a lot faster about disk work than a local hard drive. The fundamental goal is a clean environment, and today that means starting with a fresh partitioning of a local hard drive (else trouble). Just copying files (Ghost, PC-Rdist) is insufficent. On monetary issues: look at the cost of local disk drives, over several years, and the much larger cost of maintaining them, and the times when maintenance would occur (salary, finding folks to work the hours). I think the answer is local hard drives are rather expensive and still do not meet my clean-fast-guaranteed requirments. If they did I would use them. Joe D. ------------------------------ Date: Mon, 19 Jan 1998 18:12:31 -0700 From: Joe Doupnik Subject: Re: Removing a corrupt NDS >You may be experiencing an "intermittent data transfer reliability >issue." Make sure the hardware is Novell Labs Tested and Approved. >Also, does your SCSI drive require low-level formatting? If you've >changed the settings in your SCSI configuration menus, you may also >need to do this. > >Sometimes the BIOS needs to be updated (very often on clones) to >resolve these types of issues. Certain adapters, such as the Adaptec >2940UW, work best on IRQ 10. When mixed with an outdated BIOS, the >results can be unpredictable. Don't you just love PCI (grumble >grumble groan...)? > >I recommend you stress-test your system overnight with a utility >that can dump large amounts of data (so as to fill the disk cache >at least twice), then read it back and compare. --------- That's good advice. One tool to stress test the disk system is IOZONE.EXE, available widely including from netlab2.usu.edu in dir apps, mirrored to netlab1.usu.edu pub/mirror/apps. This includes C source code so you can change it and also see its simple write-file, read-file strategy. Even then this isn't enough to deal with some gotcha's, at least in my experience. We need to trash and consume cache to make it work hard. So while running iozone on a few machines on others I run DOS DIR /S from the root in a loop. That's a start and one can easily complicate this with file transfers and deltrees etc. The idea is to keep the disk drive light on as much as possible while beating the daylights out of fancy computer science stuff in the server. Finally, my favorite stress test is tape backups. I use BackupExec which has an accelerator module (compresses over the wire) and that puts quite a load on both servers involved (the one with disks being backed up and the other with the tape drive). This procedure also exhausts cache as well as stressing NCP, SPX, and lan driver components. Joe D. ------------------------------ Date: Tue, 17 Feb 1998 17:53:22 -0700 From: Joe Doupnik Subject: Re: Suballocation and performance: >I have a quick question on whether turning block sub-allocation on a volume >actually improves performance or not. At a major site, I suggested they go >with a separate volume for holding print queues and turn compression and >sub-allocation off on that volume (what's the point in sub-allocating for >print jobs?). However, the in-house IT guy claims to have heard from some >one at Novell that turning sub-allocation off will degrade performance >because now the heads have to travel more. > >I can convince myself that heads could possibly travel more with >sub-allocation off (hard to believe though that that type of movement will >lead to performance degradation), but the overall savings in the extra load >on the server from having to compute/track the sub-allocations should >improve performance. Any thoughts or maybe even hard numbers (probably >hoping for too much!) ? -------- Just talking about issues such as suballocation does not lead to conclusions. Analysis and modeling can, particularly the analysis part. Suffice it to say, suballocation uses available space better, and therefore there is less wasted platter motion skipping over unused tag ends. CPU time is mimimal in all cases, and nearly the same. Track changes occur anyway from simple reuse of the disk space, and large allocation units reduces such motion by fragmenting less. What counts for performance is the large allocation units. Suballoc takes care of the very last piece, as I understand it. Well, a piece is a piece and must be found and read. Thus suballocation saves space but not necessarily time, nor does it necessarily consume measurable time. It is easy to run controlled experiments, so if the question really bothers you give it a trial and report back. In the meanwhile note that other major file systems have been using subblock allocation for a rather long time, with those for Unix being the best known to readers here. Finally, one fine day there will be a mad rush to drives dealing with 4KB sectors rather than today'd 512B sectors, for reasons of performance. It is more important that wide SCSI, but the two go hand in hand. Joe D. ------------------------------ Date: Thu, 19 Feb 1998 12:40:37 -0700 From: Joe Doupnik Subject: Re: 3.12/2940W/PPro -PROBLEMS- (Longish) >My recent activities have included a rigorous scan for viruses on the DOS >partition of the server boot drive with Mcafee's latest (found nothing, of >course), swapping my SCSI cabling for top-of-the-line Granite Digital >cabling, and updating the motherboard BIOS to the latest revision. For >what it's worth, I am for all intents and purposes running a name-brand >system, albeit cobbled together from pieces so that I don't have a system >vendor to lean on. The motherboard is an Intel-issue VS440FX pentium pro >board, now running BIOS revision 16. The twin (duplexed) 2940W Adaptec >SCSI adapters are both running the latest drivers from Adaptec, and I have >tried every trick in the book to make them and the disks happy. This >includes double checking the termination, backing off the SCSI bus speed >from 10Mhz to 8Mhz, disabling tag queuing, verifying that write caching on >the drives is disabled, etc. etc. (Plus, my autoexec.ncf and startup.ncf >have been refined with suggestions from JoeD- thanks!) ---------- Strange. Your figure below of flushing 50 4KB buffers/sec is close to waiting for the disk to rotate round, writing a buffer, waiting for the disk to rotate round, reading it back. (7200 RPM = 120 revs/sec). I don't see this terrible performance here with the same motherboard, but with mirrored Seagate Barracudas (4GB, much more modern drives than your first issue variety), Adaptec 3940 controller. INW 4.11, not NW 3.12, 64KB allocation units, not 4KB. I see sustained very long transfer write rates of 3.5MB/sec, and the dirty cache buffer count stays well below 1000. The Wide SCSI stuff does not make disks go faster, of course, and instead simply lets the SCSI bus use less time to get the work done. Given that the disk is the slowest part anyway, and the cpu has little to nothing to do with SCSI bus details, Wide/Narrow does not make much difference. One presumes your server has been told it can do many disk writes in one go. Console SET stuff. A huge number of pending writes indicates the server can't get at the disk system because something else is clamoring for attention (say the network adatper). A suggestion is break the mirror and see what happens. Simlarly, change network adapters. Joe D. >I am very interested to see that your analysis very closely parallels my >own observations, right down to the critical factor of determining the >speed by watching the dirty cache buffers flush at the tail end of a large >transfer. I have loads of memory on this machine (128MB), and I can get >all the way up to 19,000+ dirty cache buffers before it puts a speed bump >on the network wire. This equates to exactly 75% of available buffers, so >it must be the safety threshold Novell has set. Even on this fast >processor & using fast/wide SCSI drives (4Gig Seagate Barracuda model >15150W), I see dirty cache buffers declining at a peak rate of only >50/second, which with 4K buffers is dead-on the performance you cite, >approximating 200KB/sec. A very similar setup (identical adaptor cards & >drives) on an NT machine in my office, talking across the very same network >wiring, can transfer files at consistent rates as high as 4.5MB/sec for the >whole of a 500MB file, including the read-after-write verify at the target >drive, fed from a workstation with "only" an EIDE drive. > >The only encouraging aspect to me is that I only noticed this apparent lack >of performance as part of troubleshooting why the remirroring process was >hogging my server enough to completely ignore connection requests; if this >throughput problem you have so well documented is normal, then my gremlin >may lie elsewhere. > >Nonetheless, I too find it troubling that an OS that is theoretically >optimized as a fileserver is not capable of talking a little faster to the >disks than 200KB/sec. Why bother with fast/wide SCSI, if it doesn't make >any difference? (Hey Novell! What gives?) > >Again, thanks for your response, and if I'm able to cast a little more >light on this topic, I'll forward it on. > >For anybody else who has done more thorough testing on 4.11 (as opposed to >the 3.12 I'm saddled with) please feel free to chime in. > >-Knute >Knute, your problem sounds a lot like one I was working on a while back. > >I did various tests, and came to the conclusion that, so far as I could >determine on any system I had available for testing, in any device >configuration, NetWare 3.12 does disk I/O significantly (i.e. 2-3x or more) >slower that MSDOS would on the same hardware. Or NT, for that matter. [...] >The most indicative metric I found was to watch the rate at which the dirty >cache buffer count decrements after client processes stop sending data to >the server's drive. Under NetWare 3.12, this would rarely exceed 50 >buffers/second. Under NetWare 4.x, (and this was usually on faster >hardware; such is life) I have seen this rate easily exceed 250. [...] >Ken Wallewein --------- Date: Thu, 19 Feb 1998 18:40:35 -0700 From: Joe Doupnik Subject: Re: 3.12/2940W/PPro -PROBLEMS- (Longish) >I am very interested to see that your analysis very closely parallels my >own observations, right down to the critical factor of determining the >speed by watching the dirty cache buffers flush at the tail end of a large >transfer. I have loads of memory on this machine (128MB), and I can get >all the way up to 19,000+ dirty cache buffers before it puts a speed bump >on the network wire. This equates to exactly 75% of available buffers, so >it must be the safety threshold Novell has set. Even on this fast >processor & using fast/wide SCSI drives (4Gig Seagate Barracuda model >15150W), I see dirty cache buffers declining at a peak rate of only >50/second, which with 4K buffers is dead-on the performance you cite, >approximating 200KB/sec. A very similar setup (identical adaptor cards & >drives) on an NT machine in my office, talking across the very same network >wiring, can transfer files at consistent rates as high as 4.5MB/sec for the >whole of a 500MB file, including the read-after-write verify at the target >drive, fed from a workstation with "only" an EIDE drive. ------- Sorry to continue responding to the same message, but this one has my curiosity aroused. 19000+ dirty cache buffers translates into about 76MB of material queued for disk. If we assume 10Mbps Ethernet we get max 1MB/sec of data and that means nothing at all reaches the disk for 76sec. If 100Mbps Ethernet the delay shrinks to 7.6 sec or greater. Clearly something is drastically wrong. I ran a quick check against my NW 3.12 server, Adaptec 2742AT EISA controller, Barracuda 4GB drive, 486-33 EISA bus, 64MB. Iozone was the quicky tool. Dirty cache buffers got up to 150. Throughput was 500- 800KB/sec, for the 10Mbps Ethernet link. Disk read after write checking is ON. Client32, Pburst active, no client caching. Intel EE PRO lan adapters. On another server, INW 4.11, with mirrored 4GB Barracudas (3940 PCI controller) using 100Mbps Ethernet the dirty buffer count gets larger, many hundreds peak and the throughput is about 3.5MB/sec. I do not run virus checking software. I throttle PCI bus latency to 32 clock tics, or so. I run the SCSI adapters at full speed. Joe D. --------- Date: Fri, 20 Feb 1998 18:18:58 -0800 From: Knute Ream Subject: Re: 3.12/2940W/PPro -PROBLEMS- (Longish) OK, after a little more testing, (well, actually a lot) below are my observations. For those of you just tuning in, the issue was that my 3.12 server (PentiumPro 200 on an Intel "Venus" VS440FX motherboard (BIOS rev 16), twin Adaptec 2940W SCSI controllers, twin F/W Seagate Barracudas (ST15150W),128MB RAM, and an Intel EtherExpress Pro 100B network adaptor) was exhibiting truly slow performance in writing to the disk. When testing by copying a very large file across the wire, after saturating the dirty cache buffers I was seeing the buffers decline at a peak rate of ~50/sec, which with 4K block size (and 4K cache buffers) equates to 200KB/sec. I have also tried different network adaptors (a wide range including a good old NE2000 (ISA) and a 3COM 3C905 (PCI, 1O), with no change other than slower wire transfer rates and slightly higher processor utilization. As stated before, I have run VERY thorough checks for proper SCSI cabling/termination, and am completely current on _all_ relevant patches from Novell, Intel & Adaptec. Joe Doupnik pointed out something that got me thinking: > Strange. Your figure below of flushing 50 4KB buffers/sec is close >to waiting for the disk to rotate round, writing a buffer, waiting for the >disk to rotate round, reading it back. (7200 RPM = 120 revs/sec). I don't >see this terrible performance here with the same motherboard, but with >mirrored Seagate Barracudas (4GB, much more modern drives than your first >issue variety), Adaptec 3940 controller. INW 4.11, not NW 3.12, 64KB >allocation units, not 4KB. I see sustained very long transfer write rates >of 3.5MB/sec, and the dirty cache buffer count stays well below 1000. > The Wide SCSI stuff does not make disks go faster, of course, and >instead simply lets the SCSI bus use less time to get the work done. Given >that the disk is the slowest part anyway, and the cpu has little to nothing >to do with SCSI bus details, Wide/Narrow does not make much difference. > One presumes your server has been told it can do many disk writes >in one go. Console SET stuff. A huge number of pending writes indicates >the server can't get at the disk system because something else is clamoring >for attention (say the network adatper). > A suggestion is break the mirror and see what happens. Simlarly, >change network adapters. > Joe D. For starters, playing with the settings for "maximum concurrent disk cache writes" had absolutely zero impact on performance for any number between 50 and 300. (I tried a bunch of different sizes in that range). The disk would start doing its business as the transfer started, but because it couldn't keep up, the dirty cache buffers would simply fill until it reached a threshold and sent out "server busy" packets to the client (assuming that's what it does to back off the transfer). Although the "current disk requests" shown on the monitor could be raised by changing the "max" settings, more was not detectably faster. But Joe's observation about the timing related to disk RPM and read/write cycle suggested that changing the block size should make a difference; the disk read after write process appears to verify each individual block before writing the next one, and if the disk has to spin all the way around to read it again, things slow to a crawl. So I rebuilt the server with larger block sizes, (8K instead of 4K) and noticed no performance difference. So I rebuilt it again with a block size of 32K (by now I'm getting really good at using my backup software), then set the cache buffer size to the maximum setting of 16K. WOW! Big difference. The dirty cache buffers still decline at the same rate (~48/sec), and I still have no performance increase from setting max concurrent disk cache write requests above 50, but now each buffer is dumping four times as much data to the disk, which equates to a peak rate of about 800K/sec; this improvement from 200K to 800K is directly proportional to the change in cache block size (16K from 4K) and not partition block size (32K from 4K), so I doubt I am getting any benefit from the 32K partition blocks as opposed to 16K. Now back to Ken's earlier question about why the transfers are so much faster under DOS- it's the read after write verify. When I turn off read after write, even with the max concurrent disk cache writes down around 50 (little or no benefit again for over 50 when writing one big sequential file) I can now easily attain a write through of about 2,900KB (2.9MB)/sec, which is pretty respectable considering that that's coming in over the wire and rather than as a local transfer. If only I could sleep well at night without read after write; I sure lust after the performance... The SQLBase client/server database software that we use (the main reason for the existence of our server) exhibited a dramatically positive improvement today in production use as a result of these changes, cutting the time for complex processes (such as a full reschedule of our manufacturing floor) by nearly 30%. So, in summary, for anybody concerned about max throughput on disk cache writes in an environment substantially similar to mine, increase your partition block & cache block size to 16 (the default is 4) and don't look back. Hope that helps somebody! Now about that problem I have that started this whole thing, in which the remirroring process blocks any attempt from a user to log in, I am still searching for a solution. Something changed along the way, because it DEFINITELY used to let users in while remirroring in the background, and I can't for the life of me figure out why it won't anymore. I have tested setting concurrent remirror requests to any number between 4 and 30, and no matter what the setting, the server utilization stays at near zero, the full remirror takes about 4-1/2 hours, and nobody can do _anything_ with the server until it's done. --------- Date: Fri, 20 Feb 1998 21:21:00 -0700 From: Joe Doupnik Subject: Re: 3.12/2940W/PPro -PROBLEMS- (Longish) >For starters, playing with the settings for "maximum concurrent disk cache >writes" had absolutely zero impact on performance for any number between 50 >and 300. (I tried a bunch of different sizes in that range). The disk >would start doing its business as the transfer started, but because it >couldn't keep up, the dirty cache buffers would simply fill until it >reached a threshold and sent out "server busy" packets to the client >(assuming that's what it does to back off the transfer). Although the >"current disk requests" shown on the monitor could be raised by changing >the "max" settings, more was not detectably faster. > >But Joe's observation about the timing related to disk RPM and read/write >cycle suggested that changing the block size should make a difference; the >disk read after write process appears to verify each individual block >before writing the next one, and if the disk has to spin all the way around >to read it again, things slow to a crawl. So I rebuilt the server with >larger block sizes, (8K instead of 4K) and noticed no performance >difference. So I rebuilt it again with a block size of 32K (by now I'm >getting really good at using my backup software), then set the cache buffer >size to the maximum setting of 16K. WOW! Big difference. The dirty cache >buffers still decline at the same rate (~48/sec), and I still have no >performance increase from setting max concurrent disk cache write requests >above 50, but now each buffer is dumping four times as much data to the >disk, which equates to a peak rate of about 800K/sec; this improvement >from 200K to 800K is directly proportional to the change in cache block >size (16K from 4K) and not partition block size (32K from 4K), so I doubt I >am getting any benefit from the 32K partition blocks as opposed to 16K. >Now back to Ken's earlier question about why the transfers are so much >faster under DOS- it's the read after write verify. When I turn off read >after write, even with the max concurrent disk cache writes down around 50 >(little or no benefit again for over 50 when writing one big sequential >file) I can now easily attain a write through of about 2,900KB (2.9MB)/sec, >which is pretty respectable considering that that's coming in over the wire >and rather than as a local transfer. If only I could sleep well at night >without read after write; I sure lust after the performance... >So, in summary, for anybody concerned about max throughput on disk cache >writes in an environment substantially similar to mine, increase your >partition block & cache block size to 16 (the default is 4) and don't look >back. >-Knute -------- Excellent work! That's what we need around here. For what it's worth dept. NW 4 lacks SET command to change the file cache buffer size. We may infer from disk throughput experiments that it is effectively at least as large as the disk allocation unit. We know that overall memory management in NW 4 is very much improved over NW 3, and NW 3 is vastly improved over venerable NW 2, but there are things they forgot to tell us that can make a big difference. You found another one. Comparing your new settings for NW 3.12 with what I see for NW 4.11 on similar hardware I think 4.11 is using large block writes, as large as necessary. When I drive 4.11 with 100Mbps Ethernet the disk dirty cache buffer count does go up but only to 1000 or so (somewhat more if read checking is enabled). On read-after-write-check. I too am of uncertain position on this item. My sound instincts say turn it on for safety (disks do go bad these days which is why we mirror and duplex), and my go-fast instinct says the opposite. Recent NetWares have sided with the go-fast approach as the new default. If disk drive prices continue to plummet we may want to turn on the checking to survive the loss in quality. In the near future looms NSS with its full journaling file system and treed lookups and instant volume mounting irregardless of volume size, and more. Playing with the current NSS material shows it to be pretty fast, and to have survived at least one server crash here (Moab, natch). I don't know yet what that will do to our go-safe/go-fast decision. My UnixWare machine netlab1.usu.edu uses a full journaling file system without read after write checking (none that I know of anyway) and it is nearly bulletproof to crashes (yup, this is Unix). But I also mirror drives for safety: old habits die hard. Tomorrow, when the machine becomes free I will crank up file cache buffer size on my NW 3.12 server netlab2 and do some iozones. Given its good performance as an EISA bus machine (Intel EE PRO lan adapter) there may not be much difference, but we shall see. Knute: Novell sales will look daggers at you. You've just put life into NW 3.12 that they never expected. Good on you. Joe D. --------- Adding an important footnote to this topic, based on experiments this morning. The file cache buffer size may not be larger than the disk allocation unit on any volume. If it is set larger then the volume will not be mounted. If volumes have different allocation unit sizes then file cache buffer size must not be larger than the smallest of the units. This is for NW 3 only. NW 4 handles matters differently. Joe D. --------- Date: Sun, 22 Feb 1998 11:35:29 -0700 From: Joe Doupnik Subject: Re: 3.12/2940W/PPro - PROBLEMS- (Longish) To make another step forward through this performance problem I note that my NW 3.12 server does not show disk lagging effects nor slow performance. It uses Adaptec's EISA bus controller, an Adaptec 2742AT, narrow SCSI at that.4KB file cache buffers, 8KB and 4KB disk allocation units. Given the available information my suggestion to folks using the 2940W controller is to a) look into a controller BIOS upgrade (if that's possible) and b) look into the tagged queueing material for it too. Tagged queueing is, as the name suggests, a way of getting the controller to handle many requests in disk-surface-serial-order. Adaptec's tagged queueing has a checkered history and for all we know the 2940W may have it turned off by default. See the docs for that controller for command line options. Joe D. --------- Date: Sun, 22 Feb 1998 21:58:47 -0800 From: Knute Ream Subject: Re: 3.12/2940W/PPro - PROBLEMS- (Longish) Joe Doupnik wrote: > To make another step forward through this performance problem >I note that my NW 3.12 server does not show disk lagging effects nor slow >performance. It uses Adaptec's EISA bus controller, an Adaptec 2742AT, >narrow SCSI at that.4KB file cache buffers, 8KB and 4KB disk allocation units. > Given the available information my suggestion to folks using the >2940W controller is to a) look into a controller Bios upgrade (if that's >possible) and b) look into the tagged queueing material for it too. Tagged >queueing is, as the name suggests, a way of getting the controller to handle >many requests in disk-surface-serial-order. Adaptec's tagged queueing has >a checkered history and for all we know the 2940W may have it turned off by >default. See the docs for that controller for command line options. > Joe D. Hmmm. I'm assuming that Joe had "read after write verify" set to on, which would correspond to my testing. My brief investigation into BIOS upgrades for the 2940W controller suggests that while it may be possible (apparently depends on the specific revision of the controller) it will require an EPROM burner. I'll pursue that, and report back if I find anything noteworthy related to changes in the BIOS. As far as tagged queing, the 2940W uses Adaptec's AIC7870 driver under Netware 3.12, and the current revision of this driver does enable taq queuing by default. As soon as I get a chance to bang things around with no users on the network, I'll generate some performance results with and without tagged queing, using either of 4K or 16K cache block sizes. (I believe in my current configuration tagged queing is disabled due to references from a Novell TID warning about potential problems, but enabling it with 4K cache buffers did not seem to solve my earlier problem of slowness; I'll try it again with 16K and see what happens) The other loose end which may relate (mentioned by Ken) is the actual mechanics of the read after write verify. My performance gain in changing the cache buffer size from 4K to 16K appears directly related to a process of each cache buffer being read and verified prior to the next write. Netware 3.12 has three options on the monitor console for read after write: "disabled", "hardware", and "software." In my installation, I am unable to select "software" (it simply won't select), but it occurs to me that I am also a little in the dark on where exactly this "read after write" is handled even if I choose "hardware". If I used tagged queuing, which would theoretically make the disk more efficient, and used "software" read after write verify, I might be able to keep the disk much busier since it wouldn't necessarily have to do all the writes and reads in rigorous serial order. Conversely, if each write must be immediately read for verification under "hardware" verify, it doesn't seem likely that I will see any benefit from tagged queuing in a "write" cycle even if reads are faster. So, can anyone cast any more light on the actual mechanics of the read after write verify, and what levels of software/hardware actually control the process? (i.e how does the "software" do it; part of Netware OS or specific to the disk driver software? How involved is the SCSI controller card? Does the disk itself handle the "hardware" read after write verify?) If I could nail this down, then I'd be a lot more comfortable understanding Joe's very different results on what should intuitively (at least to me) be a slower platform. Some day I'll finally figure this out! ------------------------------ Date: Fri, 20 Feb 1998 14:18:33 +0000 From: Richard Letts Subject: Re: Ethernet Utilization >I'm trying to do some testing of traffic on a segment of my network and >want to flood it with packets. Does someone have a way of doing this or >a utility that will do it? I guess I could run a batch file to >repeatedly do a directory listing of the server but workstation caching >may affect this. I would like to share a story with people, which may prove instructive. We had a network here which was of the classical bridge-in-the-middle type. Fileservers on 100base-T and labs on 10-baseT hubs. The network would perform well, and then suffer periodic slow-downs for no apparent reason. The network was poorly documented which made trouble-shooting a pain. We suspected loops inthe network, where two hubs were joined to each-other and also back to the switch. turning on spanning-tree helped identify some of these. and during the vacation we carried out stress-testing of the network by repeatedly loading and unloading windows on the diskless lab machines. Our aim was to load windows in less than 5 minutes in a lab of 30 machines. During testing the slowest machine took 90 seconds. come the start of term the network went to tortoise mode. In the end it turns out that the 100-T interface in the switch was set to full-duplex and the 100-T interface was set to half-duplex, despite the driver reporting it was full-duplex. changing the switch to half duplex fixed the problem. Changing the server would have required reloading it. Both equipment was manufacturered by the same supplier. The difference between the stress-testing we did and the real-world situation was one of randomness: we were generating repeating fixed bursts of traffic. Because any collisions delayed a station all 100+ machines self-syncronised. In real life the data flows were random and interferred with each other causing performance problems. In short: if you want to test your network properly you need to generate randon bursts of traffic. people are best sorts of randomness you can find. You can view graphs of network traffic on this network at: http://www.salford.ac.uk/ais/Network/newton and http://www.salford.ac.uk/ais/Network/newton-pkt. [Note the backup on Tuesday night over TCP has a data/ack packet ratio of 2:1 which netware client's don't get near.] Anyway, if all you are after is a packet generator perform3 from various sites is what you want. ------------------------------ Date: Fri, 20 Feb 1998 08:43:16 -0600 From: Dan Alexander Subject: Re: Analyzing network trafic (NW312) >1) Is some NLM available to analyze incomming and outgoing network traffic >for Netware 312? (Will not have IPX bind to the network card. Using >Token-ring, ROUTE.NLM and NW for SAA). The netware tracking screen may give you the information you're looking for here. From the system console, type TRACK ON to enable it. TRACK OFF stops it. >2) Is some analalysis tool availbale to correlate network names and MAC >addresses for _other_ servers on the network. For finding another server on >the network that has conflicting network numbers for certain protocols. The most bang for the buck I've seen is LANalyser. I've also heard of a sniffer program called SNIFFIT (or something like that). It's free and it runs on Linux (also free). I've had absolutely no experience with this program, so if you try it, please post a review. ------------------------------ Date: Fri, 20 Feb 1998 16:31:58 -0700 From: Joe Doupnik Subject: Re: Volume size/# recommendations >I'm going to be upgrading our primary file server in the near future and am >looking at reorganizing how our volumes are configured. I'll be using an HP >Netserver with 4 18GB drives in RAID 5. We have 3 volumes, SYS, VOL1, and a >volume for Win95 apps that require LFNs (ugh). > >For organizational purposes, I wanted to divide up the volumes further and >create a separate HOME and SHARE volume for user data, along with APPS, >32APPS, and an additional volume for a special database application. The >biggest advantage to me is that I can easily configure my backup save sets to >schedule the volumes differently. The only recommendations I've found from >Novell were in the Installation red book and those were to separate volumes >with different namespaces for ease of troubleshooting (DOSVOL, MACVOL, LFNVOL, >etc.), and recommendations for duplexing, mirroring, and all that other >drive-oriented mumbo jumbo. However, I can't find any recommendations on >volume sizes for effiency, limits, or anything else? > >Does anyone know of sources for this information? Would it behove me to >just create one SYS and one volume for everything else? I have looked all >over Novell's site, the FAQ, AppNotes, and Red Books and haven't been able to >find this set of specifications or recommendations. I am looking at a 2GB SYS >volume and up to 52GB allocated for everything else (current user data is >around 12GB). ---------- Volume size itself does not make much difference to memory consumption in the server because that depends on the total number of disk allocation units, summed over all volumes. Volume size again fades away when speed is considered because what counts is the layout of the file system. How many top level directories, how files are balanced in the tree beneath and so on. This is manipulating directory structures rather than sizes of real files. The bigger the tree the longer it takes to walk it. You might want to look at the wire with a packet monitor to gauge how much effort goes into walking your file tree. Volume size does come back in two ways though. When walking the FAT lists more volumes makes that job faster than fewer. It's called factoring by making parallel but shorter structures rather than fewer but longer ones. The second way is sectioning the disk surfaces by volume so head motion is reduced while staying within a given volume. This is the reasoning behind Unix disk "slices" and modern file system architecture (regionalize to keep things close together on the disk surface). How this works out with RAID I can't say. Probably the dominant consideration is not what the server does but rather what the customer senses and has to do to perform work. And that depends very strongly on the file system rather than the disk/volume system. Joe D. --------- Date: Sat, 21 Feb 1998 23:47:48 -0800 From: Ken Eshelby Subject: Re: Volume size/# recommendations > Volume size does come back in two ways though. When walking the FAT >lists more volumes makes that job faster than fewer. It's called factoring >by making parallel but shorter structures rather than fewer but longer ones. >The second way is sectioning the disk surfaces by volume so head motion is >reduced while staying within a given volume. This is the reasoning behind >Unix disk "slices" and modern file system architecture (regionalize to keep >things close together on the disk surface). How this works out with RAID I >can't say. Excellent stuff. When we finally get the 18G drives from HP in, I will do some packet watching. By using more volumes, I would be able to shorten trees (VOL1:HOME\ESHELBYK becomes HOME:\ESHELBYK etc.) and if I can determine a reasonable performance gain over the SYS and huge VOL1 scenario I can head to part 2. Thank goodness for a new server to test with! > Probably the dominant consideration is not what the server does >but rather what the customer senses and has to do to perform work. Part 2. Just about everything we do seems to cause grief to at least someone here, and ironically our help desk seems to take the longest to learn new things. However, the word is out that with the upgrade to NW411 some things will be different and I can decide how much paradigm shift the user will see-we've used two drive letter mappings to one volume for a long time, so users are at least trained to look in multiple places for their data. --------- Date: Sun, 22 Feb 1998 17:24:14 +0000 From: Richard Letts Subject: Re: Volume size/# recommendations >> Volume size does come back in two ways though. When walking the FAT >>lists more volumes makes that job faster than fewer. It's called factoring >>by making parallel but shorter structures rather than fewer but longer ones. >>The second way is sectioning the disk surfaces by volume so head motion is >>reduced while staying within a given volume. This is the reasoning behind >>Unix disk "slices" and modern file system architecture (regionalize to keep >>things close together on the disk surface). How this works out with RAID I >>can't say. > > Excellent stuff. When we finally get the 18G drives from HP in, I will do >some packet watching. By using more volumes, I would be able to shorten trees >(VOL1:HOME\ESHELBYK becomes HOME:\ESHELBYK etc.) and if I can determine a >reasonable performance gain over the SYS and huge VOL1 scenario I can head to >part 2. Thank goodness for a new server to test with! Personally I'd try and keep volumes down to the size you can fit on one backup tape, unless you have a robot tape library and proper software to drive it. Otherwise every time a user want a file restoring from a couple of months' ago you'll need a stack of tapes. You'll also need to look at some of the other limitations, like the maximum number of directory entries per volume, which was limited to 4Million at one version of netware.. We have a robot tape library, even so our largest user volumes are ~8GB large. http://www.salford.ac.uk/ais/Network/filestore ------------------------------ Date: Sun, 22 Feb 1998 18:34:01 +0000 From: Richard Letts Subject: Re: FYI: Performance of store-and-forward ethernet switches >I look forward to trying this test with a cut-through switch. And I >didn't try a 100MB hub; maybe I can do that today. Note that, so far as I >know, there is no such thing as a non-switching hub that can do either full >duplex or 10/100 speed conversions. Full duplex relies upon ignoring the carrier sense from the ethernet transceiver (which is part of the UTP interface). you can do this is there are exactly two devices on the cable, since the traffic is either yours down the cable, or theirs travelling in the opposite direction. I have also found negotiation of full/half duplex operation to be fraught and recommend using manual configuration. ------------------------------ Date: Sun, 22 Feb 1998 16:41:18 -0700 From: Joe Doupnik Subject: Re: MS Client for Windows 'v' MS Client for Netware >> This is an interesting issue. In all fairness, I wouldn't compare >>TCP/IP against Microsoft's implementation of IPX/SPX. TCP/IP does >>have its advantages, while IPX/SPX is stronger in other areas, so >>you have to determine what suits your situation best. > >Well, SPX is forever subject to ping-pong affect. SPXII is/can suppose >to fix this by adding a sliding window. I havne't seen any apps use >SPXII as of yet so I can't verify the validity of the above statement. ---------- Arguing generalities like this is fruitless. SPX is a poorman's imitiation of TCP, without the depth of experience and thought of TCP. IPX is similar to UDP/IP in many regards. SPX1 was a bad joke; SPX2 is much better indeed. As Hansang corrrectly stated a few messages ago, the quality of the code makes all the difference in the world. Code is the only living example of what the protocols can do for normal users, and proficiency of code authors is highly variable. Folks who wed themselves to TCP/IP and stop thinking are dangerous to the rest of us. Banding about labels as if they were substance is silly and wastes time of the readership. Joe D. ------------------------------ Date: Mon, 2 Mar 1998 13:03:46 +0000 From: Steve Kurz Subject: Re: 10Mbit Segmenting Vs. 100Mbit backbone, Which is faster ?? >Network design question: > >Assuming ALL OTHER variables being equal, which is faster ?? > >A 4.1x server with 5 dual-channel EISA ethernet cards serving 10 >10Mbit segments > >or > >The same 4.1x server with a single EISA 100Mbit card attached to >the 10Mbit segments via a 10/100 Mbit switch Sounds like deja vu all over again! I just did this exact change in the infrastructure of a customer. The original configuration was a Compaq Proliant 1500 running 4.1 with 6 10Mbit ethernet cards, each with IPX and TCP/IP enabled. There was a SCO server on one of the segments, which meant that anyone not on that segment had to route through the 4.1 server. I installed a 3Com 3300 10/100 switch, with each of the servers having a 10/100 full duplex capability, and each of the other segments routed through an external Compatible Systems four port router. This was done to isolate the segments, which were a fiber run to out-buildings. Also, the customer did not want to make any changes to the IP configuration at the desktop, nor did they want to make any changes to the six terminal servers. I configured the servers to run at 100MB full duplex, and the routers at 10MB for each segment. The end result was that the server utilization went from about 40-50% to less than 10%, and the throughput on the network segments went up. I cannot quantify the amount, but the reaction of the endusers was extermely positive. The changeover was done over a weekend, so the users were not aware of the final configuration. --------- Date: Mon, 2 Mar 1998 15:17:29 -0600 From: James E Borchart Subject: Re: 10Mbit Segmenting Vs. 100Mbit backbone, Which is faster ?? I have to agree with Steve Kurtz that (b) is the correct answer in most situations. The server can more efficiently handle the one card. The switch is much better at dealing with multiple segments than a file server, since its dedicated hardware (unless it is a crappy switch). Also, if it is a good network card and a good switch, you will get full duplex on the 100Mbps segment, which means no collisions and a realistic sustainable throughput exceeding 50Mbps. It would be VERY difficult to sustain 50Mbps on ten 10BaseT shared channels. There are a few switches that support switched FDDI and ethernet in the same chassis. These are truly awesome performers. Processor utilization on the server drops down significantly because the FDDI card does all the work. It is amazing what top-of-the-line NIC's and top-of-the-line switches can do for you today. --------- Date: Mon, 2 Mar 1998 15:11:01 -0700 From: Joe Doupnik Subject: Re: 10Mbit Segmenting Vs. 100Mbit backbone, Which is faster ?? >I have to agree with Steve Kurtz that (b) is the correct answer in most >situations. ----------- That seems wishy washy reasoning to me. FDX Ethernet has collisions unless the ENTIRE path is FDX and contention free. In this case they occur in the switch, with no feedback. Translation: not good once buffers overflow. There is simply no way around this because 10 < 100. We've gone over this ground extensively. EISA bus is pushing it on 100MHz Ethernet, but it does work. PCI bus is superior and yields less cpu loading (faster bus<->board transfers). 10Mbps Ethernet links are limited to 1MByte/sec and cannot stretch to consume unused capacity on a 100Mbps link. This is yet another example of classical fixed capacity buckets being multiplexed onto a faster conveyor, and they lose when compared to competing for a uniform higher speed pipe thoroughout. See Tanenbaum, Computer Networks, for amplification of this queueing situation. The proper answer is remove the switch and hence both its one packet delay and its cost, replace with multiple 100Mbps boards and 100Mbps hubs. Faster than this one cannot go. Generally, avoid FDX: there is no free lunch. Joe D. --------- Date: Mon, 2 Mar 1998 17:07:30 -0600 From: James E Borchart Subject: Re: 10Mbit Segmenting Vs. 100Mbit backbone, Which is faster ?? Odd, I have never disagreed with you in any way before, and I don't disagree with any of your facts below. I believe you may have misread Mr. Gunnings request, Joe. >>> FDX Ethernet has collisions unless the ENTIRE path is FDX and >>>contention free. In this case they occur in the switch, with no feedback. The entire path is contention free with no feedback, Mr. Gunning and I both assume a wire from a server to a switch, no other equipment connected. The switch handles queueing from other sources. The server recieves and transmits just on the one line. >>>Translation: not good once buffers overflow. There is simply no way around >>>this because 10 < 100. We've gone over this ground extensively. Of course we have. The messages on this subject assume that the switch had AT LEAST ten 10BaseT ports. All (or at least most) requests are going to and from the server, If the server can service the requests quickly, with fewer retries, then there is a net of less traffic on the LAN, and less buffering at the server and the switch. Remember that Mr. Gunning is asking for real-world tests, and I am replying with my experience. >>> EISA bus is pushing it on 100MHz Ethernet, but it does work. PCI >>>bus is superior and yields less cpu loading (faster bus<->board transfers). I agree, except that I don't think fast ethernet is 100Mhz. FDDI certainly isn't, but that is a nit unrelated to the main discussion. I would always recommend PCI if possible. >>> 10Mbps Ethernet links are limited to 1MByte/sec and cannot stretch >>>to consume unused capacity on a 100Mbps link. This is yet another example >>>of classical fixed capacity buckets being multiplexed onto a faster conveyor, >>>and they lose when compared to competing for a uniform higher speed pipe >>>thoroughout. See Tanenbaum, Computer Networks, for amplification of this >>>queueing situation. Of course not, I am contending that the server is less busy and can therefore service requests faster, with fewer retries and less overhead. The queue is emptied faster, that's all. Everyone (except switch salesmen) would agree that maxed out pipes will not multiplex well. Mr. Gunnings question is what actually happens in a real-world situation, where the pipes are NOT maxed out and we have something like a "normal" traffic load. >>> The proper answer is remove the switch and hence both its one packet >>>delay and its cost, replace with multiple 100Mbps boards and 100Mbps hubs. >>>Faster than this one cannot go. >>> Generally, avoid FDX: there is no free lunch. FDX is unquestionably not a free lunch, what is does is to offer a contention-free connection from server to switch, as I stated. I feel that you can get over 50Mbps out of a theoretical 200Mbps line! This sucks on paper, but the server NIC can just keep pumping out packets, without queuing them up for sending to the switch. I have certainly never seen over 100Mbps on a Full-duplex switched 100 port, but I can get over 50Mbps. Mr. Gunning did not list multiple 100Mbps boards as an option, he had five dual-10Mbps boards in his request. 100Mbps hubs have many flaws, and the limits of 100Mbps geometry limitations come into play. Athough neither of my editions of Tannenbaum list 100Mbps! Here is another way to look at my contention, forgetting all of the above: Software drivers on a netware server are inherently slower at managing queued traffic than the dedicated hardware in an ASIC-based switch. They make up for this by having a bigger queue, but this bigger queue takes even more overhead (RAM and processor). A fast switch combined with reduced overhead on the server improves server performance. The queue never gets filled if you are lucky. Lets say that ten 10BaseT segments are all offering a 3Mbps load. What I see with a high-end server and a high-end switch is fast service, low total server utilization, and low collisions compared to a number of ethernet cards in the same server. The high-end switch market wouldn't exist today if this weren't true. The equipment is just too expensive. --------- Date: Mon, 2 Mar 1998 18:17:01 -0700 From: Joe Doupnik Subject: Re: 10Mbit Segmenting Vs. 100Mbit backbone, Which is faster ?? [Previous 3 emails snipped] -------- Continuing the discussion... A half duplex 100Mbps Ethernet board, a decent one, can saturate the 100Mbps wire without causing the server to work hard. I've shown that with Intel EE PRO/100B client style boards and the fancier /Server server boards. By saturate I mean fill the wire, over 10MB/sec of data, not counting packet overhead. Server cpu utilization: 20% or less for the client board, a few percent for the fancy board. There is no question that hardware can relay frames faster than software, but that misses the point that software must deal with them at both ends of the connection. Weak links and all that jazz. The server is not acting as a relay box, like a switch. Server-client traffic tends to be highly unidirectional. Thus FDX is buying nothing but trouble (flow control) unless there are lots of clients moving things both ways. Even then this is problematic when clients share a collision domain. Thus FDX is largely fluff, and dangerous (see below). Taking what we have so far, a decent server can fully drive several 100Mbps Ethernet links and not work hard. I've tested that here. All hubs, no switches. 100Mbps Ethernet does not have 200Mbps bandwidth. Switches add a one packet delay which can and does dramatically reduce throughput on all but streaming transfers. One cuts throughput in half, two in series to 1/3, and so on. This isn't just theory. Switches have a very serious queueing problem going from fast to slow ports, and eventually the buffer space is exhausted which leads to packet discards on FDX links: no way of feeding backpressure. How much traffic a speed converting relay can absorb depends on its design details, and I doubt we will discover those other than by local stress testing. Slow channels throttle performance for all but trivially short transfers (the buffer problem above). Folks buy switches for a variety of reasons, often from marketing statements about panaceas. They are of course bridges and thus separators of directly addressed frames (broadcasts leak everywhere). If traffic is all clients to one server there is no point in such separation, because eventually there is a bottleneck which requires feedback to the originators (or lose frames). Short bursts to random clients allows 10Mbps clients to reach the server via a 100Mbps concentrator, but long bursts encounter the concentrator buffer problem. HDX has natural feedback, FDX has none and the concentrator is the bottleneck. To gain more throughput add parallel networks. To keep congestion under control provide feedback to the traffic originators. That's what I am saying, and I hope it helps illucidate matters. >Lets say that ten 10BaseT segments are all offering a 3Mbps load. What I >see with a high-end server and a high-end switch is fast service, low total >server utilization, and low collisions compared to a number of ethernet >cards in the same server. The high-end switch market wouldn't exist today >if this weren't true. The equipment is just too expensive. I must admit difficulty understanding your statement here. 30Mbps aggregate load isn't much. A server can create that traffic rate with no difficulty, and ten times that rate with little effort. Collision counts don't count unless they reach extreme values. What I suspect might be present in your statement is ten slow 10Mbps boards compared to one efficient 100Mbps board (efficient on the driver etc side, not the wire bit rate). That's as close as I can come to understanding the statement. Joe D. --------- Date: Tue, 3 Mar 1998 09:08:19 -400 From: Jon Dustin Subject: Re: 10Mbit Segmenting Vs. 100Mbit backbone, Which is faster ?? I just read a paper produced by the Tolly Group that discussed a similar situation: Which is more suitable, 10Mbit switched ethernet (with fast ethernet uplink to server), or 100Mbit repeated (all clients and servers running at 100Mbit/sec)? Here is a description blurb from the Tolly Group web page http://www.tolly.com Test Summary Overviews Building Efficient Networks Using Switched 10 Mbit/s and Shared 100 Mbit/s Ethernet Test Summary Document Number: 7299 Abstract: Ethernet's architecture makes it susceptible to diminishing returns on performance as more users are added and as applications demand more bandwidth. The result is that many of today's shared 10 Mbit/s Ethernet LANs have reached the breaking point. Customers are looking for an alternative, and many see Switched 10 Mbit/s Ethernet and Shared 100 Mbit/s Ethernet as equal-cost solutions to their current network bottlenecks. Both are potentially powerful technologies that can alleviate network congestion, but only when properly deployed. Many customers are reluctant to make large purchases of 10 Mbit/s Ethernet switches or 100 Mbit/s Ethernet concentrators until they first gather empirical performance data to indicate under what conditions each one performs best. This white paper is intended to provide that performance data. This paper is pretty good, and does not seem to favor Compaq's products (Compaq commissioned the study). My interpretation of the results - The best solution depends on your traffic mix. The results are available to the public, after registering with their system. --------- Date: Tue, 3 Mar 1998 08:31:52 -0600 From: "Thomas M. Bonvillain" Subject: Re: 10Mbit Segmenting Vs. 100Mbit backbone, Which is faster ?? >>I have to agree with Steve Kurtz that (b) is the correct answer in most >>situations. The server can more efficiently handle the one card. The >>switch is much better at dealing with multiple segments than a file server, >>since its dedicated hardware (unless it is a crappy switch). >> >>Also, if it is a good network card and a good switch, you will get full >>duplex on the 100Mbps segment, which means no collisions and a realistic >>sustainable throughput exceeding 50Mbps. It would be VERY difficult to >>sustain 50Mbps on ten 10BaseT shared channels. >> >>There are a few switches that support switched FDDI and ethernet in the >>same chassis. These are truly awesome performers. Processor utilization >>on the server drops down significantly because the FDDI card does all the >>work. I have to agree with the above statement about switched FDDI and bridged ethernet in the same chassis/card. We have a flat network with over 2000 ports of 10 BaseT bridged onto an FDDI Ring. We have 2 Netware Servers on the ring. One a 4.1 500 user server (A Pentium 100 with 128 Megs of ram. 10 Gig Raid stack)and the other a 3.12 250 user. We normally have over 400 users on our adminstrative server (the 4.1) and over 200 users on our Academic server. Running Mercury on the server and Netware for SAA to connect to the IBM Mainframe. Running FreeBSD to do DNS and route Internet Email on 2 workstations. The router is on the ring and we normally have 700 IP addresses leased out through DHCP or RARP. Server utilization is usually around 20%. Can't find much use for NT except for Web and Database stuff. ------------------------------