-----------------------------------------------------------------------
NOV-PER2.DOC -- 19971114 -- Email thread on NetWare Performance Aspects
-----------------------------------------------------------------------

	Feel free to add or edit this document and then email
	it back to faq@jelyon.com




Date: Wed, 28 May 1997 18:44:06 -0600
From: Joe Doupnik <JRD@CC.USU.EDU>
Subject: Redesigning a server for faster/busier use, random thoughts

	I'd like to share my "random thoughts" on redesigning one particular
student open lab which uses INW 4.11. There maybe items of interest when you
face similar problems. The msg is rather long so feel free to skip it.
	The setup. INW 4.11 server, 486-33 EISA bus, three NE-3200 EISA bus
Ethernet boards of interest (plus two NE-2000's not of interest here),
64MB, 4GB mirrored. Clients are Pentium-90's using NE-2000 clones on coax,
put onto those three nets as 12, 16, and 20 machines (that's a physical
constraint when the wiring was done), no hard disks on clients.
	The problem is we are beating the living daylights out of the wires.
Data transfer rates average about 30-40% of capacity when viewed over 5 min.
That means I ask a server lan adapter for its byte count now, and again
five minutes later, divide the difference by 300 seconds to get bytes/sec,
and factor that into 10Mbps for regular Ethernet. This says, on paper and
phospher, the traffic peaks are strongly clipped and packets wait in line
a lot. It says in practice things get slow when folks get busy.
	Just by way of context, almost all the traffic is server to client,
with less than 10% data bytes flowing to scratch areas on the server or
being packet ACKs.
	I've been planning and saving funds over the past year to go to
100Mbps Ethernet all round. I now have the money and it looks as if the
costs will just fit the budget (or close enough to sneak by). Fine, fine,
but will it work or even be constructive?
	Let's look at some technical numbers.
	To move data at say 350KB/sec from server to a client takes about
350 millisec. That's figuring at 1 byte every microsecond, which is a good
round number in practice and close enough to a full 10Mbps Ethernet. That
satisfactory but not outstanding data transfer rate is 1/3 of a wire, just
for one client. Clearly lots of clients make the traffic saturate the wire
and things queue and queue, hence slowness when folks are busy.
	Step one is to say "100Mbps Ethernet is the solution." Sure, right.
Why? Well, the same packets now consume only 3% of the wire because the
bit rate is ten times greater. Yup, that opens capacity on the wire very
nicely indeed.
	Step two is to try it. If one uses a so-so board in the client
and a similar one in the server then throughput goes up some, about twice
or so. We are being throttled by movement of bytes from buffers, across
system buses, to lan adapters on both ends, plus the delays in servicing
boards by drivers. But this environment keeps server CPU utilization
moderate (packets are spaced out in time, even though each is shorter
in duration). Not the best solution but a step in the right direction.
	Step two bis. Put in better 100Mbps Ethernet boards. Wow!
Throughput goes up by about a factor of three to four compared to the
original situation. This is relieving the machine bottlenecks. The way
to go. But, there is always a "but" in engineering work, the server
utilization goes through the roof from the increased packet rate and
because the server is using the same "client boards" as the clients.
One station can cause the server to sustain 100% utilization. Oops.
I know this because the experiments were run.
	Step three. Well, that suggests we must redesign the server
side to stand the increased traffic rate. The way this is done is use
smarter server boards, not client-style boards, which have a processor
present to do what most drivers do via the server's CPU. Yup, that is
just right. For example. Those original NE-2000 clones could drive
the NE-3200 bus master boards hard and all one client could do is
cause max of 20% server CPU utilization. Put an NE-2000 in the server
instead and the server would run flat out servicing one client; bus
master boards can help a great deal. NE-2000 clients could not pump
packets fast enough, the NE-3200 could easily keep up and still catch
up on sleep. Thus lots of clients could bang away and at least the server
felt fine. Replace both boards with faster units but without the co-CPU
part and the server works harder handling the traffic. Mind you, things
work faster but the server is huffing and puffing dangerously.
	Step four is then use smart server boards. Clearly today those
boards will be PCI bus units. Oh my, but my server is an EISA bus machine.
	Step five says replace the server motherboard with a decent PCI
bus unit. True. Motherboards are dirt cheap, Intel CPU's are still overly
expensive, memory is moderately expensive. After shopping and thinking
I selected a Pentium Pro 200 system, which is near enough to the same
price as any decent Pentium system at this time. Plan for the future
when the server is busier yet with web server and friends, and so on.
Plan to save two of four SIMM sockets for more memory when NW 6 or 7 is
released. Ensure the PCI chipset can cache memory above 64MB, and ensure
the memory system can run in ECC mode (true parity, correcting). Those PCI
bus contraints say use a 430HX chipset for Pentiums or the 440FX chipset
(or similar) for Pentium Pros. The popular Pentium PCI 430TX chipset cannot
cache memory above 64MB; leave that for desktop boxes.
	Now we have extra horses in the stable for intensive requests
when needed but not expected often, fast low-CPU-loading server Ethernet
boards to deal with dense 100Mbps traffic, decent client boards to move
packets without slowing down mice or dropping packets, and capacity on the
wire to sustain many clients at one time. And the server is ready to run
another three or four years without an engine change.
	Finally, 100Mbps Ethernet runs only over twisted pair wiring.
I'm selecting hubs which are 24, 24, and 12 connections, spreading clients
over them, and connecting each hub to a server Ethernet board. This avoids
reintroducing massive congestion which would occur if the three hubs were
cascaded onto one wire to the server. Etherswitching is of zero benefit
here, because the traffic heads to one destination anyway (same as cascading
hubs, but far more expensively and slowly). Smaller hubs would require more
server Ethernet boards and we don't have more motherboard slots for them.
(Quick slot count: three PCI Ethernet boards, PCI SCSI board, no more PCI
slots, two NE-2000's for other things plus a video adapter, sums to one
full motherboard).
	Note something here. I could have stayed with 10Mbps Ethernet on
each client. But that solves little because the wire would be just as
congested as before, and this is the fundamental problem to be solved.
100Mbps Ethernet opens up wire capacity, better boards use more of that
capacity than before (users get better**2 performance), and we must
amplify the server to sustain the now greater load.
	We see how the simple congestion problem grows into a fairly
complete and balanced system redesign. Throughput is expected to increase
by at least a factor of four or more under heavy load, and two to three
under light load.
	Joe D.

---------

Date: Thu, 29 May 1997 10:50:34 +0100
From: Phil Randal <philr@HWCCES.DEMON.CO.UK>
Subject: Re: Redesigning a server for faster/busier use, random thou

>	Step three. Well, that suggests we must redesign the server
>side to stand the increased traffic rate. The way this is done is use
>smarter server boards, not client-style boards, which have a processor
>present to do what most drivers do via the server's cpu. Yup, that is
>just right. For example. Those original NE-2000 clones could drive
>the NE-3200 bus master boards hard and all one client could do is
>cause max of 20% server cpu utilization. Put an NE-2000 in the server
>instead and the server would run flat out servicing one client; bus
>master boards can help a great deal. NE-2000 clients could not pump
>packets fast enough, the NE-3200 could easily keep up and still catch
>up on sleep. Thus lots of clients could bang away and at least the server
>felt fine. Replace both boards with faster units but without the co-cpu
>part and the server works harder handling the traffic. Mind you, things
>work faster but the server is huffing and puffing dangerously.

Our experience here backs this up - we had a 486DX4/100 VL Bus
server (not ideal, I know, but everything here is under tight
budgetary constraints), 9GB HDD (16K block size) and 96 MB RAM with 4
Intel EtherExpress Pro 10 cards, (with 120-odd clients) and server
utilisation rarely exceeded 60%.

We upgraded to a Pentium 133 with 2 dual-ported SMC bus-mastering
PCI ethernet cards, upped the RAM to 128MB, PCI SCSI card, combined
with a growing number of Pentium PCs with PCI Ethernet cards and peak
utilisation shot up, frequently to 80+%.  Perceived performance got
worse, according to many users.  Packet burst made things even worse,
not better, so we had to turn that off in the clients.

My guess is that as we removed bottlenecks (faster SCSI card and
server LAN adapters), the server processor load dramatically
increased.

>	Step five says replace the server motherboard with a decent PCI
>bus unit. True. Motherboards are dirt cheap, Intel cpu's are still overly
>expensive, memory is moderately expensive. After shopping and thinking
>I selected a Pentium Pro 200 system, which is near enough to the same
>price as any decent Pentium system at this time. Plan for the future
>when the server is busier yet with web server and friends, and so on.
>Plan to save two of four SIMM sockets for more memory when NW 6 or 7 is
>released. Ensure the PCI chipset can cache memory above 64MB, and ensure
>the memory system can run in ECC mode (true parity, correcting). Those PCI
>bus contraints say use a 430HX chipset for Pentiums or the 440FX chipset
>(or similar) for Pentium Pros. The popular Pentium PCI 430TX chipset cannot
>cache memory above 64MB; leave that for desktop boxes.

It is well worth considering a 430HX chipset motherboard with an AMD
K6/200.  It may well outperform the equivalent Pentium Pro, and is
certainly more cost-effective.  Note too that some HX motherboards
(e.g. GigaByte 586HX) need extra tag RAM in order to cache above 64MB.

>	We see how the simple congestion problem grows into a fairly
>complete and balanced system redesign. Throughput is expected to increase
>by at least a factor of four or more under heavy load, and two to three
>under light load.

I couldn't agree more.

---------

Date: Thu, 29 May 1997 11:02:41 -0600
From: Joe Doupnik <JRD@CC.USU.EDU>
Subject: Re: Redesigning a server, cont'd

	To reinforce my comments to the list yesterday evening, here is
a message stolen from NEWS along the same lines. Note, an Intel
EtherExpress 32 is a licensed clone of the Novell NE-3200 EISA board.
	Putting matters into an order for simple understaning, on servers
buses count much more heavily than cpu speed. ISA bus is slow, EISA bus
is fast, PCI bus can be four times faster than EISA bus.
	Bus master boards generally work faster than non-bus master
boards. Smart boards are not only bus master but also offload many
driver housekeeping duties to a processor on the adapter, thus freeing
the server's cpu for other chores. Smart boards are as fast as and less
taxing than bus master boards. Thus we go from client boards to bus
master boards to smart boards, and ISA to EISA to PCI buses (MCA is
about the same or slightly better than EISA).
	The lab being refurbished here will use Intel EtherExpress
Pro 100B PCI client boards and three Intel EE Pro/Server PCI boards
(smart) in the server. That's what the fellow below is referencing
(i960 stuff). By the way, this is not a recommendation from me. It is
what I am doing, not necessarily what you should be doing under your
circumstances.
	Joe D.

>Newsgroups: comp.os.netware.connectivity
>Subject: Re: Server NICs. Is Intel smartadapter worth it?
>From: philip@aleytys.pc.my (Philip Chee)
>Date: Wed, 28 May 97 14:13:21 GMT
>
>>Having recently replaced our aging novell servers I have been looking into
>>NICs and whether different manufacturers NICs provide any significant
>>advantage from a speed/utilisation perspective.
>>
>>Primarily I have been looking at the Intel EtherExpress smartadapter 100tx
>>which claims to significantly reduce cpu utilisation on the server via the use
>>of an onboard Intel i960 processor for packet processing.
>
>>What I was wondering is if anybody has actually compared this NIC to other
>>devices from 3com etc to see if it does actually live up to it's claims and
>>whether it would be worthwhile investing in this product for our main server.
>
>We were using a HP Vectra server with a 486DX33 cpu and a Intel
>EtherExpress32 (the predecessor to the smartadapter).  With about 48
>client PCs hitting, the server the max cpu utilisation I've seen is 30%.
>
>We have just upgraded to a HP LH Pro [1] with a Pentium 200 and a Intel Nitro
>smart server adapter.  With the same clients logging in, this server is
>barely peaking at 5%.
>
>By the way there are sufficient spare CPU cycles on the smart adapter's
>i960 to run certain network daemons such as SNMP further offloading
>processing from the server CPU.  The smart adapter 100TX costs
>significantly more than the average 100baseT NIC but I feel it's worth
>every penny (YMMV).

---------

Date: Thu, 29 May 1997 12:40:25 -0600
From: Joe Doupnik <JRD@CC.USU.EDU>
Subject: Re: Re[2]: Redesigning a server, cont'd

>>>three Intel EE Pro/Server PCI boards (smart) in the server. That's
>>>what the fellow below is referencing (i960 stuff).
>
>Gee, Joe's got some serious money to spend--three Intel Smart PCI (960)
>boards? Those aren't exactly dirt cheap. It's nice to see people sometimes
>do get the money they need to set things up appropriately.

	That was my impression too, a year ago. So I skimped and squeezed
and waited a year for 100Mbps prices to drop, and saved up about US$15K for
the upgrade. No trips to Comdex or Interop, etc. If one listens to sales
persons and instead opts for Etherswitches, ATM or FDDI, or even a Netframe
server, then the price would have been far higher and out of reach. The smart
boards are under $600 each (ouch!) while the alternatives are much more
expensive, and non-smart boards are spending money while not solving problems.
The real expense is in client boards because there are so many of them, and in
the hubs, so I pursue an aggressive purchasing policy to push down those costs.
	Notice I'm buying a motherboard, not a HP/Compaq dream machine at
ten times the cost. That's risky but has paid off here because I am picky
enough (do homework long term) to not make many mistakes here, emphasis
on "many."

>Oh, and thanks for the nice "overview", it's always a good idea to have
>certain concepts restated now and again.

	Thanks. I thought it would be worthwhile to see engineering design
be done openly as an example of how to think/measure through the process,
as contrasted to throwing money at problems. I've withheld most of my
numbers as boring and requiring too much explanation and possibly resulting
in too much legal personal exposure.

>Harry Campbell
>Applied Microsystems Corp
> (I wonder if Intel used our i960 emulator to design the Smart NICs?)

	Interesting question. We suspect Intel puts that overkill processor
on the adapters to soak up fab and design group time, as they do in other
areas. An NE-3200 gets away nicely with an x186 (or is it x188). I might
add another observation that there is a movement across the industry to
make smarter peripherals, from the awful Universal Serial Bus, to SCSI
and video adapters, to say the I2O api on lan adapters and more. Stretching
this some, the Novell Wolf Mountain project is demonstrating clustered
NetWare servers (many servers sharing a disk farm, looks like one server
to the user). And jokingly, this includes us people as necessary evil
peripherals to keep the system fed and happy.
	Joe D.

---------

Date: Sat, 31 May 1997 13:09:05 -0600
From: Joe Doupnik <JRD@CC.USU.EDU>
Subject: Lan design, cont'd

	For those wishing to dig deeper into the topic of Ethernet
switching, especially as contrasted to my recent design of using
100Mbps boards and hubs only, there is a decent readable discussion
of the issues on Intel's web pages. Contact
	http://www.intel.com/
Choose  "Product Info"
	"EtherExpress PRO/100 Fast Ethernet Adapters"
	"Express Ethernet and Fast Ethernet Switches"
	"White Papers"
	"The Advantages of Fast Ethernet for New Networks."

	While that paper looks fine on my screen and even print previews
well, it fails to print on my HP 4M LaserJet printer. Good luck with
printing. I suspect that other vendors have similar white papers worth
reading. I stumbled across this paper today while looking for better
written descriptions than mine.
	For those who want only the executive summary punch line, it is
using all 100Mbps boards and hubs is faster and cheaper than adding switches.
	Joe D.

---------

Date: Tue, 3 Jun 1997 22:31:45 -0600
From: Joe Doupnik <JRD@CC.USU.EDU>
Subject: Re: lan design, summary

	We spent many messages examining the lan design example presented
last week. I've listed some conceptual items below which might be useful as
they stand, or as strawmen for more incisive descriptions. They are clearly
over simplifications merely to make them easy to use and remember.

   The problem is basically allocating bandwidth along the pathways
   used by nodes, up to economic limits.

   Making a connection (link) between two end nodes "creates" bandwidth
   (carrying capacity).

   Bandwidth is "exploited" (consumed) by passing packets between
   computers/nodes/lan adapters. We call this traffic.

   Adding boards to a server creates bandwidth, by creating new end
   nodes. But they must be able to sustain the traffic or the link
   must be derated to match the board's capabilities. A frequent
   mistake here is to use a weak Ethernet in a server.

   The end to end bandwidth of a link is the smallest bandwidth
   along the path, diluted by competition from sharing links with
   other sources and destinations. It's congestion again.

   Where a computer cannot fully utilize the bandwidth of its link
   then other computers can share the link, with small to large delays
   to interleave packets (congestion, competition).

   The number of computers in a shared domain often exceeds the
   capability of the link to sustain peak transfer rates by all such
   computers at one time. The degree of overcommittment is a judgment
   by the designer based on average usage and acceptable delays when
   peak demands occur. This is the most difficult problem to manage.

   An Etherswitch does not create bandwidth; it distributes traffic.
   A fast uplink port operates at either the packet rate of the slower
   side when traffic is only between those two ports, or at a higher
   rate if the switch is able to buffer fast rate packets for
   simultaneous delivery to several low rate ports or vice versa.
   A switch shares its backplane amongst ports and the backplane can
   be considered the fastest communal but hidden link. Thus a switch
   can aggregate traffic (many sources to a few destinations) or
   sustain bandwidth if crossing pathways are all different.

   A two port bridge conserves small amounts of bandwidth on each wire
   by preventing unnecessary traffic from reaching the other side.
   Bridges add one packet delay.

   A hub does not create bandwidth; it aggregates traffic. It's capacity
   is that of the slowest port. It creates one logical wire from
   many physical wires. A speed changing hub (with a fast uplink port)
   operates the fastest wire at the packet rate of the slower links even
   though the bit rate is higher (only one transmitter can be active at
   a time on a hub).

   Speed changing devices (10/100MHz bridges, many Etherswitches and
   some hubs with a fast Ethernet uplink port) introduce packet delay.
   The amount of delay depends on the direction of travel and is one
   packet's time on the exit side of the device. Thus a 10MHz client
   transmission to a 100MHz server adds 1/10 of the 10MHz transmission
   time, or the time to send the 100MHz version of the packet. Delays
   reduce throughput. Delay is necessary to prevent a fast transmitter
   from prematurely exhausting a slow source (a "DMA underrun" situation).

   Even hubs introduce a small amount of delay, typically 8 byte times
   or less if Ethernet preamble is regenerated.

   Packet delays reduce throughput by extending the time of arrival of
   permission to send the next packet. When streaming protocols such
   as TCP or IPX Packet Burst are used delay is diluted over the duration
   of the streaming transmission, provided transmission windows are
   not exhausted (they aren't on LANs but can be on long WANs). Packet
   Burst is a short stream to avoid overrunning client boards which
   typically have small buffers and slow response times.

   The design goals are to balance available bandwidth (most often over
   committed) against its exploitation across the network and to minimize
   costs. Balancing is creating new bandwidth (parallel links, faster
   links) versus aggregating traffic (hubs and switches), weighted by
   costs.

	Joe D.

------------------------------

Date: Fri, 13 Jun 1997 21:03:18 -0600
From: Joe Doupnik <JRD@CC.USU.EDU>
Subject: Lan performance, cont'd, fragmentary story

	Since this seems to be the season to tell tales of hardware
success (and failure), here is a short unorganized report on my 100Mbps
installation thus far.

	In a quiet lab (term ended, next is yet to come) a single Pentium
90 client with an NE-2000 board reached a maximum throughput of 800KB/sec
to a 486-33 EISA bus file server. The server uses NE-3200 smart boards.
It reached that 800KBps figure with Perform3 on long file lengths, and
server utilization was about 20%. (Normally we can get only about 350KBps
under average usage conditions, meaning with other folks on the wire
and not all transfers were of long length.)
	As more stations joined the same wire aggregate throughput went
up to 1000KBps or one very full 10Mbps Ethernet. Each station shared
equally in that throughput, thus dividing 1000KBps by the number of
stations. Twelve, the max tested yesterday, together yielded a wonderful
80KBps each, or about floppy speed. The point is we could fill the wire
with traffic if two or more clients worked at the same time, assuming
the file server could keep up as it did here, and each station got only
a proportional share.
	Then we lashed together the new Pentium Pro 200 server. It had
an Intel Etherexpress PRO 100 PCI client board (more on why below). The
clients had the same Intel board and were again the Pentium 90's.
	One station alone got over 6000KBps throughput. Server utilization
was about 15%. Two or more stations saturated the 100Mbps wire (say
10000KBps) and again divided traffic evenly. Server utilization was
about 20% with four stations (max wired tonight). We see the same point
being repeated, but scaled up by the 1:10 wire capacity.
	One limit is wire capacity, and that applies when many stations share
it. Another limit is how fast a client can move bytes. The third is how
much punishment the server can take. Both client and server can move more
bytes with faster cpus, up to the limits imposed by the Ethernet board and
system bus. PCI bus isn't saturated by 100Mbps Ethernet, EISA is working
up a sweat, and ISA is overwhelmed.
	To see the EISA sweat item, the EISA version of the Intel client
board in a server pushed the 486-33 server to 100% utilization from a single
PCI client, and throughput was under 4000KBps. A PCI bus server with more
horses raised the single station throughput to 6000KBps at 15% server
utilization. And 6000KBps seems to be the limit of a Pentium 90 PCI bus
client (vs 800KBps with an ISA bus NE-2000 in the same machine).
	I was surprized to learn the Intel client board could push a full
100Mbps Ethernet, but with a PPro 200 providing excessive free cycles it
managed to do it. A Pentium 90 client managed only 60% of that rate. One
significant difference between servers is the bus, EISA vs PCI, at least a
1:4 difference in capacity (and hence lower cpu utilization on the faster
bus because the job finishes more quickly).

	The new server will receive Intel Etherexpress PRO 100/Server
smart boards. But when I plugged together items yesterday the Ethernet
board was not recognized by the machine, nor could NetWare use it. Oh
boy, the bleeding edge is present. Call to Intel: incident filed,
engineering will call back within a week or so. Msg to ASUS: please fix
your Bios, no response yet of course. Repeat msg to ASUS NEWS group for
emphasis. Dig around ASUS ftp site (not the www site), discover a beta bios,
flash same. Lo, the Ethernet board is recognized at cold boot time, but now
IRQ's are forced into conflict with the SCSI adapter and the board won't run
NW. Ok, it's an unmarked beta. So, between ASUS and Intel the problem remains
to be solved. The Ethernet board is a PCI to PCI bridge affair with an i960
coprocessor, and hence complicated. But then so is the Adaptec 3490 SCSI board
(no big square hot cpu though) and it works ok.

	Could we run in production mode with that client Intel board in
the server? No, the server has to pay attention to disk and other boards;
perform3 lets the "file" remain tiny enough to always be in cache. Those
smart boards are needed to free time to do real disk work and printing and
all those other jobs we load onto servers. And someday the clients will
be faster than Pentium 90's.
	Joe D.

---------

Date: Mon, 16 Jun 1997 11:57:37 -0600
From: Joe Doupnik <JRD@CC.USU.EDU>
Subject: Re: Lan performance, cont'd, fragmentary story

	Following up my message last week on bringing 100Mbps to the
desktop, cheaply and effectively. Here are some numbers to think about.
You can repeat the experiments locally, by obtaining perform3 from
netlab2.usu.edu directory apps, or equivalently from netlab1.usu.edu
directory pub/mirror/apps.

	... below are some Perform3 test results. These were gathered
using VLMs on Pentium 90 clients. No client side caching. Packet Burst
was active.
	The first set is 10Mbps Ethernet. The server was a 486-33 EISA
bus unit with an NE-3200. Server cpu utilization reached 20% at full load.
Clients used NE-2000 clones.
	The second set is 100Mbps Ethernet. The server was a Pentium Pro
200 PCI bus unit with an Intel EtherExpress PRO/100 client-style board.
Server cpu utilization reached 20% at full load. Clients used the same
Intel board. A hub joined server and clients.
	Perform3 writes and reads small files, 8KB and smaller in steps
in this case, so they remain in server cache rather than being slowed down
by accessing the server's hard drive. Small file sizes reveal system
overhead in opening/closing files etc.
	10Mbps Ethernet has a user-data capacity of say 1MBps (leaving
room for packet headers and so on). 100Mbps Ethernet has a user-data
capacity of say 10MBps. "file length" below is user-data capacity.
	KBps is kilobytes per second. Mbps is megabits per second.

  10Mbps Ethernet. One station alone, limited by client capability.
  file length  r/w speed     total on wire
  8192 bytes.  745.86 KBps.  745.86 Aggregate KBps.
  7680 bytes.  727.46 KBps.  727.46 Aggregate KBps.
  7168 bytes.  712.85 KBps.  712.85 Aggregate KBps.
  6656 bytes.  706.47 KBps.  706.47 Aggregate KBps.
  6144 bytes.  683.21 KBps.  683.21 Aggregate KBps.
  5632 bytes.  669.92 KBps.  669.92 Aggregate KBps.
  5120 bytes.  634.39 KBps.  634.39 Aggregate KBps.
  4608 bytes.  585.60 KBps.  585.60 Aggregate KBps.
  4096 bytes.  603.51 KBps.  603.51 Aggregate KBps.
  3584 bytes.  570.47 KBps.  570.47 Aggregate KBps.
  3072 bytes.  527.32 KBps.  527.32 Aggregate KBps.
  2560 bytes.  489.77 KBps.  489.77 Aggregate KBps.
  2048 bytes.  424.54 KBps.  424.54 Aggregate KBps.
  1536 bytes.  337.22 KBps.  337.22 Aggregate KBps.
  1024 bytes.  301.34 KBps.  301.34 Aggregate KBps.
   512 bytes.  196.07 KBps.  196.07 Aggregate KBps.
  745.86 Maximum KBps.    557.25 Average KBps.

  10Mbps Ethernet. Two stations, sharing nearly full wire.
  8192 bytes.  502.68 KBps.  1005.37 Aggregate KBps.
  7680 bytes.  490.77 KBps.  981.54 Aggregate KBps.
  7168 bytes.  469.80 KBps.  939.01 Aggregate KBps.
  6656 bytes.  498.82 KBps.  997.23 Aggregate KBps.
  6144 bytes.  479.70 KBps.  957.88 Aggregate KBps.
  5632 bytes.  475.25 KBps.  950.50 Aggregate KBps.
  5120 bytes.  475.67 KBps.  951.34 Aggregate KBps.
  4608 bytes.  463.97 KBps.  927.94 Aggregate KBps.
  4096 bytes.  461.07 KBps.  922.48 Aggregate KBps.
  3584 bytes.  442.79 KBps.  885.57 Aggregate KBps.
  3072 bytes.  418.54 KBps.  837.68 Aggregate KBps.
  2560 bytes.  410.86 KBps.  821.73 Aggregate KBps.
  2048 bytes.  403.52 KBps.  807.05 Aggregate KBps.
  1536 bytes.  312.84 KBps.  625.54 Aggregate KBps.
  1024 bytes.  261.07 KBps.  522.23 Aggregate KBps.
   512 bytes.  176.89 KBps.  353.69 Aggregate KBps.
  1005.37 Maximum KBps.    842.92 Average KBps.

  10Mbps Ethernet. Three stations (the wire fills completely)
  8192 bytes.  339.60 KBps.  1018.12 Aggregate KBps.
  7680 bytes.  333.47 KBps.  1001.05 Aggregate KBps.
  7168 bytes.  314.77 KBps.  943.71 Aggregate KBps.
  6656 bytes.  335.36 KBps.  1005.53 Aggregate KBps.
  6144 bytes.  325.17 KBps.  975.50 Aggregate KBps.
  5632 bytes.  338.67 KBps.  1016.02 Aggregate KBps.
  5120 bytes.  332.63 KBps.  999.16 Aggregate KBps.
  4608 bytes.  319.38 KBps.  958.89 Aggregate KBps.
  4096 bytes.  333.56 KBps.  1001.01 Aggregate KBps.
  3584 bytes.  319.46 KBps.  958.66 Aggregate KBps.
  3072 bytes.  295.47 KBps.  886.41 Aggregate KBps.
  2560 bytes.  298.91 KBps.  896.47 Aggregate KBps.
  2048 bytes.  272.65 KBps.  817.78 Aggregate KBps.
  1536 bytes.  255.83 KBps.  763.40 Aggregate KBps.
  1024 bytes.  238.59 KBps.  716.19 Aggregate KBps.
   512 bytes.  168.96 KBps.  506.46 Aggregate KBps.
  1018.12 Maximum KBps.    904.02 Average KBps.

  10Mbps Ethernet. Four stations. Note division of capacity.
  8192 bytes.  255.70 KBps.  1021.69 Aggregate KBps.
  7680 bytes.  250.42 KBps.  1002.30 Aggregate KBps.
  7168 bytes.  236.66 KBps.  947.82 Aggregate KBps.
  6656 bytes.  252.69 KBps.  1011.41 Aggregate KBps.
  6144 bytes.  245.13 KBps.  980.03 Aggregate KBps.
  5632 bytes.  255.16 KBps.  1020.64 Aggregate KBps.
  5120 bytes.  251.05 KBps.  1003.14 Aggregate KBps.
  4608 bytes.  240.48 KBps.  961.08 Aggregate KBps.
  4096 bytes.  252.35 KBps.  1009.06 Aggregate KBps.
  3584 bytes.  243.12 KBps.  970.74 Aggregate KBps.
  3072 bytes.  228.52 KBps.  914.85 Aggregate KBps.
  2560 bytes.  244.34 KBps.  976.93 Aggregate KBps.
  2048 bytes.  205.54 KBps.  822.98 Aggregate KBps.
  1536 bytes.  167.51 KBps.  670.02 Aggregate KBps.
  1024 bytes.  215.78 KBps.  873.16 Aggregate KBps.
   512 bytes.  131.00 KBps.  523.70 Aggregate KBps.
  1021.69 Maximum KBps.    919.35 Average KBps.


  100Mbps Ethernet. One station. Limited by client capability.
  8192 bytes.  6361.24 KBps.  6361.24 Aggregate KBps.
  7680 bytes.  6101.50 KBps.  6101.50 Aggregate KBps.
  7168 bytes.  5853.59 KBps.  5853.59 Aggregate KBps.
  6656 bytes.  5708.93 KBps.  5708.93 Aggregate KBps.
  6144 bytes.  5422.06 KBps.  5422.06 Aggregate KBps.
  5632 bytes.  5221.55 KBps.  5221.55 Aggregate KBps.
  5120 bytes.  4967.42 KBps.  4967.42 Aggregate KBps.
  4608 bytes.  4613.44 KBps.  4613.44 Aggregate KBps.
  4096 bytes.  4290.48 KBps.  4290.48 Aggregate KBps.
  3584 bytes.  3933.92 KBps.  3933.92 Aggregate KBps.
  3072 bytes.  3516.79 KBps.  3516.79 Aggregate KBps.
  2560 bytes.  3151.42 KBps.  3151.42 Aggregate KBps.
  2048 bytes.  2620.87 KBps.  2620.87 Aggregate KBps.
  1536 bytes.  2053.92 KBps.  2053.92 Aggregate KBps.
  1024 bytes.  1474.35 KBps.  1474.35 Aggregate KBps.
   512 bytes.   804.55 KBps.  804.55 Aggregate KBps.
  6361.24 Maximum KBps.    4131.00 Average KBps.

  100Mbps Ethernet. Two stations. Full wire, shared by stations.
  8192 bytes.  5459.73 KBps.  10925.50 Aggregate KBps.
  7680 bytes.  5376.47 KBps.  10752.94 Aggregate KBps.
  7168 bytes.  5185.99 KBps.  10371.39 Aggregate KBps.
  6656 bytes.  5418.12 KBps.  10841.33 Aggregate KBps.
  6144 bytes.  5209.23 KBps.  10416.95 Aggregate KBps.
  5632 bytes.  4889.09 KBps.  9777.72 Aggregate KBps.
  5120 bytes.  4792.61 KBps.  9581.20 Aggregate KBps.
  4608 bytes.  4490.18 KBps.  8980.37 Aggregate KBps.
  4096 bytes.  4297.65 KBps.  8594.63 Aggregate KBps.
  3584 bytes.  3904.32 KBps.  7808.35 Aggregate KBps.
  3072 bytes.  3474.66 KBps.  6949.33 Aggregate KBps.
  2560 bytes.  3101.51 KBps.  6203.65 Aggregate KBps.
  2048 bytes.  2584.06 KBps.  5168.29 Aggregate KBps.
  1536 bytes.  2021.22 KBps.  4042.07 Aggregate KBps.
  1024 bytes.  1454.45 KBps.  2908.98 Aggregate KBps.
   512 bytes.   797.91 KBps.  1599.13 Aggregate KBps.
  10925.50 Maximum KBps.    7807.61 Average KBps.

  100Mbps Ethernet. Three stations, sharing full wire.
  8192 bytes.  3655.03 KBps.  10967.11 Aggregate KBps.
  7680 bytes.  3629.19 KBps.  10873.08 Aggregate KBps.
  7168 bytes.  3573.99 KBps.  10722.57 Aggregate KBps.
  6656 bytes.  3631.17 KBps.  10895.46 Aggregate KBps.
  6144 bytes.  3580.45 KBps.  10779.93 Aggregate KBps.
  5632 bytes.  3594.38 KBps.  10786.62 Aggregate KBps.
  5120 bytes.  3585.99 KBps.  10758.39 Aggregate KBps.
  4608 bytes.  3524.44 KBps.  10604.71 Aggregate KBps.
  4096 bytes.  3550.67 KBps.  10637.18 Aggregate KBps.
  3584 bytes.  3479.74 KBps.  10437.46 Aggregate KBps.
  3072 bytes.  3112.00 KBps.  9339.51 Aggregate KBps.
  2560 bytes.  2823.62 KBps.  8471.89 Aggregate KBps.
  2048 bytes.  2571.45 KBps.  7711.70 Aggregate KBps.
  1536 bytes.  2012.67 KBps.  6037.88 Aggregate KBps.
  1024 bytes.  1455.45 KBps.  4361.67 Aggregate KBps.
   512 bytes.   801.47 KBps.  2404.07 Aggregate KBps.
  10967.11 Maximum KBps.    9111.83 Average KBps.

  100Mbps Ethernet. Four stations, sharing full wire.
  8192 bytes.  2744.30 KBps.  10942.28 Aggregate KBps.
  7680 bytes.  2721.27 KBps.  10867.45 Aggregate KBps.
  7168 bytes.  2690.77 KBps.  10734.31 Aggregate KBps.
  6656 bytes.  2723.78 KBps.  10885.42 Aggregate KBps.
  6144 bytes.  2696.98 KBps.  10737.58 Aggregate KBps.
  5632 bytes.  2712.16 KBps.  10778.52 Aggregate KBps.
  5120 bytes.  2661.63 KBps.  10595.06 Aggregate KBps.
  4608 bytes.  2624.50 KBps.  10440.60 Aggregate KBps.
  4096 bytes.  2671.81 KBps.  10625.50 Aggregate KBps.
  3584 bytes.  2651.72 KBps.  10568.12 Aggregate KBps.
  3072 bytes.  2576.93 KBps.  10248.71 Aggregate KBps.
  2560 bytes.  2519.51 KBps.  10073.62 Aggregate KBps.
  2048 bytes.  2481.38 KBps.  9922.82 Aggregate KBps.
  1536 bytes.  1907.43 KBps.  7626.27 Aggregate KBps.
  1024 bytes.  1454.03 KBps.  5816.36 Aggregate KBps.
   512 bytes.   798.16 KBps.  3195.04 Aggregate KBps.
  10942.28 Maximum KBps.    9628.60 Average KBps.

	Joe D.

---------

Date: Mon, 16 Jun 1997 13:23:06 -0600
From: Joe Doupnik <JRD@CC.USU.EDU>
Subject: Re: Lan performance, cont'd, fragmentary story

	Just a brief addendum to the lan performance material.
	You can run perform3 against local drives too. Trying it may
be informative. As examples, on a Pentium 100 EISA bus desktop machine:
	Local drive, Seagate Hawk (2GB, 5400 RPM, SCSI), Adaptec 2742
SCSI EISA bus controller, no caches. That should be a swift configuation,
more so that IDE drives or ISA bus stuff. Max throughput is 2MBps.
	Ram drive. Max throughput is 6MBps. And this says perform3 itself
maxes out at that speed, the same as we saw for 100Mbps Ethernet and one
station.
	100Mbps Ethernet can produce faster transfers than the local drive,
and match the RAM drive. Throughput from the server will then be limited
by the disk system on it, which can be made fast indeed, or the capacity
of the wire.
	Joe D.

------------------------------

Date: Sat, 7 Jun 1997 20:58:15 +0200
From: "Arthur B." <arthur-b@ZEELANDNET.NL>
Subject: Re: Performance Gain from Replacing Hub with Switch

>For 30 users, our LAN has become slow (Compaq Proliant 800, 80
>Meg RAM, 4 Gig duplexed HD's).  Question--will replacing the hub
>with a switch (10BaseT, but 100BaseT to the server) likely create
>a noticeable difference in speed by itself?

It will. Users and servers get full-duplex (send and receive at the same
time) instead of half-duplex. And traffic goes their only where it is
needed. It's a nice device but not often used in a 30 PC environment.

The big question is... why do you think it's needed and is it needed?
There are other options you might want to consider.

------------------------------

Date: Thu, 19 Jun 1997 10:04:39 -0600
From: Joe Doupnik <JRD@CC.USU.EDU>
Subject: Re: Load balancing on local LAN with IPXRTR

>We are experiencing some bottlenecks on the NIC of one of our main
>4.10 servers, and are looking for ways to improve/eliminate this
>condition.  The server is on its own segment, connected directly to
>our backbone Ethernet switch.  <snip> I know that Novell has the
>ability to "load balance" the traffic using the IPXRTR NLM.  By
>putting multiple NIC's in the server, binding the same IPX network
>number and address to both cards, and loading the IPXRTR NLM you
>would be able to distribute the traffic over the NIC's.
>
>My question: Is anyone else already doing this?  Has it helped?
>Are there any other options?
--------------
	Let's take apart the problem.
	Lots of traffic on the wire is fine, provided the queueing delays
don't become too long nor lan adapters get smashed from too high a packet
rate. Some numbers are often helpful in defining size adjectives.
	So we can have a wire that is just plain too full for decent
response times. And we can have a server lan adapter that is overwhelmed
by the traffic, or both.
	What does load balancing do for us? The idea is to split the traffic
going out of the server across one or more lan adapters and somehow put it
onto one wire, or if money is there then to multiple ports on an Etherswitch.
That is, the "balance" part is sensed by the outgoing queue lengths. If only
one wire is available then simply get a better lan adapter and forget
complexity. If multiple wires are available then my inclination is connect
them to separate hubs and hence truely multiply your lan bandwidth: one
server lan adapter per wire.
	Load balancing is an IPX affair, not an IP one. Splitting the wiring
into parallel paths, with no load balancing software, is simpler and works
on all kinds of traffic. It also introduces IP subnetting issues. Clearly,
load balancing and multiple lan adapters feeding a clogged wire does nothing
good at all. A decent lan adapter in a server can deal with a totally full
Ethernet wire, without load balancing software.
	Again, if only one wire is available then consider making that wire
100Mbps Ethernet to gain capacity. Many hubs have such an "uplink" port
available. Match the wire with a decent lan adapter (I happen to like the
Intel EtherExpress PRO/100 boards at this time for reasons of outstanding
performance, the low price doesn't hurt either). Whatever you purchase,
ensure it is a technically satisfactory product.
	If the traffic situation is generally awful all around then its
time to redesign the topology and strategy. Segregrating traffic is the
normal first step, and we use bridges and routers for the task. Each
extracts a toll of reduced throughput (but still a net gain from more open
time on the wires) from the one packet transit time and the expense of
buying the boxes. Sometimes a better hub concentration plan does the trick
without spending much money, and we couple that with more wires to the server
to gain bandwidth at minimal expense. Clustering clients behind NW servers
rather than putting everyone on a backbone is a normal curative; that's a
router style solution.
	If there is only one wire and it is busy from traffic not involving
the NW server then an Etherswitch is the item to employ, to let traffic
cross over point to point without clogging the wire to the server. An
Etherswitch doesn't do any good if nearly all the traffic goes to one place,
and it can merely slow down the net with no benefit. Please do read the
fine print on Etherswitches (such as backplane speed, MAC table lengths
per port, whether the MAC tables automatically refresh if a station moves
from wire A to wire B, and so on); they are not created equal.
	Carrying the general topology plan to greater lengths suggests
merging 10Mbps streams into backbone fat pipes, either 100Mbps Ethernet
or FDDI or ATM. Before spending lots of money one needs traffic measurments.
	Joe D.

---------

Date: Thu, 19 Jun 1997 11:19:21 -0700
From: Andrew Bynum <abynum@PC-SUPPORT.COM>
Subject: Re: Load balancing on local LAN with IPXRTR

Yes, it does work, but it depends on what the users are accessing the
segment for.  How many users do you have logging on to the server, and
what are they doing once they are there?  The problem with putting a
server directly onto a switch is that the buffering mechanism that is
built into ethernet (collision domains, with partitioning hubs) no
longer is effective at doing its job.  If you are saying that the load
reported by the server statistics is higher than what you want, it very
well may be that your baseline is not accurate for your configuration.
I did some stress testing on switches for Intel, utilizing multiple NICs
in Novell servers (3.12 - 4.11) and found no problem.  Are people not
able to login?  Are print jobs taking longer to print?  These are the
indications that show you have a configuration problem.  Whenever you
plug a NIC directly into a switch, whether it be a server, client, or
printer, you are always going to have higher statistics reported at that
node.  Where are you getting your statistics from?  NIC's were designed
with CSMA/CD in mind, but switches deny them of much of this function
(although on occasion it does come into use).  What kind of switch are
you using, and does it have any kind of reporting features built in.

------------------------------

Date: Fri, 4 Jul 1997 01:41:34 +0200
From: "Arthur B." <arthur-b@ZEELANDNET.NL>
Subject: Re: .....utilitzation.......

>Hello......when looking at and considering "utiliazation"  which is
>"more" important, packets per second and packet size or bytes per second?
>Given the "real world" thru put of about 5MB/Sec , what is a good way to
>think about and measure what's on the "wire."

Packets/second.
Each packet should claim the entire wire for a single moment in time in
which time other packets shouldn't be transmitted. Size of the packet is
not important for this. Too many packets and clients need to be more
patient in transmitting their packets (and wait longer for the answer
to arrive). Thus resulting in loss of performance.

In worse cases a lot of excessive collisions may occur, resulting in too
much fragmentation and following jamming signals being transmitted on the
wire (which tell every NIC to "shut up" for a while). After which almost
every NIC connected on the wire has the need to transmit their packets
since they have waited long enough as far as they are concerned.
...but increasing the chance of another excessive collision. After a
while this behaviour will fade out if overall network utilization lowers.

Another thing to watch is the 'network utilization'. The overall workload
on your wire. The higher peaks are moments of performance loss. Too many
of them and users are not happy. The average workload determines the chance
on getting the higher peaks. I like an average utilization below 3% and
most of the peaks not above 35% (problem is getting there if at all
achievable that is).

Try to pinpoint the processes/NICs that are responsible for the most peaks
and the ones that boost your average utilization by a steady stream of
packets. If you can lower their transmitting behaviour enough you should
get noticeble results. If all fails you may wish to separate them
(eg segmented hub).

Example: pinpointing a bunch of printers that are searching for jobs with
an interval of 1 second (increase their interval to 5 seconds and you just
lowered your average utilization) -or- replacing the widly used but not
network-friendly app with a calmer one (average and peak go down) -or even-
pinpoint a process that checks for the existence of certain files every
so often but is probing target directories *and* the entire search path
(do a SET PATH= just before starting that process and average utilization
just went down again).

---------

Date: Thu, 3 Jul 1997 18:55:10 -0600
From: Joe Doupnik <JRD@CC.USU.EDU>
Subject: Re: .....utilitzation.......

>>Hello......when looking at and considering "utiliazation"  which is
>>"more" important, packets per second and packet size or bytes per second?
>>Given the "real world" thru put of about 5MB/Sec , what is a good way to
>>think about and measure what's on the "wire."
>
>Packets/second.
>Each packet should claim the entire wire for a single moment in time in
>which time other packets shouldn't be transmitted. Size of the packet is
>not important for this. Too many packets and clients need to be more
>patient in transmitting their packets (and wait longer for the answer
>to arrive). Thus resulting in loss of performance.

	Let's be much simpler here. If the person is concerned about
utilization of the wire then that is insufficient verbage to define a
problem. For example: client talks to server and moves big files. The
wire can run at 80-90% capacity with just that traffic and things are
just perfect. I can and have shown on this list just such data for 10
and 100Mbps Ethernet to INW 4.11.
	The wire doesn't care. I'll explain more below.
	There is a concern about packets per second, however, as a
consequence of spending cpu time processing each packet. On slow machines
this becomes a dominant concern. With awkward or slow lan adapters, or
with slow buses, this becomes *the* dominant factor. It's part of the
overhead of doing business. Another part is packet headers using time on
the wire, and it too can be horrid if tinygrams are employed.

>In worst cases a lot of excessive collisions may occur, resulting in too
>much fragmentation and following jamming signals being transmitted on the
>wire (which tell every NIC to "shut up" for a while). After which almost
>every NIC connected on the wire has the need to transmit their packets
>since they have waited long enough as far as they are concerned, 
>increasing the chance on another excessive collision. After a while this
>behaviour will fade out if overall network utilization lowers.

	Not quite. Yes, there will be collisions if multiple parties try
to transmit at the same time; that's normal. Collision pieces are tiny,
a dozen bytes or less, because the distances are tiny (speed of light is
finite etc). The transmitter sensing a collision may continue to send
jam info to fill 64Bytes (because the controller can be made that simple).
That is very very little time on the wire. Stations separate themselves via
the Ethernet binary truncated exponential backoff algorithm. Up to 1024
stations can contend, successfully, for the wire, and we put nothing like
that number in one contention domain.


>Another thing to watch is the 'network utilization'. The overall workload
>on your wire. The higher peaks are moments of performance loss. Too many
>of them and users are not happy. The average workload determines the
>change on getting the higher peaks. I like an average utilization
>below 3% and most of the peaks not above 35% (problem is getting there
>if at all achievable that is).

	That's not the way I see things. High wire utilization is performance,
period. Not a loss. Wire utilization itself tells us little other than the
wire is doing its job. One Pentium 90 client can fill 60% of a 100Mbps
Ethernet all by itself. A Pentium 200 can use it all. That's good, not bad.
It means they can get the job done quickly and leave the wire free for
other machines, and user's like things which happen quickly (or, um, many
such things). It also means stations don't let the wire go idle unnecessarily
and thus increase the time to complete a job. 100% utilization peaks mean
you have very fine stations on the wire, including the server. It can
also mean you have a lot of maybe fine stations hammering on a faithful
server, and each gets only a fraction of the available total capacity
(wire and server, not just wire).
	To rub in the salt, when a packet is on the wire the utilization
is 100%, by definition. Smaller wire utilization means time goes by with
no transmission, and that is not doing anything useful.

	Now let us see what the underlying reason is for nets slowing down
when the wires become busy. It's called queueing. If packets need to go out
but can't right now because the wire is occuppied then they form a queue.
Simple queueing theory for exponentially distributed packet generation
(Poisson process) by many stations and exponentially distributed packet
servicing (by their disappearance onto the wire) yields the interesting
result that the average time a packet spends waiting in the queue is
	T packet times = 1/(service - arrival)
where "service" is the average service rate in packets per possible packet
time (think of packet slots for convenience), and "arrival" the average
rate of packets joining the queue per packet time. This is known as Little's
result (See, for example, Andrew Tanenbaum's "Computer Networks").
	Notice a couple of things. If "service" and "arrival" rates are
equal the average queue delay is infinite. Yikes. The number of packets
in the queue is T * "arrival" packets, which also goes infinite. Remember,
we are dealing with statistical averages, where idle servers mean wire
capacity is lost forever without recovery.
	If "service" is twice as large as "arrival" then on average each
packet waits twice as long compared to the "arrival" = 0 (idle wire) or
the "service" = infinity case. Fast service means packets are carried away
in the ether very quickly. Infinity is slightly faster than say Pentabit
Ethernet.
	Some quiet reflection says when a queue becomes busy the line
(and delay) grow very quickly indeed. This is the traffic jam effect.
There is not a mention of collisions so far, but to accomodate them make
retransmission attempts be fresh arrivals in the sending queue. Half or
three quarter capacity utilization means things will wait to get onto the
wire, and the busier the wire the longer will be the waiting time.

	We can lower the delay by increasing the service rate, "service."
I did just that by going from 10 to 100Mbps Ethernet. Microsoft could
lower the delay by making the "arrival" rate (program size) smaller.

	The other part of service rate is what the receiving end needs to
do with a packet to keep the conversation going. Sluggish servers will
make the system slow regardless of wire speed. The receiver is another
queue in series with that of the wire. Sharp readers will instantly
recall the "packet receive buffers" value on Monitor climbing when the
server becomes blocked by other events, and that's one important part
of the receiver queue. If the server is really swift one needs very
few "packet receive buffers."
	Overall, looking at only one component of the network is silly
and misleading. The network is a set of coupled systems: clients (many),
wires, delays in bridges/switches/routers, delays/queues in the server,
queues for disk activity, and so on. Good networks are balanced to put
the major delays at either extremity of the system (client and server)
where software can sense what is going on and take constructive steps
to behave appropriately (not to mention provide queue buffer space). Bad
nets have bottlenecks in the middle where the only recourse to overload
is dropping packets. Ethernet Carrier Sense is just dandy as a local
throttle to keep material queued at home until the wire is free.
	Ask meaningful questions, such as how fast can my clients get their
job done, and the answers are revealed through analysis and measurment of
all the components (not neglecting bloated programs and poorly written
ones doing small transfers per packet).
	Joe D.
P.S. Appologies for the long message. This is part of the foundation of
computer networking and ought to be understood by professionals.

>Try to pinpoint the processes/NICs that are responsible for the most peaks
>and the ones that boost your average utilization by a steady stream of
>packets. If you can lower their transmitting behaviour enough you should
>get noticeble results. If all fails you may wish to separate them
>(eg segmented hub).
>
>Example: pinpointing a bunch of printers that are searching for jobs with
>an interval of 1 second (increase their interval to 5 seconds and you
>just lowered your average utilization) -or- replacing the widly used but
>not network-friendly app with a calmer one (average and peak go down)
>-or even- pinpoint a process that checks for the existing of certain
>files every so often but is probing target directories *and* the entire
>search path (do a SET PATH= just before starting that process and average
>utilization just went down again).
>* Arthur B.

------------------------------

Date: Wed, 6 Aug 1997 21:50:16 +0100
From: Richard Letts <r.j.letts@SALFORD.AC.UK>
Subject: Re: Netware 4.11 high utilization

>We had a netware 4.11 server with 550 concurrent users, 2 nic 3com
>3c59x at 10 bmps and NLSP with LOAD BALANCE set to "ON"
>the processor utilization in the server was 10 %..
>
>Now, when we install switches 100basex in our LAN and set the nics in
>the 4.11 server to 100mbps, the processor utilization up to 80% .

10 times the network thoughput, 10 times the cpu-load (about)..

>AES processes call backs

These are the hooks that get called when a packet arrives.

All network cards are not equal, some are bus-mastering, others have
on-board processors to off-load the main CPU.

The 59x series of cards rely on the main CPU to do alot of the work;
Joe'll be along any moment to probably reccomend you try Intel's P100
SERVER boards as these have on-board i860 CPU's

---------

Date: Wed, 6 Aug 1997 15:07:24 -0600
From: Joe Doupnik <JRD@CC.USU.EDU>
Subject: Re: Netware 4.11 high utilization

	Is that my cue?
	Ok, 80% utilization for two 100Mbps lan adapters is indeed high
compared to at least one other adapter maker. Tests here using full 100Mbps
Ethernets shows the Intel EE Pro 100B (a client board, but in the server)
to consume about 20% of the cpu per wire, on a PPro 200 motherboard server.
That too is INW 4.11. I have three such boards in the server, and two
NE-2000's for printer wiring and for connection to the outside world.
The Intel boards are PCI based, full bus master units, and their cost has
dropped to attractive levels.
	Keep in mind that your load balancing requires lots of decisions
to be made, for each outgoing packet. I turn off NLSP completely and have
no such overhead. In fact load balancing is questionable in many situations.
Instead I split the network into separate collision domains and assign a
server board to each. The Intel board is quite able to drive a full 100Mbps
wire all by itself, at the stated 20% cpu loading. The better Intel EE PRO/
Server board should drop the utilization down to the 5% range; it has bigger
buffers and an i960 cpu to keep the lan adapter happy. Alas I have motherboard
versus Pro/Server board troubles at the moment, so the client style boards
are doing the work.
	Joe D.

------------------------------

Date: Thu, 7 Aug 1997 20:51:37 -0500
From: "John H. Lederer" <johnl@IBM.NET>
Subject: Raid Arrays

I thought I might share some of the research/materials I found.  A lot
of the info comes from Gibson, Patterson, Katz (the original researchers
on Raid) and their subsequent associates.
===
First, and most interesting they regard Raid 1 as a specific instance of
Raid 5. Think about it and you will see what they mean.
===
Second, in terms of first order effects and assuming that all disks cost
the same and have the same performance, the following table gives a
fairly good explication of Raid performance. The table is one for a
ratio of cost to performance with a Raid 0 disk unit equalling 1.  A
number of 1/2 means that the performance cost ratio is 1/2 that of a
single Raid 0 drive in terms of I/O per second -- either you have twice
as many drive units and the same i/o performance, or the same number of
drive uits and half the i/o performance:

For Raid 1 and Raid 5.

N= number of drive units

Small Read      Small Write    Large Read     Large Write   Storage
efficiency

     1               1/N          1            (N-1)/N         (N-1)/N
	       (but never less
		than 1/4)

This implies that if small writes are an important performance criteria,
then use Raid 1, if large writes are important, then use Raid 5 with
many disks.

The chart dioes not account for duplexing which will double read
performance.  To me this factor is critical, because I can get duplexing
out of Novell for "free"  (or some cpu load). As a practical matter I
don't have the money to duplex Raid 5.

============

Reliability

There are a number of scenarios under which any Raid array can have data
loss (e.g. two drives fail). In a very rough way, the probability of the
majhor ones of these scenarios increases as the number of disks
increase.  Therefore, for instance, from a reliability point of view, a
3 disk Raid 5 array is more reliable than a 7 disk raid 5 array.

===========

My thoughts:

As I said duplexing made up my mind (and my environment has a large
number of small writes).  However, there is a second factor.  If I do
duplexed Raid 1 I can use high quality "standard" components.  I can,
for instance, buy high quality disks at a reasonable price.  If I do
raid 5, I normally end up with proprietary components that are very high
priced, e.g. the same drive in a plastic tray with a different plug and
a fancy label costs me 1.5-2.5x as much. This makes those ratios even
worse.  Duplexing and standard drives, gives Raid 1 almost a 3 or 4 to
1  performance cost adavanatge.

I know some of you will suggest that caching makes this all wrong.  It
might, and I don't have good figures.  However, as a general matter, I
think cache is better in a single large cache (higher probability of
hits).  Thus I suspect that I do nearly as well or better to increase
system memory than to buy cache for the controller (particularly if the
cahe is porprietary and expensive).

---------

Date: Sat, 9 Aug 1997 01:25:38 GMT
From: "Eric E. Allen" <eric.e.allen@WORLDNET.ATT.NET>
Subject: Re: RAId

Check out the RAID Advisory Board web page.

	www.raid-advisory.com

If you are member you are able to receive the booklets on RAID at no cost.
However, they are priced very reasonably if you or your company are not a
member.

------------------------------

Date: Mon, 11 Aug 1997 21:19:48 GMT
From: "Eric E. Allen" <eric.e.allen@WORLDNET.ATT.NET>
Subject: Re: Raid Arrays

>I would love to get some detail on cache performance.  The old saw is
>that cache is best kept in a large single location (e.g. the server
>cache) because one's probability of an early hit in cache increases.

It is true for Novell OS's the best performance boost for caching is to
add more memory to the server in the beginning. But as the Dirty
Cache Buffers rise it (usually means there is a bottle neck in the I/O)
This is where the added cache on the controller helps the most. RAID 5
write performance can be made equivilant to RAID 3 by increasing the
cache. The large cache offering from the manufactures is to help
Windows NT Server get better performance since it does not a very
good job at caching.

>One disadvanatge that I would see for controller cache/write
>verification is that it would not meet the requirements of some for "end
>to end" verfification. The controller would verify that it wrote to disk
>what it got from the server -- but cannot verify that what it got from
>the server is what the server sent.

Array controllers have the ability to perform read/write/read verification
of the data at the array level. Most users sacrifice this function for
performance.

>A Compaq salesman made the claim that Compaq drives (third party
>relabeled) were built and tested to a higher specification. When I
>pressed for detail (what is the higher spec and how does it differ from
>the manufactruer's normal spec, does the disk manufacturer run a
>separate line, etc.) I was unable to obtain detail.

This area depends on the relationship the company has with
the drive manufacture. If it is a good relationship then drives
that the company receives from the drive manufacturer can
be rejected for having to large of a permanent bad block table
or any other quality benchmark that is set in place.

------------------------------

Date: Sun, 26 Oct 1997 11:01:27 -0600
From: Joe Doupnik <JRD@CC.USU.EDU>
Subject: Re: Netware slower than DOS -- an update

>Some of you may recall a couple of messages I have posted here
>recently, in which I described apparent disk I/O limitations of
>NetWare 3.12 servers.
>
>To summarize, I found that raw disk I/O speed of NetWare 3.12 was
>several times slower than that of DOS running on the _same_ _machine_,
>and that this result was _independent_ of the type of hard drive used.
>I confirmed this on more that one differently configured machine,
>although I was not able to do NetWare 3.12 tests on any name-brand
>server-grade systems.
>
>I have just done some tests which I think are rather interesting.  On
>one of my test machines, I installed NetWare 4.11 and ran IOZONE on
>it.  During this test, dirty cache buffers never got above about 500,
>and IOZONE's throughput remained steady at about 500,000 bytes per
>second. The dirty cache buffer observed write rate was around 250 per
>second.  NCOPY of a 10MB file took about 5 seconds.
>
>Booting this same machine under 3.12 was a rather different story.
>Dirty Cache Buffers easily reached in the vicinity of 3,000, and
>IOZONE's throughput dropped as low as 60,000 at one point.  The dirty
>cache buffer observed write rate was around 25 per second.  I didn't
>try to 10MB copy this time, but tests in the past, even using the NW4
>NCOPY on a 3.12 system, took over a minute.
>
>So it looks like NetWare 4.11 has nearly an order of magnitude better
>disk I/O than Netware 3.12.  Yes, I am incredulous.  I'd love to hear
>of tests that prove me wrong, or explanations for this dramatic
>difference that has not to my knowledge been previously reported.
--------
	Perhaps I can explain all this and put matters back into perspective.

	The transfer rates you are seeing, say 0.5MB/sec, are determined
by the network, not by the server's hard disk.
	Some tests and insights.
	Run Novell's Perform3 to your server. It first creates a file of
a few KB (the size of the test) and then repeatedly reads it (NCP request
of Read File Handle N, offset of 0 bytes) for the test duration (12 secs
default). That file is small enough to stay in the server's cache memory
and hence eliminate the disk from the end to end situation. Thus Perform3
is basically a test of the network component.
	Decent 10Mbps Ethernet adapters and drivers can reach up to 0.6 to
0.7MB/sec on long transfers using Packet Burst, half that if PBurst is not
used. 10Mbps Ethernet can carry about 1+MB/sec. Two or more clients doing
the same test together can fully use a conventional 10Mbps Ethernet.
	Decent 100Mbps Ethernet adapters and drivers can reach 10 times
these values: 6MB/sec for a Pentium 90 client, a full 10+MB/sec for two
or more (or just a faster client).

	Applying iozone to a local disk drive yields numbers in the range
of say 2.5MB/sec. Iozone first writes a long file and then reads it back.
It then (in "auto" mode) repeats this with double the file length. When using
a file server the whole file goes to the server's disk, and only part of it is
likely to remain in server cache for the read-back. Transfer rates to decent
servers over 100Mbps Ethernet are about 3.5MB/sec. That's faster than a local
hard disk.
	Let's say this again. A 100Mbps lan plus a NW 4 server and decent
disks can be markedly faster than local disks.
	If the network is slower than 3.5MB/sec transfer rate then clearly
the iozone test to a server will be appropriately slower too, and that is
what you observed with 10Mbps Ethernet. Max throughput of 100Mbps Ethernet
is about 10+MB/sec of user data.

	Further, NW permits using logical disk units (disk allocation units)
larger than 4KB. NW 3 servers are typically constructed with 4KB units to
reduce loss of space from tag ends. NW 4 servers have subblock allocation
and thus we operate them with typically 64KB allocation units and let the
subblock stuff reuse tag ends. The difference in allocation unit size means
the server's disk system works a great deal less hard with larger rather
than smaller allocation units. Think of this as block i/o with larger blocks
to reduce the per call busy-work overhead, which of course is precisely the
idea of the matter.
	Depending on the lan adapters involved, the server can fall behind
by spending lots of cpu cycles on the lan adapter rather than the disk
drive, or by spending too many cycles beating on say IDE drives and not
keep up with the network. Smart stuff really counts, so use SCSI.
	Server bus kind counts for a great deal too. ISA is slow, EISA
is much faster, PCI is faster yet.

	Finally, to repeat a word of caution on Packet Burst. It can and
does become unstable under heavy stress. That first reduces throughput and
then can overwhelm a receiver as PB tries to recover by sending as fast as
physically possible. Instability depends on the lan adapters involved, on
the speed of the client cpu, and probably phase of the moon, but happen it
certainly does. Use a good wire monitor, say Novell's LZFW, to observe it.
Unstable PBurst can yield throughput numbers which get worse rather than
better as transfers become longer.
	If you were to compare a NW 3 with a NW 4 file server with the
same disk and lan setup then one could make remarks about the efficiency
of the o/s. But I'll wager that is not the way you did things. Instead,
my guess is you used 4KB disk allocation units on NW 3 and 64KB with
subblock allocation on NW 4.
	Try again with identical allocation unit sizes. Try NW 4 further
with and without suballocation, and see that the feature likely costs time
yet saves disk space.
	Joe D.

---------

Date: Mon, 27 Oct 1997 11:04:34 -0600
From: Joe Doupnik <JRD@CC.USU.EDU>
Subject: Re: Netware slower than DOS -- an update

>>The transfer rates you are seeing, say 0.5MB/sec, are determined
>>by the network, not by the server's hard disk.
>
>or the peak rates, that's clearly true, Joe.  But it doesn't explain
>why transfer rates would dip so much lower on 3.12 as dirty buffers
>climb(packet burst?).

	Packet Burst, yes, also a constipated server. Novell's Lanalyzer
is especially good about the latter by displaying a "server overload"
alarm. That alarm really means an NCP request was heard two or more
times while the first instance had not been satisfied, and the server
reports back "yes, yes, I heard that, please be patient" which then
triggers the LZFW alarm.
	Clearly, a sluggish disk farm can lead to these blockages too.

>And it says nothing about the dramatic _differences_ in the dirty
>cache buffer counters and write rates between 3.12 and 4.11.  These
>are critically important to running a busy server with a large,
>randomly-accessed data base.

	Actually I wonder about that. The dirty buffers simply reflect
the number of items waiting in the disk-write queue. That means they
are available, in principle anyway, for being part of the disk cache
and thus readable from that cache before hitting the disk drives.

>A server that's busy writing because of slow disk I/O will have much
>poorer read performance on uncached data, which will be frequent with
>large, active data bases.

	That's complicated and I'm not ready to speculate on it.

>or these tests comparing disk read/write rates on 3.12 and 4.11, I
>pretty much ignored the issue of absolute transfer rate as being
>irrelevant (they _were_ close to identical with small, easily-cached
>files).  Instead, I concentrated on the relative rates between them.
>I guess I assumed everyone else would do the same.  Sorry if I was
>unclear.

	Still isn't clear.
	<chopping away the rest of the long message...>
	There are yet two more items not mentioned but hopefully were
controlled. One is the disk read-after-write checking. NW 3 has it on by
default, NW 4 has it off by default. The difference is about a factor of
two in write time, and hence is reflected in vastly different queue lengths
(dirty cache buffers). And the two O/S's can differ in their default number
of simultaneous disk writes, so align those guys too.
	Through all this I am not stating one O/S is necessarily faster
than other, but one suspects NW 4 does have faster paths based on learning
experiences from NW 3. However, I am saying 10Mbps Ethernet itself is the
dominant limiting factor for client-server exchanges, because those exchanges
go dramatically faster with 100Mbps Ethernet and the disk farm is the limiting
factor with 100Mbps links. Add PBurst instability and the numbers can go all
over the map (use LZFW to observe this item).
	I'm delighted to see continued probes of system performance. We
learn about all kinds of non-obvious impediments this way. My grad computer
networks class conducted very similar experiments last week as one of their
lab assignments: 10 and 100Mbps Ethernets to the same server, Perform3 and
iozone tests, etc. They observed the same numbers I reported from my own tests.
INW 4.11 server in this case rather than dual NW 3 and 4 servers. VLMs and not
Client32 as well (yet another item to watch when folks do testing).
	Joe D.

---------

Date: Tue, 28 Oct 1997 04:08:17 GMT
From: Ken Wallewein <wallewek@CADVISION.COM>
Subject: Re: Netware slower than DOS -- an update

I just had to respond briefly (sorta) to a couple of points.  Anything
more would be better served with fresh data.

>>And it says nothing about the dramatic _differences_ in the dirty
>>cache buffer counters and write rates between 3.12 and 4.11.  These
>>are critically important to running a busy server with a large,
>>randomly-accessed data base.
>
>       Actually I wonder about that. The dirty buffers simply reflect
>the number of items waiting in the disk-write queue. That means they
>are available, in principle anyway, for being part of the disk cache
>and thus readable from that cache before hitting the disk drives.

Agreed, for data in the dirty cache buffers.  But what about reading
data not currently in RAM?  Semi-casual observation (not careful
tests, granted), as well as NetWare documentation on side effects of
tuning the dirty cache buffer concurrent write rate, tend to support
my analysis (a server with many dirties gets dog slow reading the
disk).  And it only makes sense that slow disk I/O will exacerbate
both conditions.

>       There are yet two more items not mentioned but hopefully were
>controlled. One is the disk read-after-write checking. NW 3 has it on by
>default, NW 4 has it off by default. The difference is about a factor of
>two in write time, and hence is reflected in vastly different queue =
lengths

Actually, I tried that on a mirrored server, Joe. I was disappointed
by the results.  Does mirroring affect it much?

>(dirty cache buffers). And the two O/S's can differ in their default
>number of simultaneous disk writes, so align those guys too.

I'm not sure I would want to do that (see below).

>Through all this I am not stating one O/S is necessarily faster
>than other, but one suspects NW 4 does have faster paths based on
>learning experiences from NW 3. However, I am saying 10Mbps Ethernet
>itself is the dominant limiting factor for client-server exchanges,
>because those exchanges

Several thousand dirty cache buffers generated by a single 10baseT
client running IOZONE doesn't sound like a network bottleneck to me.

>go dramatically faster with 100Mbps Ethernet and the disk farm is the
>limiting factor with 100Mbps links. Add PBurst instability and the
>numbers can go all over the map (use LZFW to observe this item).

Joe, you make a number of excellent points.  But you've got me
thinking.  For these tests, I did fresh installs of NetWare 3.12 and
4.11 on the same machine.  By and large, I used the default settings.

Before I test a changed configuration, I want to think carefully about
whether I even care what the results would be.

My objective in all of this, after all, is not to determine relative
speed of respective OS's core I/O performance.  It is to optimize the
performance of mission-critical production business systems.  OS
internals are interesting only over beer or to the extent that they
can be applied.

---------

Date: Tue, 28 Oct 1997 09:33:23 -0600
From: Joe Doupnik <JRD@CC.USU.EDU>
Subject: Re: Netware slower than DOS -- an update

>I just had to respond briefly (sorta) to a couple of points.  Anything
>more would be better served with fresh data.
>
>>>And it says nothing about the dramatic _differences_ in the dirty
>>>cache buffer counters and write rates between 3.12 and 4.11.  These
>>>are critically important to running a busy server with a large,
>>>randomly-accessed data base.
>>
>>        Actually I wonder about that. The dirty buffers simply reflect
>>the number of items waiting in the disk-write queue. That means they
>>are available, in principle anyway, for being part of the disk cache
>>and thus readable from that cache before hitting the disk drives.
>
>Agreed, for data in the dirty cache buffers.  But what about reading
>data not currently in RAM?  Semi-casual observation (not careful
>tests, granted), as well as NetWare documentation on side effects of
>tuning the dirty cache buffer concurrent write rate, tend to support
>my analysis (a server with many dirties gets dog slow reading the
>disk).  And it only makes sense that slow disk I/O will exacerbate
>both conditions.

	Clearly, if a disk is way behind with a huge write queue and
then someone asks to read from the disk itself then there will be
competion for sequencing of events. That's a system strategy item.
	Rather than get wound up in things I can't change I tend to
stand back and ask simpler questions: is all this because I have a
weak system and could it be improved by changing components? For
example, disk controllers, the age of their drivers, the tuning of
the drivers, the strength of the disk drive itself, all count a lot.
Some disk drives don't work well with lots of outstanding requests
(tagged queueing in Adaptec-speak), so the driver needs tuning to
reduce the load; Adaptec says this in their docs.
	In this case I suggest, if you can, using the latest drivers
rather than those shipped on the very old original NW 3.12 media,
and change the disk drive to a current production unit. This should,
add quotes here, improve matters.
	As yet another example on drives. Our likable Seagate Barracuda
SCSI units (7200 RPM) differ by a factor of about four in sustained
throughput from early to current models. And that's from higher bit
densities on the disk surface; the mechanicals are about the same.
These are drives with decent SCSI implementations.

>>        There are yet two more items not mentioned but hopefully were
>>controlled. One is the disk read-after-write checking. NW 3 has it on by
>>default, NW 4 has it off by default. The difference is about a factor of
>>two in write time, and hence is reflected in vastly different queue =
>lengths
>
>Actually, I tried that on a mirrored server, Joe. I was disappointed
>by the results.  Does mirroring affect it much?

	Mirroring costs a little time, yes. Same controller, twice the
data to move, with disk rotation waiting. If SCSI disconnect is defeated
then the SCSI bus is tied up waiting.
	Flogging a weak disk system isn't productive.
	<omissions>
	Joe D.

------------------------------

Date: Tue, 28 Oct 1997 12:04:48 -0600
From: Joe Doupnik <JRD@CC.USU.EDU>
Subject: Re: mirroring/duplexing performance

>>>Actually, I tried that on a mirrored server, Joe. I was disappointed
>>>by the results.  Does mirroring affect it much?
>>
>>	Mirroring costs a little time, yes. Same controller, twice the
>>data to move, with disk rotation waiting. If SCSI disconnect is defeated
>>then the SCSI bus is tied up waiting.
>
>Does duplexing avoid this performance issue of writing the same
>amount of data twice on a write?
>
>I know reads are faster from duplexing...
-------
	Duplexing still requires the disk driver to send the data twice,
once to each controller. But the controllers themselves will normally
deal with moving data across the system bus into their own silicon. That
means the driver issues the commands twice and then gets out of the way
of the bus master controllers.
	If one wants to be cheap and clever and fast at the same time
without going to full blown hardware RAID then adding spindles to a
NW volume creates a striping situation where bits and pieces of files
are scattered across each spindle. That can increase disk farm throughput,
at very obvious risk factors. A safer move is simply use the fastest disks
with excellent SCSI controllers and PCI bus, with large disk allocation
units (and optionally subblock allocation to avoid space wastage).
	I use mirroring only as a crutch to solve some immediate hardware
problems (drive goes off line every now and then). Remirroring is a very
heavy load and long process when the server goes down abnormally (ABEND).
I have not had the money nor free slots to play the duplexing game.
	Joe D.

------------------------------

Date: Wed, 5 Nov 1997 17:11:57 -0600
From: Joe Doupnik <JRD@CC.USU.EDU>
Subject: Re: ..packets per second.....

>I know there are hundreds of variables, but what would be a reasonable
>figure for  packets per second and bytes per second in and out  on a 200
>WKS, 10 server, IP and IPX network?...is 200 to 500 PK/s good, bad,
>OK...should it be 80 to 120 PK/S ??...according to my ManageWise
>system my network seems to run in the 100-200 PK/S range...is this
>then the answer for my system?...is there no "normal" or "usual."
>What about across a router ?...packets in and packtes out
>?...20-80, or 200-300 ?
---------
	First, I always like to know if there is a problem. So we break
apart the pkts/sec question into two parts. The first part is how many
packets can we squeeze onto the wire, realizing that the wire does not
care if it is very busy all day. The second part is what consequences
are there for hammering a machine, say a server or even a client, with
too many packets in a short time interval.
	The busier the wire the more your network is earning its keep.
Do some arithmetic to see how many packets the wire could carry at the
max. What are min/max Ethernet frame sizes? What separation must occur
between each? Please do not pay any attention to that man labeled
"collision" standing behind the magic curtain.
	What happens when a frame arrives at its destination station?
An interrupt from the lan adapter to the cpu+driver is what. And that
means time to deal with the arrived frame. How much of that do we think
a machine can take? Good question, and the answer is not a terrific
amount and still move the mouse cursor. It all depends, as we are wont
to say, on the robustness and speed of the receiver and what else is
competing for resources.
	Rather than reteach a course on networking I'll make two additional
comments. One is for regular 10Mbps Ethernet 1000 pkts/sec is a lot,
multiply by 10 for Fast Ethernet. The other is experiment to discover
what sustained loading does between a client and a server, and what it
does to other stations trying to reach the same server. Perform3 and
iozone make nifty test tools, as we have discussed on the list many
times. Visit netlab2.usu.edu, cd apps, for both. I warmly recommend
testing, and thinking about the results.
	Joe D.

---------

Date: Wed, 5 Nov 1997 17:50:42 -0600
From: Joe Doupnik <JRD@CC.USU.EDU>
Subject: Re: ..packets per second.....

---------
	To see graphs of throughput on 10/100Mbps Ethernet you might wish
to look at the visuals-only portion of a presentation I put together this
summer. It's a Office 97 PowerPoint thingy for Win95, archived as file
fasteth.zip in directory misc on netlab2.usu.edu.
	Some careful looking reveals stations competing on the wire, Perform3,
and what fraction of it they get. Pentium-90 clients to a PPro 200 INW 4.11
server. On the matter of bytes/sec, first figure out the carrying capacity
of Ethernet. Then look at the many sites now running MRTG monitoring
information to graph bytes/sec or bits/sec on a five minute average basis.
Clicking on netlab1.usu.edu will show one example. Server edu-usu-engrlab
is a student lab, INW 4.11, upon which the above presentation experiments
were run (when quiet).
	Joe D.

------------------------------

Date: Thu, 6 Nov 1997 21:18:00 -0600
From: Joe Doupnik <JRD@CC.USU.EDU>
Subject: Re: Anyone using Adaptec AHA2940, NW4.1 more than one HDD?

>I have a problem with the "speed" of hard disk access on our system.
>
>We use to have 1 Seagate ST32430N on the server. About 6 months ago I added
>3 ST12400N's to the same SCSI bus, mirroring a pair of each. The result has
>been a general slowing of disk I/O by (maybe) a factor of 3-4. (Each
>physical drive is 2GB)

	If I count correctly that's seven (7) drives on a single SCSI bus.
If you made your own ribbon connector the attachments must be more than
just a few inches apart to prevent reflections/impedance problems from
the lumped load of a drive. If the drives are external then the cable
length must be short, not ten feet or more, and the cables must be of
high quality (translation: fat and expensive). And the final terminator
ought to be of the active variety.
	Further, ensure two more things. 1. Caching on the drives is
turned off (please) with an Seagate supplied DOS program. 2. The controller
is set to Not map sectors for large drives (NW does not use the PC bios
after server.exe starts).
	I need not add that the drives must be kept cool.

>I made a "rash" attempt to upgrade the 7800 Adaptec drivers and the
>associated NLM's (NBI NWPA NWPALOAD) but the result was a disaster. Had a
>few abends and it took ages to remirror the "mess". Per a Novell technote I
>also decreased the bus speed and disabled tags, but the drives then
>deactivated maybe three times a week. I am now back to what my setup was
>previously!

	Then there maybe much more wrong in your system than just drives.
Take a very careful look at PCI latency, less means get off the bus faster
when asked, and generally less is better. Your lan driver can hog the PCI
bus too.

>I now want to make a more ordered upgrade even starting as low as the SCSI
>BIOS (currently 1.11)

	You may wish to consider the dual SCSI channel version, Adaptec
3940, to gain more SCSI bandwidth. I'm assuming those are SCSI narrow
drives and so is the controller. Mixing wide and narrow is not a good idea.

>I'd like to contact anyone that has a configuration similar to this or if
>you have experienced the kind of problems I am talking about.
>
>Server has 48MB RAM, SYS volume has 16K blocks and compression and suball
>is turned on (woops). VOL1 has 64K blocks and likewise has comp and suball
>on. I have actually disabled overnight compression though. Chache hits are
>always 96% plus.

	Well, may I recommend turning off compression and never turning it
on again. It is trouble waiting to happen. Sub-block allocation is a good
thing; use it. Go for the full 64KB disk allocation units to improve disk
channel efficiency.
	Joe D.

------------------------------

Date: Fri, 7 Nov 1997 13:52:00 +0200
From: "David W. Hanson" <hansond@AFRC.GARMISCH.ARMY.MIL>
Subject: Re: LZFW  shows errors 300/sec with no other packets moving.

>We have a 10baseT ethernet network which has an averaged packet rate
>of 500 to 1500 pkts /sec giving an average utilization of 20%, spread
>over 5 hubswith fibre links. We have a single NW 3.11 server, not
>running packet burst, and a number of unix SGI and SUNs. For the past
>week , at randomn intervals and times all packets cease to flow, LZFW
>show errors at 300pkts/s, the vast majority being fragments, with all
>other packets ceasing for the same period of time, in fact the graphs
>are an inverse of each other. Both rise and fall are steep with a
>flat plateau of between 2 and 10 minutes. We have tried disconnecting
>fibre to our main hub, but because of the randomness off the
>occurances we have been unable to identify which particular branch of
>our network contains the faulty hardware, as we assume it to be
>hardware related. Another feature has been a steady flow of oversized
>packet errors, but we are not certain if this is related other then
>they also feature during the seizure of other packets.

First, capture the oversized packets and use the MAC address to
locate their source.  A 'steady flow' of them should not be
tolerated.

Try pointing Lanalyzer at all segments and constantly capturing all
packets, overwriting the buffer when it fills.  Then, as soon as you
have a failure, stop the captures and pick through the fragments to
see if you can identify a MAC address.  Most likely it is an
intermittant NIC failure.

BTW, what do you do to get things going again, or does it fix itself?

---------

Date: Fri, 7 Nov 1997 12:01:00 -0700
From: Hansang Bae <hbae@PRIMENET.COM>
Subject: Re: LZFW shows errors 300/sec with no other packets moving.

You don't by any chance have any duplexe NICs do you?  I've seen this
happen when a card decides and/or hub port decides to go from full to
simplex operation..  Since there are NO media contention in a full
duplex operation, you can imagine the problems that can show up.

Capture the trafic and see who the source MAC address is.  You may have
to go to each segment to find out.

300pkts/s error most being fragments... hmm this smacks of faulty
collision detection going on.... are there any usuable captures?

------------------------------

Date: Thu, 13 Nov 1997 14:30:15 -0500
From: "Brien K. Meehan" <MEEHANB@DETROITEDISON.COM>
Subject: Re: Compression, problem?

>I've stumbled across some info/opinion that indicates that Novell's file
>compression available in Intranetware shouldn't be used. That's fine, but
>I was curious as to why.

I always advise against using Netware's compression.  I believe it's poorly
designed, because it depends on the file being decompressed on the disk for it
to be presented to the client, (which negates any usefulness it might have, in
my mind) and because it's a CPU-intensive task that they tried to cram in to a
low priority thread.

I've had lots of problems with it.  They were caused by demands for
compression during periods of high utilization.

Compression is a CPU hog that tries to run as a low priority thread.  During
high utilization periods, other processes wind up waiting for compression.
Other functions try to start, and wind up starting a new service process
because the old ones are busy waiting.  It tends to cascade.  On a good day,
the server would run out of service processes and return an error to the
requesting thread.  The server would usually ABEND, though.

Also, I've run into this problem often:  A server administrator notices that
they are running out of disk space, so they start looking around for files to
delete.  They delete a bunch, and wind up with less disk space than they had!
They've just uncompressed their old files.  So, they start working with
current files that can't be decompressed due to lack of disk space ... chaos
ensues.

---------

Date: Thu, 13 Nov 1997 17:18:00 -0500
From: David Weaver <daw@DECT.COM>
Subject: Re: Compression, problem?

>>I've stumbled across some info/opinion that indicates that Novell's file
>>compression available in Intranetware shouldn't be used. That's fine, but
>>I was curious as to why.
>>Can anyone shed some light on this?
>
>Once mangaed a 4.02 server w/ compression turned on.  On any given
>moment, had 600 users connected to it (1000 user license)  Did not have
>much problems w/ compression.  HOWEVER, disks are cheap.  Compression
>just adds yet another layer of complexity.  On top of that, you have to
>keep an eye out during backups and restores, have to ensure that files
>can be decompressed, etc etc.  One other thing, it takes CPU cyclees to
>compress and decompress files.... cycles that can be used for other
>things.

With today's processors, namely the P-II, are so fast that the performance
bottleneck is in the nic or disk sub-system.  With a 386, admins would have
to be cautious of how much of a load certain processes put on the CPU but
still the impact of compression is minimal.

---------

Date: Thu, 13 Nov 1997 18:56:10 -0500
From: "Brien K. Meehan" <MEEHANB@DETROITEDISON.COM>
Subject: Re: Compression

>With today's processors, namely the P-II, are so fast that the performance
>bottleneck is in the nic or disk sub-system.  With a 386, admins would have
>to be cautious of how much of a load certain processes put on the CPU but
>still the impact of compression is minimal.

Well, that sounds lovely.

(Does anyone remember the good old days, when we'd use XOR AX,AX instead of
MOV AX,0 because it was faster?)

I've compared utilization on a server with a compressed volume to an
uncompressed volume, using a Pentium Pro 200.  I think that's sufficiently
"up to date."

In both circumstances, the volume total was 8GB.  The volume supporting
compression was 90% full, 50% of which was compressed files, which would
have taken 6GB without compression.

The volume not supporting compression was also 90% full.

I tried attaching with 2 workstations.

I've had "bad" results by trying to perform compression, full virus scan,
and a tape backup at the same time, using default settings.  So that's
what I used to test them.

With the volume supporting compression, over a sample period of 30 minutes,
the average CPU utilization was 95%, and would often "peg" at 100.  The
number of service processes increased throughout the period (but didn't max
out, unlike my production servers).  The workstations were unable to get a
response from the server about 25% of the time.

With the volume not supporting compression, over the same sample period,
the average CPU utilization was 6%, peaking at about 20%.  The number of
service processes did not increase.  The workstations were always able to
get a response from the server.

I performed this non-scientfic benchmark to show a department why I was
buying them so much new disk space for an upgrade.

So, even in light of the Intel hype, it's still my "opinion" that Netware
Compression Kills Servers.

---------

Date: Thu, 13 Nov 1997 17:23:37 -0800
From: Alan Rowe <Alan@TORAHAURA.COM>
Subject: Re: Compression

In that situation I agree compression kills. Here we live off it. We do a
lot of graphic production and use compression for seldom used files etc.
With setting like Wait 30 days Before Compression I am pretty sure that
any normally used files are not compressed and the server never takes a
big hit decompressing files.  Like anything else used with care it can be
a helpful tool but taken to an extreme it can kill file servers.

---------

Date: Fri, 14 Nov 1997 09:35:31 -0500
From: James E Borchart <James_E_Borchart@NOTES.SEAGATE.COM>
Subject: A differing opinion on compression

My company would not be surviving too well if it werent' for Novell's
compression, and we make drives!

Using Novell's defaults to only compress at night, we have never had a
significant problem with compression.  It works very well for large
volumes with zillions of files.  Our users, like most users, will not
delete any of their own old files.  I have never seen an abend, and
have never seen other processes wait due to compression.

We have one server with a 27GB raid array and 8GB mirrored (that's
available partition sizes after mirroring or raiding.).  This server
has 1400 people on it during the day and 500 at night in a 24 hour
factory.  It only stores applications and every imaginable version
of every imaginable application is stored on it, 80% of the files
rarely  get used and are compressed.  Its never had a compression
problem.

3.12 servers, on the other hand, don't have enough disk space ever.
They constantly get filled up with old files and we don't know what
to delete.  After moving to 4.x we don't have disk fill up problems
anymore.

------------------------------