Linux Benchmarking HOWTO

	Sign In Sign-Up

| HowTo Linux | Linux Zone Home | E-Mail Me |

Linux Benchmarking HOWTO

by Andr D. Balsa, andrewbalsa@usa.net <mailto:andrew-

balsa@usa.net>

v0.12, 15 August 1997

The Linux Benchmarking HOWTO discusses some issues associated with the

benchmarking of Linux systems and presents a basic benchmarking

toolkit, as well as an associated form, which enable one to produce

significant benchmarking information in a couple of hours. Perhaps it

will also help diminish the amount of useless articles in

comp.os.linux.hardware...

______________________________________________________________________

Table of Contents

1. Introduction

1.1 Why is benchmarking so important ?

1.2 Invalid benchmarking considerations

2. Benchmarking procedures and interpretation of results

2.1 Understanding benchmarking choices

2.1.1 Synthetic vs. applications benchmarks

2.1.2 High-level vs. low-level benchmarks

2.2 Standard benchmarks available for Linux

2.3 Links and references

3. The Linux Benchmarking Toolkit (LBT)

3.1 Rationale

3.2 Benchmark selection

3.3 Test duration

3.4 Comments

3.4.1 Kernel 2.0.0 compilation:

3.4.2 Whetstone:

3.4.3 Xbench-0.2:

3.4.4 UnixBench version 4.01:

3.4.5 BYTE Magazine's BYTEmark benchmarks:

3.5 Possible improvements

3.6 LBT Report Form

3.7 Network performance tests

3.8 SMP tests

4. Example run and results

5. Pitfalls and caveats of benchmarking

5.1 Comparing apples and oranges

5.2 Incomplete information

5.3 Proprietary hardware/software

5.4 Relevance

6. FAQ

7. Copyright, acknowledgments and miscellaneous

7.1 How this document was produced

7.2 Copyright

7.3 New versions of this document

7.4 Feedback

7.5 Acknowledgments

7.6 Disclaimer

7.7 Trademarks

______________________________________________________________________

1. Introduction

"What we cannot speak about we must pass over in silence."

Ludwig Wittgenstein (1889-1951), Austrian philosopher

Benchmarking means measuring the speed with which a computer system

will execute a computing task, in a way that will allow comparison

between different hard/software combinations. It does not involve

user-friendliness, aesthetic or ergonomic considerations or any other

subjective judgment.

Benchmarking is a tedious, repetitive task, and takes attention to

details. Very often the results are not what one would expect, and

subject to interpretation (which actually may be the most important

part of a benchmarking procedure).

Finally, benchmarking deals with facts and figures, not opinion or

approximation.

1.1. Why is benchmarking so important ?

Apart from the reasons pointed out in the BogoMips Mini-HOWTO (section

7, paragraph 2), one occasionally is confronted with a limited budget

and/or minimum performance requirements while putting together a Linux

box. In other words, when confronted with the following questions:

o How do I maximize performance within a given budget ?

o How do I minimize costs for a required minimum performance level ?

o How do I obtain the best performance/cost ratio (within a given

budget or given performance requirements)?

one will have to examine, compare and/or produce benchmarks.

Minimizing costs with no performance requirements usually involves

putting together a machine with leftover parts (that old 386SX-16 box

lying around in the garage will do fine) and does not require

benchmarks, and maximizing performance with no cost ceiling is not a

realistic situation (unless one is willing to put a Cray box in

his/her living room - the leather-covered power supplies around it

look nice, don't they ?).

Benchmarking per se is senseless, a waste of time and money; it is

only meaningful as part of a decision process, i.e. if one has to make

a choice between two or more alternatives.

Usually another parameter in the decision process is cost, but it

could be availability, service, reliability, strategic considerations

or any other rational, measurable characteristic of a computer system.

When comparing the performance of different Linux kernel versions, for

example, stability is almost always more important than speed.

1.2. Invalid benchmarking considerations

Very often read in newsgroups and mailing lists, unfortunately:

1. Reputation of manufacturer (unmeasurable and meaningless).

2. Market share of manufacturer (meaningless and irrelevant).

3. Irrational parameters (for example, superstition or prejudice:

would you buy a processor labeled 131313ZAP and painted pink ?)

4. Perceived value (meaningless, unmeasurable and irrational).

5. Amount of marketing hype: this one is the worst, I guess. I

personally am fed up with the "XXX inside" or "kkkkkws compatible"

logos (now the "aaaaaPowered" has joined the band - what next ?).

IMHO, the billions of dollars spent on such campaigns would be

better used by research teams on the design of new, faster,

(cheaper :-) bug-free processors. No amount of marketing hype will

remove a floating-point bug in the FPU of the brand-new processor

you just plugged in your motherboard, but an exchange against a

redesigned processor will.

6. "You get what you pay for" opinions are just that: opinions. Give

me the facts, please.

2. Benchmarking procedures and interpretation of results

A few semi-obvious recommendations:

1. First and foremost, identify your benchmarking goals. What is it

you are exactly trying to benchmark ? In what way will the

benchmarking process help later in your decision making ? How much

time and resources are you willing to put into your benchmarking

effort ?

2. Use standard tools. Use a current, stable kernel version, standard,

current gcc and libc and a standard benchmark. In short, use the

LBT (see below).

3. Give a complete description of your setup (see the LBT report form

below).

4. Try to isolate a single variable. Comparative benchmarking is more

informative than "absolute" benchmarking. I cannot stress this

enough.

5. Verify your results. Run your benchmarks a few times and verify the

variations in your results, if any. Unexplained variations will

invalidate your results.

6. If you think your benchmarking effort produced meaningful

information, share it with the Linux community in a precise and

concise way.

7. Please forget about BogoMips. I promise myself I shall someday

implement a very fast ASIC with the BogoMips loop wired in. Then we

shall see what we shall see !

2.1. Understanding benchmarking choices

2.1.1. Synthetic vs. applications benchmarks

Before spending any amount of time on benchmarking chores, a basic

choice must be made between "synthetic" benchmarks and "applications"

benchmarks.

Synthetic benchmarks are specifically designed to measure the

performance of individual components of a computer system, usually by

exercising the chosen component to its maximum capacity. An example of

a well-known synthetic benchmark is the Whetstone suite, originally

programmed in 1972 by Harold Curnow in FORTRAN (or was that ALGOL ?)

and still in widespread use nowadays. The Whestone suite will measure

the floating-point performance of a CPU.

The main critic that can be made to synthetic benchmarks is that they

do not represent a computer system's performance in real-life

situations. Take for example the Whetstone suite: the main loop is

very short and will easily fit in the primary cache of a CPU, keeping

the FPU pipeline constantly filled and so exercising the FPU to its

maximum speed. We cannot really criticize the Whetstone suite if we

remember it was programmed 25 years ago (its design dates even earlier

than that !), but we must make sure we interpret its results with

care, when it comes to benchmarking modern microprocessors.

Another very important point to note about synthetic benchmarks is

that, ideally, they should tell us something about a specific aspect

of the system being tested, independently of all other aspects: a

synthetic benchmark for Ethernet card I/O throughput should result in

the same or similar figures whether it is run on a 386SX-16 with 4

MBytes of RAM or a Pentium 200 MMX with 64 MBytes of RAM. Otherwise,

the test will be measuring the overall performance of the

CPU/Motherboard/Bus/Ethernet card/Memory subsystem/DMA combination:

not very useful since the variation in CPU will cause a greater impact

than the change in Ethernet network card (this of course assumes we

are using the same kernel/driver combination, which could cause an

even greater variation)!

Finally, a very common mistake is to average various synthetic

benchmarks and claim that such an average is a good representation of

real-life performance for any given system.

Here is a comment on FPU benchmarks quoted with permission from the

Cyrix Corp. Web site:

"A Floating Point Unit (FPU) accelerates software designed

to use floating point mathematics : typically CAD programs,

spreadsheets, 3D games and design applications. However,

today's most popular PC applications make use of both float-

ing point and integer instructions. As a result, Cyrix chose

to emphasize "parallelism" in the design of the 6x86 proces-

sor to speed up software that intermixes these two instruc-

tion types.

The x86 floating point exception model allows integer

instructions to issue and complete while a floating point

instruction is executing. In contrast, a second floating

point instruction cannot begin execution while a previous

floating point instruction is executing. To remove the per-

formance limitation created by the floating point exception

model, the 6x86 can speculatively issue up to four floating

point instructions to the on-chip FPU while continuing to

issue and execute integer instructions. As an example, in a

code sequence of two floating point instructions (FLTs) fol-

lowed by six integer instructions (INTs) followed by two

FLTs, the 6x86 processor can issue all ten instructions to

the appropriate execution units prior to completion of the

first FLT. If none of the instructions fault (the typical

case), execution continues with both the integer and float-

ing point units completing instructions in parallel. If one

of the FLTs faults (the atypical case), the speculative exe-

cution capability of the 6x86 allows the processor state to

be restored in such a way that it is compatible with the x86

floating point exception model.

Examination of benchmark tests reveals that synthetic float-

ing point benchmarks use a pure floating point-only code

stream not found in real-world applications. This type of

benchmark does not take advantage of the speculative execu-

tion capability of the 6x86 processor. Cyrix believes that

non-synthetic benchmarks based on real-world applications

better reflect the actual performance users will achieve.

Real-world applications contain intermixed integer and

floating point instructions and therefore benefit from the

6x86 speculative execution capability."

So, the recent trend in benchmarking is to choose common applications

and use them to test the performance of complete computer systems. For

example, SPEC, the non-profit corporation that designed the well-known

SPECINT and SPECFP synthetic benchmark suites, has launched a project

for a new applications benchmark suite. But then again, it is very

unlikely that such commercial benchmarks will ever include any Linux

code.

Summarizing, synthetic benchmarks are valid as long as you understand

their purposes and limitations. Applications benchmarks will better

reflect a computer system's performance, but none are available for

Linux.

2.1.2. High-level vs. low-level benchmarks

Low-level benchmarks will directly measure the performance of the

hardware: CPU clock, DRAM and cache SRAM cycle times, hard disk

average access time, latency, track-to-track stepping time, etc...

This can be useful in case you bought a system and are wondering what

components it was built with, but a better way to check these figures

would be to open the case, list whatever part numbers you can find and

somehow obtain the data sheet for each part (usually on the Web).

Another use for low-level benchmarks is to check that a kernel driver

was correctly configured for a specific piece of hardware: if you have

the data sheet for the component, you can compare the results of the

low-level benchmarks to the theoretical, printed specs.

High-level benchmarks are more concerned with the performance of the

hardware/driver/OS combination for a specific aspect of a

microcomputer system, for example file I/O performance, or even for a

specific hardware/driver/OS/application performance, e.g. an Apache

benchmark on different microcomputer systems.

Of course, all low-level benchmarks are synthetic. High-level

benchmarks may be synthetic or applications benchmarks.

2.2. Standard benchmarks available for Linux

IMHO a simple test that anyone can do while upgrading any component in

his/her Linux box is to launch a kernel compile before and after the

hard/software upgrade and compare compilation times. If all other

conditions are kept equal then the test is valid as a measure of

compilation performance and one can be confident to say that:

"Changing A to B led to an improvement of x % in the compile

time of the Linux kernel under such and such conditions".

No more, no less !

Since kernel compilation is a very usual task under Linux, and since

it exercises most functions that get exercised by normal benchmarks

(except floating-point performance), it constitutes a rather good

individual test. In most cases, however, results from such a test

cannot be reproduced by other Linux users because of variations in

hard/software configurations and so this kind of test cannot be used

as a "yardstick" to compare dissimilar systems (unless we all agree on

a standard kernel to compile - see below).

Unfortunately, there are no Linux-specific benchmarking tools, except

perhaps the Byte Linux Benchmarks which are a slightly modified

version of the Byte Unix Benchmarks dating back from May 1991 (Linux

mods by Jon Tombs, original authors Ben Smith, Rick Grehan and Tom

Yager).

There is a central Web site for the Byte Linux Benchmarks.

An improved, updated version of the Byte Unix Benchmarks was put

together by David C. Niemi. It is called UnixBench 4.01 to avoid

confusion with earlier versions. Here is what David wrote about his

mods:

"The original and slightly modified BYTE Unix benchmarks are

broken in quite a number of ways which make them an unusu-

ally unreliable indicator of system performance. I inten-

tionally made my "index" values look a lot different to

avoid confusion with the old benchmarks."

David has setup a majordomo mailing list for discussion of

benchmarking on Linux and competing OSs. Join with "subscribe bench"

sent in the body of a message to majordomo@wauug.erols.com

<mailto:majordomo@wauug.erols.com>. The Washington Area Unix User

Group is also in the process of setting up a Web site for Linux

benchmarks.

Also recently, Uwe F. Mayer, mayer@math.vanderbilt.edu

<mailto:mayer@math.vanderbilt.edu>ported the BYTE Bytemark suite to

Linux. This is a modern suite carefully put together by Rick Grehan at

BYTE Magazine to test the CPU, FPU and memory system performance of

modern microcomputer systems (these are strictly processor-performance

oriented benchmarks, no I/O or system performance is taken into

account).

Uwe has also put together a Web site with a database of test results

for his version of the Linux BYTEmark benchmarks.

While searching for synthetic benchmarks for Linux, you will notice

that sunsite.unc.edu carries few benchmarking tools. To test the

relative speed of X servers and graphics cards, the xbench-0.2 suite

by Claus Gittinger is available from sunsite.unc.edu, ftp.x.org and

other sites. Xfree86.org refuses (wisely) to carry or recommend any

benchmarks.

The XFree86-benchmarks Survey is a Web site with a database of x-bench

results.

For pure disk I/O throughput, the hdparm program (included with most

distributions, otherwise available from sunsite.unc.edu) will measure

transfer rates if called with the -t and -T switches.

There are many other tools freely available on the Internet to test

various performance aspects of your Linux box.

2.3. Links and references

The comp.benchmarks.faq by Dave Sill is the standard reference for

benchmarking. It is not Linux specific, but recommended reading for

anybody serious about benchmarking. It is available from a number of

FTP and web sites and lists 56 different benchmarks, with links to FTP

or Web sites that carry them. Some of the benchmarks listed are

commercial (SPEC for example), though.

I will not go through each one of the benchmarks mentionned in the

comp.benchmarks.faq, but there is at least one low-level suite which I

would like to comment on: the lmbench suite, by Larry McVoy. Quoting

David C. Niemi:

"Linus and David Miller use this a lot because it does some

useful low-level measurements and can also measure network

throughput and latency if you have 2 boxes to test with. But

it does not attempt to come up with anything like an overall

"figure of merit"..."

A rather complete FTP site for freely available benchmarks was put

together by Alfred Aburto. The Whetstone suite used in the LBT can be

found at this site.

There is a multipart FAQ by Eugene Miya that gets posted regularly to

comp.benchmarks; it is an excellent reference.

3. The Linux Benchmarking Toolkit (LBT)

I will propose a basic benchmarking toolkit for Linux. This is a

preliminary version of a comprehensive Linux Benchmarking Toolkit, to

be expanded and improved. Take it for what it's worth, i.e. as a

proposal. If you don't think it is a valid test suite, feel free to

email me your critics and I will be glad to make the changes and

improve it if I can. Before getting into an argument, however, read

this HOWTO and the mentionned references: informed criticism is

welcomed, empty criticism is not.

3.1. Rationale

This is just common sense:

1. It should not take a whole day to run. When it comes to comparative

benchmarking (various runs), nobody wants to spend days trying to

figure out the fastest setup for a given system. Ideally, the

entire benchmark set should take about 15 minutes to complete on an

average machine.

2. All source code for the software used must be freely available on

the Net, for obvious reasons.

3. Benchmarks should provide simple figures reflecting the measured

performance.

4. There should be a mix of synthetic benchmarks and application

benchmarks (with separate results, of course).

5. Each synthetic benchmarks should exercise a particular subsystem to

its maximum capacity.

6. Results of synthetic benchmarks should not be averaged into a

single figure of merit (that defeats the whole idea behind

synthetic benchmarks, with considerable loss of information).

7. Applications benchmarks should consist of commonly executed tasks

on Linux systems.

3.2. Benchmark selection

I have selected five different benchmark suites, trying as much as

possible to avoid overlap in the tests:

1. Kernel 2.0.0 (default configuration) compilation using gcc.

2. Whetstone version 10/03/97 (latest version by Roy Longbottom).

3. xbench-0.2 (with fast execution parameters).

4. UnixBench benchmarks version 4.01 (partial results).

5. BYTE Magazine's BYTEmark benchmarks beta release 2 (partial

results).

For tests 4 and 5, "(partial results)" means that not all results

produced by these benchmarks are considered.

3.3. Test duration

1. Kernel 2.0.0 compilation: 5 - 30 minutes, depending on the real

performance of your system.

2. Whetstone: 100 seconds.

3. Xbench-0.2: < 1 hour.

4. UnixBench benchmarks version 4.01: approx. 15 minutes.

5. BYTE Magazine's BYTEmark benchmarks: approx. 10 minutes.

3.4. Comments

3.4.1. Kernel 2.0.0 compilation:

o What: it is the only application benchmark in the LBT.

o The code is widely available (i.e. I finally found some use for my

old Linux CD-ROMs).

o Most linuxers recompile the kernel quite often, so it is a

significant measure of overall performance.

o The kernel is large and gcc uses a large chunk of memory:

attenuates L2 cache size bias with small tests.

o It does frequent I/O to disk.

o Test procedure: get a pristine 2.0.0 source, compile with default

options (make config, press Enter repeatedly). The reported time

should be the time spent on compilation i.e. after you type make

zImage, not including make dep, make clean. Note that the default

target architecture for the kernel is the i386, so if compiled on

another architecture, gcc too should be set to cross-compile, with

i386 as the target architecture.

o Results: compilation time in minutes and seconds (please don't

report fractions of seconds).

3.4.2. Whetstone:

o What: measures pure floating point performance with a short, tight

loop. The source (in C) is quite readable and it is very easy to

see which floating-point operations are involved.

o Shortest test in the LBT :-).

o It's an "Old Classic" test: comparable figures are available, its

flaws and shortcomings are well known.

o Test procedure: the newest C source should be obtained from

Aburto's site. Compile and run in double precision mode. Specify

gcc and -O2 as precompiler and precompiler options, and define

POSIX 1 to specify machine type.

o Results: a floating-point performance figure in MWIPS.

3.4.3. Xbench-0.2:

o What: measures X server performance.

o The xStones measure provided by xbench is a weighted average of

several tests indexed to an old Sun station with a single-bit-depth

display. Hmmm... it is questionable as a test of modern X servers,

but it's still the best tool I have found.

o Test procedure: compile with -O2. We specify a few options for a

shorter run: ./xbench -timegoal 3 >

results/name_of_your_linux_box.out. To get the xStones rating, we

must run an awk script; the simplest way is to type make

summary.ms. Check the summary.ms file: the xStone rating for your

system is in the last column of the line with your machine name

specified during the test.

o Results: an X performance figure in xStones.

o Note: this test, as it stands, is outdated. It should be re-coded.

3.4.4. UnixBench version 4.01:

o What: measures overall Unix performance. This test will exercice

the file I/O and kernel multitasking performance.

o I have discarded all arithmetic test results, keeping only the

system-related test results.

o Test procedure: make with -O2. Execute with ./Run -1 (run each test

once). You will find the results in the ./results/report file.

Calculate the geometric mean of the EXECL THROUGHPUT, FILECOPY 1,

2, 3, PIPE THROUGHPUT, PIPE-BASED CONTEXT SWITCHING, PROCESS

CREATION, SHELL SCRIPTS and SYSTEM CALL OVERHEAD indexes.

o Results: a system index.

3.4.5. BYTE Magazine's BYTEmark benchmarks:

o What: provides a good measure of CPU performance. Here is an

excerpt from the documentation: "These benchmarks are meant to

expose the theoretical upper limit of the CPU, FPU, and memory

architecture of a system. They cannot measure video, disk, or

network throughput (those are the domains of a different set of

benchmarks). You should, therefore, use the results of these tests

as part, not all, of any evaluation of a system."

o I have discarded the FPU test results since the Whetstone test is

just as representative of FPU performance.

o I have split the integer tests in two groups: those more

representative of memory-cache-CPU performance and the CPU integer

tests.

o Test procedure: make with -O2. Run the test with ./nbench >

myresults.dat or similar. Then, from myresults.dat, calculate

geometric mean of STRING SORT, ASSIGNMENT and BITFIELD test

indexes; this is the memory index; calculate the geometric mean of

NUMERIC SORT, IDEA, HUFFMAN and FP EMULATION test indexes; this is

the integer index.

o Results: a memory index and an integer index calculated as

explained above.

3.5. Possible improvements

The ideal benchmark suite would run in a few minutes, with synthetic

benchmarks testing every subsystem separately and applications

benchmarks providing results for different applications. It would also

automatically generate a complete report and eventually email the

report to a central database on the Web.

We are not really interested in portability here, but it should at

least run on all recent (> 2.0.0) versions and flavours (i386, Alpha,

Sparc...) of Linux.

If anybody has any idea about benchmarking network performance in a

simple, easy and reliable way, with a short (less than 30 minutes to

setup and run) test, please contact me.

3.6. LBT Report Form

Besides the tests, the benchmarking procedure would not be complete

without a form describing the setup, so here it is (following the

guidelines from comp.benchmarks.faq):

______________________________________________________________________

LINUX BENCHMARKING TOOLKIT REPORT FORM

______________________________________________________________________

______________________________________________________________________

CPU

==

Vendor:

Model:

Core clock:

Motherboard vendor:

Mbd. model:

Mbd. chipset:

Bus type:

Bus clock:

Cache total:

Cache type/speed:

SMP (number of processors):

______________________________________________________________________

______________________________________________________________________

RAM

====

Total:

Type:

Speed:

______________________________________________________________________

______________________________________________________________________

Disk

====

Vendor:

Model:

Size:

Interface:

Driver/Settings:

______________________________________________________________________

______________________________________________________________________

Video board

===========

Vendor:

Model:

Bus:

Video RAM type:

Video RAM total:

X server vendor:

X server version:

X server chipset choice:

Resolution/vert. refresh rate:

Color depth:

______________________________________________________________________

______________________________________________________________________

Kernel

=====

Version:

Swap size:

______________________________________________________________________

______________________________________________________________________

gcc

===

Version:

Options:

libc version:

______________________________________________________________________

______________________________________________________________________

Test notes

==========

______________________________________________________________________

______________________________________________________________________

RESULTS

========

Linux kernel 2.0.0 Compilation Time: (minutes and seconds)

Whetstones: results are in MWIPS.

Xbench: results are in xstones.

Unixbench Benchmarks 4.01 system INDEX:

BYTEmark integer INDEX:

BYTEmark memory INDEX:

______________________________________________________________________

______________________________________________________________________

Comments*

=========

* This field is included for possible interpretations of the results, and as

such, it is optional. It could be the most significant part of your report,

though, specially if you are doing comparative benchmarking.

______________________________________________________________________

3.7. Network performance tests

Testing network performance is a challenging task since it involves at

least two machines, a server and a client machine, hence twice the

time to setup and many more variables to control, etc... On an

ethernet network, I guess your best bet would be the ttcp package. (to

be expanded)

3.8. SMP tests

SMP tests are another challenge, and any benchmark specifically

designed for SMP testing will have a hard time proving itself valid in

real-life settings, since algorithms that can take advantage of SMP

are hard to come by. It seems later versions of the Linux kernel (>

2.1.30 or around that) will do "fine-grained" multiprocessing, but I

have no more information than that for the moment.

According to David Niemi, " ... shell8 [part of the Unixbench 4.01

benchmaks]does a good job at comparing similar hardware/OS in SMP and

UP modes."

4. Example run and results

The LBT was run on my home machine, a Pentium-class Linux box that I

put together myself and that I used to write this HOWTO. Here is the

LBT Report Form for this system:

LINUX BENCHMARKING TOOLKIT REPORT FORM

CPU

==

Vendor: Cyrix/IBM

Model: 6x86L P166+

Core clock: 133 MHz

Motherboard vendor: Elite Computer Systems (ECS)

Mbd. model: P5VX-Be

Mbd. chipset: Intel VX

Bus type: PCI

Bus clock: 33 MHz

Cache total: 256 KB

Cache type/speed: Pipeline burst 6 ns

SMP (number of processors): 1

RAM

====

Total: 32 MB

Type: EDO SIMMs

Speed: 60 ns

Disk

====

Vendor: IBM

Model: IBM-DAQA-33240

Size: 3.2 GB

Interface: EIDE

Driver/Settings: Bus Master DMA mode 2

Video board

===========

Vendor: Generic S3

Model: Trio64-V2

Bus: PCI

Video RAM type: EDO DRAM

Video RAM total: 2 MB

X server vendor: XFree86

X server version: 3.3

X server chipset choice: S3 accelerated

Resolution/vert. refresh rate: 1152x864 @ 70 Hz

Color depth: 16 bits

Kernel

=====

Version: 2.0.29

Swap size: 64 MB

gcc

===

Version: 2.7.2.1

Options: -O2

libc version: 5.4.23

Test notes

==========

Very light load. The above tests were run with some of the special

Cyrix/IBM 6x86 features enabled with the setx86 program: fast ADS,

fast IORT, Enable DTE, fast LOOP, fast Lin. VidMem.

RESULTS

========

Linux kernel 2.0.0 Compilation Time: 7m12s

Whetstones: 38.169 MWIPS.

Xbench: 97243 xStones.

BYTE Unix Benchmarks 4.01 system INDEX: 58.43

BYTEmark integer INDEX: 1.50

BYTEmark memory INDEX: 2.50

Comments

=========

This is a very stable system with homogeneous performance, ideal

for home use and/or Linux development. I will report results

with a 6x86MX processor as soon as I can get my hands on one!

5. Pitfalls and caveats of benchmarking

After putting together this HOWTO I began to understand why the words

"pitfalls" and "caveats" are so often associated with benchmarking...

5.1. Comparing apples and oranges

Or should I say Apples and PCs ? This is so obvious and such an old

dispute that I won't go into any details. I doubt the time it takes to

load Word on a Mac compared to an average Pentium is a real measure of

anything. Likewise booting Linux and Windows NT, etc... Try as much as

possible to compare identical machines with a single modification.

5.2. Incomplete information

A single example will illustrate this very common mistake. One often

reads in comp.os.linux.hardware the following or similar statement: "I

just plugged in processor XYZ running at nnn MHz and now compiling the

linux kernel only takes i minutes" (adjust XYZ, nnn and i as

required). This is irritating, because no other information is given,

i.e. we don't even know the amount of RAM, size of swap, other tasks

running simultaneously, kernel version, modules selected, hard disk

type, gcc version, etc... I recommend you use the LBT Report Form,

which at least provides a standard information framework.

5.3. Proprietary hardware/software

A well-known processor manufacturer once published results of

benchmarks produced by a special, customized version of gcc. Ethical

considerations apart, those results were meaningless, since 100% of

the Linux community would go on using the standard version of gcc. The

same goes for proprietary hardware. Benchmarking is much more useful

when it deals with off-the-shelf hardware and free (in the GNU/GPL

sense) software.

5.4. Relevance

We are talking Linux, right ? So we should forget about benchmarks

produced on other operating systems (this is a special case of the

"Comparing apples and oranges" pitfall above). Also, if one is going

to benchmark Web server performance, do not quote FPU performance and

other irrelevant information. In such cases, less is more. Also, you

do not need to mention the age of your cat, your mood while

benchmarking, etc..

6. FAQ

Q1.

Is there any single figure of merit for Linux systems ?

A: No, thankfully nobody has yet come up with a Lhinuxstone (tm)

measurement. And if there was one, it would not make much sense:

Linux systems are used for many different tasks, from heavily

loaded Web servers to graphics workstations for individual use.

No single figure of merit can describe the performance of a

Linux system under such different situations.

Q2.

Then, how about a dozen figures summarizing the performance of

diverse Linux systems ?

A: That would be the ideal situation. I would like to see that come

true. Anybody volunteers for a Linux Benchmarking Project ? With

a Web site and an on-line, complete, well-designed reports

database ?

Q3.

... BogoMips ... ?

A: BogoMips has nothing to do with the performance of your system.

Check the BogoMips Mini-HOWTO.

Q4.

What is the "best" benchmark for Linux ?

A: It all depends on which performance aspect of a Linux system one

wants to measure. There are different benchmarks to measure the

network (Ethernet sustained transfer rates), file server (NFS),

disk I/O, FPU, integer, graphics, 3D, processor-memory

bandwidth, CAD performance, transaction time, SQL performance,

Web server performance, real-time performance, CD-ROM

performance, Quake performance (!), etc ... AFAIK no bechmark

suite exists for Linux that supports all these tests.

Q5.

What is the fastest processor under Linux ?

A: Fastest at what task ? If one is heavily number-crunching

oriented, a very high clock rate Alpha (600 MHz and going)

should be faster than anything else, since Alphas have been

designed for that kind of performance. If, on the other hand,

one wants to put together a very fast news server, it is

probable that the choice of a fast hard disk subsystem and lots

of RAM will result in higher performance improvements than a

change of processor, for the same amount of $.

Q6.

Let me rephrase the last question, then: is there a processor

that is fastest for general purpose applications ?

A: This is a tricky question but it takes a very simple answer: NO.

One can always design a faster system even for general purpose

applications, independent of the processor. Usually, all other

things being equal, higher clock rates will result in higher

performance systems (and more headaches too). Taking out an old

100 MHz Pentium from an (usually not) upgradable motherboard,

and plugging in the 200 MHz version, one should feel the extra

"hummph". Of course, with only 16 MBytes of RAM, the same

investment would have been more wisely spent on extra SIMMs...

Q7.

So clock rates influence the performance of a system ?

A: For most tasks except for NOP empty loops (BTW these get removed

by modern optimizing compilers), an increase in clock rate will

not give you a linear increase in performance. Very small

processor intensive programs that will fit entirely in the

primary cache inside the processor (the L1 cache, usually 8 or

16 K) will have a performance increase equivalent to the clock

rate increase, but most "true" programs are much larger than

that, have loops that do not fit in the L1 cache, share the L2

(external) cache with other processes, depend on external

components and will give much smaller performance increases.

This is because the L1 cache runs at the same clock rate as the

processor, whereas most L2 caches and all other subsystems

(DRAM, for example) will run asynchronously at lower clock

rates.

Q8.

OK, then, one last question on that matter: which is the

processor with the best price/performance ratio for general

purpose Linux use ?

A: Defining "general purpose Linux use" in not an easy thing ! For

any particular application, there is always a processor with THE

BEST price/performance ratio at any given time, but it changes

rather frequently as manufacturers release new processors, so

answering Processor XYZ running at n MHz would be a snapshot

answer. However, the price of the processor is insignificant

when compared to the price of the whole system one will be

putting together. So, really, the question should be how can one

maximize the price/performance ratio for a given system ? And

the answer to that question depends heavily on the minimum

performance requirements and/or maximum cost established for the

configuration being considered. Sometimes, off-the-shelf

hardware will not meet minimum performance requirements and

expensive RISC systems will be the only alternative. For home

use, I recommend a balanced, homogeneous system for overall

performance (now go figure what I mean by balanced and

homogeneous :-); the choice of a processor is an important

decision , but no more than choosing hard disk type and

capacity, amount of RAM, video card, etc...

Q9.

What is a "significant" increase in performance ?

A: I would say that anything under 1% is not significant (could be

described as "marginal"). We, humans, will hardly perceive the

difference between two systems with a 5 % difference in response

time. Of course some hard-core benchmarkers are not humans and

will tell you that, when comparing systems with 65.9 and 66.5

performance indexes, the later is "definitely faster".

Q10.

How do I obtain "significant" increases in performance at the

lowest cost ?

A: Since most source code is available for Linux, careful

examination and algorithmic redesign of key subroutines could

yield order-of-magnitude increases in performance in some cases.

If one is dealing with a commercial project and does not wish to

delve deeply in C source code a Linux consultant should be

called in. See the Consultants-HOWTO.

7. Copyright, acknowledgments and miscellaneous

7.1. How this document was produced

The first step was reading section 4 "Writing and submitting a HOWTO"

of the HOWTO Index by Tim Bynum.

I knew absolutely nothing about SGML or LaTeX, but was tempted to use

an automated documentation generation package after reading the

various comments about SGML-Tools. However, inserting tags manually in

a document reminds me of the days I hand-assembled a 512 byte monitor

program for a now defunct 8-bit microprocessor, so I got hold of the

LyX sources, compiled it, and used its LinuxDoc mode. Highly

recommended combination: LyX and SGML-Tools.

7.2. Copyright

The Linux Benchmarking HOWTO is copyright (C) 1997 by Andr D. Balsa.

Linux HOWTO documents may be reproduced and distributed in whole or in

part, in any medium physical or electronic, as long as this copyright

notice is retained on all copies. Commercial redistribution is allowed

and encouraged; however, the author would like to be notified of any

such distributions.

All translations, derivative works, or aggregate works incorporating

any Linux HOWTO documents must be covered under this copyright notice.

That is, you may not produce a derivative work from a HOWTO and impose

additional restrictions on its distribution. Exceptions to these rules

may be granted under certain conditions; please contact the Linux

HOWTO coordinator at the address given below.

In short, we wish to promote dissemination of this information through

as many channels as possible. However, we do wish to retain copyright

on the HOWTO documents, and would like to be notified of any plans to

redistribute the HOWTOs.

If you have questions, please contact Tim Bynum, the Linux HOWTO

coordinator, at linux-howto@sunsite.unc.edu via email.

7.3. New versions of this document

New versions of the Linux Benchmarking-HOWTO will be placed on

sunsite.unc.edu and mirror sites. There are other formats, such as a

Postscript and dvi version in the other-formats directory. The Linux

Benchmarking-HOWTO is also available for WWW clients such as Grail, a

Web browser written in Python. It will also be posted regularly to

comp.os.linux.answers.

7.4. Feedback

Suggestions, corrections, additions wanted. Contributors wanted and

acknowledged. Flames not wanted.

I can always be reached at andrewbalsa@usa.net.

7.5. Acknowledgments

David Niemi, the author of the Unixbench suite, has proved to be an

endless source of information and (valid) criticism.

I also want to thank Greg Hankins one of the main contributors to the

SGML-tools package, Linus Torvalds and the entire Linux community.

This HOWTO is my way of giving back.

7.6. Disclaimer

Your mileage may, and will, vary. Be aware that benchmarking is a

touchy subject and a great time-and-energy consuming activity.

7.7. Trademarks

Pentium and Windows NT are trademarks of Intel and Microsoft

Corporations respectively.

BYTE and BYTEmark are trademarks of McGraw-Hill, Inc.

Cyrix and 6x86 are trademarks of Cyrix Corporation.

Linux is not a trademark, hopefully never will be.

| HowTo Linux | Linux Zone Home | E-Mail Me |

Copyright 1999

Linux Zone