RAMspeed is a free open source command line utility to measure cache and memory performance of computer systems, changes as well. It has evolved successfully from v1.00 released in November of 2002 as a result of my personal amusement containing about 100 lines of C code to produce one simple benchmark, to the latest versions written in assembly language mostly. There are 3 hardware platforms supported (i386, amd64, alpha) and several most popular UNIX-like operating systems. A quite popular DOS (and Windows) version exists as well. Nowadays, the software offers 18 cache and memory benchmarks for i386 and amd64 machines, though 6 only for alpha ones.
So far, RAMspeed has been tested to compile and run with assembly level optimisations on:
Ã‚Â· Linux (i386, amd64, alpha)
Ã‚Â· FreeBSD (i386, amd64, alpha)
Ã‚Â· NetBSD (i386, amd64, alpha)
Ã‚Â· Digital UNIX (alpha)
No need to explain here in depth all the benchmarking algorithms implemented in RAMspeed, better look at the documentation supplied and the source code. In general, there are *mark benchmarks such as INTmark, FLOATmark, MMXmark and SSEmark. They operate with linear (sequential) data streams passed through ALU, FPU, MMX and SSE units respectively. They allocate certain memory space and start either writing to or reading from it using continuous blocks sized in power of 2 from 1Kb up to the array boundary. This simple algorithm allows to show how fast are both cache and memory subsystems. There are also *mem benchmarks such as INTmem, FLOATmem, MMXmem and SSEmem. These are supposed to illustrate how fast is actual read\write memory performance. Each of them includes four subtests called Copy, Scale, Add and Triad. They’re synthetic simulations, but correlate with many real world applications. You may have seen them already within STREAM and SiSoft Sandra. All *mem benchmarks support the BatchRun mode to enable high-precision memory performance measurement through multiple passes with averages calculated per pass and per run.
There are also non-temporal versions of MMX and SSE benchmarks. They have been coded with special instructions to minimise cache pollution on memory reads and to eliminate it completely on memory writes. In addition, they operate with a built in aggressive data prefetching algorithm, though actual behaviour is hardware dependent very much. In a matter of fact, use of non-temporal code allows for significant performance improvements over regular MMX and SSE benchmarks. In some cases, non-temporal MMXmark and SSEmark can deliver almost 100% of theoretical bandwidth while reading.
There is also RAMspeed/SMP for multiprocessor machines running UNIX-like operating systems. To be absolutely correct, there are two distinct branches: 2.x.x features support for POSIX threads, and 3.x.x utilises System V shared memory for IPC (Inter-Process Communication) and operates with multiple processes. RAMspeed/SMP v2.x.x is developed no longer due to numerous compatibility and performance issues.
Download: RAMspeed 2.6.0