'Mmap3' Exerciser

The mmap3 exerciser is one of the programs we use to test the system. It maps a region of its address space backed by the NMS system. It then scans through this region from beginning to end to initialize each page with address-dependent data. It then makes a second read-only scan to verify the data written. After this is done, it generates random addresses uniformly distributed over the region, reads and verifies the data, then writes back the same data.

The first graph below shows the page faults taken during an approximately four-minute run of mmap3. An Alpha with memory limited by the OS to 256MB was used as the client. Two Compaq PCs were used as servers. The amount of memory provided by each server for the NMS system was limited to about 700MB due to an unresolved bug that causes the server to crash if more than 700MB is used. The mmap3 client mapped a 1200MB region of NMS-backed address space in this run. Each 8K Alpha page is split into 4K halves for transmission between client and server, in order to comply with MTU limitations of the network.

In the first segment of the run, which lasts about 10 seconds, the system handles page faults by allocating zero-filled pages of memory. At the end of this segment, the available memory on the system has been exhausted, and paging begins. There is a gap in which there is no activity for several seconds while the system attempts to identify available NMS servers. Once the two available servers have been contacted, the initialization phase continues until a total of about 100 seconds have elapsed. During this phase, page faults are handled by allocating zero-filled pages, however the shortage of memory means that compensating pageout activity to servers must occur, which results in a somewhat slower rate of execution.

A little more than 100 seconds into the run, the initialization is complete and verification begins. In the first segment of verification, the system handles page faults by paging in from servers, and at the same time pages are freed by paging out to servers. After somewhat over 200MB have been verified, all the dirty pages have been cleaned, and only paging in is required. Consequently, the execution rate increases somewhat. At just over 200 seconds into the run, verification is complete, and the random exercising begins. The effect of the last portion of the address space that remains cached at the end of the verification phase can be seen for the first 20 seconds or so of this phase.

The next graph shows the paging rate during the run. Initially, zero-filled pages are allocated at a rate of about 3700 pages per second. When pageout starts, it occurs at a rate of about 1400 pages per second. Allocation of zero-filled pages is limited during this interval to the rate at which pages are cleaned, so this also occurs at 1400 pages per second. During the initial part of the verification phase, pagein and pageout each occur at a rate of about 1000 pages per second. Once all dirty pages are cleaned, pagein occurs at about 1800 pages per second. Finally, once the random exercising begins, after some transient effects, paging appears to settle down to about 1300 pages per second in, and 1300 pages per second out.

The third graph shows page fault service latencies throughout the run. Both minimum and average latencies are shown for each second of the run. Allocation of new zero-filled pages occurs at a minimum latency of about 30 microseconds, with an average of about 50 microseconds. Paging in from a server incurs a minimum latency of about 230 microseconds and an average latency of about 300 microseconds. The difference between the mininum and average latency is caused by an unresolved bug that regularly causes the first of the two 4K half pages sent from server to client to be lost. A workaround has been implemented, in which the client recognizes the situation when it receives the second of the two halves before receving the first, and it immediately requests a retransmission of both halves. With high probability the retransmission succeeds, but the extra time taken causes the average pagein latency to be somewhat higher than the minimum. The spikes in the average pagein latency are caused when a requested page occasionally fails to arrive completely from a server in an 800 microsecond window during which the client spins in the kernel waiting for the page. After 800 microseconds, the spin loop terminates and the waiting process is put to sleep. Once a process is put to sleep it can be tens of milliseconds before it runs again. I do not know for sure why this would occur on a system without other compute-bound activities, but it is possibly a bug in the Linux scheduler. The spin loop technique is used to avoid this unknown scheduling latency nearly all the time.


Last modified: Sat Jun 7 06:37:15 EDT 2003