Performance of 'Visible Human' Application

The "Visible Human" application is a raycasting renderer for one of the 3D datasets in the National Library of Medicine's Visible Human program. The dataset was originally created by slicing the body of a male cadaver into 1mm-wide slices, and then digitizing an image of each slice at 0.33mm resolution and 24-bit color. The result is a 2048x1216x1878 full-color three-dimensional representation of the human body. Our viewer loads a user-specified subset of the 1878 slices and then computes a projection of the data onto a viewing window for display to the user. The view computed is determined by the position and orientation of an "eye", which can be changed by the user. The projection is computed by casting a ray from the eye through the model to each pixel of the viewing window. The color information accumulated as each ray traverses voxels of the model becomes the color of the corresponding pixel of the viewing window. The tracing of each ray is pruned outside the volume actually occupied by model data.

We configured the "Visible Human" application so that it would run, in a batch mode, a fixed series of raycasting rendering operations. This workload was then run under a variety of conditions, both with the network memory system and without it. The computation was performed on a 500MHz Alpha 21164 processor equipped with 1GB RAM, running a Linux 2.2.17 OS kernel. The boot flags given to Linux were adjusted to software-limit the amount of system RAM to various values less than 1GB. Roughly 60MB of RAM was consumed by the OS, leaving the rest available for user applications. The dataset was loaded into a 3-D volume buffer consisting of 2048x1216x130 = 323,747,840 "voxels". Stored for each voxel was three bytes of data: one for red, one for green, and one for blue, giving a total of 971,243,520 bytes of data. The volume buffer was actually organized in memory as 130 "slices", where each slice consisted of three 2048x1216 byte arrays. The size of the dataset was chosen so that it would fit within the available RAM when run with the full 1GB of system RAM. A viewing window of size 768x456 was used for these runs.

In the runs where network memory was not used, measurement consisted of recording the wall-clock times taken to perform each of the rendering operations in the fixed workload. In the runs where network memory was used, detailed execution traces were extracted, from which a variety of performance measures can be calculated. These detailed execution traces consist of a series of "events" that occurred during the computation, with each event timestamped to microsecond-level precision. There are a very small number of gaps in the traces where the system could not record events as fast as they were generated. Since these gaps are very few in number, and do not even account for one percent of the total events, we have simply ignored them. Of the 20-odd raycasting operations in the fixed workload, a representative operation was then selected for examination in detail. This representative was chosen far enough into the calculation that page-cleaning operations had ceased by that time in all of the runs where paging actually occurred. The tables and graphs below concern the performance during this representative section of the calculation.

The first table shows the performance of the "Visible Human" application when network memory was not used. When run with 1GB of system RAM, the computation is able to complete in 27.2 seconds without paging. However, when the computation is run with system RAM reduced slightly to 896MB, very significant paging is required, and the computation takes 1388 seconds to complete when the system disk is used as the paging device. Though we did not measure the number of page faults directly for this run, we did measure it when the same calculation was performed using network memory and the same amount of system RAM, and we can infer that the same number of page faults should be essentially the same as in that case. If we assume that total execution time is the sum of CPU time plus total page fault service time, we can conclude that the total page fault service time for this run is 1388s-27s = 1361s, and that the average time to service a page fault is thus 1361s/117756 = 11.6ms.

Visible Human without Network Memory

System RAM (MB) 1024896
Page Faults0117,756 (inferred)
Execution Time (sec)27.2 1388
Effective Rate (%)100 2.0

The next table shows the performance of the Visible Human application when run using network memory. If we again assume that execution time equals CPU time plus page fault service time, for the 128MB run we calculate an average page fault service time of 333 microseconds. The actual measured value was about 295 microseconds, which agrees fairly closely, but indicates that there is approximately 10 percent additional overhead that is not accounted for by the page fault service latency measurements. One likely place this overhead was spent is the activity of the system daemon that is responsible for maintaining the pool of free memory pages.

Visible Human with Network Memory

System RAM (MB) 1024896768 512384256 192128
Page Faults117,756130,202 126,804134,973152,817175,618 257,815
Execution Time (sec)64.7 70.069.071.077.685.0113
Effective Rate (%)10042.0 38.939.438.335.132.024.1

The graphs below give a more detailed look into the behavior of the system during the runs that used network memory. The graphs labeled "Page Fault History" plot the page number (i.e. the virtual address in units of the 8KB page size) for each page fault versus time. The graphs labeled "Page Fault Rate" show the number of page faults that were handled during each second of the calculation. The graphs labeled "Page Fault Service Latency" show the minimum and average times to service a page fault during each second. This data was obtained by recording the current time when control first enters the network memory subsystem from the Linux page fault handler, again recording the current time when control leaves the network memory subsystem after servicing the page fault, and using the difference between the two as a measurement of the page fault service time. The graphs labeled "Waiting Time" show the actual amount of time spent waiting for data to be delivered by a memory server, expressed as a percentage of real time over each second. That is, a waiting time of 30% for a given second means that 300ms during that second of real time were spent waiting for data from a server. Since no useful calculation is performed by the application during such periods, from the waiting time information one can conclude an upper bound on the CPU utilization by the application.








Last modified: Fri Jun 20 04:57:26 EDT 2003