Abstract: OS Support for Improving Data Locality on CC-NUMA Compute Servers
The dominant architecture for the next generation of shared-
memory multiprocessors is CC-NUMA (cache-coherent non-
uniform memory architecture). These machines are attractive as
compute servers because they provide transparent access to local
and remote memory. However, the access latency to remote
memory is 3 to 5 times the latency to local memory. CC-NOW
machines provide the benefits of cache coherence to networks of
workstations, at the cost of even higher remote access latency.
Given the large remote access latencies of these architectures, data
locality is potentially the most important performance issue. Using
realistic workloads, we study the performance improvements
provided by OS supported dynamic page migration and
replication. Analyzing our kernel-based implementation, we
provide a detailed breakdown of the costs. We show that sampling
of cache misses can be used to reduce cost without compromising
performance, and that TLB misses may not be a consistent
approximation for cache misses. Finally, our experiments show
that dynamic page migration and replication can substantially
increase application performance, as much as 30%, and reduce
contention for resources in the NUMA memory system.
Hive home page
FLASH home page
Last modified 3/4/95 by
Ben Verghese,
webmaster@www-flash.stanford.edu.