True Sharing and False Sharing

True Sharing & False Sharing

Computer systems with multiple multicore processors are becoming more popular with the evolution of technology. These systems have independent L1 and L2 cache but shared memory. The interconnect can connect all the processors directly to the main memory. While working with shared memory systems, we see both true and false sharing. In this article, we will discuss both these topics in detail.

True Sharing

True sharing happens when multiple cores access the same variable on the same cache line.

Example

Assuming we have a cache line of 2 words. Core0 and core1 are modifying the same address in the memory, let's say 0xA00. Since the cache line is of 2 words, both core0 and core1 cores will load the address from 0xA00 – 0xA08 in their caches. When core0 modifies the address, it invalidates its cache line along with the core1 cache line and vice versa. What if core0 has stored something on the variable and needs to load it back, but in the meantime, core1 has also modified the same variable? Such a phenomenon is called true sharing. Let's discuss the problems that occur in true sharing.

Issues in true sharing

Cache Coherence & Cache Line Invalidation

The caches in different cores hold copies of the same cache line containing the shared memory location. When one core modifies the memory location, it invalidates the corresponding cache line. This invalidation must be coordinated with other caches to maintain cache coherence and ensure consistent views of shared data. Consequently, the other cores also invalidate their copy of cache lines, even if they did not directly access the shared memory location. This cache line invalidation can introduce performance overhead.

Data race condition

Concurrent access to a shared memory location or lack of synchronization can cause data race conditions. There are two types of data race conditions.

· Read-Write Race A condition occurs when one thread/ core wants to write at a shared memory location. In contrast, another core performs a read operation simultaneously, leading to non-deterministic behavior or corrupt data.

· Write-Write Race A data race condition can occur when multiple threads want to perform a write operation at a shared memory location concurrently, making the final value uncertain, depending upon which thread completes execution first.

Performance Impact

True sharing can lead to performance degradation due to the overhead of cache line invalidation and coordination between cores. Increased cache coherence traffic, serialization, and contention for memory resources can reduce performance and scalability in multicore systems.

Solutions of true sharing

Synchronization and Consistency

To ensure correctness and prevent data races, appropriate synchronization mechanisms, such as locks, atomic operations, or memory barriers, must be employed. These mechanisms ensure that different cores access the shared memory location in a mutually exclusive or properly coordinated manner.

Atomic Operations

Atomic variables ensure that other threads cannot interrupt updates to a shared variable. It performs access in a thread-safe and atomic manner, mitigating data races and maintaining consistency.

False Sharing

False sharing happens when different cores/threads access different variables on the same cache line. The mentioned condition is called false sharing because each thread is not actually sharing access to the same variable.

Example

Let's assume we have an array of N elements such that N is less than the number of words a cache line can store. Or we have an array that entirely fits in the same cache line. When executed in parallel, the different threads access different but adjacent variables. Although different variables are accessed, the cache line must be invalidated, and the cache coherence protocol to be followed. That's how false sharing happens.

Issues in false sharing

Cache Line Invalidation and Updates

When a thread modifies a variable in its cache line, the entire cache line is typically invalidated and updated, even if the other variables in the cache line are unchanged. This condition introduces additional cache coherence traffic and overhead.

Performance Degradation

False sharing can result in significantly decreased performance When accessing different variables in the same cache line in a tight loop. Even though threads modify different variables, cache coherence protocols may require synchronization and coordination among caches causing cache misses, serialization, and contention for cache resources leading to threads waiting for access to cache lines, limiting parallelism, and reducing overall performance.

Cache Thrashing

False sharing can cause cache thrashing. When multiple threads modify different variables in the same cache coherence domain, cache lines are frequently invalidated and updated. This constant churn of cache lines can result in increased cache misses, reduced cache utilization, and increased memory access latency, negatively impacting performance.

Scalability Issues

False sharing also limits the scalability of concurrent programs. With an increasing number of threads, false sharing increases cache coherence traffic and contention, resulting in diminishing returns.

Solution of False Sharing

Padding

By padding, we add extra space between the variables. The area should be enough to ensure variables reside on separate cache lines. Proper padding can eliminate true sharing.

Thread Affinity

Assign each thread a specific memory region. Having separate and independent memory regions for threads would reduce the likelihood of multiple threads modifying the same variable, eliminating true sharing.

Compiler Optimizations

Modern compilers offer optimization to reduce true sharing. For example, they automatically add padding to avoid data on the same cache line. Enabling compiler optimization can also be a solution to this.

Search This Blog

Engineers

True Sharing and False Sharing

Comments

Post a Comment