True Sharing and False Sharing
True Sharing & False Sharing
Computer
systems with multiple multicore processors are becoming more popular with the
evolution of technology. These systems have independent L1 and L2 cache but
shared memory. The interconnect can connect all the processors directly to the
main memory. While working with shared memory systems, we see both true and
false sharing. In this article, we will discuss both these topics in detail.
True Sharing
True
sharing happens when multiple cores access the same variable on the same cache
line.
Example
Assuming
we have a cache line of 2 words. Core0 and core1 are modifying the same address
in the memory, let's say 0xA00. Since the cache line is of 2 words, both core0
and core1 cores will load the address from 0xA00 – 0xA08 in their caches. When
core0 modifies the address, it invalidates its cache line along with the core1
cache line and vice versa. What if core0 has stored something on the variable
and needs to load it back, but in the meantime, core1 has also modified the
same variable? Such a phenomenon is called true sharing. Let's discuss the
problems that occur in true
Issues in true sharing
Cache Coherence & Cache Line
Invalidation
The caches
in different cores hold copies of the same cache line containing the shared
memory location. When one core modifies the memory location, it invalidates the
corresponding cache line. This invalidation must be coordinated with other
caches to maintain cache coherence and ensure consistent views of shared data.
Consequently, the other cores also invalidate their copy of cache lines, even
if they did not directly access the shared memory location. This cache line
invalidation can introduce performance overhead.
Data race condition
Concurrent
access to a shared memory location or lack of synchronization can cause data
race conditions. There are two types of data race
·
Read-Write Race A condition occurs
when one thread/ core wants to write at a shared memory location. In contrast,
another core performs a read operation simultaneously, leading to non-deterministic
behavior or corrupt data.
·
Write-Write Race A
data race condition can occur when multiple threads want to perform a write
operation at a shared memory location concurrently, making the final value
uncertain, depending upon which thread completes execution first.
Performance Impact
True
sharing can lead to performance degradation due to the overhead of cache line
invalidation and coordination between cores. Increased cache coherence traffic,
serialization, and contention for memory resources can reduce performance and
scalability in multicore systems.
Solutions of true sharing
Synchronization and Consistency
To ensure
correctness and prevent data races, appropriate synchronization mechanisms,
such as locks, atomic operations, or memory barriers, must be employed. These
mechanisms ensure that different cores access the shared memory location in a
mutually exclusive or properly coordinated manner.
Atomic Operations
Atomic
variables ensure that other threads cannot interrupt updates to a shared
variable. It performs access in a thread-safe and atomic manner, mitigating
data races and maintaining consistency.
False Sharing
False
sharing happens when different cores/threads access different variables on the
same cache line. The mentioned condition is called false sharing because each
thread is not actually sharing access to the same variable.
Example
Let's
assume we have an array of N elements such that N is less than the number of
words a cache line can store. Or we have an array that entirely fits in the
same cache line. When executed in parallel, the different threads access
different but adjacent variables. Although different variables are accessed,
the cache line must be invalidated, and the cache coherence protocol to be
followed. That's how false sharing happens.
Issues in false sharing
Cache Line Invalidation and Updates
When a
thread modifies a variable in its cache line, the entire cache line is
typically invalidated and updated, even if the other variables in the cache
line are unchanged. This condition introduces additional cache coherence
traffic and overhead.
Performance Degradation
False
sharing can result in significantly decreased performance When accessing
different variables in the same cache line in a tight loop. Even though threads
modify different variables, cache coherence protocols may require
synchronization and coordination among caches causing cache misses, serialization,
and contention for cache resources leading to threads waiting for access to
cache lines, limiting parallelism, and reducing overall performance.
Cache Thrashing
False
sharing can cause cache thrashing. When multiple threads modify different variables
in the same cache coherence domain, cache lines are frequently invalidated and
updated. This constant churn of cache lines can result in increased cache
misses, reduced cache utilization, and increased memory access latency,
negatively impacting performance.
Scalability Issues
False
sharing also limits the scalability of concurrent programs. With an increasing
number of threads, false sharing increases cache coherence traffic and
contention, resulting in diminishing returns.
Solution of False Sharing
Padding
By
padding, we add extra space between the variables. The area should be enough to
ensure variables reside on separate cache lines. Proper padding can eliminate
true sharing.
Thread Affinity
Assign
each thread a specific memory region. Having separate and independent memory
regions for threads would reduce the likelihood of multiple threads modifying
the same variable, eliminating true sharing.
Compiler Optimizations
Modern
compilers offer optimization to reduce true sharing. For example, they
automatically add padding to avoid data on the same cache line. Enabling
compiler optimization can also be a solution to this.
Comments
Post a Comment