====== gem5-gpu's Memory Simulation ====== gem5-gpu, for the most part, eschews GPGPU-Sim's separate functional simulation and instead uses gem5's execute-in-execute model. Therefore, memory is updated/read when a store/load is executed. There is no separate functional path. (As a side note, this isn't strictly true. Due to some peculiarities in Ruby, the memory is functionally simulated instead of held in the cache hierarchy. However, for the way you're looking at the simulator this shouldn't matter at all.) ===== Lifetime of a memory access ===== Here is a trace of a memory operation through gem5: - GPGPU-Sim executes a ld/st (see gpgpu-sim/gpgpu-sim/shader.cc: ldst_unit::memory_cycle_gem5) - The warp-wide instruction is converted into lane operations and sent to the LSQ unit (see gem5-gpu/src/gpu/gpgpu-sim/cuda_core.cc: CudaCore::executeMemOp) - The LSQ gets the lane requests, coalesces them and then sends the request to the memory subsystem---Ruby in this case. (see gem5-gpu/src/gpu/shader_lsq.cc: ShaderLSQ::injectCacheAccesses) - Ruby receives the request and simulates the cache hierarchy and memory (both timing and functional). The starting point for this is in gem5/src/mem/ruby/system/RubyPort.cc and Sequencer.cc. The actual code that simulates the caches is automatically generated from the SLICC files. In the case of VI_hammer those are gem5-gpu/src/mem/protocol/VI_hammer*. - Ruby returns the result after some amount of time to the LSQ, which in turn (on a load) returns the data to the CudaCore. - Finally, the CudaCore in gem5-gpu forwards the data back the actual core model in gpgpu-sim which (on a load) writes the data into a register.