f2e8d419 | 30-May-2023 |
sfencevma <[email protected]> |
LQ: fix select oldest inst & remove bank conf. block to avoid deadlock (#2100)
* LoadQueueReplay: fix worst case, all oldest instructions are allocated to the same bank,
and the number of instruct
LQ: fix select oldest inst & remove bank conf. block to avoid deadlock (#2100)
* LoadQueueReplay: fix worst case, all oldest instructions are allocated to the same bank,
and the number of instructions is greater than the number of stages in load unit.
* Remove bank conflict block
* Increase priority for data replay
The deadlock scenario is as follows:
The LoadQueueReplay entry will not be released immediately after the instruction
is replayed from LoadQueueReplay. For example, after instruction a is replayed from
LoadQueueReplay, entry 1 is still valid. If instruction a still needs to be replayed,
Entry 1 will be updated again, otherwise entry 1 can be released.
If only the time of the first enqueue is used to select replay instructions (age matrix),
when there are too many instructions (in LoadQueueReplay) to be replay, some
instructions may not be selected.
Using the pointer ldWbPtr of the oldest instruction, when the saved lqIdx of the
instruction is equal to ldWbPtr and can be replayed, LoadQueueReplay will give
priority to the instruction instead of using the selection result of the age matrix.
To select older instructions, LoadQueueReplay will calculate pointers such as
ldWbPtr, ldWbPtr+1, ldWbPtr+2, ldWbPtr+3..., and if the lqIdx of the instruction
is in these results, it will be selected first.
When the pointer is compared, there will be an n-bit long mask, and LoadQueueReplay
will be from 0 to n-1. When i th bit is valid, select i th instruction.
The stride of the pointer comparison is larger than the number of pipeline stages
of the load unit, and the selected instruction still needs to be replayed after the
first replay (for example, the data is not ready). Worse, in the bit of the mask
generated by pointer comparison, the instructions (lqIdx is ldWbPtr+1, ldWbPtr+2, ...)
after the oldest instruction (lqIdx is equal to ldWbPtr) are in the lower bit and the
oldest instruction is in the higher bit. It cannot select the oldest instruction.
show more ...
|
cea46230 | 23-May-2023 |
sfencevma <[email protected]> |
lsu, uncache buffer: fix uncache buffer writeback loadOut is incorrectly held (#2087)
* fix uncache buffer writeback fsm
* fix uncache buffer writeback fsm
* fix uncache buffer writeback contr
lsu, uncache buffer: fix uncache buffer writeback loadOut is incorrectly held (#2087)
* fix uncache buffer writeback fsm
* fix uncache buffer writeback fsm
* fix uncache buffer writeback control
---------
Co-authored-by: Lyn <[email protected]>
show more ...
|
62dfd6c3 | 19-Mar-2023 |
happy-lx <[email protected]> |
Fix replay logic in unified load queue (#1966)
* difftest: monitor cache miss latency
* lq, ldu, dcache: remove lq's data
* lq's data is no longer used
* replay cache miss load from lq (use c
Fix replay logic in unified load queue (#1966)
* difftest: monitor cache miss latency
* lq, ldu, dcache: remove lq's data
* lq's data is no longer used
* replay cache miss load from lq (use counter to delay)
* if dcache's mshr gets refill data, wake up lq's missed load
* uncache load will writeback to ldu using ldout_0
* ldout_1 is no longer used
* lq, ldu: add forward port
* forward D and mshr in load S1, get result in S2
* remove useless code logic in loadQueueData
* misc: revert monitor
* lq: change replay cycle
* lq: change replay cycle
* change cycle to 11 36 10 10
* Revert "lq: change replay cycle"
This reverts commit 3ca74b63eaeef7792016cd270b77f8a14f588981.
And change replay cycles
* lq: change replay cycle according to dramsim
* change Reselectlen to 7
* change replay cycle to (11, 18, 127, 17) to fit refill delay (14, 36,
188)
* lq: change replay cycle
* change block_cycles_cache to (7, 0, 32, 51)
* lq: change replay cycle
* change block_cycles_cache to (7, 0, 126, 95)
* lq: fix replay ptr update logic
* fix priority of updating ptr
* revert block_cycles_cache
* lq: change tlb replay cycle
* change tlbReplayDelayCycleCtrl to (15, 0, 126, 0)
show more ...
|
683c1411 | 28-Dec-2022 |
happy-lx <[email protected]> |
lq: Remove LQ data (#1862)
This PR remove data in lq.
All cache miss load instructions will be replayed by lq, and the forward path to the D channel
and mshr is added to the pipeline.
Special t
lq: Remove LQ data (#1862)
This PR remove data in lq.
All cache miss load instructions will be replayed by lq, and the forward path to the D channel
and mshr is added to the pipeline.
Special treatment is made for uncache load. The data is no longer stored in the datamodule
but stored in a separate register. ldout is only used as uncache writeback, and only ldout0
will be used. Adjust the priority so that the replayed instruction has the highest priority in S0.
Future work:
1. fix `milc` perf loss
2. remove data from MSHRs
* difftest: monitor cache miss latency
* lq, ldu, dcache: remove lq's data
* lq's data is no longer used
* replay cache miss load from lq (use counter to delay)
* if dcache's mshr gets refill data, wake up lq's missed load
* uncache load will writeback to ldu using ldout_0
* ldout_1 is no longer used
* lq, ldu: add forward port
* forward D and mshr in load S1, get result in S2
* remove useless code logic in loadQueueData
* misc: revert monitor
show more ...
|