3c02ee8f | 25-Dec-2022 |
wakafa <[email protected]> |
Separate Utility submodule from XiangShan (#1861)
* misc: add utility submodule
* misc: adjust to new utility framework
* bump utility: revert resetgen
* bump huancun |
89515a3b | 14-Dec-2022 |
ZhangZifei <[email protected]> |
Merge remote-tracking branch 'origin/master' into rf-after-issue
more changes: load-rs in master branch does not replay load instr. But in rf-after-issue branch, it still does. rf-after-issue does n
Merge remote-tracking branch 'origin/master' into rf-after-issue
more changes: load-rs in master branch does not replay load instr. But in rf-after-issue branch, it still does. rf-after-issue does not use params to contrl whether replay or not, so re-add the "param control" again.
show more ...
|
a760aeb0 | 02-Dec-2022 |
happy-lx <[email protected]> |
Replay all load instructions from LQ (#1838)
This intermediate architecture replays all load instructions from LQ.
An independent load replay queue will be added later.
Performance loss caused b
Replay all load instructions from LQ (#1838)
This intermediate architecture replays all load instructions from LQ.
An independent load replay queue will be added later.
Performance loss caused by changing of load replay sequences will be
analyzed in the future.
* memblock: load queue based replay
* replay load from load queue rather than RS
* use counters to delay replay logic
* memblock: refactor priority
* lsq-replay has higher priority than try pointchasing
* RS: remove load store rs's feedback port
* ld-replay: a new path for fast replay
* when fast replay needed, wire it to loadqueue and it will be selected
this cycle and replay to load pipline s0 in next cycle
* memblock: refactor load S0
* move all the select logic from lsq to load S0
* split a tlbReplayDelayCycleCtrl out of loadqueue to speed up
generating emu
* loadqueue: parameterize replay
show more ...
|
eb163ef0 | 17-Nov-2022 |
Haojin Tang <[email protected]> |
top-down: introduce top-down counters and scripts (#1803)
* top-down: add initial top-down features
* rob600: enlarge queue/buffer size
* :art: After git pull
* :sparkles: Add BranchResteer
top-down: introduce top-down counters and scripts (#1803)
* top-down: add initial top-down features
* rob600: enlarge queue/buffer size
* :art: After git pull
* :sparkles: Add BranchResteers->CtrlBlock
* :sparkles: Cg BranchResteers after pending
* :sparkles: Add robflush_bubble & ldReplay_bubble
* :ambulance: Fix loadReplay->loadReplay.valid
* :art: Dlt printf
* :sparkles: Add stage2_redirect_cycles->CtrlBlock
* :saprkles: CtrlBlock:Add s2Redirect_when_pending
* :sparkles: ID:Add ifu2id_allNO_cycle
* :sparkles: Add ifu2ibuffer_validCnt
* :sparkles: Add ibuffer_IDWidth_hvButNotFull
* :sparkles: Fix ifu2ibuffer_validCnt
* :ambulance: Fix ibuffer_IDWidth_hvButNotFull
* :sparkles: Fix ifu2ibuffer_validCnt->stop
* feat(buggy): parameterize load/store pipeline, etc.
* fix: use LoadPipelineWidth rather than LoadQueueSize
* fix: parameterize `rdataPtrExtNext`
* fix(SBuffer): fix idx update logic
* fix(Sbuffer): use `&&` to generate flushMask instead of `||`
* fix(atomic): parameterize atomic logic in `MemBlock`
* fix(StoreQueue): update allow enque requirement
* chore: update comments, requirements and assertions
* chore: refactor some Mux to meet original logic
* feat: reduce `LsMaxRsDeq` to 2 and delete it
* feat: support one load/store pipeline
* feat: parameterize `EnsbufferWidth`
* chore: resharp codes for better generated name
* top-down: add initial top-down features
* rob600: enlarge queue/buffer size
* top-down: add l1, l2, l3 and ddr loads bound perf counters
* top-down: dig into l1d loads bound
* top-down: move memory related counters to `Scheduler`
* top-down: add 2 Ldus and 2 Stus
* top-down: v1.0
* huancun: bump HuanCun to a version with top-down
* chore: restore parameters and update `build.sc`
* top-down: use ExcitingUtils instead of BoringUtils
* top-down: add switch of top-down counters
* top-down: add top-down scripts
* difftest: enlarge stuck limit cycles again
Co-authored-by: gaozeyu <[email protected]>
show more ...
|
fe2fd136 | 26-Oct-2022 |
ZhangZifei <[email protected]> |
issue: remove delayedSrc for fpReg at RSStd
SlowPort of fpWakeup cross ExuBlock is RegNext-ed, but fpBusyTable not. This will cause error when rm delayedSrc. So, the RegNext is also removed. |
c15d13ad | 23-Oct-2022 |
ZhangZifei <[email protected]> |
issue: delete fma midState relative codes |
448ed776 | 20-Oct-2022 |
ZhangZifei <[email protected]> |
issue: add other types rs child-class
Include: FMA/FMisc/Load/Mul/Sta/Std Add RSMisc for mid-state type, such as MemAddr: Load/Sta some trait for [not]dropOnDirect and so on. |
d16f4ea4 | 15-Oct-2022 |
ZhangZifei <[email protected]> |
issue: add alu and jump[csr] rs
More modification: 1. parameter RSMod to generate different submodules add case class RSMod for a list of rs's submodule's generator methods 2. remove [submodule]RSIO
issue: add alu and jump[csr] rs
More modification: 1. parameter RSMod to generate different submodules add case class RSMod for a list of rs's submodule's generator methods 2. remove [submodule]RSIO remove ALU[Jump..]RSIO, add RSExtraIO to contain all the extra io of different child class. Ugly codes. Assign DontCare to the extra io. 3. Same with 2. The submodule's io should contain all the io.
For jump: move pcMem part code into JumpRS from BaseRS
For jump and alu: add immExtractorGen for jump/alu and other child class
show more ...
|
54034ccd | 13-Oct-2022 |
ZhangZifei <[email protected]> |
issue: add submodule for each type rs, not acutually implimented
There are several kinds of reservation station type. Name them with coresponding exu name: 1. ALU 2. Jump[/CSR/i2f/fence] 3. Mul[Div]
issue: add submodule for each type rs, not acutually implimented
There are several kinds of reservation station type. Name them with coresponding exu name: 1. ALU 2. Jump[/CSR/i2f/fence] 3. Mul[Div] 4. Load 5. Sta 6. Std 7. FMA[c] 8. FMisc
They have only a few differences with each other. The main body of rs is the same. To make rs more easy to read and understand, we keep the 'common body' in the BaseRS, move the difference into the submodules.
show more ...
|
b0b91ecd | 01-Sep-2022 |
Yinan Xu <[email protected]> |
rs: optimize load balance algorithm |
43d10b70 | 01-Sep-2022 |
Yinan Xu <[email protected]> |
rs: move bypass network to deq stage for fp RS |
ad879770 | 01-Sep-2022 |
Yinan Xu <[email protected]> |
ld,rs: optimize load-load forward timing (#1762)
Move imm addition to stage 0. |
3102ffdd | 31-Aug-2022 |
Yinan Xu <[email protected]> |
rs: don't update midResult when flushed (#1758)
This commit fixes a bug when FMA partially issues but is flushed
just after it is issues. In this case, new instruction will enter
the RS and writes
rs: don't update midResult when flushed (#1758)
This commit fixes a bug when FMA partially issues but is flushed
just after it is issues. In this case, new instruction will enter
the RS and writes the data array. However, previously midResult
from FMA is written into the data array two cycles after issue.
This may cause the wrong data to be written into the data array.
This is a rare case because usually instructions enter RS in-order,
unless dispatch2 is blocked.
show more ...
|
c3b763d0 | 22-Aug-2022 |
Yinan Xu <[email protected]> |
rs,mem: optimize load-load forwarding timing (#1742)
This commit optimizes the timing of load-load forwarding by making
it speculatively issue requests to TLB/dcache.
When load_s0 does not have
rs,mem: optimize load-load forwarding timing (#1742)
This commit optimizes the timing of load-load forwarding by making
it speculatively issue requests to TLB/dcache.
When load_s0 does not have a valid instruction and load_s3 writes
a valid instruction back, we speculatively bypass the writeback
data to load_s0 and assume there will be a pointer chasing instruction
following it. A pointer chasing instruction has a base address that
comes from a previous instruction with a small offset. To avoid timing
issues, now only when the offset does not change the cache set index,
we reduce its latency by speculatively issuing it.
show more ...
|
9b3d9e59 | 17-Aug-2022 |
Yinan Xu <[email protected]> |
rs: fix not_select_entries performance counter |
7d12b265 | 16-Aug-2022 |
Yinan Xu <[email protected]> |
rs: re-pipeline stage0 and stage1
Move selection to stage1. Should benefit the timing for function units. |
01feb937 | 15-Aug-2022 |
Yinan Xu <[email protected]> |
rs: optimize deqResp timing
Separate deqResp for selectPtr/allocatePtr/oldestPtr. |
6a9c441d | 15-Aug-2022 |
Yinan Xu <[email protected]> |
rs: optimize data select timing
Separate selection into dispatch/issueSelect/oldestSelect. |
36e3f470 | 10-Aug-2022 |
Yinan Xu <[email protected]> |
rs: duplicate dispatch registers to reduce fanout |
c9ddacac | 09-Aug-2022 |
Yinan Xu <[email protected]> |
rs: optimize timing for interfaces (#1722)
* rs,status: simplify deqRespSucc condition
This commit optimizes the logic of deqResp in StatusArray of RS.
We use ParallelMux instead of Mux1H to ens
rs: optimize timing for interfaces (#1722)
* rs,status: simplify deqRespSucc condition
This commit optimizes the logic of deqResp in StatusArray of RS.
We use ParallelMux instead of Mux1H to ensure that deqRespSucc is
asserted only when deq.valid. This reduces one logic level of AND.
* rs,select: optimize update logic of age matrix
* fdivSqrt: add separated registers for data selection
Optimize the fanout of sel valid bits.
* fu: reduce fanout of emptyVec in InputBuffer
show more ...
|
5c2fef75 | 09-Aug-2022 |
Yinan Xu <[email protected]> |
exu: add more copies of redirect registers (#1716) |
9af29e01 | 08-Aug-2022 |
Yinan Xu <[email protected]> |
rs: add registers for fma mid-results (#1712) |
dff7ca56 | 28-Jul-2022 |
Yinan Xu <[email protected]> |
rs,select: optimize oldest compare timing (#1691)
No need to OHToUInt. |
b56f947e | 18-Jul-2022 |
Yinan Xu <[email protected]> |
ftq,ctrl: add copies for pc and jalr_target data modules (#1661)
* ftq, ctrl: remove pc/target backend read ports, and remove redirectGen in ftq
* ctrl: add data modules for pc and jalr_target
ftq,ctrl: add copies for pc and jalr_target data modules (#1661)
* ftq, ctrl: remove pc/target backend read ports, and remove redirectGen in ftq
* ctrl: add data modules for pc and jalr_target
This commit adds two data modules for pc and jalr_target respectively.
They are the same as data modules in frontend. Should benefit timing.
* jump: reduce pc and jalr_target read latency
* ftq: add predecode redirect update target interface, valid only on ifuRedirect
* ftq, ctrl: add second write port logic of jalrTargetMem, and delay write of pc/target mem for two cycles
Co-authored-by: Lingrui98 <[email protected]>
show more ...
|
9e4583a2 | 15-Jul-2022 |
Yinan Xu <[email protected]> |
rs: optimize allocation ready gen and perf counter timing (#1647)
* scheduler: fix performance counter timing
* rs: optimize allocation ready gen timing |