c0ef164e | 14-Jul-2022 |
Yinan Xu <[email protected]> |
rs: fix enqBypass when numEnq > 2 (#1653)
Balance between the first numDeq ports. Possible IPC increase? |
74515c5a | 12-Jul-2022 |
Yinan Xu <[email protected]> |
jump: delay pc and jalr_target for one cycle (#1640) |
bcce877b | 12-Jul-2022 |
Yinan Xu <[email protected]> |
rs: optimize timing for dispatch and wakeup (#1621)
This commit optimizes the timing of reservation stations.
* dispatched uops are latched and bypassed to s1_out
* wakeup from slowPorts are l
rs: optimize timing for dispatch and wakeup (#1621)
This commit optimizes the timing of reservation stations.
* dispatched uops are latched and bypassed to s1_out
* wakeup from slowPorts are latched and bypassed to s1_data
* rs: optimize allocation selection
Change select policy for allocation. Should avoid issuing the just
dispatched instructions in some cases.
* rs: disable load balance for load units
show more ...
|
fa9d712c | 27-Jun-2022 |
Yinan Xu <[email protected]> |
dp2: add a pipeline for load/store (#1597)
* dp2: add a pipeline for load/store
Load/store Dispatch2 has a bad timing because it requires the fuType
to disguish the out ports. This brings timing
dp2: add a pipeline for load/store (#1597)
* dp2: add a pipeline for load/store
Load/store Dispatch2 has a bad timing because it requires the fuType
to disguish the out ports. This brings timing issues because the
instruction has to read busyTable after the port arbitration.
This commit adds a pipeline in dp2Ls, which may cause performance
degradation. Instructions are dispatched according to out, and at
the next cycle it will leave dp2.
* bump difftest trying to fix vcs
show more ...
|
46f74b57 | 06-May-2022 |
Haojin Tang <[email protected]> |
feat: parameterize load store (#1527)
* feat: parameterize load/store pipeline, etc.
* fix: use LoadPipelineWidth rather than LoadQueueSize
* fix: parameterize `rdataPtrExtNext`
* SBuffer:
feat: parameterize load store (#1527)
* feat: parameterize load/store pipeline, etc.
* fix: use LoadPipelineWidth rather than LoadQueueSize
* fix: parameterize `rdataPtrExtNext`
* SBuffer: fix idx update logic
* atomic: parameterize atomic logic in `MemBlock`
* StoreQueue: update allow enque requirement
* feat: support one load/store pipeline
* feat: parameterize `EnsbufferWidth`
* chore: resharp codes for better generated name
show more ...
|
9658ce50 | 25-Mar-2022 |
LinJiawei <[email protected]> |
Bump chisel to 3.5.0 |
783011be | 24-Feb-2022 |
Yinan Xu <[email protected]> |
std: delay fp regfile read for one cycle (#1473) |
fd7603d9 | 15-Dec-2021 |
Yinan Xu <[email protected]> |
rename: add fused lui and load (#1356)
This commit adds fused load support by bypassing LUI results to load.
For better timing, detection is done at the rename stage. Imm is stored
in psrc(1), p
rename: add fused lui and load (#1356)
This commit adds fused load support by bypassing LUI results to load.
For better timing, detection is done at the rename stage. Imm is stored
in psrc(1), psrc(0) and imm.
show more ...
|
1ca0e4f3 | 10-Dec-2021 |
Yinan Xu <[email protected]> |
core: refactor hardware performance counters (#1335)
This commit optimizes the coding style and timing for hardware
performance counters.
By default, performance counters are RegNext(RegNext(_)). |
6ab6918f | 09-Dec-2021 |
Yinan Xu <[email protected]> |
core: refactor writeback parameters (#1327)
This commit adds WritebackSink and WritebackSource parameters for
multiple modules. These traits hide implementation details from
other modules by defin
core: refactor writeback parameters (#1327)
This commit adds WritebackSink and WritebackSource parameters for
multiple modules. These traits hide implementation details from
other modules by defining IO-related functions in modules.
By using WritebackSink, ROB is able to choose the writeback sources.
Now fflags and exceptions are connected from exe units to reduce write
ports and optimize timing.
Further optimizations on write-back to RS and better coding style to
be added later.
show more ...
|
2234af84 | 06-Dec-2021 |
Yinan Xu <[email protected]> |
rs: optimize issue grant timing with age (#1312)
This commit optimizes the issue grant timing when age is enabled.
Select from age and SelectPolicy are processed parallely. |
64886eef | 30-Nov-2021 |
William Wang <[email protected]> |
mem: disable l2l forward by default (#1283) |
9d4e1137 | 30-Nov-2021 |
Yinan Xu <[email protected]> |
rs: delay fp regfile read and wakeup for store data (#1274) |
980c1bc3 | 23-Nov-2021 |
William Wang <[email protected]> |
mem,mdp: use robIdx instead of sqIdx (#1242)
* mdp: implement SSIT with sram
* mdp: use robIdx instead of sqIdx
Dispatch refactor moves lsq enq to dispatch2, as a result, mdp can not
get corr
mem,mdp: use robIdx instead of sqIdx (#1242)
* mdp: implement SSIT with sram
* mdp: use robIdx instead of sqIdx
Dispatch refactor moves lsq enq to dispatch2, as a result, mdp can not
get correct sqIdx in dispatch. Unlike robIdx, it is hard to maintain a
"speculatively assigned" sqIdx, as it is hard to track store insts in
dispatch queue. Yet we can still use "speculatively assigned" robIdx
for memory dependency predictor.
For now, memory dependency predictor uses "speculatively assigned"
robIdx to track inflight store.
However, sqIdx is still used to track those store which's addr is valid
but data it not valid. When load insts try to get forward data from
those store, load insts will get that store's sqIdx and wait in RS.
They will not waken until store data with that sqIdx is issued.
* mdp: add track robIdx recover logic
show more ...
|
0e1ce320 | 22-Nov-2021 |
Yinan Xu <[email protected]> |
rs: fix counter for not-selected entries (#1251) |
35de2a4c | 22-Oct-2021 |
Yinan Xu <[email protected]> |
rs: wrap data selection logic in module (#1160) |
45f497a4 | 21-Oct-2021 |
happy-lx <[email protected]> |
asid: add asid, mainly work when hit check, not in sfence.vma (#1090)
add mmu's asid support.
1. put asid inside sram (if the entry is sram), or it will take too many sources.
2. when sfence, just
asid: add asid, mainly work when hit check, not in sfence.vma (#1090)
add mmu's asid support.
1. put asid inside sram (if the entry is sram), or it will take too many sources.
2. when sfence, just flush it all, don't care asid.
3. when hit check, check asid.
4. when asid changed, flush all the inflight ptw req for safety
5. simple asid unit test:
asid 1 write, asid 2 read and check, asid 2 write, asid 1 read and check. same va, different pa
* ASID: make satp's asid bits configurable to RW
* use AsidLength to control it
* ASID: implement asid refilling and hit checking
* TODO: sfence flush with asid
* ASID: implement sfence with asid
* TODO: extract asid from SRAMTemplate
* ASID: extract asid from SRAMTemplate
* all is down
* TODO: test
* fix write to asid
* Sfence: support rs2 of sfence and fix Fence Unit
* rs2 of Sfence should be Reg and pass it to Fence Unit
* judge the value of reg instead of the index in Fence Unit
* mmu: re-write asid
now, asid is stored inside sram, so sfence just flush it
it's a complex job to handle the problem that asid is changed but
no sfence.vma is executed. when asid is changed, all the inflight
mmu reqs are flushed but entries in storage is not influenced.
so the inflight reqs do not need to record asid, just use satp.asid
* tlb: fix bug of refill mask
* ci: add asid unit test
Co-authored-by: ZhangZifei <[email protected]>
show more ...
|
f4b2089a | 16-Oct-2021 |
Yinan Xu <[email protected]> |
core: use redirect ports for flush (#1121)
This commit removes flush IO for every module. Flush now re-uses
redirect ports to flush the instructions. |
d1fe0262 | 16-Oct-2021 |
William Wang <[email protected]> |
Add strict mode to reduce mdp mispredict (#1113)
* storeset: fix waitForSqIdx generate logic
Now right waitForSqIdx will be generated for earlier store in the same
dispatch bundle.
* mdp: add
Add strict mode to reduce mdp mispredict (#1113)
* storeset: fix waitForSqIdx generate logic
Now right waitForSqIdx will be generated for earlier store in the same
dispatch bundle.
* mdp: add strict wait mode
When loadWaitStrict && loadWaitBit, load will wait in rs until all
older store addr calculation are finished.
* chore: add storeset_load_strict_wait counter
show more ...
|
485648fa | 12-Oct-2021 |
Yinan Xu <[email protected]> |
rs: add IOs for performance counters (#1109)
This commit adds IOs for performance counters in reservation stations.
Only `full` is included for now. |
c7160cd3 | 12-Oct-2021 |
William Wang <[email protected]> |
mem: update block load logic (#1035)
* mem: update block load logic
Now load will be selected as soon as the store it depends on is ready,
which is predicted by Store Sets
* mem: opt block lo
mem: update block load logic (#1035)
* mem: update block load logic
Now load will be selected as soon as the store it depends on is ready,
which is predicted by Store Sets
* mem: opt block load logic
Load blocked by std invalid will wait for that std to issue
Load blocked by load violation wait for that sta to issue
* csr: add 2 extra storeset config bits
Following bits were added to slvpredctl:
- storeset_wait_store
- storeset_no_fast_wakeup
* storeset: fix waitForSqIdx generate logic
Now right waitForSqIdx will be generated for earlier store in the same
dispatch bundle
show more ...
|
33177a7c | 12-Oct-2021 |
Yinan Xu <[email protected]> |
core: update dispatch port parameters (#1103)
This commit changes how dispatch ports (regfile ports) are connected to
reservation station ports:
INT regfile:
* INT(0-1) --> ALU0, MUL0, JUMP
*
core: update dispatch port parameters (#1103)
This commit changes how dispatch ports (regfile ports) are connected to
reservation station ports:
INT regfile:
* INT(0-1) --> ALU0, MUL0, JUMP
* INT(2-3) --> ALU1, MUL0
* INT(4-5) --> ALU2, MUL1
* INT(6-7) --> ALU3, MUL1
* INT(8) --> LOAD0
* INT(9) --> LOAD1
* INT(10) --> STA0
* INT(11) --> STA1
* INT(12) --> STD0
* INT(13) --> STD1
FP regfile:
* FP(0-2) --> FMA0, FMISC0
* FP(3-5) --> FMA1, FMISC0
* FP(6-8) --> FMA2, FMISC1
* FP(9-11) --> FMA3, FMISC1
* FP(12) --> STD0
* FP(13) --> STD1
show more ...
|
d87b76aa | 11-Oct-2021 |
William Wang <[email protected]> |
Speed up dcache bank conflict feedback (#1081)
Make bank conflict feedback 1 cycle earlier |
3feeca58 | 10-Oct-2021 |
zfw <[email protected]> |
riscv-crypto: support K extension (#1102)
* This commit add risc-v cryptography extension subset(zknd zkne zknh zksed zksh)
- Rename bmu to bku
- Add crypto instruction in Mdu -> bku
- Store imme
riscv-crypto: support K extension (#1102)
* This commit add risc-v cryptography extension subset(zknd zkne zknh zksed zksh)
- Rename bmu to bku
- Add crypto instruction in Mdu -> bku
- Store immediate into mdu RS
* ci: add riscv-crypto test
show more ...
|
2b4e8253 | 01-Oct-2021 |
Yinan Xu <[email protected]> |
core: update parameters and module organizations (#1080)
This commit moves load/store reservation stations into the first
ExuBlock (or calling it IntegerBlock). The unnecessary dispatch module
is
core: update parameters and module organizations (#1080)
This commit moves load/store reservation stations into the first
ExuBlock (or calling it IntegerBlock). The unnecessary dispatch module
is also removed from CtrlBlock.
Now the module organization becomes:
* ExuBlock: Int RS, Load/Store RS, Int RF, Int FUs
* ExuBlock_1: Fp RS, Fp RF, Fp FUs
* MemBlock: Load/Store FUs
Besides, load queue has 80 entries and store queue has 64 entries now.
show more ...
|