#
b6982e83 |
| 11-Oct-2021 |
Lemover <[email protected]> |
pmp: add pmp support (#1092)
* [WIP] PMP: add pmp to tlb & csr(ptw part is not added)
* pmp: add pmp, unified
* pmp: add pmp, distributed but same cycle
* pmp: pmp resp next cycle
* [WIP
pmp: add pmp support (#1092)
* [WIP] PMP: add pmp to tlb & csr(ptw part is not added)
* pmp: add pmp, unified
* pmp: add pmp, distributed but same cycle
* pmp: pmp resp next cycle
* [WIP] PMP: add l2tlb missqueue pmp support
* pmp: add pmp to ptw and regnext pmp for frontend
* pmp: fix bug of napot-match
* pmp: fix bug of method aligned
* pmp: when write cfg, update mask
* pmp: fix bug of store af getting in store unit
* tlb: fix bug, add af check(access fault from ptw)
* tlb: af may have higher priority than pf when ptw has af
* ptw: fix bug of sending paddr to pmp and recv af
* ci: add pmp unit test
* pmp: change PMPPlatformGrain to 6 (512bits)
* pmp: fix bug of read_addr
* ci: re-add pmp unit test
* l2tlb: lazymodule couldn't use @chiselName
* l2tlb: fix bug of l2tlb missqueue duplicate req's logic
filt the duplicate req:
old: when enq, change enq state to different state
new: enq + mem.req.fire, more robust
* pmp: pmp checker now supports samecycle & regenable
show more ...
|
#
3feeca58 |
| 10-Oct-2021 |
zfw <[email protected]> |
riscv-crypto: support K extension (#1102)
* This commit add risc-v cryptography extension subset(zknd zkne zknh zksed zksh)
- Rename bmu to bku
- Add crypto instruction in Mdu -> bku
- Store imme
riscv-crypto: support K extension (#1102)
* This commit add risc-v cryptography extension subset(zknd zkne zknh zksed zksh)
- Rename bmu to bku
- Add crypto instruction in Mdu -> bku
- Store immediate into mdu RS
* ci: add riscv-crypto test
show more ...
|
#
7b441e5e |
| 04-Oct-2021 |
Yinan Xu <[email protected]> |
alu: fix maxu/minu/rol/ror results (#1085)
* bump difftest
* alu: fix max and maxu result
* alu: fix src1 generated by opcode
Co-authored-by: Zhangfw <[email protected]>
|
#
2b4e8253 |
| 01-Oct-2021 |
Yinan Xu <[email protected]> |
core: update parameters and module organizations (#1080)
This commit moves load/store reservation stations into the first
ExuBlock (or calling it IntegerBlock). The unnecessary dispatch module
is
core: update parameters and module organizations (#1080)
This commit moves load/store reservation stations into the first
ExuBlock (or calling it IntegerBlock). The unnecessary dispatch module
is also removed from CtrlBlock.
Now the module organization becomes:
* ExuBlock: Int RS, Load/Store RS, Int RF, Int FUs
* ExuBlock_1: Fp RS, Fp RF, Fp FUs
* MemBlock: Load/Store FUs
Besides, load queue has 80 entries and store queue has 64 entries now.
show more ...
|
#
675acc68 |
| 25-Sep-2021 |
Yinan Xu <[email protected]> |
backend: optimize aluOpType to 7 bits (#1061)
This commit optimizes ALUOpType to 7 bits. Alu timing will be checked
later.
We also apply some misc changes including:
* Move REVB, PACK, PACKH,
backend: optimize aluOpType to 7 bits (#1061)
This commit optimizes ALUOpType to 7 bits. Alu timing will be checked
later.
We also apply some misc changes including:
* Move REVB, PACK, PACKH, PACKW to ALU
* Add fused logicZexth, addwZext, addwSexth
* Add instruction fusion test cases to CI
show more ...
|
#
07596dc6 |
| 25-Sep-2021 |
zfw <[email protected]> |
Bmu: support zbk* instruction (#1059)
* Bmu: support zbk* instructions
* ci: add zbk* instruction test
|
#
a58e3351 |
| 23-Sep-2021 |
Li Qianruo <[email protected]> |
Integer SRT16 Divider (#1019)
* New SRT4 divider that may improve timing
See "Digital reurrence dividers with reduced logical depth"
* SRT16 Int Divider that is working properly
* Fix bug r
Integer SRT16 Divider (#1019)
* New SRT4 divider that may improve timing
See "Digital reurrence dividers with reduced logical depth"
* SRT16 Int Divider that is working properly
* Fix bug related to div 1
* Timing improved version of SRT16 int divider
* Add copyright and made some minor changes
* Fix bugs related to div 0
* Fix another div 0 bug
* Fix another special case bug
show more ...
|
#
ebb8ebf8 |
| 18-Sep-2021 |
Yinan Xu <[email protected]> |
core: add timer counters for important stages (#1045)
This commit adds timer counters for some important pipeline stages,
including rename, dispatch, dispatch2, select, issue, execute, commit.
We
core: add timer counters for important stages (#1045)
This commit adds timer counters for some important pipeline stages,
including rename, dispatch, dispatch2, select, issue, execute, commit.
We add performance counters for different types of instructions to see
the latency in different pipeline stages.
show more ...
|
#
c88c3a2a |
| 13-Sep-2021 |
Yinan Xu <[email protected]> |
backend: clean up exception vector usages (#1026)
This commit cleans up exception vector usages in backend.
Previously the exception vector will go through the pipeline with the
uop. However, in
backend: clean up exception vector usages (#1026)
This commit cleans up exception vector usages in backend.
Previously the exception vector will go through the pipeline with the
uop. However, instructions with exceptions will enter ROB when they are
dispatched. Thus, actually we don't need the exception vector when an
instruction enters a function unit.
* exceptionVec, flushPipe, replayInst are reset when an instruction
enters function units.
* For execution units that don't have exceptions, we reset their output
exception vectors to avoid ROB to record them.
* Move replayInst to CtrlSignals.
show more ...
|
#
a792bcf1 |
| 12-Sep-2021 |
Yinan Xu <[email protected]> |
backend: add 3-bit shift fused instructions (#1022)
This commit adds 3-bit shift fused instructions. When the program
tries to add 8-byte index, these may be used.
List of fused instructions add
backend: add 3-bit shift fused instructions (#1022)
This commit adds 3-bit shift fused instructions. When the program
tries to add 8-byte index, these may be used.
List of fused instructions added in this commit:
* szewl3: `slli r1, r0, 32` + `srli r1, r0, 29`
* sr29add: `srli r1, r0, 29` + `add r1, r1, r2`
show more ...
|
#
c9ebdf90 |
| 11-Sep-2021 |
Yinan Xu <[email protected]> |
rs,status: simplify logic to optimize timing (#1020)
This commit simplifies status logic in reservations stations. Module
StatusArray is mostly rewritten.
The following optimizations are applied
rs,status: simplify logic to optimize timing (#1020)
This commit simplifies status logic in reservations stations. Module
StatusArray is mostly rewritten.
The following optimizations are applied:
* Wakeup now has higher priority than enqueue. This reduces the length
of the critical path of ALU back-to-back wakeup.
* Don't compare fpWen/rfWen if the reservation station does not have
float/int operands.
* Ignore status.valid or redirect for srcState update. For data capture,
these are necessary and not changed.
* Remove blocked and scheduled conditions in issue logic when the
reservation station does not have loadWait bit and feedback.
show more ...
|
#
88825c5c |
| 09-Sep-2021 |
Yinan Xu <[email protected]> |
backend: support instruction fusion cases (#1011)
This commit adds some simple instruction fusion cases in decode stage.
Currently we only implement instruction pairs that can be fused into
RV64GC
backend: support instruction fusion cases (#1011)
This commit adds some simple instruction fusion cases in decode stage.
Currently we only implement instruction pairs that can be fused into
RV64GCB instructions.
Instruction fusions are detected in the decode stage by FusionDecoder.
The decoder checks every two instructions and marks the first
instruction fused if they can be fused into one instruction. The second
instruction is removed by setting the valid field to false.
Simple fusion cases include sh1add, sh2add, sh3add, sexth, zexth, etc.
Currently, ftq in frontend needs every instruction to commit. However,
the second instruction is removed from the pipeline and will not commit.
To solve this issue, we temporarily add more bits to isFused to indicate
the offset diff of the two fused instruction. There are four
possibilities now. This feature may be removed later.
This commit also adds more instruction fusion cases that need changes
in both the decode stage and the funtion units. In this commit, we add
some opcode to the function units and fuse the new instruction pairs
into these new internal uops.
The list of opcodes we add in this commit is shown below:
- szewl1: `slli r1, r0, 32` + `srli r1, r0, 31`
- szewl2: `slli r1, r0, 32` + `srli r1, r0, 30`
- byte2: `srli r1, r0, 8` + `andi r1, r1, 255`
- sh4add: `slli r1, r0, 4` + `add r1, r1, r2`
- sr30add: `srli r1, r0, 30` + `add r1, r1, r2`
- sr31add: `srli r1, r0, 31` + `add r1, r1, r2`
- sr32add: `srli r1, r0, 32` + `add r1, r1, r2`
- oddadd: `andi r1, r0, 1`` + `add r1, r1, r2`
- oddaddw: `andi r1, r0, 1`` + `addw r1, r1, r2`
- orh48: mask off the first 16 bits and or with another operand
(`andi r1, r0, -256`` + `or r1, r1, r2`)
Furthermore, this commit adds some complex instruction fusion cases to
the decode stage and function units. The complex instruction fusion cases
are detected after the instructions are decoded into uop and their
CtrlSignals are used for instruction fusion detection.
We add the following complex instruction fusion cases:
- addwbyte: addw and mask it with 0xff (extract the first byte)
- addwbit: addw and mask it with 0x1 (extract the first bit)
- logiclsb: logic operation and mask it with 0x1 (extract the first bit)
- mulw7: andi 127 and mulw instructions.
Input to mul is AND with 0x7f if mulw7 bit is set to true.
show more ...
|
#
bd278897 |
| 05-Sep-2021 |
Yinan Xu <[email protected]> |
backend,exu: load balance between issue ports (#947)
This commit adds support for load balance between different issue ports
when the function unit is not pipelined and the reservation station has
backend,exu: load balance between issue ports (#947)
This commit adds support for load balance between different issue ports
when the function unit is not pipelined and the reservation station has
more than one issue ports.
We use a ping pong bit to decide which port to issue the instruction. At
every clock cycle, the bit is flipped.
show more ...
|
#
4b65fc7e |
| 04-Sep-2021 |
Jiawei Lin <[email protected]> |
FMA: separate fmul/fadd/fma (#996)
* FMA: spearate fadd/fmul/fma
* exu: enable fast uop out from fmacExeUnit
Co-authored-by: Yinan Xu <[email protected]>
|
#
c3d7991b |
| 03-Sep-2021 |
Jiawei Lin <[email protected]> |
Multiplier: adjust pipeline (#993)
* Multiplier: adjust pipeline
|
#
6cdd85d9 |
| 03-Sep-2021 |
Yinan Xu <[email protected]> |
backend,fu: add InputBuffer for fdivSqrt (#990)
This commit adds an 8-entry buffer for fdivSqrt function unit input.
Set hasInputBuffer to true to enable input buffers for other function
units.
|
#
e174d629 |
| 01-Sep-2021 |
Jiawei Lin <[email protected]> |
IntToFP: support fully pipelined work mode (#983)
* IntToFP: support fully pipelined mode
|
#
b2482bc1 |
| 01-Sep-2021 |
Yinan Xu <[email protected]> |
backend, fu: support fastUopOut for pipelined fu (#966)
This commit adds fastUopOut support for pipelined function units via
implementing fastUopOut in trait HasPipelineReg.
The following functi
backend, fu: support fastUopOut for pipelined fu (#966)
This commit adds fastUopOut support for pipelined function units via
implementing fastUopOut in trait HasPipelineReg.
The following function units now support fastUopOut:
- MUL
- FMA
- F2I
- F2F
show more ...
|
#
28c18878 |
| 31-Aug-2021 |
zfw <[email protected]> |
Alu: optimize timing for bitmanip (#979)
* Alu: optimize timing
This pull request optimizes timing by adding a 32bit adder for addw and changing the encode.
|
#
f83b578a |
| 27-Aug-2021 |
Yinan Xu <[email protected]> |
backend,fu: allow early arbitration via fastUopOut (#962)
This commit adds a fastUopOut option to function units. This allows the
function units to give valid and uop one cycle before its output da
backend,fu: allow early arbitration via fastUopOut (#962)
This commit adds a fastUopOut option to function units. This allows the
function units to give valid and uop one cycle before its output data is
ready. FastUopOut lets writeback arbitration happen one cycle before
data is ready and helps optimize the timing.
Since some function units are not ready for this new feature, this
commit adds a fastImplemented option to allow function units to have
fastUopOut but the data is still at the same cycle as uop. This option
will delay the data for one cycle and may cause performance degradation.
FastImplemented should be true after function units support fastUopOut.
show more ...
|
#
184a1958 |
| 26-Aug-2021 |
zfw <[email protected]> |
Alu: optimize timing for bitmanip (#959)
* separate the Alu instructions by 64bit data instructions and w-suffix instructions
* optimize select logic of instructions result
|
#
1a0f06ee |
| 23-Aug-2021 |
Yinan Xu <[email protected]> |
exu: add suggestName to function units (#944)
|
#
85b4cd54 |
| 21-Aug-2021 |
Yinan Xu <[email protected]> |
backend: separate store address and data (#921)
This commit separates store address and store data in backend, including both reservation stations and function units. This commit also changes how st
backend: separate store address and data (#921)
This commit separates store address and store data in backend, including both reservation stations and function units. This commit also changes how stIssuePtr is updated. stIssuePtr should only be updated when both store data and address issue.
show more ...
|
#
ee8ff153 |
| 17-Aug-2021 |
zfw <[email protected]> |
Support RISC-V bitmanip extension v1.0 (#919)
* Add bitmanip v1.0 instructions into decede table
* Fix some instructions' name
* Add basic instructions into Alu
* Add clz, ctz, cpop, clmul Instru
Support RISC-V bitmanip extension v1.0 (#919)
* Add bitmanip v1.0 instructions into decede table
* Fix some instructions' name
* Add basic instructions into Alu
* Add clz, ctz, cpop, clmul Instruction into MulDivExeUnit
show more ...
|
#
adb5df20 |
| 04-Aug-2021 |
Yinan Xu <[email protected]> |
backend: add ExuBlock to wrap execution units and RS (#903)
Backend --> ExuBlock --> FuBlock --> Exu --> Function Units
--> --> Scheduler --> RS
|