#
72951335 |
| 15-Nov-2021 |
Li Qianruo <[email protected]> |
Trigger Implementation for Debug Mode (#1170)
* Untested Trigger Implementation
Co-authored-by: William Wang <[email protected]>
Co-authored-by: Lingrui98 <[email protected]>
Co-autho
Trigger Implementation for Debug Mode (#1170)
* Untested Trigger Implementation
Co-authored-by: William Wang <[email protected]>
Co-authored-by: Lingrui98 <[email protected]>
Co-authored-by: rvcoresjw <[email protected]>
show more ...
|
#
4d0a7d51 |
| 13-Nov-2021 |
Steve Gou <[email protected]> |
Merge pull request #1223 from OpenXiangShan/tage-fh-merge
implement folded global histories for tage-sc/ittage
|
#
cbe9a847 |
| 12-Nov-2021 |
Yinan Xu <[email protected]> |
difftest: add basic difftest features for releases (#1219)
* difftest: add basic difftest features for releases
This commit adds basic difftest features for every release, no matter
it's for sim
difftest: add basic difftest features for releases (#1219)
* difftest: add basic difftest features for releases
This commit adds basic difftest features for every release, no matter
it's for simulation or physical design. The macro SYNTHESIS is used to
skip these logics when synthesizing the design. This commit aims at
allowing designs for physical design to be verified.
* bump ready-to-run
* difftest: add int and fp writeback data
show more ...
|
#
b3d79b37 |
| 12-Nov-2021 |
Yinan Xu <[email protected]> |
top: add seip and meip bits from plic (#1221)
|
#
dd6c0695 |
| 12-Nov-2021 |
Lingrui98 <[email protected]> |
bpu: bring folded history into use, and use previous ghr to do difftest; move tage and ittage config to top
|
#
c2ad24eb |
| 11-Nov-2021 |
Lingrui98 <[email protected]> |
bpu: use circular buffer as global history register, and
* use compressed info to do redirects * implement folded history class
|
#
d200f594 |
| 27-Oct-2021 |
William Wang <[email protected]> |
mem: simplify software prefetch logic (#1176)
* mem: update lsu op encoding
* decode: remove prefetch bits from CtrlSignals
* mem: simplify software prefetch logic in loadpipe
* mem: fix wrong dc
mem: simplify software prefetch logic (#1176)
* mem: update lsu op encoding
* decode: remove prefetch bits from CtrlSignals
* mem: simplify software prefetch logic in loadpipe
* mem: fix wrong dcacheShouldResp assertion
show more ...
|
#
af2f7849 |
| 27-Oct-2021 |
happy-lx <[email protected]> |
Svinval (#1055)
* Svinval: implement Svinval * add three new instructions(SINVAL_VMA SFENCE_W_INVAL SFENCE_INVAL_IR) * TODO : test
* Prevent illegal software code by adding an assert * make sure th
Svinval (#1055)
* Svinval: implement Svinval * add three new instructions(SINVAL_VMA SFENCE_W_INVAL SFENCE_INVAL_IR) * TODO : test
* Prevent illegal software code by adding an assert * make sure the software runs as follow: begin instruction of svinval extension svinval xxxx svinval xxxx ... end instruction of svinval extension
* Svinval: add an CSR to control it and some annotations
* Roq: fix assert bug of Svinval
* Svinval: fix svinval.vma's rs2 type * make it reg instead of imm
* Svinval: change assert logic and fix bug * fix the condition judging Svinval.vma instruction * using doingSvinval in assert
* ci: add rv64mi-p-svinval to ci
* fix typo
* fix bug that lost ','
* when svinval disable, raise illegal instr excep
* CSR: mv svinval ctl to srnctl(1)
* rob: when excep, do not set dosvinval
* decode: when disable svinval, do not set flushpipe
* bump ready-to-run
Co-authored-by: ZhangZifei <[email protected]>
show more ...
|
#
c3abb8b6 |
| 22-Oct-2021 |
Yinan Xu <[email protected]> |
rob: optimize bits width in storage (#1155)
This PR optimizes out isFused and crossPageIPFFix usages in Rob's DispatchData. They will not be stored in ROB. Now DispatchData has only 38 bits.
* is
rob: optimize bits width in storage (#1155)
This PR optimizes out isFused and crossPageIPFFix usages in Rob's DispatchData. They will not be stored in ROB. Now DispatchData has only 38 bits.
* isFused is merged with commitType (2 bits reduced)
* crossPageIPFFix is used only in ExceptionGen (1 bit reduced)
* rename: reduce ldest usages
* decode: set isMove to false if ldest is zero
show more ...
|
#
67682d05 |
| 22-Oct-2021 |
William Wang <[email protected]> |
Add ld-ld violation check (#1140)
* mem: support ld-ld violation check
* mem: do not fast wakeup if ld vio check failed
* mem: disable ld-ld vio check after core reset
|
#
e19f7967 |
| 21-Oct-2021 |
William Wang <[email protected]> |
mem: add CSR based l1 cache instructions (#1116)
|
#
45f497a4 |
| 21-Oct-2021 |
happy-lx <[email protected]> |
asid: add asid, mainly work when hit check, not in sfence.vma (#1090)
add mmu's asid support.
1. put asid inside sram (if the entry is sram), or it will take too many sources.
2. when sfence, just
asid: add asid, mainly work when hit check, not in sfence.vma (#1090)
add mmu's asid support.
1. put asid inside sram (if the entry is sram), or it will take too many sources.
2. when sfence, just flush it all, don't care asid.
3. when hit check, check asid.
4. when asid changed, flush all the inflight ptw req for safety
5. simple asid unit test:
asid 1 write, asid 2 read and check, asid 2 write, asid 1 read and check. same va, different pa
* ASID: make satp's asid bits configurable to RW
* use AsidLength to control it
* ASID: implement asid refilling and hit checking
* TODO: sfence flush with asid
* ASID: implement sfence with asid
* TODO: extract asid from SRAMTemplate
* ASID: extract asid from SRAMTemplate
* all is down
* TODO: test
* fix write to asid
* Sfence: support rs2 of sfence and fix Fence Unit
* rs2 of Sfence should be Reg and pass it to Fence Unit
* judge the value of reg instead of the index in Fence Unit
* mmu: re-write asid
now, asid is stored inside sram, so sfence just flush it
it's a complex job to handle the problem that asid is changed but
no sfence.vma is executed. when asid is changed, all the inflight
mmu reqs are flushed but entries in storage is not influenced.
so the inflight reqs do not need to record asid, just use satp.asid
* tlb: fix bug of refill mask
* ci: add asid unit test
Co-authored-by: ZhangZifei <[email protected]>
show more ...
|
#
d1fe0262 |
| 16-Oct-2021 |
William Wang <[email protected]> |
Add strict mode to reduce mdp mispredict (#1113)
* storeset: fix waitForSqIdx generate logic
Now right waitForSqIdx will be generated for earlier store in the same
dispatch bundle.
* mdp: add
Add strict mode to reduce mdp mispredict (#1113)
* storeset: fix waitForSqIdx generate logic
Now right waitForSqIdx will be generated for earlier store in the same
dispatch bundle.
* mdp: add strict wait mode
When loadWaitStrict && loadWaitBit, load will wait in rs until all
older store addr calculation are finished.
* chore: add storeset_load_strict_wait counter
show more ...
|
#
c7160cd3 |
| 12-Oct-2021 |
William Wang <[email protected]> |
mem: update block load logic (#1035)
* mem: update block load logic
Now load will be selected as soon as the store it depends on is ready,
which is predicted by Store Sets
* mem: opt block lo
mem: update block load logic (#1035)
* mem: update block load logic
Now load will be selected as soon as the store it depends on is ready,
which is predicted by Store Sets
* mem: opt block load logic
Load blocked by std invalid will wait for that std to issue
Load blocked by load violation wait for that sta to issue
* csr: add 2 extra storeset config bits
Following bits were added to slvpredctl:
- storeset_wait_store
- storeset_no_fast_wakeup
* storeset: fix waitForSqIdx generate logic
Now right waitForSqIdx will be generated for earlier store in the same
dispatch bundle
show more ...
|
#
b6982e83 |
| 11-Oct-2021 |
Lemover <[email protected]> |
pmp: add pmp support (#1092)
* [WIP] PMP: add pmp to tlb & csr(ptw part is not added)
* pmp: add pmp, unified
* pmp: add pmp, distributed but same cycle
* pmp: pmp resp next cycle
* [WIP
pmp: add pmp support (#1092)
* [WIP] PMP: add pmp to tlb & csr(ptw part is not added)
* pmp: add pmp, unified
* pmp: add pmp, distributed but same cycle
* pmp: pmp resp next cycle
* [WIP] PMP: add l2tlb missqueue pmp support
* pmp: add pmp to ptw and regnext pmp for frontend
* pmp: fix bug of napot-match
* pmp: fix bug of method aligned
* pmp: when write cfg, update mask
* pmp: fix bug of store af getting in store unit
* tlb: fix bug, add af check(access fault from ptw)
* tlb: af may have higher priority than pf when ptw has af
* ptw: fix bug of sending paddr to pmp and recv af
* ci: add pmp unit test
* pmp: change PMPPlatformGrain to 6 (512bits)
* pmp: fix bug of read_addr
* ci: re-add pmp unit test
* l2tlb: lazymodule couldn't use @chiselName
* l2tlb: fix bug of l2tlb missqueue duplicate req's logic
filt the duplicate req:
old: when enq, change enq state to different state
new: enq + mem.req.fire, more robust
* pmp: pmp checker now supports samecycle & regenable
show more ...
|
#
d87b76aa |
| 11-Oct-2021 |
William Wang <[email protected]> |
Speed up dcache bank conflict feedback (#1081)
Make bank conflict feedback 1 cycle earlier
|
#
3f4ec46f |
| 10-Oct-2021 |
CODE-JTZ <[email protected]> |
add softprefetch (prefetch.r & prefetch.w). (#1099)
* add soft prefetch
Add the softprefetch. Actually, prefetch.r&w are an ORI which's ldest is x0, we distinguish it in decodeUnit and send it to l
add softprefetch (prefetch.r & prefetch.w). (#1099)
* add soft prefetch
Add the softprefetch. Actually, prefetch.r&w are an ORI which's ldest is x0, we distinguish it in decodeUnit and send it to ld func unit. Then, we modified some interaction signals in ordinary Load steps.
show more ...
|
#
20edb3f7 |
| 09-Oct-2021 |
William Wang <[email protected]> |
Add runahead debug signals (#1082)
* runahead: add runahead support (WIP)
* runahead: fix redirect event
* difftest: bump difftest
* runahead: bump version
Note: current runahead does no
Add runahead debug signals (#1082)
* runahead: add runahead support (WIP)
* runahead: fix redirect event
* difftest: bump difftest
* runahead: bump version
Note: current runahead does not support instruction fusion, disable that
in XiangShan if runahead is needed
* runahead: bump version
* difftest: bump version to support runahead
* chore: bump huancun to make ci happy
* chore: fix wrong submodule url
* difftest: bump version
BREAKING CHANGE: nemu update_config api has changed
show more ...
|
#
2b4e8253 |
| 01-Oct-2021 |
Yinan Xu <[email protected]> |
core: update parameters and module organizations (#1080)
This commit moves load/store reservation stations into the first
ExuBlock (or calling it IntegerBlock). The unnecessary dispatch module
is
core: update parameters and module organizations (#1080)
This commit moves load/store reservation stations into the first
ExuBlock (or calling it IntegerBlock). The unnecessary dispatch module
is also removed from CtrlBlock.
Now the module organization becomes:
* ExuBlock: Int RS, Load/Store RS, Int RF, Int FUs
* ExuBlock_1: Fp RS, Fp RF, Fp FUs
* MemBlock: Load/Store FUs
Besides, load queue has 80 entries and store queue has 64 entries now.
show more ...
|
#
9aca92b9 |
| 28-Sep-2021 |
Yinan Xu <[email protected]> |
misc: code clean up (#1073)
* rename Roq to Rob
* remove trailing whitespaces
* remove unused parameters
|
#
1f0e2dc7 |
| 27-Sep-2021 |
Jiawei Lin <[email protected]> |
128KB L1D + non-inclusive L2/L3 (#1051)
* L1D: provide independent meta array for load pipe
* misc: reorg files in cache dir
* chore: reorg l1d related files
* bump difftest: use clang to c
128KB L1D + non-inclusive L2/L3 (#1051)
* L1D: provide independent meta array for load pipe
* misc: reorg files in cache dir
* chore: reorg l1d related files
* bump difftest: use clang to compile verialted files
* dcache: add BankedDataArray
* dcache: fix data read way_en
* dcache: fix banked data wmask
* dcache: replay conflict correctly
When conflict is detected:
* Report replay
* Disable fast wakeup
* dcache: fix bank addr match logic
* dcache: add bank conflict perf counter
* dcache: fix miss perf counters
* chore: make lsq data print perttier
* dcache: enable banked ecc array
* dcache: set dcache size to 128KB
* dcache: read mainpipe data from banked data array
* dcache: add independent mainpipe data read port
* dcache: revert size change
* Size will be changed after main pipe refactor
* Merge remote-tracking branch 'origin/master' into l1-size
* dcache: reduce banked data load conflict
* MainPipe: ReleaseData for all replacement even if it's clean
* dcache: set dcache size to 128KB
BREAKING CHANGE: l2 needed to provide right vaddr index to probe l1,
and it has to help l1 to avoid addr alias problem
* chore: fix merge conflict
* Change L2 to non-inclusive / Add alias bits in L1D
* debug: hard coded dup data array for debuging
* dcache: fix ptag width
* dcache: fix amo main pipe req
* dcache: when probe, use vaddr for main pipe req
* dcache: include vaddr in atomic unit req
* dcache: fix get_tag() function
* dcache: fix writeback paddr
* huancun: bump version
* dcache: erase block offset bits in release addr
* dcache: do not require probe vaddr != 0
* dcache: opt banked data read timing
* bump huancun
* dcache: fix atom unit pipe req vaddr
* dcache: simplify main pipe writeback_vaddr
* bump huancun
* dcache: remove debug data array
* Turn on all usr bits in L1
* Bump huancun
* Bump huancun
* enable L2 prefetcher
* bump huancun
* set non-inclusive L2/L3 + 128KB L1 as default config
* Use data in TLBundleB to hint ProbeAck beeds data
* mmu.l2tlb: mem_resp now fills multi mq pte buffer
mq entries can just deq without accessing l2tlb cache
* dcache: handle dirty userbit
* bump huancun
* chore: l1 cache code clean up
* Remove l1plus cache
* Remove HasBankedDataArrayParameters
* Add bus pmu between L3 and Mem
* bump huncun
* dcache: fix l1 probe index generate logic
* Now right probe index will be used according to the len of alias bits
* dcache: clean up amo pipeline
* DCacheParameter rowBits will be removed in the future, now we set it to 128
to make dcache work
* dcache: fix amo word index
* bump huancun
Co-authored-by: William Wang <[email protected]>
Co-authored-by: zhanglinjuan <[email protected]>
Co-authored-by: TangDan <[email protected]>
Co-authored-by: ZhangZifei <[email protected]>
Co-authored-by: wangkaifan <[email protected]>
show more ...
|
#
ebb8ebf8 |
| 18-Sep-2021 |
Yinan Xu <[email protected]> |
core: add timer counters for important stages (#1045)
This commit adds timer counters for some important pipeline stages,
including rename, dispatch, dispatch2, select, issue, execute, commit.
We
core: add timer counters for important stages (#1045)
This commit adds timer counters for some important pipeline stages,
including rename, dispatch, dispatch2, select, issue, execute, commit.
We add performance counters for different types of instructions to see
the latency in different pipeline stages.
show more ...
|
#
c88c3a2a |
| 13-Sep-2021 |
Yinan Xu <[email protected]> |
backend: clean up exception vector usages (#1026)
This commit cleans up exception vector usages in backend.
Previously the exception vector will go through the pipeline with the
uop. However, in
backend: clean up exception vector usages (#1026)
This commit cleans up exception vector usages in backend.
Previously the exception vector will go through the pipeline with the
uop. However, instructions with exceptions will enter ROB when they are
dispatched. Thus, actually we don't need the exception vector when an
instruction enters a function unit.
* exceptionVec, flushPipe, replayInst are reset when an instruction
enters function units.
* For execution units that don't have exceptions, we reset their output
exception vectors to avoid ROB to record them.
* Move replayInst to CtrlSignals.
show more ...
|
#
c9ebdf90 |
| 11-Sep-2021 |
Yinan Xu <[email protected]> |
rs,status: simplify logic to optimize timing (#1020)
This commit simplifies status logic in reservations stations. Module
StatusArray is mostly rewritten.
The following optimizations are applied
rs,status: simplify logic to optimize timing (#1020)
This commit simplifies status logic in reservations stations. Module
StatusArray is mostly rewritten.
The following optimizations are applied:
* Wakeup now has higher priority than enqueue. This reduces the length
of the critical path of ALU back-to-back wakeup.
* Don't compare fpWen/rfWen if the reservation station does not have
float/int operands.
* Ignore status.valid or redirect for srcState update. For data capture,
these are necessary and not changed.
* Remove blocked and scheduled conditions in issue logic when the
reservation station does not have loadWait bit and feedback.
show more ...
|
#
88825c5c |
| 09-Sep-2021 |
Yinan Xu <[email protected]> |
backend: support instruction fusion cases (#1011)
This commit adds some simple instruction fusion cases in decode stage.
Currently we only implement instruction pairs that can be fused into
RV64GC
backend: support instruction fusion cases (#1011)
This commit adds some simple instruction fusion cases in decode stage.
Currently we only implement instruction pairs that can be fused into
RV64GCB instructions.
Instruction fusions are detected in the decode stage by FusionDecoder.
The decoder checks every two instructions and marks the first
instruction fused if they can be fused into one instruction. The second
instruction is removed by setting the valid field to false.
Simple fusion cases include sh1add, sh2add, sh3add, sexth, zexth, etc.
Currently, ftq in frontend needs every instruction to commit. However,
the second instruction is removed from the pipeline and will not commit.
To solve this issue, we temporarily add more bits to isFused to indicate
the offset diff of the two fused instruction. There are four
possibilities now. This feature may be removed later.
This commit also adds more instruction fusion cases that need changes
in both the decode stage and the funtion units. In this commit, we add
some opcode to the function units and fuse the new instruction pairs
into these new internal uops.
The list of opcodes we add in this commit is shown below:
- szewl1: `slli r1, r0, 32` + `srli r1, r0, 31`
- szewl2: `slli r1, r0, 32` + `srli r1, r0, 30`
- byte2: `srli r1, r0, 8` + `andi r1, r1, 255`
- sh4add: `slli r1, r0, 4` + `add r1, r1, r2`
- sr30add: `srli r1, r0, 30` + `add r1, r1, r2`
- sr31add: `srli r1, r0, 31` + `add r1, r1, r2`
- sr32add: `srli r1, r0, 32` + `add r1, r1, r2`
- oddadd: `andi r1, r0, 1`` + `add r1, r1, r2`
- oddaddw: `andi r1, r0, 1`` + `addw r1, r1, r2`
- orh48: mask off the first 16 bits and or with another operand
(`andi r1, r0, -256`` + `or r1, r1, r2`)
Furthermore, this commit adds some complex instruction fusion cases to
the decode stage and function units. The complex instruction fusion cases
are detected after the instructions are decoded into uop and their
CtrlSignals are used for instruction fusion detection.
We add the following complex instruction fusion cases:
- addwbyte: addw and mask it with 0xff (extract the first byte)
- addwbit: addw and mask it with 0x1 (extract the first bit)
- logiclsb: logic operation and mask it with 0x1 (extract the first bit)
- mulw7: andi 127 and mulw instructions.
Input to mul is AND with 0x7f if mulw7 bit is set to true.
show more ...
|