Bundle.scala - OpenGrok history log for /XiangShan/src/main/scala/xiangshan/Bundle.scala

Revision	Date	Author	Comments
# 72951335	15-Nov-2021	Li Qianruo <[email protected]>	Trigger Implementation for Debug Mode (#1170) * Untested Trigger Implementation Co-authored-by: William Wang <[email protected]> Co-authored-by: Lingrui98 <[email protected]> Co-autho Trigger Implementation for Debug Mode (#1170) * Untested Trigger Implementation Co-authored-by: William Wang <[email protected]> Co-authored-by: Lingrui98 <[email protected]> Co-authored-by: rvcoresjw <[email protected]> show more ...
# 4d0a7d51	13-Nov-2021	Steve Gou <[email protected]>	Merge pull request #1223 from OpenXiangShan/tage-fh-merge implement folded global histories for tage-sc/ittage
# cbe9a847	12-Nov-2021	Yinan Xu <[email protected]>	difftest: add basic difftest features for releases (#1219) * difftest: add basic difftest features for releases This commit adds basic difftest features for every release, no matter it's for sim difftest: add basic difftest features for releases (#1219) * difftest: add basic difftest features for releases This commit adds basic difftest features for every release, no matter it's for simulation or physical design. The macro SYNTHESIS is used to skip these logics when synthesizing the design. This commit aims at allowing designs for physical design to be verified. * bump ready-to-run * difftest: add int and fp writeback data show more ...
# b3d79b37	12-Nov-2021	Yinan Xu <[email protected]>	top: add seip and meip bits from plic (#1221)
# dd6c0695	12-Nov-2021	Lingrui98 <[email protected]>	bpu: bring folded history into use, and use previous ghr to do difftest; move tage and ittage config to top
# c2ad24eb	11-Nov-2021	Lingrui98 <[email protected]>	bpu: use circular buffer as global history register, and * use compressed info to do redirects * implement folded history class
# d200f594	27-Oct-2021	William Wang <[email protected]>	mem: simplify software prefetch logic (#1176) * mem: update lsu op encoding * decode: remove prefetch bits from CtrlSignals * mem: simplify software prefetch logic in loadpipe * mem: fix wrong dc mem: simplify software prefetch logic (#1176) * mem: update lsu op encoding * decode: remove prefetch bits from CtrlSignals * mem: simplify software prefetch logic in loadpipe * mem: fix wrong dcacheShouldResp assertion show more ...
# af2f7849	27-Oct-2021	happy-lx <[email protected]>	Svinval (#1055) * Svinval: implement Svinval * add three new instructions(SINVAL_VMA SFENCE_W_INVAL SFENCE_INVAL_IR) * TODO : test * Prevent illegal software code by adding an assert * make sure th Svinval (#1055) * Svinval: implement Svinval * add three new instructions(SINVAL_VMA SFENCE_W_INVAL SFENCE_INVAL_IR) * TODO : test * Prevent illegal software code by adding an assert * make sure the software runs as follow: begin instruction of svinval extension svinval xxxx svinval xxxx ... end instruction of svinval extension * Svinval: add an CSR to control it and some annotations * Roq: fix assert bug of Svinval * Svinval: fix svinval.vma's rs2 type * make it reg instead of imm * Svinval: change assert logic and fix bug * fix the condition judging Svinval.vma instruction * using doingSvinval in assert * ci: add rv64mi-p-svinval to ci * fix typo * fix bug that lost ',' * when svinval disable, raise illegal instr excep * CSR: mv svinval ctl to srnctl(1) * rob: when excep, do not set dosvinval * decode: when disable svinval, do not set flushpipe * bump ready-to-run Co-authored-by: ZhangZifei <[email protected]> show more ...
# c3abb8b6	22-Oct-2021	Yinan Xu <[email protected]>	rob: optimize bits width in storage (#1155) This PR optimizes out isFused and crossPageIPFFix usages in Rob's DispatchData. They will not be stored in ROB. Now DispatchData has only 38 bits. * is rob: optimize bits width in storage (#1155) This PR optimizes out isFused and crossPageIPFFix usages in Rob's DispatchData. They will not be stored in ROB. Now DispatchData has only 38 bits. * isFused is merged with commitType (2 bits reduced) * crossPageIPFFix is used only in ExceptionGen (1 bit reduced) * rename: reduce ldest usages * decode: set isMove to false if ldest is zero show more ...
# 67682d05	22-Oct-2021	William Wang <[email protected]>	Add ld-ld violation check (#1140) * mem: support ld-ld violation check * mem: do not fast wakeup if ld vio check failed * mem: disable ld-ld vio check after core reset
# e19f7967	21-Oct-2021	William Wang <[email protected]>	mem: add CSR based l1 cache instructions (#1116)
# 45f497a4	21-Oct-2021	happy-lx <[email protected]>	asid: add asid, mainly work when hit check, not in sfence.vma (#1090) add mmu's asid support. 1. put asid inside sram (if the entry is sram), or it will take too many sources. 2. when sfence, just asid: add asid, mainly work when hit check, not in sfence.vma (#1090) add mmu's asid support. 1. put asid inside sram (if the entry is sram), or it will take too many sources. 2. when sfence, just flush it all, don't care asid. 3. when hit check, check asid. 4. when asid changed, flush all the inflight ptw req for safety 5. simple asid unit test: asid 1 write, asid 2 read and check, asid 2 write, asid 1 read and check. same va, different pa * ASID: make satp's asid bits configurable to RW * use AsidLength to control it * ASID: implement asid refilling and hit checking * TODO: sfence flush with asid * ASID: implement sfence with asid * TODO: extract asid from SRAMTemplate * ASID: extract asid from SRAMTemplate * all is down * TODO: test * fix write to asid * Sfence: support rs2 of sfence and fix Fence Unit * rs2 of Sfence should be Reg and pass it to Fence Unit * judge the value of reg instead of the index in Fence Unit * mmu: re-write asid now, asid is stored inside sram, so sfence just flush it it's a complex job to handle the problem that asid is changed but no sfence.vma is executed. when asid is changed, all the inflight mmu reqs are flushed but entries in storage is not influenced. so the inflight reqs do not need to record asid, just use satp.asid * tlb: fix bug of refill mask * ci: add asid unit test Co-authored-by: ZhangZifei <[email protected]> show more ...
# d1fe0262	16-Oct-2021	William Wang <[email protected]>	Add strict mode to reduce mdp mispredict (#1113) * storeset: fix waitForSqIdx generate logic Now right waitForSqIdx will be generated for earlier store in the same dispatch bundle. * mdp: add Add strict mode to reduce mdp mispredict (#1113) * storeset: fix waitForSqIdx generate logic Now right waitForSqIdx will be generated for earlier store in the same dispatch bundle. * mdp: add strict wait mode When loadWaitStrict && loadWaitBit, load will wait in rs until all older store addr calculation are finished. * chore: add storeset_load_strict_wait counter show more ...
# c7160cd3	12-Oct-2021	William Wang <[email protected]>	mem: update block load logic (#1035) * mem: update block load logic Now load will be selected as soon as the store it depends on is ready, which is predicted by Store Sets * mem: opt block lo mem: update block load logic (#1035) * mem: update block load logic Now load will be selected as soon as the store it depends on is ready, which is predicted by Store Sets * mem: opt block load logic Load blocked by std invalid will wait for that std to issue Load blocked by load violation wait for that sta to issue * csr: add 2 extra storeset config bits Following bits were added to slvpredctl: - storeset_wait_store - storeset_no_fast_wakeup * storeset: fix waitForSqIdx generate logic Now right waitForSqIdx will be generated for earlier store in the same dispatch bundle show more ...
# b6982e83	11-Oct-2021	Lemover <[email protected]>	pmp: add pmp support (#1092) * [WIP] PMP: add pmp to tlb & csr(ptw part is not added) * pmp: add pmp, unified * pmp: add pmp, distributed but same cycle * pmp: pmp resp next cycle * [WIP pmp: add pmp support (#1092) * [WIP] PMP: add pmp to tlb & csr(ptw part is not added) * pmp: add pmp, unified * pmp: add pmp, distributed but same cycle * pmp: pmp resp next cycle * [WIP] PMP: add l2tlb missqueue pmp support * pmp: add pmp to ptw and regnext pmp for frontend * pmp: fix bug of napot-match * pmp: fix bug of method aligned * pmp: when write cfg, update mask * pmp: fix bug of store af getting in store unit * tlb: fix bug, add af check(access fault from ptw) * tlb: af may have higher priority than pf when ptw has af * ptw: fix bug of sending paddr to pmp and recv af * ci: add pmp unit test * pmp: change PMPPlatformGrain to 6 (512bits) * pmp: fix bug of read_addr * ci: re-add pmp unit test * l2tlb: lazymodule couldn't use @chiselName * l2tlb: fix bug of l2tlb missqueue duplicate req's logic filt the duplicate req: old: when enq, change enq state to different state new: enq + mem.req.fire, more robust * pmp: pmp checker now supports samecycle & regenable show more ...
# d87b76aa	11-Oct-2021	William Wang <[email protected]>	Speed up dcache bank conflict feedback (#1081) Make bank conflict feedback 1 cycle earlier
# 3f4ec46f	10-Oct-2021	CODE-JTZ <[email protected]>	add softprefetch (prefetch.r & prefetch.w). (#1099) * add soft prefetch Add the softprefetch. Actually, prefetch.r&w are an ORI which's ldest is x0, we distinguish it in decodeUnit and send it to l add softprefetch (prefetch.r & prefetch.w). (#1099) * add soft prefetch Add the softprefetch. Actually, prefetch.r&w are an ORI which's ldest is x0, we distinguish it in decodeUnit and send it to ld func unit. Then, we modified some interaction signals in ordinary Load steps. show more ...
# 20edb3f7	09-Oct-2021	William Wang <[email protected]>	Add runahead debug signals (#1082) * runahead: add runahead support (WIP) * runahead: fix redirect event * difftest: bump difftest * runahead: bump version Note: current runahead does no Add runahead debug signals (#1082) * runahead: add runahead support (WIP) * runahead: fix redirect event * difftest: bump difftest * runahead: bump version Note: current runahead does not support instruction fusion, disable that in XiangShan if runahead is needed * runahead: bump version * difftest: bump version to support runahead * chore: bump huancun to make ci happy * chore: fix wrong submodule url * difftest: bump version BREAKING CHANGE: nemu update_config api has changed show more ...
# 2b4e8253	01-Oct-2021	Yinan Xu <[email protected]>	core: update parameters and module organizations (#1080) This commit moves load/store reservation stations into the first ExuBlock (or calling it IntegerBlock). The unnecessary dispatch module is core: update parameters and module organizations (#1080) This commit moves load/store reservation stations into the first ExuBlock (or calling it IntegerBlock). The unnecessary dispatch module is also removed from CtrlBlock. Now the module organization becomes: * ExuBlock: Int RS, Load/Store RS, Int RF, Int FUs * ExuBlock_1: Fp RS, Fp RF, Fp FUs * MemBlock: Load/Store FUs Besides, load queue has 80 entries and store queue has 64 entries now. show more ...
# 9aca92b9	28-Sep-2021	Yinan Xu <[email protected]>	misc: code clean up (#1073) * rename Roq to Rob * remove trailing whitespaces * remove unused parameters
# 1f0e2dc7	27-Sep-2021	Jiawei Lin <[email protected]>	128KB L1D + non-inclusive L2/L3 (#1051) * L1D: provide independent meta array for load pipe * misc: reorg files in cache dir * chore: reorg l1d related files * bump difftest: use clang to c 128KB L1D + non-inclusive L2/L3 (#1051) * L1D: provide independent meta array for load pipe * misc: reorg files in cache dir * chore: reorg l1d related files * bump difftest: use clang to compile verialted files * dcache: add BankedDataArray * dcache: fix data read way_en * dcache: fix banked data wmask * dcache: replay conflict correctly When conflict is detected: * Report replay * Disable fast wakeup * dcache: fix bank addr match logic * dcache: add bank conflict perf counter * dcache: fix miss perf counters * chore: make lsq data print perttier * dcache: enable banked ecc array * dcache: set dcache size to 128KB * dcache: read mainpipe data from banked data array * dcache: add independent mainpipe data read port * dcache: revert size change * Size will be changed after main pipe refactor * Merge remote-tracking branch 'origin/master' into l1-size * dcache: reduce banked data load conflict * MainPipe: ReleaseData for all replacement even if it's clean * dcache: set dcache size to 128KB BREAKING CHANGE: l2 needed to provide right vaddr index to probe l1, and it has to help l1 to avoid addr alias problem * chore: fix merge conflict * Change L2 to non-inclusive / Add alias bits in L1D * debug: hard coded dup data array for debuging * dcache: fix ptag width * dcache: fix amo main pipe req * dcache: when probe, use vaddr for main pipe req * dcache: include vaddr in atomic unit req * dcache: fix get_tag() function * dcache: fix writeback paddr * huancun: bump version * dcache: erase block offset bits in release addr * dcache: do not require probe vaddr != 0 * dcache: opt banked data read timing * bump huancun * dcache: fix atom unit pipe req vaddr * dcache: simplify main pipe writeback_vaddr * bump huancun * dcache: remove debug data array * Turn on all usr bits in L1 * Bump huancun * Bump huancun * enable L2 prefetcher * bump huancun * set non-inclusive L2/L3 + 128KB L1 as default config * Use data in TLBundleB to hint ProbeAck beeds data * mmu.l2tlb: mem_resp now fills multi mq pte buffer mq entries can just deq without accessing l2tlb cache * dcache: handle dirty userbit * bump huancun * chore: l1 cache code clean up * Remove l1plus cache * Remove HasBankedDataArrayParameters * Add bus pmu between L3 and Mem * bump huncun * dcache: fix l1 probe index generate logic * Now right probe index will be used according to the len of alias bits * dcache: clean up amo pipeline * DCacheParameter rowBits will be removed in the future, now we set it to 128 to make dcache work * dcache: fix amo word index * bump huancun Co-authored-by: William Wang <[email protected]> Co-authored-by: zhanglinjuan <[email protected]> Co-authored-by: TangDan <[email protected]> Co-authored-by: ZhangZifei <[email protected]> Co-authored-by: wangkaifan <[email protected]> show more ...
# ebb8ebf8	18-Sep-2021	Yinan Xu <[email protected]>	core: add timer counters for important stages (#1045) This commit adds timer counters for some important pipeline stages, including rename, dispatch, dispatch2, select, issue, execute, commit. We core: add timer counters for important stages (#1045) This commit adds timer counters for some important pipeline stages, including rename, dispatch, dispatch2, select, issue, execute, commit. We add performance counters for different types of instructions to see the latency in different pipeline stages. show more ...
# c88c3a2a	13-Sep-2021	Yinan Xu <[email protected]>	backend: clean up exception vector usages (#1026) This commit cleans up exception vector usages in backend. Previously the exception vector will go through the pipeline with the uop. However, in backend: clean up exception vector usages (#1026) This commit cleans up exception vector usages in backend. Previously the exception vector will go through the pipeline with the uop. However, instructions with exceptions will enter ROB when they are dispatched. Thus, actually we don't need the exception vector when an instruction enters a function unit. * exceptionVec, flushPipe, replayInst are reset when an instruction enters function units. * For execution units that don't have exceptions, we reset their output exception vectors to avoid ROB to record them. * Move replayInst to CtrlSignals. show more ...
# c9ebdf90	11-Sep-2021	Yinan Xu <[email protected]>	rs,status: simplify logic to optimize timing (#1020) This commit simplifies status logic in reservations stations. Module StatusArray is mostly rewritten. The following optimizations are applied rs,status: simplify logic to optimize timing (#1020) This commit simplifies status logic in reservations stations. Module StatusArray is mostly rewritten. The following optimizations are applied: * Wakeup now has higher priority than enqueue. This reduces the length of the critical path of ALU back-to-back wakeup. * Don't compare fpWen/rfWen if the reservation station does not have float/int operands. * Ignore status.valid or redirect for srcState update. For data capture, these are necessary and not changed. * Remove blocked and scheduled conditions in issue logic when the reservation station does not have loadWait bit and feedback. show more ...
# 88825c5c	09-Sep-2021	Yinan Xu <[email protected]>	backend: support instruction fusion cases (#1011) This commit adds some simple instruction fusion cases in decode stage. Currently we only implement instruction pairs that can be fused into RV64GC backend: support instruction fusion cases (#1011) This commit adds some simple instruction fusion cases in decode stage. Currently we only implement instruction pairs that can be fused into RV64GCB instructions. Instruction fusions are detected in the decode stage by FusionDecoder. The decoder checks every two instructions and marks the first instruction fused if they can be fused into one instruction. The second instruction is removed by setting the valid field to false. Simple fusion cases include sh1add, sh2add, sh3add, sexth, zexth, etc. Currently, ftq in frontend needs every instruction to commit. However, the second instruction is removed from the pipeline and will not commit. To solve this issue, we temporarily add more bits to isFused to indicate the offset diff of the two fused instruction. There are four possibilities now. This feature may be removed later. This commit also adds more instruction fusion cases that need changes in both the decode stage and the funtion units. In this commit, we add some opcode to the function units and fuse the new instruction pairs into these new internal uops. The list of opcodes we add in this commit is shown below: - szewl1: `slli r1, r0, 32` + `srli r1, r0, 31` - szewl2: `slli r1, r0, 32` + `srli r1, r0, 30` - byte2: `srli r1, r0, 8` + `andi r1, r1, 255` - sh4add: `slli r1, r0, 4` + `add r1, r1, r2` - sr30add: `srli r1, r0, 30` + `add r1, r1, r2` - sr31add: `srli r1, r0, 31` + `add r1, r1, r2` - sr32add: `srli r1, r0, 32` + `add r1, r1, r2` - oddadd: `andi r1, r0, 1`` + `add r1, r1, r2` - oddaddw: `andi r1, r0, 1`` + `addw r1, r1, r2` - orh48: mask off the first 16 bits and or with another operand (`andi r1, r0, -256`` + `or r1, r1, r2`) Furthermore, this commit adds some complex instruction fusion cases to the decode stage and function units. The complex instruction fusion cases are detected after the instructions are decoded into uop and their CtrlSignals are used for instruction fusion detection. We add the following complex instruction fusion cases: - addwbyte: addw and mask it with 0xff (extract the first byte) - addwbit: addw and mask it with 0x1 (extract the first bit) - logiclsb: logic operation and mask it with 0x1 (extract the first bit) - mulw7: andi 127 and mulw instructions. Input to mul is AND with 0x7f if mulw7 bit is set to true. show more ...
1 2 3 4 5 6 789 10 >>...23