Probe.scala - OpenGrok history log for /XiangShan/src/main/scala/xiangshan/cache/dcache/mainpipe/Probe.scala

Revision	Date	Author	Comments
# 8b33cd30	13-Dec-2024	klin02 <[email protected]>	feat(XSLog): move all XSLog outside WhenContext for collection As data in WhenContext is not acessible in another module. To support XSLog collection, we move all XSLog and related signal outside Wh feat(XSLog): move all XSLog outside WhenContext for collection As data in WhenContext is not acessible in another module. To support XSLog collection, we move all XSLog and related signal outside WhenContext. For example, when(cond1){XSDebug(cond2, pable)} to XSDebug(cond1 && cond2, pable) show more ...
# 44f2941b	24-Sep-2024	Jiru Sun <[email protected]>	refactor(HPM): move HPMs from utils to utility repo (#3631) Because HPMs will be used in Coupled L2 as well, delete `PerfCounterUtils.scala` in Xiangshan and create `HardwarePerfMonitor.scala` in refactor(HPM): move HPMs from utils to utility repo (#3631) Because HPMs will be used in Coupled L2 as well, delete `PerfCounterUtils.scala` in Xiangshan and create `HardwarePerfMonitor.scala` in Utility. See also [Pull Request in CoupledL2](https://github.com/OpenXiangShan/CoupledL2/pull/251#discussion_r1770738535). show more ...
# bb2f3f51	12-Jul-2024	Tang Haojin <[email protected]>	perf: use perfUtils in `Utility` (#3190) Currently, log and perf utilities such as `XSPerfAccumulate` are implemented in many repositories like XiangShan, CoupledL2 and HuanCun. This PR unifies th perf: use perfUtils in `Utility` (#3190) Currently, log and perf utilities such as `XSPerfAccumulate` are implemented in many repositories like XiangShan, CoupledL2 and HuanCun. This PR unifies them and put them in Utility repository. show more ...
# 8891a219	08-Oct-2023	Yinan Xu <[email protected]>	Bump rocket-chip (#2353)
# 935edac4	21-Sep-2023	Tang Haojin <[email protected]>	chore: remove deprecated brackets, APIs, etc. (#2321)
# d2b20d1a	02-Jun-2023	Tang Haojin <[email protected]>	top-down: align top-down with Gem5 (#2085) * topdown: add defines of topdown counters enum * redirect: add redirect type for perf * top-down: add stallReason IOs frontend -> ctrlBlock -> de top-down: align top-down with Gem5 (#2085) * topdown: add defines of topdown counters enum * redirect: add redirect type for perf * top-down: add stallReason IOs frontend -> ctrlBlock -> decode -> rename -> dispatch * top-down: add dummy connections * top-down: update TopdownCounters * top-down: imp backend analysis and counter dump * top-down: add HartId in `addSource` * top-down: broadcast lqIdx of ROB head * top-down: frontend signal done * top-down: add memblock topdown interface * Bump HuanCun: add TopDownMonitor * top-down: receive and handle reasons in dispatch * top-down: remove previous top-down code * TopDown: add MemReqSource enum * TopDown: extend mshr_latency range * TopDown: add basic Req Source TODO: distinguish prefetch * dcache: distinguish L1DataPrefetch and CPUData * top-down: comment out debugging perf counters in ibuffer * TopDown: add path to pass MemReqSource to HuanCun * TopDown: use simpler logic to count reqSource and update Probe count * frontend: update topdown counters * Update HuanCun Topdown for MemReqSource * top-down: fix load stalls * top-down: Change the priority of different stall reasons * top-down: breakdown OtherCoreStall * sbuffer: fix eviction * when valid count reaches StoreBufferSize, do eviction * sbuffer: fix replaceIdx * If the way selected by the replacement algorithm cannot be written into dcache, its result is not used. * dcache, ldu: fix vaddr in missqueue This commit prevents the high bits of the virtual address from being truncated * fix-ldst_pri-230506 * mainpipe: fix loadsAreComing * top-down: disable dedup * top-down: remove old top-down config * top-down: split lq addr from ls_debug * top-down: purge previous top-down code * top-down: add debug_vaddr in LoadQueueReplay * add source rob_head_other_repay * remove load_l1_cache_stall_with/wihtou_bank_conflict * dcache: split CPUData & refill latency * split CPUData to CPUStoreData & CPULoadData & CPUAtomicData * monitor refill latency for all type of req * dcache: fix perfcounter in mq * io.req.bits.cancel should be applied when counting req.fire * TopDown: add TopDown for CPL2 in XiangShan * top-down: add hartid params to L2Cache * top-down: fix dispatch queue bound * top-down: no DqStall when robFull * topdown: buspmu support latency statistic (#2106) * perf: add buspmu between L2 and L3, support name argument * bump difftest * perf: busmonitor supports latency stat * config: fix cpl2 compatible problem * bump utility * bump coupledL2 * bump huancun * misc: adapt to utility key&field * config: fix key&field source, remove deprecated argument * buspmu: remove debug print * bump coupledl2&huancun * top-down: fix sq full condition * top-down: classify "lq full" load bound * top-down: bump submodules * bump coupledL2: fix reqSource in data path * bump coupledL2 --------- Co-authored-by: tastynoob <[email protected]> Co-authored-by: Guokai Chen <[email protected]> Co-authored-by: lixin <[email protected]> Co-authored-by: XiChen <[email protected]> Co-authored-by: Zhou Yaoyang <[email protected]> Co-authored-by: Lyn <[email protected]> Co-authored-by: wakafa <[email protected]> show more ...
# b6d53cef	05-Jul-2022	William Wang <[email protected]>	mem,hpm: optimize memblock hpm timing
# 0f59c834	01-Jan-2022	William Wang <[email protected]>	mem: split L1CacheErrorInfo and L1BusErrorUnitInfo, fix ecc error (#1409) * mem: fix error csr update * dcache: l2 error will now trigger atom error * chore: fix cache error debug decoder * mem: mem: split L1CacheErrorInfo and L1BusErrorUnitInfo, fix ecc error (#1409) * mem: fix error csr update * dcache: l2 error will now trigger atom error * chore: fix cache error debug decoder * mem: split L1CacheErrorInfo and L1BusErrorUnitInfo show more ...
# 6b6d88e6	22-Dec-2021	William Wang <[email protected]>	mem: optimize missq reject to lq timing (#1375) * mem: optimize missq reject to lq timing DCache replay request is quite slow to generate, as it need to compare load address with address in all mem: optimize missq reject to lq timing (#1375) * mem: optimize missq reject to lq timing DCache replay request is quite slow to generate, as it need to compare load address with address in all valid miss queue entries. Now we delay the usage of replay request from data cache. Now replay request will not influence normal execution flow until load_s3 (1 cycle after load_s2, load result writeback to RS). Note1: It is worth mentioning that "select refilling inst for load writeback" will be disabled if dcacheRequireReplay in the last cycle. Note2: ld-ld violation or forward failure will let an normal load inst replay from fetch. If TLB hit and ld-ld violation / forward failure happens, we write back that inst immediately. Meanwhile, such insts will not be replayed from rs. * dcache: compare probe block addr instead of full addr show more ...
# 8b538b51	10-Dec-2021	William Wang <[email protected]>	dcache: fix lrsc_locked_block check (#1334)
# 1ca0e4f3	10-Dec-2021	Yinan Xu <[email protected]>	core: refactor hardware performance counters (#1335) This commit optimizes the coding style and timing for hardware performance counters. By default, performance counters are RegNext(RegNext(_)).
# 53e88463	08-Dec-2021	William Wang <[email protected]>	Fix dcache probe (#1324) * dcache: give probe the highest priority * dcache: fix block probe logic * dcache: give replace_req higher priority
# f4d8d00e	02-Dec-2021	William Wang <[email protected]>	Optimize memblock timing (#1288) * mem: delay uncache op start for 1 cycle * dcache: decouple miss and replay signal Now resp.miss will not depend on s2_nack_no_mshr * lq,mem: give released Optimize memblock timing (#1288) * mem: delay uncache op start for 1 cycle * dcache: decouple miss and replay signal Now resp.miss will not depend on s2_nack_no_mshr * lq,mem: give released flag update 1 more cycle * chore: fix a name typo * dcache: delay probe req for 1 cycle show more ...
# 300ded30	04-Nov-2021	William Wang <[email protected]>	Optimize dcache timing (#1195) * dcache: do not check readline rmask This should opt bank_conflict check timing * dcache: block replace if store s1 valid It takes quite long to generate way Optimize dcache timing (#1195) * dcache: do not check readline rmask This should opt bank_conflict check timing * dcache: block replace if store s1 valid It takes quite long to generate way_en in mainpipe s1. As a result, use s1 way_en to judge if replace should be blocked will cause severe timing problem Now we simply block replace if mainpipe.s1.valid Refill timing to be optmized later * sbuffer: delay sbuffer enqueue for 1 cycle With store queue growing larger, read data from datamodule nearly costs a whole cycle. Hence we delay sbuffer enqueue for 1 cycle for better timing. * dcache: reduce probe queue size * dcache: replace probe pipe req RRArbiter with Arbiter * dcache: reduce writeback queue size for timing opt * dcache: delay wbqueue enqueue req for 1 cycle Addr enqueue req will compare its addr with addrs in all writeback entries to check if it should be blocked. Delay enqueue req will give that process more time. * dcache: set default replacer to setplru It does not change current design * dcache: fix wbqueue req_delayed deadlock We delayed writeback queue enq for 1 cycle, missQ req does not depend on wbQ enqueue. As a result, missQ req may be blocked in req_delayed. When grant comes, that req should also be updated * dcache: remove outdated require * dcache: replace missReqArb RRArbiter with Arbiter * perf: add detailed histogram for low dcache latency * dcache: fix wbqueue entry alloc logic * dcache: opt probe req timing In current design, resv_set is maintained in dcache. All probe req will be blocked if that addr is in resv_set. However, checking if that addr is in resv_set costs almost half a cycle, which causes severe timing problem. Now when we update update_resv_set, all probe reqs will be blocked in the next cycle. It should give Probe reservation set addr compare an independent cycle, which will lead to better timing show more ...
# 2f30d658	30-Oct-2021	Yinan Xu <[email protected]>	top: change physical address width to 36 (#1188)
# cd365d4c	23-Oct-2021	rvcoresjw <[email protected]>	add performance counters at core and hauncun (#1156) * Add perf counters * add reg from hpm counter source * add print perfcounter enable
# ad3ba452	20-Oct-2021	zhanglinjuan <[email protected]>	New DCache (#1111) * L1D: provide independent meta array for load pipe * misc: reorg files in cache dir * chore: reorg l1d related files * bump difftest: use clang to compile verialted file New DCache (#1111) * L1D: provide independent meta array for load pipe * misc: reorg files in cache dir * chore: reorg l1d related files * bump difftest: use clang to compile verialted files * dcache: add BankedDataArray * dcache: fix data read way_en * dcache: fix banked data wmask * dcache: replay conflict correctly When conflict is detected: * Report replay * Disable fast wakeup * dcache: fix bank addr match logic * dcache: add bank conflict perf counter * dcache: fix miss perf counters * chore: make lsq data print perttier * dcache: enable banked ecc array * dcache: set dcache size to 128KB * dcache: read mainpipe data from banked data array * dcache: add independent mainpipe data read port * dcache: revert size change * Size will be changed after main pipe refactor * Merge remote-tracking branch 'origin/master' into l1-size * dcache: reduce banked data load conflict * MainPipe: ReleaseData for all replacement even if it's clean * dcache: set dcache size to 128KB BREAKING CHANGE: l2 needed to provide right vaddr index to probe l1, and it has to help l1 to avoid addr alias problem * chore: fix merge conflict * Change L2 to non-inclusive / Add alias bits in L1D * debug: hard coded dup data array for debuging * dcache: fix ptag width * dcache: fix amo main pipe req * dcache: when probe, use vaddr for main pipe req * dcache: include vaddr in atomic unit req * dcache: fix get_tag() function * dcache: fix writeback paddr * huancun: bump version * dcache: erase block offset bits in release addr * dcache: do not require probe vaddr != 0 * dcache: opt banked data read timing * bump huancun * dcache: fix atom unit pipe req vaddr * dcache: simplify main pipe writeback_vaddr * bump huancun * dcache: remove debug data array * Turn on all usr bits in L1 * Bump huancun * Bump huancun * enable L2 prefetcher * bump huancun * set non-inclusive L2/L3 + 128KB L1 as default config * Use data in TLBundleB to hint ProbeAck beeds data * mmu.l2tlb: mem_resp now fills multi mq pte buffer mq entries can just deq without accessing l2tlb cache * dcache: handle dirty userbit * bump huancun * chore: l1 cache code clean up * Remove l1plus cache * Remove HasBankedDataArrayParameters * Add bus pmu between L3 and Mem * bump huncun * IFU: add performance counters and mmio af * icache replacement policy moniter * ifu miss situation moniter * icache miss rate * raise access fault when found mmio req * Add framework for seperated main pipe and reg meta array * Rewrite miss queue for seperated pipes * Add RefillPipe * chore: rename NewSbuffer.scala * cache: add CacheInstruction opcode and reg list * CSR: add cache control registers * Add Replace Pipe * CacheInstruction: add CSRs for cache instruction * mem: remove store replay unit * Perf counter to be added * Timing opt to be done * mem: update sbuffer to support new dcache * sbuffer: fix missqueue time out logic * Merge remote-tracking branch 'origin/master' into dcache-rm-sru * chore: fix merge conflict, remove nStoreReplayEntries * Temporarily disable TLMonitor * Bump huancun (L2/L3 MSHR bug fix) * Rewrite main pipe * ReplacePipe: read meta to decide whether data should be read * RefillPipe: add a store resp port * MissQueue: new req should be rejected according to set+way * Add replacement policy interface * sbuffer: give missq replay the highest priority Now we give missqReplayHasTimeOut the highest priority, as eviction has already happened Besides, it will fix the problem that fix dcache eviction generate logic gives the wrong sbuffer id * Finish DCache framework * Split meta & tag and use regs to build meta array * sbuffer: use new dcache io * dcache: update dcache resp in memblock and fake d$ * Add atomics processing flow * Refactor Top * Bump huancun * DCacheWrapper: disable ld fast wakeup only when bank conflict * sbuffer: update dcache_resp difftest io * MainPipe: fix combinational loop * Sbuffer: fix bug in assert * RefillPipe: fix bug of getting tag from addr * dcache: ~0.U should restrict bit-width * LoadPipe: fix bug in assert * ReplacePipe: addr to be replaced should be block-aligned * MainPipe: fix bug in required coh sending to miss queue * DCacheWrapper: tag write in refill pipe should always be ready * MainPipe: use replacement way_en when the req is from miss queue * MissQueue: refill data should be passed on to main pipe * MainPipe: do not use replacement way when tag match * CSR: clean up cache op regs * chore: remove outdated comments * ReplacePipe: fix stupid bug * dcache: replace checkOneHot with assert * alu: fix bug of rev8 & orc.b instruction * MissQueue: fix bug in the condition of mshr accepting a req * MissQueue: add perf counters * chore: delete out-dated code * chore: add license * WritebackQueue: distinguish id from miss queue * AsynchronousMetaArray: fix bug * Sbuffer: fix difftest io * DCacheWrapper: duplicate one more tag copy for main pipe * Add perf cnt to verify whether replacing is too early * dcache: Release needs to wait for refill pipe * WritebackQueue: fix accept condition * MissQueue: remove unnecessary assert * difftest: let refill check ingore illegal mem access * Parameters: enlarge WritebackQueue to break dead-lock * DCacheWrapper: store hit wirte should not be interrupted by refill * Config: set nReleaseEntries to twice of nMissEntries * DCacheWrapper: main pipe read should block refill pipe by set Co-authored-by: William Wang <[email protected]> Co-authored-by: LinJiawei <[email protected]> Co-authored-by: TangDan <[email protected]> Co-authored-by: LinJiawei <[email protected]> Co-authored-by: ZhangZifei <[email protected]> Co-authored-by: wangkaifan <[email protected]> Co-authored-by: JinYue <[email protected]> Co-authored-by: Zhangfw <[email protected]> show more ...
# 9aca92b9	28-Sep-2021	Yinan Xu <[email protected]>	misc: code clean up (#1073) * rename Roq to Rob * remove trailing whitespaces * remove unused parameters
# 1f0e2dc7	27-Sep-2021	Jiawei Lin <[email protected]>	128KB L1D + non-inclusive L2/L3 (#1051) * L1D: provide independent meta array for load pipe * misc: reorg files in cache dir * chore: reorg l1d related files * bump difftest: use clang to c 128KB L1D + non-inclusive L2/L3 (#1051) * L1D: provide independent meta array for load pipe * misc: reorg files in cache dir * chore: reorg l1d related files * bump difftest: use clang to compile verialted files * dcache: add BankedDataArray * dcache: fix data read way_en * dcache: fix banked data wmask * dcache: replay conflict correctly When conflict is detected: * Report replay * Disable fast wakeup * dcache: fix bank addr match logic * dcache: add bank conflict perf counter * dcache: fix miss perf counters * chore: make lsq data print perttier * dcache: enable banked ecc array * dcache: set dcache size to 128KB * dcache: read mainpipe data from banked data array * dcache: add independent mainpipe data read port * dcache: revert size change * Size will be changed after main pipe refactor * Merge remote-tracking branch 'origin/master' into l1-size * dcache: reduce banked data load conflict * MainPipe: ReleaseData for all replacement even if it's clean * dcache: set dcache size to 128KB BREAKING CHANGE: l2 needed to provide right vaddr index to probe l1, and it has to help l1 to avoid addr alias problem * chore: fix merge conflict * Change L2 to non-inclusive / Add alias bits in L1D * debug: hard coded dup data array for debuging * dcache: fix ptag width * dcache: fix amo main pipe req * dcache: when probe, use vaddr for main pipe req * dcache: include vaddr in atomic unit req * dcache: fix get_tag() function * dcache: fix writeback paddr * huancun: bump version * dcache: erase block offset bits in release addr * dcache: do not require probe vaddr != 0 * dcache: opt banked data read timing * bump huancun * dcache: fix atom unit pipe req vaddr * dcache: simplify main pipe writeback_vaddr * bump huancun * dcache: remove debug data array * Turn on all usr bits in L1 * Bump huancun * Bump huancun * enable L2 prefetcher * bump huancun * set non-inclusive L2/L3 + 128KB L1 as default config * Use data in TLBundleB to hint ProbeAck beeds data * mmu.l2tlb: mem_resp now fills multi mq pte buffer mq entries can just deq without accessing l2tlb cache * dcache: handle dirty userbit * bump huancun * chore: l1 cache code clean up * Remove l1plus cache * Remove HasBankedDataArrayParameters * Add bus pmu between L3 and Mem * bump huncun * dcache: fix l1 probe index generate logic * Now right probe index will be used according to the len of alias bits * dcache: clean up amo pipeline * DCacheParameter rowBits will be removed in the future, now we set it to 128 to make dcache work * dcache: fix amo word index * bump huancun Co-authored-by: William Wang <[email protected]> Co-authored-by: zhanglinjuan <[email protected]> Co-authored-by: TangDan <[email protected]> Co-authored-by: ZhangZifei <[email protected]> Co-authored-by: wangkaifan <[email protected]> show more ...