#
8b33cd30 |
| 13-Dec-2024 |
klin02 <[email protected]> |
feat(XSLog): move all XSLog outside WhenContext for collection
As data in WhenContext is not acessible in another module. To support XSLog collection, we move all XSLog and related signal outside Wh
feat(XSLog): move all XSLog outside WhenContext for collection
As data in WhenContext is not acessible in another module. To support XSLog collection, we move all XSLog and related signal outside WhenContext. For example, when(cond1){XSDebug(cond2, pable)} to XSDebug(cond1 && cond2, pable)
show more ...
|
#
44f2941b |
| 24-Sep-2024 |
Jiru Sun <[email protected]> |
refactor(HPM): move HPMs from utils to utility repo (#3631)
Because HPMs will be used in Coupled L2 as well, delete
`PerfCounterUtils.scala` in Xiangshan and create
`HardwarePerfMonitor.scala` in
refactor(HPM): move HPMs from utils to utility repo (#3631)
Because HPMs will be used in Coupled L2 as well, delete
`PerfCounterUtils.scala` in Xiangshan and create
`HardwarePerfMonitor.scala` in Utility.
See also [Pull Request in
CoupledL2](https://github.com/OpenXiangShan/CoupledL2/pull/251#discussion_r1770738535).
show more ...
|
#
bb2f3f51 |
| 12-Jul-2024 |
Tang Haojin <[email protected]> |
perf: use perfUtils in `Utility` (#3190)
Currently, log and perf utilities such as `XSPerfAccumulate` are
implemented in many repositories like XiangShan, CoupledL2 and HuanCun.
This PR unifies th
perf: use perfUtils in `Utility` (#3190)
Currently, log and perf utilities such as `XSPerfAccumulate` are
implemented in many repositories like XiangShan, CoupledL2 and HuanCun.
This PR unifies them and put them in Utility repository.
show more ...
|
#
8891a219 |
| 08-Oct-2023 |
Yinan Xu <[email protected]> |
Bump rocket-chip (#2353)
|
#
935edac4 |
| 21-Sep-2023 |
Tang Haojin <[email protected]> |
chore: remove deprecated brackets, APIs, etc. (#2321)
|
#
d2b20d1a |
| 02-Jun-2023 |
Tang Haojin <[email protected]> |
top-down: align top-down with Gem5 (#2085)
* topdown: add defines of topdown counters enum
* redirect: add redirect type for perf
* top-down: add stallReason IOs
frontend -> ctrlBlock -> de
top-down: align top-down with Gem5 (#2085)
* topdown: add defines of topdown counters enum
* redirect: add redirect type for perf
* top-down: add stallReason IOs
frontend -> ctrlBlock -> decode -> rename -> dispatch
* top-down: add dummy connections
* top-down: update TopdownCounters
* top-down: imp backend analysis and counter dump
* top-down: add HartId in `addSource`
* top-down: broadcast lqIdx of ROB head
* top-down: frontend signal done
* top-down: add memblock topdown interface
* Bump HuanCun: add TopDownMonitor
* top-down: receive and handle reasons in dispatch
* top-down: remove previous top-down code
* TopDown: add MemReqSource enum
* TopDown: extend mshr_latency range
* TopDown: add basic Req Source
TODO: distinguish prefetch
* dcache: distinguish L1DataPrefetch and CPUData
* top-down: comment out debugging perf counters in ibuffer
* TopDown: add path to pass MemReqSource to HuanCun
* TopDown: use simpler logic to count reqSource and update Probe count
* frontend: update topdown counters
* Update HuanCun Topdown for MemReqSource
* top-down: fix load stalls
* top-down: Change the priority of different stall reasons
* top-down: breakdown OtherCoreStall
* sbuffer: fix eviction
* when valid count reaches StoreBufferSize, do eviction
* sbuffer: fix replaceIdx
* If the way selected by the replacement algorithm cannot be written into dcache, its result is not used.
* dcache, ldu: fix vaddr in missqueue
This commit prevents the high bits of the virtual address from being truncated
* fix-ldst_pri-230506
* mainpipe: fix loadsAreComing
* top-down: disable dedup
* top-down: remove old top-down config
* top-down: split lq addr from ls_debug
* top-down: purge previous top-down code
* top-down: add debug_vaddr in LoadQueueReplay
* add source rob_head_other_repay
* remove load_l1_cache_stall_with/wihtou_bank_conflict
* dcache: split CPUData & refill latency
* split CPUData to CPUStoreData & CPULoadData & CPUAtomicData
* monitor refill latency for all type of req
* dcache: fix perfcounter in mq
* io.req.bits.cancel should be applied when counting req.fire
* TopDown: add TopDown for CPL2 in XiangShan
* top-down: add hartid params to L2Cache
* top-down: fix dispatch queue bound
* top-down: no DqStall when robFull
* topdown: buspmu support latency statistic (#2106)
* perf: add buspmu between L2 and L3, support name argument
* bump difftest
* perf: busmonitor supports latency stat
* config: fix cpl2 compatible problem
* bump utility
* bump coupledL2
* bump huancun
* misc: adapt to utility key&field
* config: fix key&field source, remove deprecated argument
* buspmu: remove debug print
* bump coupledl2&huancun
* top-down: fix sq full condition
* top-down: classify "lq full" load bound
* top-down: bump submodules
* bump coupledL2: fix reqSource in data path
* bump coupledL2
---------
Co-authored-by: tastynoob <[email protected]>
Co-authored-by: Guokai Chen <[email protected]>
Co-authored-by: lixin <[email protected]>
Co-authored-by: XiChen <[email protected]>
Co-authored-by: Zhou Yaoyang <[email protected]>
Co-authored-by: Lyn <[email protected]>
Co-authored-by: wakafa <[email protected]>
show more ...
|
#
b6d53cef |
| 05-Jul-2022 |
William Wang <[email protected]> |
mem,hpm: optimize memblock hpm timing
|
#
0f59c834 |
| 01-Jan-2022 |
William Wang <[email protected]> |
mem: split L1CacheErrorInfo and L1BusErrorUnitInfo, fix ecc error (#1409)
* mem: fix error csr update
* dcache: l2 error will now trigger atom error
* chore: fix cache error debug decoder
* mem:
mem: split L1CacheErrorInfo and L1BusErrorUnitInfo, fix ecc error (#1409)
* mem: fix error csr update
* dcache: l2 error will now trigger atom error
* chore: fix cache error debug decoder
* mem: split L1CacheErrorInfo and L1BusErrorUnitInfo
show more ...
|
#
6b6d88e6 |
| 22-Dec-2021 |
William Wang <[email protected]> |
mem: optimize missq reject to lq timing (#1375)
* mem: optimize missq reject to lq timing
DCache replay request is quite slow to generate, as it need to compare
load address with address in all
mem: optimize missq reject to lq timing (#1375)
* mem: optimize missq reject to lq timing
DCache replay request is quite slow to generate, as it need to compare
load address with address in all valid miss queue entries.
Now we delay the usage of replay request from data cache.
Now replay request will not influence normal execution flow until
load_s3 (1 cycle after load_s2, load result writeback to RS).
Note1: It is worth mentioning that "select refilling inst for load
writeback" will be disabled if dcacheRequireReplay in the
last cycle.
Note2: ld-ld violation or forward failure will let an normal load inst replay
from fetch. If TLB hit and ld-ld violation / forward failure happens,
we write back that inst immediately. Meanwhile, such insts will not be
replayed from rs.
* dcache: compare probe block addr instead of full addr
show more ...
|
#
8b538b51 |
| 10-Dec-2021 |
William Wang <[email protected]> |
dcache: fix lrsc_locked_block check (#1334)
|
#
1ca0e4f3 |
| 10-Dec-2021 |
Yinan Xu <[email protected]> |
core: refactor hardware performance counters (#1335)
This commit optimizes the coding style and timing for hardware
performance counters.
By default, performance counters are RegNext(RegNext(_)).
|
#
53e88463 |
| 08-Dec-2021 |
William Wang <[email protected]> |
Fix dcache probe (#1324)
* dcache: give probe the highest priority
* dcache: fix block probe logic
* dcache: give replace_req higher priority
|
#
f4d8d00e |
| 02-Dec-2021 |
William Wang <[email protected]> |
Optimize memblock timing (#1288)
* mem: delay uncache op start for 1 cycle
* dcache: decouple miss and replay signal
Now resp.miss will not depend on s2_nack_no_mshr
* lq,mem: give released
Optimize memblock timing (#1288)
* mem: delay uncache op start for 1 cycle
* dcache: decouple miss and replay signal
Now resp.miss will not depend on s2_nack_no_mshr
* lq,mem: give released flag update 1 more cycle
* chore: fix a name typo
* dcache: delay probe req for 1 cycle
show more ...
|
#
300ded30 |
| 04-Nov-2021 |
William Wang <[email protected]> |
Optimize dcache timing (#1195)
* dcache: do not check readline rmask
This should opt bank_conflict check timing
* dcache: block replace if store s1 valid
It takes quite long to generate way
Optimize dcache timing (#1195)
* dcache: do not check readline rmask
This should opt bank_conflict check timing
* dcache: block replace if store s1 valid
It takes quite long to generate way_en in mainpipe s1. As a result,
use s1 way_en to judge if replace should be blocked will cause severe
timing problem
Now we simply block replace if mainpipe.s1.valid
Refill timing to be optmized later
* sbuffer: delay sbuffer enqueue for 1 cycle
With store queue growing larger, read data from datamodule nearly
costs a whole cycle. Hence we delay sbuffer enqueue for 1 cycle
for better timing.
* dcache: reduce probe queue size
* dcache: replace probe pipe req RRArbiter with Arbiter
* dcache: reduce writeback queue size for timing opt
* dcache: delay wbqueue enqueue req for 1 cycle
Addr enqueue req will compare its addr with addrs in all writeback
entries to check if it should be blocked. Delay enqueue req will
give that process more time.
* dcache: set default replacer to setplru
It does not change current design
* dcache: fix wbqueue req_delayed deadlock
We delayed writeback queue enq for 1 cycle, missQ req does not
depend on wbQ enqueue. As a result, missQ req may be blocked
in req_delayed. When grant comes, that req should also be updated
* dcache: remove outdated require
* dcache: replace missReqArb RRArbiter with Arbiter
* perf: add detailed histogram for low dcache latency
* dcache: fix wbqueue entry alloc logic
* dcache: opt probe req timing
In current design, resv_set is maintained in dcache. All probe req
will be blocked if that addr is in resv_set.
However, checking if that addr is in resv_set costs almost half a cycle,
which causes severe timing problem.
Now when we update update_resv_set, all probe reqs will be blocked
in the next cycle. It should give Probe reservation set addr compare an
independent cycle, which will lead to better timing
show more ...
|
#
2f30d658 |
| 30-Oct-2021 |
Yinan Xu <[email protected]> |
top: change physical address width to 36 (#1188)
|
#
cd365d4c |
| 23-Oct-2021 |
rvcoresjw <[email protected]> |
add performance counters at core and hauncun (#1156)
* Add perf counters
* add reg from hpm counter source
* add print perfcounter enable
|
#
ad3ba452 |
| 20-Oct-2021 |
zhanglinjuan <[email protected]> |
New DCache (#1111)
* L1D: provide independent meta array for load pipe
* misc: reorg files in cache dir
* chore: reorg l1d related files
* bump difftest: use clang to compile verialted file
New DCache (#1111)
* L1D: provide independent meta array for load pipe
* misc: reorg files in cache dir
* chore: reorg l1d related files
* bump difftest: use clang to compile verialted files
* dcache: add BankedDataArray
* dcache: fix data read way_en
* dcache: fix banked data wmask
* dcache: replay conflict correctly
When conflict is detected:
* Report replay
* Disable fast wakeup
* dcache: fix bank addr match logic
* dcache: add bank conflict perf counter
* dcache: fix miss perf counters
* chore: make lsq data print perttier
* dcache: enable banked ecc array
* dcache: set dcache size to 128KB
* dcache: read mainpipe data from banked data array
* dcache: add independent mainpipe data read port
* dcache: revert size change
* Size will be changed after main pipe refactor
* Merge remote-tracking branch 'origin/master' into l1-size
* dcache: reduce banked data load conflict
* MainPipe: ReleaseData for all replacement even if it's clean
* dcache: set dcache size to 128KB
BREAKING CHANGE: l2 needed to provide right vaddr index to probe l1,
and it has to help l1 to avoid addr alias problem
* chore: fix merge conflict
* Change L2 to non-inclusive / Add alias bits in L1D
* debug: hard coded dup data array for debuging
* dcache: fix ptag width
* dcache: fix amo main pipe req
* dcache: when probe, use vaddr for main pipe req
* dcache: include vaddr in atomic unit req
* dcache: fix get_tag() function
* dcache: fix writeback paddr
* huancun: bump version
* dcache: erase block offset bits in release addr
* dcache: do not require probe vaddr != 0
* dcache: opt banked data read timing
* bump huancun
* dcache: fix atom unit pipe req vaddr
* dcache: simplify main pipe writeback_vaddr
* bump huancun
* dcache: remove debug data array
* Turn on all usr bits in L1
* Bump huancun
* Bump huancun
* enable L2 prefetcher
* bump huancun
* set non-inclusive L2/L3 + 128KB L1 as default config
* Use data in TLBundleB to hint ProbeAck beeds data
* mmu.l2tlb: mem_resp now fills multi mq pte buffer
mq entries can just deq without accessing l2tlb cache
* dcache: handle dirty userbit
* bump huancun
* chore: l1 cache code clean up
* Remove l1plus cache
* Remove HasBankedDataArrayParameters
* Add bus pmu between L3 and Mem
* bump huncun
* IFU: add performance counters and mmio af
* icache replacement policy moniter
* ifu miss situation moniter
* icache miss rate
* raise access fault when found mmio req
* Add framework for seperated main pipe and reg meta array
* Rewrite miss queue for seperated pipes
* Add RefillPipe
* chore: rename NewSbuffer.scala
* cache: add CacheInstruction opcode and reg list
* CSR: add cache control registers
* Add Replace Pipe
* CacheInstruction: add CSRs for cache instruction
* mem: remove store replay unit
* Perf counter to be added
* Timing opt to be done
* mem: update sbuffer to support new dcache
* sbuffer: fix missqueue time out logic
* Merge remote-tracking branch 'origin/master' into dcache-rm-sru
* chore: fix merge conflict, remove nStoreReplayEntries
* Temporarily disable TLMonitor
* Bump huancun (L2/L3 MSHR bug fix)
* Rewrite main pipe
* ReplacePipe: read meta to decide whether data should be read
* RefillPipe: add a store resp port
* MissQueue: new req should be rejected according to set+way
* Add replacement policy interface
* sbuffer: give missq replay the highest priority
Now we give missqReplayHasTimeOut the highest priority, as eviction
has already happened
Besides, it will fix the problem that fix dcache eviction generate logic
gives the wrong sbuffer id
* Finish DCache framework
* Split meta & tag and use regs to build meta array
* sbuffer: use new dcache io
* dcache: update dcache resp in memblock and fake d$
* Add atomics processing flow
* Refactor Top
* Bump huancun
* DCacheWrapper: disable ld fast wakeup only when bank conflict
* sbuffer: update dcache_resp difftest io
* MainPipe: fix combinational loop
* Sbuffer: fix bug in assert
* RefillPipe: fix bug of getting tag from addr
* dcache: ~0.U should restrict bit-width
* LoadPipe: fix bug in assert
* ReplacePipe: addr to be replaced should be block-aligned
* MainPipe: fix bug in required coh sending to miss queue
* DCacheWrapper: tag write in refill pipe should always be ready
* MainPipe: use replacement way_en when the req is from miss queue
* MissQueue: refill data should be passed on to main pipe
* MainPipe: do not use replacement way when tag match
* CSR: clean up cache op regs
* chore: remove outdated comments
* ReplacePipe: fix stupid bug
* dcache: replace checkOneHot with assert
* alu: fix bug of rev8 & orc.b instruction
* MissQueue: fix bug in the condition of mshr accepting a req
* MissQueue: add perf counters
* chore: delete out-dated code
* chore: add license
* WritebackQueue: distinguish id from miss queue
* AsynchronousMetaArray: fix bug
* Sbuffer: fix difftest io
* DCacheWrapper: duplicate one more tag copy for main pipe
* Add perf cnt to verify whether replacing is too early
* dcache: Release needs to wait for refill pipe
* WritebackQueue: fix accept condition
* MissQueue: remove unnecessary assert
* difftest: let refill check ingore illegal mem access
* Parameters: enlarge WritebackQueue to break dead-lock
* DCacheWrapper: store hit wirte should not be interrupted by refill
* Config: set nReleaseEntries to twice of nMissEntries
* DCacheWrapper: main pipe read should block refill pipe by set
Co-authored-by: William Wang <[email protected]>
Co-authored-by: LinJiawei <[email protected]>
Co-authored-by: TangDan <[email protected]>
Co-authored-by: LinJiawei <[email protected]>
Co-authored-by: ZhangZifei <[email protected]>
Co-authored-by: wangkaifan <[email protected]>
Co-authored-by: JinYue <[email protected]>
Co-authored-by: Zhangfw <[email protected]>
show more ...
|
#
9aca92b9 |
| 28-Sep-2021 |
Yinan Xu <[email protected]> |
misc: code clean up (#1073)
* rename Roq to Rob
* remove trailing whitespaces
* remove unused parameters
|
#
1f0e2dc7 |
| 27-Sep-2021 |
Jiawei Lin <[email protected]> |
128KB L1D + non-inclusive L2/L3 (#1051)
* L1D: provide independent meta array for load pipe
* misc: reorg files in cache dir
* chore: reorg l1d related files
* bump difftest: use clang to c
128KB L1D + non-inclusive L2/L3 (#1051)
* L1D: provide independent meta array for load pipe
* misc: reorg files in cache dir
* chore: reorg l1d related files
* bump difftest: use clang to compile verialted files
* dcache: add BankedDataArray
* dcache: fix data read way_en
* dcache: fix banked data wmask
* dcache: replay conflict correctly
When conflict is detected:
* Report replay
* Disable fast wakeup
* dcache: fix bank addr match logic
* dcache: add bank conflict perf counter
* dcache: fix miss perf counters
* chore: make lsq data print perttier
* dcache: enable banked ecc array
* dcache: set dcache size to 128KB
* dcache: read mainpipe data from banked data array
* dcache: add independent mainpipe data read port
* dcache: revert size change
* Size will be changed after main pipe refactor
* Merge remote-tracking branch 'origin/master' into l1-size
* dcache: reduce banked data load conflict
* MainPipe: ReleaseData for all replacement even if it's clean
* dcache: set dcache size to 128KB
BREAKING CHANGE: l2 needed to provide right vaddr index to probe l1,
and it has to help l1 to avoid addr alias problem
* chore: fix merge conflict
* Change L2 to non-inclusive / Add alias bits in L1D
* debug: hard coded dup data array for debuging
* dcache: fix ptag width
* dcache: fix amo main pipe req
* dcache: when probe, use vaddr for main pipe req
* dcache: include vaddr in atomic unit req
* dcache: fix get_tag() function
* dcache: fix writeback paddr
* huancun: bump version
* dcache: erase block offset bits in release addr
* dcache: do not require probe vaddr != 0
* dcache: opt banked data read timing
* bump huancun
* dcache: fix atom unit pipe req vaddr
* dcache: simplify main pipe writeback_vaddr
* bump huancun
* dcache: remove debug data array
* Turn on all usr bits in L1
* Bump huancun
* Bump huancun
* enable L2 prefetcher
* bump huancun
* set non-inclusive L2/L3 + 128KB L1 as default config
* Use data in TLBundleB to hint ProbeAck beeds data
* mmu.l2tlb: mem_resp now fills multi mq pte buffer
mq entries can just deq without accessing l2tlb cache
* dcache: handle dirty userbit
* bump huancun
* chore: l1 cache code clean up
* Remove l1plus cache
* Remove HasBankedDataArrayParameters
* Add bus pmu between L3 and Mem
* bump huncun
* dcache: fix l1 probe index generate logic
* Now right probe index will be used according to the len of alias bits
* dcache: clean up amo pipeline
* DCacheParameter rowBits will be removed in the future, now we set it to 128
to make dcache work
* dcache: fix amo word index
* bump huancun
Co-authored-by: William Wang <[email protected]>
Co-authored-by: zhanglinjuan <[email protected]>
Co-authored-by: TangDan <[email protected]>
Co-authored-by: ZhangZifei <[email protected]>
Co-authored-by: wangkaifan <[email protected]>
show more ...
|