#
c61abc0c |
| 06-Aug-2023 |
Xuan Hu <[email protected]> |
merge master into new-backend
Todo: fix error
|
#
04665835 |
| 28-Jul-2023 |
Maxpicca-Li <[email protected]> |
DCacheWPU: update the latest version (#2095)
Co-authored-by: bugGenerator <[email protected]>
Co-authored-by: William Wang <[email protected]>
Co-authored-by: Haoyuan Feng <fenghaoyuan19@mails
DCacheWPU: update the latest version (#2095)
Co-authored-by: bugGenerator <[email protected]>
Co-authored-by: William Wang <[email protected]>
Co-authored-by: Haoyuan Feng <[email protected]>
show more ...
|
#
cdbff57c |
| 24-Jul-2023 |
Haoyuan Feng <[email protected]> |
Memblock: Add load/store 128 bits datapath (#2180)
* Memblock: Add load/store 128 bits datapath
---------
Co-authored-by: lulu0521 <[email protected]>
* Memblock: fix bug of raw addr ma
Memblock: Add load/store 128 bits datapath (#2180)
* Memblock: Add load/store 128 bits datapath
---------
Co-authored-by: lulu0521 <[email protected]>
* Memblock: fix bug of raw addr match
* Memblock, LoadUnit: Fix Vector RAW paddr match
---------
Co-authored-by: lulu0521 <[email protected]>
show more ...
|
#
14a67055 |
| 12-Jul-2023 |
sfencevma <[email protected]> |
ldu, stu: Refactoring the code for ldu/stu (#2171)
* add new ldu and stu
* add fast replay kill at s1
* fix pointer chasing cancel
* pick flushpipe_rvc
* merge flushpipe_rvc
* fix s3_
ldu, stu: Refactoring the code for ldu/stu (#2171)
* add new ldu and stu
* add fast replay kill at s1
* fix pointer chasing cancel
* pick flushpipe_rvc
* merge flushpipe_rvc
* fix s3_cache_rep and s3_feedbacked
* fix fast replay condition
---------
Co-authored-by: Lyn <[email protected]>
show more ...
|
#
d2b20d1a |
| 02-Jun-2023 |
Tang Haojin <[email protected]> |
top-down: align top-down with Gem5 (#2085)
* topdown: add defines of topdown counters enum
* redirect: add redirect type for perf
* top-down: add stallReason IOs
frontend -> ctrlBlock -> de
top-down: align top-down with Gem5 (#2085)
* topdown: add defines of topdown counters enum
* redirect: add redirect type for perf
* top-down: add stallReason IOs
frontend -> ctrlBlock -> decode -> rename -> dispatch
* top-down: add dummy connections
* top-down: update TopdownCounters
* top-down: imp backend analysis and counter dump
* top-down: add HartId in `addSource`
* top-down: broadcast lqIdx of ROB head
* top-down: frontend signal done
* top-down: add memblock topdown interface
* Bump HuanCun: add TopDownMonitor
* top-down: receive and handle reasons in dispatch
* top-down: remove previous top-down code
* TopDown: add MemReqSource enum
* TopDown: extend mshr_latency range
* TopDown: add basic Req Source
TODO: distinguish prefetch
* dcache: distinguish L1DataPrefetch and CPUData
* top-down: comment out debugging perf counters in ibuffer
* TopDown: add path to pass MemReqSource to HuanCun
* TopDown: use simpler logic to count reqSource and update Probe count
* frontend: update topdown counters
* Update HuanCun Topdown for MemReqSource
* top-down: fix load stalls
* top-down: Change the priority of different stall reasons
* top-down: breakdown OtherCoreStall
* sbuffer: fix eviction
* when valid count reaches StoreBufferSize, do eviction
* sbuffer: fix replaceIdx
* If the way selected by the replacement algorithm cannot be written into dcache, its result is not used.
* dcache, ldu: fix vaddr in missqueue
This commit prevents the high bits of the virtual address from being truncated
* fix-ldst_pri-230506
* mainpipe: fix loadsAreComing
* top-down: disable dedup
* top-down: remove old top-down config
* top-down: split lq addr from ls_debug
* top-down: purge previous top-down code
* top-down: add debug_vaddr in LoadQueueReplay
* add source rob_head_other_repay
* remove load_l1_cache_stall_with/wihtou_bank_conflict
* dcache: split CPUData & refill latency
* split CPUData to CPUStoreData & CPULoadData & CPUAtomicData
* monitor refill latency for all type of req
* dcache: fix perfcounter in mq
* io.req.bits.cancel should be applied when counting req.fire
* TopDown: add TopDown for CPL2 in XiangShan
* top-down: add hartid params to L2Cache
* top-down: fix dispatch queue bound
* top-down: no DqStall when robFull
* topdown: buspmu support latency statistic (#2106)
* perf: add buspmu between L2 and L3, support name argument
* bump difftest
* perf: busmonitor supports latency stat
* config: fix cpl2 compatible problem
* bump utility
* bump coupledL2
* bump huancun
* misc: adapt to utility key&field
* config: fix key&field source, remove deprecated argument
* buspmu: remove debug print
* bump coupledl2&huancun
* top-down: fix sq full condition
* top-down: classify "lq full" load bound
* top-down: bump submodules
* bump coupledL2: fix reqSource in data path
* bump coupledL2
---------
Co-authored-by: tastynoob <[email protected]>
Co-authored-by: Guokai Chen <[email protected]>
Co-authored-by: lixin <[email protected]>
Co-authored-by: XiChen <[email protected]>
Co-authored-by: Zhou Yaoyang <[email protected]>
Co-authored-by: Lyn <[email protected]>
Co-authored-by: wakafa <[email protected]>
show more ...
|
#
b9e121df |
| 02-Jun-2023 |
happy-lx <[email protected]> |
hint: add CustomHint interface (#2111)
* hint: add CustomHint interface
* dcache: fix replacement & mshrId update
* access replacement only once per load
* update mshrId in replayqueue only w
hint: add CustomHint interface (#2111)
* hint: add CustomHint interface
* dcache: fix replacement & mshrId update
* access replacement only once per load
* update mshrId in replayqueue only when this load enters mshr
* replay: block cache miss load
* block cache miss load until hint or dcache refill appears
* buffer: fix hint buffer depth to 1
* ldu: add dcache miss l2hint fast replay path
* bump coupledL2
* bump utility
---------
Co-authored-by: Lyn <[email protected]>
Co-authored-by: wangkaifan <[email protected]>
show more ...
|
#
68d13085 |
| 25-May-2023 |
Xuan Hu <[email protected]> |
Merge remote-tracking branch 'upstream/master' into tmp-new-backend-merge-vlsu
# Conflicts: # .gitmodules # build.sc # src/main/scala/top/Configs.scala # src/main/scala/xiangshan/Bundle.scala # src/
Merge remote-tracking branch 'upstream/master' into tmp-new-backend-merge-vlsu
# Conflicts: # .gitmodules # build.sc # src/main/scala/top/Configs.scala # src/main/scala/xiangshan/Bundle.scala # src/main/scala/xiangshan/Parameters.scala # src/main/scala/xiangshan/XSCore.scala # src/main/scala/xiangshan/backend/CtrlBlock.scala # src/main/scala/xiangshan/backend/MemBlock.scala # src/main/scala/xiangshan/backend/Scheduler.scala # src/main/scala/xiangshan/backend/issue/ReservationStation.scala # src/main/scala/xiangshan/backend/issue/StatusArray.scala # src/main/scala/xiangshan/backend/rob/Rob.scala # src/main/scala/xiangshan/mem/MemCommon.scala # src/main/scala/xiangshan/mem/lsqueue/LSQWrapper.scala # src/main/scala/xiangshan/mem/lsqueue/LoadQueue.scala # src/main/scala/xiangshan/mem/lsqueue/StoreQueue.scala # src/main/scala/xiangshan/mem/pipeline/LoadUnit.scala # src/main/scala/xiangshan/mem/pipeline/StoreUnit.scala
show more ...
|
#
e4f69d78 |
| 21-May-2023 |
sfencevma <[email protected]> |
lsu: split lq for larger ooo load window (#2077)
BREAKING CHANGE: new LSU/LQ architecture introduced in this PR
In this commit, we replace unified LQ with:
* virtual load queue
* load replay qu
lsu: split lq for larger ooo load window (#2077)
BREAKING CHANGE: new LSU/LQ architecture introduced in this PR
In this commit, we replace unified LQ with:
* virtual load queue
* load replay queue
* load rar queue
* load raw queue
* uncache buffer
It will provide larger ooo load window.
NOTE: IPC loss in this commit is caused by MDP problems, for previous MDP
does not fit new LSU architecture.
MDP update is not included in this commit, IPC loss will be fixed by MDP update later.
---------
Co-authored-by: Lyn <[email protected]>
show more ...
|
#
67fcf090 |
| 18-Apr-2023 |
Xuan Hu <[email protected]> |
Merge remote-tracking branch 'upstream/master' into new-backend
|
#
730cfbc0 |
| 16-Apr-2023 |
Xuan Hu <[email protected]> |
backend: merge v2backend into backend
|
#
141a6449 |
| 27-Mar-2023 |
Xuan Hu <[email protected]> |
backend: add load inst support
|
#
3b739f49 |
| 06-Mar-2023 |
Xuan Hu <[email protected]> |
v2backend: huge tmp commit
|
#
1350347a |
| 01-Feb-2023 |
William Wang <[email protected]> |
ldu: software prefetch issue will always succeed
|
#
7f111a00 |
| 30-Jan-2023 |
William Wang <[email protected]> |
chore: update prefetch interface
|
#
3af6aa6e |
| 22-Oct-2022 |
William Wang <[email protected]> |
dcache: add optional meta prefetch and access bit
Added meta_prefetch and meta_access related sim perf counter
For now, optional dcache meta prefetch and access can be removed safely
|
#
b52348ae |
| 13-Oct-2022 |
William Wang <[email protected]> |
dcache: add hardware prefetch interface
|
#
144422dc |
| 04-Jan-2023 |
Maxpicca-Li <[email protected]> |
dcache: setup way predictor framework (#1857)
This commit sets up a basic dcache way predictor framework and a dummy predictor.
A Way Predictor Unit (WPU) module has been added to dcache. Dcache da
dcache: setup way predictor framework (#1857)
This commit sets up a basic dcache way predictor framework and a dummy predictor.
A Way Predictor Unit (WPU) module has been added to dcache. Dcache data SRAMs
have been reorganized for that.
The dummy predictor is disabled by default.
Besides, dcache bank conflict check has been optimized. It may cause timing problems,
to be fixed in the future.
* ideal wpu
* BankedDataArray: change architecture to reduce bank_conflict
* BankedDataArray: add db analysis
* Merge: the rest
* BankedDataArray: change the logic of rrl_bank_conflict, but let the number of rw_bank_conflict up
* Load Logic: changed to be as expected
reading data will be delayed by one cycle to make selection
writing data will be also delayed by one cycle to do write operation
* fix: ecc check error
* update the gitignore
* WPU: add regular wpu and change the replay mechanism
* WPU: fix refill fail bug, but a new addiw fail bug appears
* WPU: temporarily turn off to PR
* WPU: tfix all bug
* loadqueue: fix the initialization of replayCarry
* bankeddataarray: fix the bug
* DCacheWrapper: fix bug
* ready-to-run: correct the version
* WayPredictor: comments clean
* BankedDataArray: fix ecc_bank bug
* Parameter: set the enable signal of wpu
show more ...
|
#
683c1411 |
| 28-Dec-2022 |
happy-lx <[email protected]> |
lq: Remove LQ data (#1862)
This PR remove data in lq.
All cache miss load instructions will be replayed by lq, and the forward path to the D channel
and mshr is added to the pipeline.
Special t
lq: Remove LQ data (#1862)
This PR remove data in lq.
All cache miss load instructions will be replayed by lq, and the forward path to the D channel
and mshr is added to the pipeline.
Special treatment is made for uncache load. The data is no longer stored in the datamodule
but stored in a separate register. ldout is only used as uncache writeback, and only ldout0
will be used. Adjust the priority so that the replayed instruction has the highest priority in S0.
Future work:
1. fix `milc` perf loss
2. remove data from MSHRs
* difftest: monitor cache miss latency
* lq, ldu, dcache: remove lq's data
* lq's data is no longer used
* replay cache miss load from lq (use counter to delay)
* if dcache's mshr gets refill data, wake up lq's missed load
* uncache load will writeback to ldu using ldout_0
* ldout_1 is no longer used
* lq, ldu: add forward port
* forward D and mshr in load S1, get result in S2
* remove useless code logic in loadQueueData
* misc: revert monitor
show more ...
|
#
3c02ee8f |
| 25-Dec-2022 |
wakafa <[email protected]> |
Separate Utility submodule from XiangShan (#1861)
* misc: add utility submodule
* misc: adjust to new utility framework
* bump utility: revert resetgen
* bump huancun
|
#
16c3b0b7 |
| 08-Dec-2022 |
sfencevma <[email protected]> |
ldu: add st-ld violation re-execute (#1849)
* lsu: add st-ld violation re-execute
* misc: update vio check comments in LQ
Co-authored-by: Lyn <[email protected]>
Co-authored-by: Will
ldu: add st-ld violation re-execute (#1849)
* lsu: add st-ld violation re-execute
* misc: update vio check comments in LQ
Co-authored-by: Lyn <[email protected]>
Co-authored-by: William Wang <[email protected]>
show more ...
|
#
37225120 |
| 07-Dec-2022 |
sfencevma <[email protected]> |
Uncache: optimize write operation (#1844)
This commit adds an uncache write buffer to accelerate uncache write
For uncacheable address range, now we use atomic bit in PMA to indicate
uncache wri
Uncache: optimize write operation (#1844)
This commit adds an uncache write buffer to accelerate uncache write
For uncacheable address range, now we use atomic bit in PMA to indicate
uncache write in this range should not use uncache write buffer.
Note that XiangShan does not support atomic insts in uncacheable address range.
* uncache: optimize write operation
* pma: add atomic config
* uncache: assign hartId
* remove some pma atomic
* extend peripheral id width
Co-authored-by: Lyn <[email protected]>
show more ...
|
#
a760aeb0 |
| 02-Dec-2022 |
happy-lx <[email protected]> |
Replay all load instructions from LQ (#1838)
This intermediate architecture replays all load instructions from LQ.
An independent load replay queue will be added later.
Performance loss caused b
Replay all load instructions from LQ (#1838)
This intermediate architecture replays all load instructions from LQ.
An independent load replay queue will be added later.
Performance loss caused by changing of load replay sequences will be
analyzed in the future.
* memblock: load queue based replay
* replay load from load queue rather than RS
* use counters to delay replay logic
* memblock: refactor priority
* lsq-replay has higher priority than try pointchasing
* RS: remove load store rs's feedback port
* ld-replay: a new path for fast replay
* when fast replay needed, wire it to loadqueue and it will be selected
this cycle and replay to load pipline s0 in next cycle
* memblock: refactor load S0
* move all the select logic from lsq to load S0
* split a tlbReplayDelayCycleCtrl out of loadqueue to speed up
generating emu
* loadqueue: parameterize replay
show more ...
|
#
a19ae480 |
| 22-Sep-2022 |
William Wang <[email protected]> |
dcache: optimize data sram read fanout (#1784)
|
#
cb9c18dc |
| 24-Aug-2022 |
William Wang <[email protected]> |
ldu: select data in load_s3 (#1743)
rdataVec (i.e. sram read result merge forward result) is still generated in load_s2. It will be write to load queue in load_s2
|
#
0a992150 |
| 06-Aug-2022 |
William Wang <[email protected]> |
std: add an extra pipe stage for std (#1704)
|