#
3c808de0 |
| 17-Feb-2025 |
Anzo <[email protected]> |
fix(LSU): fix cbo instr exceptions and implementation (#4262)
1. typo.
2. `cbo` instr not produce misaligned exception.
3. `cbo zero` instr need flush `sbuffer`.
4. `cbo zero` sets mask correctly
fix(LSU): fix cbo instr exceptions and implementation (#4262)
1. typo.
2. `cbo` instr not produce misaligned exception.
3. `cbo zero` instr need flush `sbuffer`.
4. `cbo zero` sets mask correctly
5. Adding RAW checks to `cbo zero`.
6. Adding trigger(Debug Mode) checks to `cbo zero`.
7. Fixed several issues with the CBO instruction in NEMU.
----
In order not to create ambiguity with `io.mmioStout`, a new port of
`StoreQueue` is introduced for writeback `cbo zero` after flush sbuffer.
arbitration is performed in `MemBlock`, and currently, `cbo zero` has
higher priority by default.
`cbo zero` should not be writteback at the same time as `mmio`.
---
A check on `CacheLine` has been added to `RAWQueue` to ensure memory
consistency when executing `cbo zero`.
See this issues:https://github.com/OpenXiangShan/XiangShan/issues/4240
for specific issues.
---
The `cbo` instruction requires a trigger check.
---------
Co-authored-by: zhanglinjuan <[email protected]>
show more ...
|
#
549073a0 |
| 10-Dec-2024 |
cz4e <[email protected]> |
area(Lsq): compress rar/raw paddr and remove sq useless regs (#3976)
* LoadQueueRAR PAddr hash function, total 16bits:
: compress rar/raw paddr and remove sq useless regs (#3976)
* LoadQueueRAR PAddr hash function, total 16bits:

* LoadQueueRAW use PAddr[29:6], total 24bits
show more ...
|
#
5003e6f8 |
| 23-Jul-2024 |
Huijin Li <[email protected]> |
LSQ: optimize static clock gating coverage and fix x_value in vcs (#3176)
optimize LSQ static clock gating coverage, fix x_value in vcs
|
#
a7828dc1 |
| 12-Jun-2024 |
Tang Haojin <[email protected]> |
Revert "LSQ: optimize static clock gating coverage (#3023)" (#3055)
|
#
082b30d1 |
| 31-May-2024 |
Huijin Li <[email protected]> |
LSQ: optimize static clock gating coverage (#3023)
|
#
692e2faf |
| 07-Apr-2024 |
Huijin Li <[email protected]> |
MemBlock: optimize area for DCache refill logic (#2844)
* AtomicsUnit: delete signals 'trigger.backendHit' vector
* MemBlock & DCacheWrapper & FakeDCache & LSQWrapper & LoadQueue & LoadQueueRepla
MemBlock: optimize area for DCache refill logic (#2844)
* AtomicsUnit: delete signals 'trigger.backendHit' vector
* MemBlock & DCacheWrapper & FakeDCache & LSQWrapper & LoadQueue & LoadQueueReplay & LoadUnit : delete refill_to_ldq (unused signals)
* LoadQueueData: add Restrictions LoadQueueReplaySize must be divided by numWBank
show more ...
|
#
8891a219 |
| 08-Oct-2023 |
Yinan Xu <[email protected]> |
Bump rocket-chip (#2353)
|
#
cdbff57c |
| 24-Jul-2023 |
Haoyuan Feng <[email protected]> |
Memblock: Add load/store 128 bits datapath (#2180)
* Memblock: Add load/store 128 bits datapath
---------
Co-authored-by: lulu0521 <[email protected]>
* Memblock: fix bug of raw addr ma
Memblock: Add load/store 128 bits datapath (#2180)
* Memblock: Add load/store 128 bits datapath
---------
Co-authored-by: lulu0521 <[email protected]>
* Memblock: fix bug of raw addr match
* Memblock, LoadUnit: Fix Vector RAW paddr match
---------
Co-authored-by: lulu0521 <[email protected]>
show more ...
|
#
8a610956 |
| 13-Jun-2023 |
sfencevma <[email protected]> |
LQ: Optimizing LoadQueueReplay replay timing (#2127)
* Replay cycles increased from 2 to 3 cycles
* Simplified replay selection logic
|
#
e4f69d78 |
| 21-May-2023 |
sfencevma <[email protected]> |
lsu: split lq for larger ooo load window (#2077)
BREAKING CHANGE: new LSU/LQ architecture introduced in this PR
In this commit, we replace unified LQ with:
* virtual load queue
* load replay qu
lsu: split lq for larger ooo load window (#2077)
BREAKING CHANGE: new LSU/LQ architecture introduced in this PR
In this commit, we replace unified LQ with:
* virtual load queue
* load replay queue
* load rar queue
* load raw queue
* uncache buffer
It will provide larger ooo load window.
NOTE: IPC loss in this commit is caused by MDP problems, for previous MDP
does not fit new LSU architecture.
MDP update is not included in this commit, IPC loss will be fixed by MDP update later.
---------
Co-authored-by: Lyn <[email protected]>
show more ...
|
#
683c1411 |
| 28-Dec-2022 |
happy-lx <[email protected]> |
lq: Remove LQ data (#1862)
This PR remove data in lq.
All cache miss load instructions will be replayed by lq, and the forward path to the D channel
and mshr is added to the pipeline.
Special t
lq: Remove LQ data (#1862)
This PR remove data in lq.
All cache miss load instructions will be replayed by lq, and the forward path to the D channel
and mshr is added to the pipeline.
Special treatment is made for uncache load. The data is no longer stored in the datamodule
but stored in a separate register. ldout is only used as uncache writeback, and only ldout0
will be used. Adjust the priority so that the replayed instruction has the highest priority in S0.
Future work:
1. fix `milc` perf loss
2. remove data from MSHRs
* difftest: monitor cache miss latency
* lq, ldu, dcache: remove lq's data
* lq's data is no longer used
* replay cache miss load from lq (use counter to delay)
* if dcache's mshr gets refill data, wake up lq's missed load
* uncache load will writeback to ldu using ldout_0
* ldout_1 is no longer used
* lq, ldu: add forward port
* forward D and mshr in load S1, get result in S2
* remove useless code logic in loadQueueData
* misc: revert monitor
show more ...
|
#
3c02ee8f |
| 25-Dec-2022 |
wakafa <[email protected]> |
Separate Utility submodule from XiangShan (#1861)
* misc: add utility submodule
* misc: adjust to new utility framework
* bump utility: revert resetgen
* bump huancun
|
#
0a47e4a1 |
| 09-Aug-2022 |
William Wang <[email protected]> |
lq: update paddr in lq in load_s1 and load_s2 (#1707)
Now we use 2 cycles to update paddr in lq. In this way, paddr in lq is still valid in load_s3
|
#
39f2ec76 |
| 09-Aug-2022 |
William Wang <[email protected]> |
lq: add 1 extra stage for lq data write (#1705)
Now lq data is divided into 8 banks by default. Write to lq
data takes 2 cycles to finish
Lq data will not be read in at least 2 cycles after wri
lq: add 1 extra stage for lq data write (#1705)
Now lq data is divided into 8 banks by default. Write to lq
data takes 2 cycles to finish
Lq data will not be read in at least 2 cycles after write, so it is ok
to add this delay. For example:
T0: update lq meta, lq data write req start
T1: lq data write finish, new wbidx selected
T2: read lq data according to new wbidx selected
show more ...
|
#
ef3b5b96 |
| 13-Feb-2022 |
William Wang <[email protected]> |
mem: fix ldld vio check implementation (#1456)
* mem: fix ldld vio mask gen logic
* mem: fix lq released flag update logic
Make sure that every load before a probe has correct released flag
mem: fix ldld vio check implementation (#1456)
* mem: fix ldld vio mask gen logic
* mem: fix lq released flag update logic
Make sure that every load before a probe has correct released flag
See the PR of this commit for illustration
* mem: fix ld-ld violation check logic
* ci: clean up workspace before do real test
* mem: reduce lq released flag update delay for 1 cycle
* chore: bump difftest to run no-smp diff
* ci: add mc test
* mem: fix lq released flag update logic
* chore: set difftest firstCommit_limit to 10000
* ci: use dual-nemu-so for mc test
show more ...
|
#
96b1e495 |
| 15-Nov-2021 |
William Wang <[email protected]> |
Optmize memblock timing (#1218)
DCache timing problem has not been solved yet. DCache structure will be further changed.
* sbuffer: add extra perf counters
* sbuffer: optmize timeout replay ch
Optmize memblock timing (#1218)
DCache timing problem has not been solved yet. DCache structure will be further changed.
* sbuffer: add extra perf counters
* sbuffer: optmize timeout replay check timing
* sbuffer: optmize do_uarch_drain check timing
Now we only compare merge entry's vtag, check will not start until
mergeIdx is generated by PriorityEncoder
* mem, lq: optmize writeback select logic timing
* dcache: replace missqueue reill req arbiter
* dcache: refactor missqueue entry select logic
* mem: add comments for lsq data
* dcache: give amo alu an extra cycle
* sbuffer: optmize sbuffer forward data read timing
show more ...
|
#
beabc72d |
| 29-Oct-2021 |
William Wang <[email protected]> |
mem: fix ld-ld violation check, enable it by default (#1184)
|
#
67682d05 |
| 22-Oct-2021 |
William Wang <[email protected]> |
Add ld-ld violation check (#1140)
* mem: support ld-ld violation check
* mem: do not fast wakeup if ld vio check failed
* mem: disable ld-ld vio check after core reset
|
#
9aca92b9 |
| 28-Sep-2021 |
Yinan Xu <[email protected]> |
misc: code clean up (#1073)
* rename Roq to Rob
* remove trailing whitespaces
* remove unused parameters
|
#
594ba8ac |
| 24-Aug-2021 |
William Wang <[email protected]> |
mem: let lq refill width be equal to l1d bus width
|
#
f320e0f0 |
| 24-Jul-2021 |
Yinan Xu <[email protected]> |
misc: update PCL information (#899)
XiangShan is jointly released by ICT and PCL.
|
#
6d5ddbce |
| 19-Jul-2021 |
Lemover <[email protected]> |
cache,mmu: split PTW and TLB into several files (#890)
|
#
c6d43980 |
| 04-Jun-2021 |
Lemover <[email protected]> |
Add MulanPSL-2.0 License (#824)
In this commit, we add License for XiangShan project.
|
#
2225d46e |
| 19-Apr-2021 |
Jiawei Lin <[email protected]> |
Refactor parameters, SimTop and difftest (#753)
* difftest: use DPI-C to refactor difftest
In this commit, difftest is refactored with DPI-C calls.
There're a few reasons:
(1) From Verilator's
Refactor parameters, SimTop and difftest (#753)
* difftest: use DPI-C to refactor difftest
In this commit, difftest is refactored with DPI-C calls.
There're a few reasons:
(1) From Verilator's manual, DPI-C calls should be more efficient than accessing from dut_ptr.
(2) DPI-C is cross-platform (Verilator, VCS, ...)
(3) difftest APIs are splited from emu.cpp to possibly support more backend platforms
(NEMU, Spike, ...)
The performance at this commit is quite slower than the original emu.
Performance issues will be fixed later.
* [WIP] SimTop: try to use 'XSTop' as soc
* CircularQueuePtr: ues F-bounded polymorphis instead implict helper
* Refactor parameters & Clean up code
* difftest: support basic difftest
* Support diffetst in new sim top
* Difftest; convert recode fmt to ieee754 when comparing fp regs
* Difftest: pass sign-ext pc to dpic functions && fix exception pc
* Debug: add int/exc inst wb to debug queue
* Difftest: pass sign-ext pc to dpic functions && fix exception pc
* Difftest: fix naive commit num limit
Co-authored-by: Yinan Xu <[email protected]>
Co-authored-by: William Wang <[email protected]>
show more ...
|
#
0f22ee7c |
| 02-Feb-2021 |
William Wang <[email protected]> |
MemBlock: add MaskedSyncDataModuleTemplate
|