History log of /XiangShan/src/main/scala/xiangshan/mem/lsqueue/LoadQueueData.scala (Results 1 – 25 of 37)
Revision Date Author Comments
# 3c808de0 17-Feb-2025 Anzo <[email protected]>

fix(LSU): fix cbo instr exceptions and implementation (#4262)

1. typo.
2. `cbo` instr not produce misaligned exception.
3. `cbo zero` instr need flush `sbuffer`.
4. `cbo zero` sets mask correctly

fix(LSU): fix cbo instr exceptions and implementation (#4262)

1. typo.
2. `cbo` instr not produce misaligned exception.
3. `cbo zero` instr need flush `sbuffer`.
4. `cbo zero` sets mask correctly
5. Adding RAW checks to `cbo zero`.
6. Adding trigger(Debug Mode) checks to `cbo zero`.
7. Fixed several issues with the CBO instruction in NEMU.
----

In order not to create ambiguity with `io.mmioStout`, a new port of
`StoreQueue` is introduced for writeback `cbo zero` after flush sbuffer.
arbitration is performed in `MemBlock`, and currently, `cbo zero` has
higher priority by default.
`cbo zero` should not be writteback at the same time as `mmio`.

---
A check on `CacheLine` has been added to `RAWQueue` to ensure memory
consistency when executing `cbo zero`.
See this issues:https://github.com/OpenXiangShan/XiangShan/issues/4240
for specific issues.

---
The `cbo` instruction requires a trigger check.

---------

Co-authored-by: zhanglinjuan <[email protected]>

show more ...


# 549073a0 10-Dec-2024 cz4e <[email protected]>

area(Lsq): compress rar/raw paddr and remove sq useless regs (#3976)

* LoadQueueRAR PAddr hash function, total 16bits:
![vaddr_compress](https://github.com/user-attachments/assets/6b87fb4d-7080-4b5

area(Lsq): compress rar/raw paddr and remove sq useless regs (#3976)

* LoadQueueRAR PAddr hash function, total 16bits:
![vaddr_compress](https://github.com/user-attachments/assets/6b87fb4d-7080-4b59-bf20-0e0f991ab141)
* LoadQueueRAW use PAddr[29:6], total 24bits

show more ...


# 5003e6f8 23-Jul-2024 Huijin Li <[email protected]>

LSQ: optimize static clock gating coverage and fix x_value in vcs (#3176)

optimize LSQ static clock gating coverage, fix x_value in vcs


# a7828dc1 12-Jun-2024 Tang Haojin <[email protected]>

Revert "LSQ: optimize static clock gating coverage (#3023)" (#3055)


# 082b30d1 31-May-2024 Huijin Li <[email protected]>

LSQ: optimize static clock gating coverage (#3023)


# 692e2faf 07-Apr-2024 Huijin Li <[email protected]>

MemBlock: optimize area for DCache refill logic (#2844)

* AtomicsUnit: delete signals 'trigger.backendHit' vector

* MemBlock & DCacheWrapper & FakeDCache & LSQWrapper & LoadQueue & LoadQueueRepla

MemBlock: optimize area for DCache refill logic (#2844)

* AtomicsUnit: delete signals 'trigger.backendHit' vector

* MemBlock & DCacheWrapper & FakeDCache & LSQWrapper & LoadQueue & LoadQueueReplay & LoadUnit : delete refill_to_ldq (unused signals)

* LoadQueueData: add Restrictions LoadQueueReplaySize must be divided by numWBank

show more ...


# 8891a219 08-Oct-2023 Yinan Xu <[email protected]>

Bump rocket-chip (#2353)


# cdbff57c 24-Jul-2023 Haoyuan Feng <[email protected]>

Memblock: Add load/store 128 bits datapath (#2180)

* Memblock: Add load/store 128 bits datapath

---------

Co-authored-by: lulu0521 <[email protected]>

* Memblock: fix bug of raw addr ma

Memblock: Add load/store 128 bits datapath (#2180)

* Memblock: Add load/store 128 bits datapath

---------

Co-authored-by: lulu0521 <[email protected]>

* Memblock: fix bug of raw addr match

* Memblock, LoadUnit: Fix Vector RAW paddr match

---------

Co-authored-by: lulu0521 <[email protected]>

show more ...


# 8a610956 13-Jun-2023 sfencevma <[email protected]>

LQ: Optimizing LoadQueueReplay replay timing (#2127)

* Replay cycles increased from 2 to 3 cycles
* Simplified replay selection logic


# e4f69d78 21-May-2023 sfencevma <[email protected]>

lsu: split lq for larger ooo load window (#2077)

BREAKING CHANGE: new LSU/LQ architecture introduced in this PR

In this commit, we replace unified LQ with:
* virtual load queue
* load replay qu

lsu: split lq for larger ooo load window (#2077)

BREAKING CHANGE: new LSU/LQ architecture introduced in this PR

In this commit, we replace unified LQ with:
* virtual load queue
* load replay queue
* load rar queue
* load raw queue
* uncache buffer

It will provide larger ooo load window.

NOTE: IPC loss in this commit is caused by MDP problems, for previous MDP
does not fit new LSU architecture.
MDP update is not included in this commit, IPC loss will be fixed by MDP update later.

---------

Co-authored-by: Lyn <[email protected]>

show more ...


# 683c1411 28-Dec-2022 happy-lx <[email protected]>

lq: Remove LQ data (#1862)

This PR remove data in lq.

All cache miss load instructions will be replayed by lq, and the forward path to the D channel
and mshr is added to the pipeline.
Special t

lq: Remove LQ data (#1862)

This PR remove data in lq.

All cache miss load instructions will be replayed by lq, and the forward path to the D channel
and mshr is added to the pipeline.
Special treatment is made for uncache load. The data is no longer stored in the datamodule
but stored in a separate register. ldout is only used as uncache writeback, and only ldout0
will be used. Adjust the priority so that the replayed instruction has the highest priority in S0.

Future work:
1. fix `milc` perf loss
2. remove data from MSHRs

* difftest: monitor cache miss latency

* lq, ldu, dcache: remove lq's data

* lq's data is no longer used
* replay cache miss load from lq (use counter to delay)
* if dcache's mshr gets refill data, wake up lq's missed load
* uncache load will writeback to ldu using ldout_0
* ldout_1 is no longer used

* lq, ldu: add forward port

* forward D and mshr in load S1, get result in S2
* remove useless code logic in loadQueueData

* misc: revert monitor

show more ...


# 3c02ee8f 25-Dec-2022 wakafa <[email protected]>

Separate Utility submodule from XiangShan (#1861)

* misc: add utility submodule

* misc: adjust to new utility framework

* bump utility: revert resetgen

* bump huancun


# 0a47e4a1 09-Aug-2022 William Wang <[email protected]>

lq: update paddr in lq in load_s1 and load_s2 (#1707)

Now we use 2 cycles to update paddr in lq. In this way,
paddr in lq is still valid in load_s3


# 39f2ec76 09-Aug-2022 William Wang <[email protected]>

lq: add 1 extra stage for lq data write (#1705)

Now lq data is divided into 8 banks by default. Write to lq
data takes 2 cycles to finish

Lq data will not be read in at least 2 cycles after wri

lq: add 1 extra stage for lq data write (#1705)

Now lq data is divided into 8 banks by default. Write to lq
data takes 2 cycles to finish

Lq data will not be read in at least 2 cycles after write, so it is ok
to add this delay. For example:
T0: update lq meta, lq data write req start
T1: lq data write finish, new wbidx selected
T2: read lq data according to new wbidx selected

show more ...


# ef3b5b96 13-Feb-2022 William Wang <[email protected]>

mem: fix ldld vio check implementation (#1456)

* mem: fix ldld vio mask gen logic

* mem: fix lq released flag update logic

Make sure that every load before a probe has correct released flag

mem: fix ldld vio check implementation (#1456)

* mem: fix ldld vio mask gen logic

* mem: fix lq released flag update logic

Make sure that every load before a probe has correct released flag

See the PR of this commit for illustration

* mem: fix ld-ld violation check logic

* ci: clean up workspace before do real test

* mem: reduce lq released flag update delay for 1 cycle

* chore: bump difftest to run no-smp diff

* ci: add mc test

* mem: fix lq released flag update logic

* chore: set difftest firstCommit_limit to 10000

* ci: use dual-nemu-so for mc test

show more ...


# 96b1e495 15-Nov-2021 William Wang <[email protected]>

Optmize memblock timing (#1218)

DCache timing problem has not been solved yet. DCache structure will be further changed.

* sbuffer: add extra perf counters

* sbuffer: optmize timeout replay ch

Optmize memblock timing (#1218)

DCache timing problem has not been solved yet. DCache structure will be further changed.

* sbuffer: add extra perf counters

* sbuffer: optmize timeout replay check timing

* sbuffer: optmize do_uarch_drain check timing

Now we only compare merge entry's vtag, check will not start until
mergeIdx is generated by PriorityEncoder

* mem, lq: optmize writeback select logic timing

* dcache: replace missqueue reill req arbiter

* dcache: refactor missqueue entry select logic

* mem: add comments for lsq data

* dcache: give amo alu an extra cycle

* sbuffer: optmize sbuffer forward data read timing

show more ...


# beabc72d 29-Oct-2021 William Wang <[email protected]>

mem: fix ld-ld violation check, enable it by default (#1184)


# 67682d05 22-Oct-2021 William Wang <[email protected]>

Add ld-ld violation check (#1140)

* mem: support ld-ld violation check
* mem: do not fast wakeup if ld vio check failed
* mem: disable ld-ld vio check after core reset


# 9aca92b9 28-Sep-2021 Yinan Xu <[email protected]>

misc: code clean up (#1073)

* rename Roq to Rob

* remove trailing whitespaces

* remove unused parameters


# 594ba8ac 24-Aug-2021 William Wang <[email protected]>

mem: let lq refill width be equal to l1d bus width


# f320e0f0 24-Jul-2021 Yinan Xu <[email protected]>

misc: update PCL information (#899)

XiangShan is jointly released by ICT and PCL.


# 6d5ddbce 19-Jul-2021 Lemover <[email protected]>

cache,mmu: split PTW and TLB into several files (#890)


# c6d43980 04-Jun-2021 Lemover <[email protected]>

Add MulanPSL-2.0 License (#824)

In this commit, we add License for XiangShan project.


# 2225d46e 19-Apr-2021 Jiawei Lin <[email protected]>

Refactor parameters, SimTop and difftest (#753)

* difftest: use DPI-C to refactor difftest

In this commit, difftest is refactored with DPI-C calls.
There're a few reasons:
(1) From Verilator's

Refactor parameters, SimTop and difftest (#753)

* difftest: use DPI-C to refactor difftest

In this commit, difftest is refactored with DPI-C calls.
There're a few reasons:
(1) From Verilator's manual, DPI-C calls should be more efficient than accessing from dut_ptr.
(2) DPI-C is cross-platform (Verilator, VCS, ...)
(3) difftest APIs are splited from emu.cpp to possibly support more backend platforms
(NEMU, Spike, ...)

The performance at this commit is quite slower than the original emu.
Performance issues will be fixed later.

* [WIP] SimTop: try to use 'XSTop' as soc

* CircularQueuePtr: ues F-bounded polymorphis instead implict helper

* Refactor parameters & Clean up code

* difftest: support basic difftest

* Support diffetst in new sim top

* Difftest; convert recode fmt to ieee754 when comparing fp regs

* Difftest: pass sign-ext pc to dpic functions && fix exception pc

* Debug: add int/exc inst wb to debug queue

* Difftest: pass sign-ext pc to dpic functions && fix exception pc

* Difftest: fix naive commit num limit

Co-authored-by: Yinan Xu <[email protected]>
Co-authored-by: William Wang <[email protected]>

show more ...


# 0f22ee7c 02-Feb-2021 William Wang <[email protected]>

MemBlock: add MaskedSyncDataModuleTemplate


12