#
51f9a957 |
| 21-Feb-2025 |
cz4e <[email protected]> |
style(LoadPipe): use `miss_req.bits.cancel` instead of `mq_enq_cancel` (#4296)
|
#
fa5e530d |
| 21-Jan-2025 |
cz4e <[email protected]> |
timing(VSegmentUnit): duplicate latchVAddr (#4209)
* `latchVAddr` needs to index all dcache data sram from top to bottom, which causes a large fanout, so duplicate `latchVaddr`
|
#
fb49ae6b |
| 25-Dec-2024 |
cz4e <[email protected]> |
fix(LoadPipe): enable s1 permission check for prefetch_w (#4056)
|
#
8b33cd30 |
| 13-Dec-2024 |
klin02 <[email protected]> |
feat(XSLog): move all XSLog outside WhenContext for collection
As data in WhenContext is not acessible in another module. To support XSLog collection, we move all XSLog and related signal outside Wh
feat(XSLog): move all XSLog outside WhenContext for collection
As data in WhenContext is not acessible in another module. To support XSLog collection, we move all XSLog and related signal outside WhenContext. For example, when(cond1){XSDebug(cond2, pable)} to XSDebug(cond1 && cond2, pable)
show more ...
|
#
72dab974 |
| 16-Dec-2024 |
cz4e <[email protected]> |
feat(CtrlUnit, DCache): support L1 DCache RAS (#4009)
# L1 DCache RAS extension support
The L1 DCache supports the part of Reliability, Availability, and Serviceability (RAS) Extension. * L1 DCache
feat(CtrlUnit, DCache): support L1 DCache RAS (#4009)
# L1 DCache RAS extension support
The L1 DCache supports the part of Reliability, Availability, and Serviceability (RAS) Extension. * L1 DCache protection with Single Error Correct Double Error Detect (SECDED) ECC on the RAMs. This includes the L1 DChace tag and data RAMs. Not recovery error tag or data. * Fault Handling Interrupt (Bus Error Unit Interrupt,BEU, 65) * Error inject
## ECC Error Detect An error might be triggered, when access L1 DCache. * **Error Report**: * Tag ECC Error: As long as an ECC error occurs on a certain path, it is judged that an ECC error has occurred. * Data ECC Error: If an ECC error occurs in the hit line, it is considered that an ECC error has occurred. If it does not hit, it will not be processed. * If an instruction access triggers an ECC error, a Hardware error is considered and an exception is reported. * Whenever there is an error in starting, an error message needs to be sent to BEU. * When the hardware detects an error, it reports it to the BEU and triggers the NMI external interrupt(65).
* **Load instruction**: * Only ECC errors of tags or data will be triggered during execution, and the errors will be reported to the BEU and a `Hardware Error` will be reported.
* **Probe/Snoop**: * If a tag ecc error occurs, there is no need to change the cache status, and a `ProbeAck` with `corrupt=1` needs to be returned to l2. * If a data ecc error occurs, change the cache status according to the rules. If data needs to be returned, `ProbeAckData` with `corrupt=1` needs to be returned to l2.
* **Replace/Evict**: * `ReleaseData` with `corrupt=1` needs to be returned to l2.
* **Store to L1 DCache**: * If a tag ecc error occurs, the cacheline is released according to the `Repalce/Evict` process and the data is written to L1 DCache without reporting errors to l2. * If a data ecc error occurs, the data is written directly without reporting the error to l2.
* **Atomics**: * report `Hardware Error`, do not report errors to l2.
## Error Inject Each core's L1 DCache is configured with a memory map register-controlled controller, and each hardware unit that supports ECC is configured with a control bank. After the Bank register configuration is completed, L1 DCache will trigger an ecc error for the first access L1 DCache. <div style="text-align: center;"> <img src="https://github.com/user-attachments/assets/8c4d23c5-0324-4e52-bcf4-29b47a282d72" alt="err_inject" width="200" /> </div>
### Address Space Address space `0x38022000`-`0x3802207F`, a total of 128 bytes of space, this space is the local space of each hart. <div style="text-align: center;"> <img width="292" alt="ctl_bank" src="https://github.com/user-attachments/assets/89f88b24-37a4-4786-a192-401759eb95cf"> </div>
### L1 DCache Control Bank Each Control Bank contains registers: `ECCCTL`, `ECCEID`, `ECCMASK`, each register is 8 bytes. <img width="414" alt="eccctl" src="https://github.com/user-attachments/assets/b22ff437-d05d-4b3c-a353-dbea1afdc156"> * ECCCTL(ECC Control): ECC injection control register. * `ese(error signaling enable)`: Indicates that the injection is valid and is initialized to 0. When the injection is successful and `pst==0`, ese will be clean. * `pst(persist)`: Continuously inject signals. When `pst==1`, the `ECCEID` counter decreases to 0 and after successful injection, the injection timer will be restored to the last set `ECCEID` and re-injected; when `pst==0`, it will be injected only once. * `ede(error delay enable)`: Indicates that counter is valid and initialized to 0. If * `ese==1` and `ede==0`, error injection is effective immediately. * `ese==1` and `ede==1`, you need to wait until `ECCEID` decrements to 0 before the injection is effective. * `cmp(component)`: Injection target, initialized to 0. * 1'b0: The injection object is tag. * 1'b1: The injection object is data. * `bank`: The bank valid signal is initialized to 0. When the bit in the `bank` is set, the corresponding mask is valid. <img width="414" alt="ecceid" src="https://github.com/user-attachments/assets/8cea0d8d-2540-44b1-b1f9-c1ed6ec5341e">
* ECCEID(ECC Error Inject Delay): ECC injection delay controller. * When `ese==1` and `ede==1`, it starts to decrease until it reaches 0. Currently, the same clock as the core frequency is used, which can also be divided. Since ECC injection relies on L1 DCache access, the time of the `EID` and the time when the ECC error is triggered may not be consistent.
<img width="414" alt="eccmask" src="https://github.com/user-attachments/assets/b1be83fd-17a6-4324-8aa6-45858249c476">
* ECCMASK(ECC Mask): ECC injection mask register. * 0 means no inversion, 1 means flip. Tag injection only uses the bits in `ECCMASK0` corresponding to the tag length.
### Error Inject Example ``` 1 # set control bank base address 2 mv x3, $(BASEADDR) 3 4 # set eid 5 mv x5, 500 # delay 500 cycles 6 sd x5, 8(x3) # mmio store 7 8 # set mask 9 mv x5, 0x1 # flip bit 0 10 sd x5, 16(x3) # mmio store 11 12 # set ctl 13 mv x5, 0x7 # comp = 0, ede = 1, pst = 1, ese = 1 14 sd x5, 0(x3) # mmio store ```
show more ...
|
#
8ffb12e4 |
| 13-Dec-2024 |
Anzo <[email protected]> |
fix(bank_conflict): Selecting the oldest Load causes a conflict (#4036)
This modification changes `load bank conflict` from [default priority 0
1 2] to [so that the oldest Load does not have a `ban
fix(bank_conflict): Selecting the oldest Load causes a conflict (#4036)
This modification changes `load bank conflict` from [default priority 0
1 2] to [so that the oldest Load does not have a `bank conflict`].
In the following, `Load 0` refers to `LoadUnit 0`.
For example, before:
Load 0 lqidx 5
Load 1 lqidx 3
Load 2 lqidx 8
Assuming that three Loads have `bank conflict`, then we will default to
making Load1 and Load2 have `bank conflict` so that they can be
replayed.
---
However, this may lead to deadlocks in some cases.
For example:
Load 0 robidx 7
Store 0 robidx 6
Load 1 robidx 5
`Store 0` is dependent on `Load 1` for data, while `Load 0` is dependent
on `Store 0` for data, and `Load 0` and `Load 1` will have a `bank
conflict`.
In this case then, `Load 1` will `Replay` because of `bank conflict` and
`Load 0` will `Replay` because of `forward fault`(because of misalign).
---
With the modification, we will choose to make the oldest Load not
generate `bank conflict`, thus circumventing the jamming problem.
**Note !!! This may introduce performance fluctuations (up or down)**
show more ...
|
#
6a539f6d |
| 29-Nov-2024 |
Anzooooo <[email protected]> |
feat(LoadPipe): let 128bitReq be accessed at 128-bit aligned granularity
|
#
e718f875 |
| 27-Nov-2024 |
Anzo <[email protected]> |
refactor(LoadPipe): remove the redundant logic of the `mq_nack` (#3936)
Remove the redundant logic of the miss queue nack based on the issues
raised in the
issues(https://github.com/OpenXiangShan/
refactor(LoadPipe): remove the redundant logic of the `mq_nack` (#3936)
Remove the redundant logic of the miss queue nack based on the issues
raised in the
issues(https://github.com/OpenXiangShan/XiangShan/issues/3916).
When a tlb miss and a dcache miss occur at the same time and the miss
queue nack, it will cause the `LoadUnit` to generate both replay signals
`C_TM` and `C_DR`. We will give priority to `C_TM`, which is why we need
to send a kill signal to dcache when a tlb miss occurs.
Although there was no problem before, as the
issue(https://github.com/OpenXiangShan/XiangShan/issues/3916) says, this
will cause ambiguity, and the miss queue nack message is already
included in `s2_nack_no_mshr`, so the choice is to remove the
`s2_miss_req_fire` signal from the generation logic of the `s2_mq_nack`
signal.
show more ...
|
#
92bcee1c |
| 22-Nov-2024 |
cz4e <[email protected]> |
timing(DCache): delay tag error to s3 instead of s2 (#3908)
* break hitVec -> ldu tag_error -> loadunit path
|
#
a5f58fbc |
| 29-Sep-2024 |
lixin <[email protected]> |
timing(dataArray): seperate bankedDataRead kill
Do not let banked_read_valid include kill to improve the timing of reading sram. Later, use kill to determine bankConflict in load s2.
fix(BankedData
timing(dataArray): seperate bankedDataRead kill
Do not let banked_read_valid include kill to improve the timing of reading sram. Later, use kill to determine bankConflict in load s2.
fix(BankedDataArray): remove kill logic when generate rr_bank_conflict
data_bank will select the read address based on the priority of the valid signal. When there are multiple read requests, bank conflicts occur and the high-priority needs to be killed, the data read by the low-priority loadunit will be overwritten.
show more ...
|
#
4cc68b84 |
| 26-Sep-2024 |
sfencevma <[email protected]> |
timing(LoadPipe): remove permission and coh check when generate hit logic
|
#
44f2941b |
| 24-Sep-2024 |
Jiru Sun <[email protected]> |
refactor(HPM): move HPMs from utils to utility repo (#3631)
Because HPMs will be used in Coupled L2 as well, delete
`PerfCounterUtils.scala` in Xiangshan and create
`HardwarePerfMonitor.scala` in
refactor(HPM): move HPMs from utils to utility repo (#3631)
Because HPMs will be used in Coupled L2 as well, delete
`PerfCounterUtils.scala` in Xiangshan and create
`HardwarePerfMonitor.scala` in Utility.
See also [Pull Request in
CoupledL2](https://github.com/OpenXiangShan/CoupledL2/pull/251#discussion_r1770738535).
show more ...
|
#
08b0bc30 |
| 03-Sep-2024 |
happy-lx <[email protected]> |
timing(MemBlock): optimize MemBlock timing (#3467)
This PR optimizes the timing of MemBlock. Specific optimizations include
but are not limited to:
+ TLB use the redirect for the next cycle
+ Opt
timing(MemBlock): optimize MemBlock timing (#3467)
This PR optimizes the timing of MemBlock. Specific optimizations include
but are not limited to:
+ TLB use the redirect for the next cycle
+ Optimize VLSU feedback and redirect
+ Optimise ldCancel and writeback signal generation
+ Optimise TLB Query Vaddr/hlv/hlvx/valid etc
+ Delay MMIO Store writeback for 1 Cycle
+ Fix tlbNoQuery and pmp logic
+ Remove clock gating for s3_fast_rep
+ Remove wbq conflict check to LoadPipe/MainPipe
+ Remove Mux in dcache resp data
+ Optimise data generation logic of LoadUnit
+ Duplicate Register in LoadUnit for data writeback
+ Duplicate Register in loadPipe for missQueue enq
+ Add skid buffer in VLSU
+ Select data from metaArray at S1
+ Simplify the enqueuing logic of missQueue
+ Separately generate the ready logic of miss Queue
+ Relax the conditions valid for bankdataArray reads
+ Add Reg between Dcache Mainpipe with sms prefetcher
+ Optimise store exceptionBuffer pipeline
---------
Co-authored-by: weiding liu <[email protected]>
Co-authored-by: Charlie Liu <[email protected]>
Co-authored-by: good-circle <[email protected]>
show more ...
|
#
6070f1e9 |
| 03-Sep-2024 |
happy-lx <[email protected]> |
fix(L1PF): fix good_prefetch Counting logic (#3474)
Previous design:
When a demand load hits a Cache block fetched by the prefetcher, the
`PrefetchSource` of this block will be cleared,
causing i
fix(L1PF): fix good_prefetch Counting logic (#3474)
Previous design:
When a demand load hits a Cache block fetched by the prefetcher, the
`PrefetchSource` of this block will be cleared,
causing it to be mistakenly believed that it was not fetched by the
prefetcher initially when it is subsequently replaced from the cache,
resulting in not increasing the `good_prefetch` counter
Fix:
Now add a new cache block status(L1_HW_PREFETCH_CLEAR): indicating that
this block was originally fetched by the prefetcher
show more ...
|
#
d4564868 |
| 17-Jul-2024 |
weiding liu <[email protected]> |
Dcache: refactor dcache's read data delay for better port timing
|
#
bb2f3f51 |
| 12-Jul-2024 |
Tang Haojin <[email protected]> |
perf: use perfUtils in `Utility` (#3190)
Currently, log and perf utilities such as `XSPerfAccumulate` are
implemented in many repositories like XiangShan, CoupledL2 and HuanCun.
This PR unifies th
perf: use perfUtils in `Utility` (#3190)
Currently, log and perf utilities such as `XSPerfAccumulate` are
implemented in many repositories like XiangShan, CoupledL2 and HuanCun.
This PR unifies them and put them in Utility repository.
show more ...
|
#
344cf5d5 |
| 27-Jun-2024 |
CharlieLiu <[email protected]> |
DCache: Remove redundant nack_data from mq_nack (#3110)
Remove redundant s2_nack_data from s2_mq_nack
|
#
10deab87 |
| 28-May-2024 |
good-circle <[email protected]> |
Dcache: data read valid should not rely on tag hit
|
#
31d5a9c4 |
| 09-Jan-2024 |
sfencevma <[email protected]> |
ECC: add enable option for ecc
|
#
5adc4829 |
| 16-Jun-2024 |
Yanqin Li <[email protected]> |
memblock: add rest clockgate of reg (#3017)
Co-authored-by: cai luoshan <[email protected]> Co-authored-by: Cai Luoshan <[email protected]> Co-authored-by: good-circle <
memblock: add rest clockgate of reg (#3017)
Co-authored-by: cai luoshan <[email protected]> Co-authored-by: Cai Luoshan <[email protected]> Co-authored-by: good-circle <[email protected]> Co-authored-by: Ma-YX <[email protected]> Co-authored-by: Ma-YX <[email protected]> Co-authored-by: CharlieLiu <[email protected]>
show more ...
|
#
0184a80e |
| 15-Jun-2024 |
Yanqin Li <[email protected]> |
L1CacheErrorInfo: code refactor for correct and convenient clockgate (#3044)
|
#
c2bbba9f |
| 21-May-2024 |
CharlieLiu <[email protected]> |
DCache: Remove useless data_read when miss in LoadPipe (#2990)
- Remove useless data_read when DCache miss in LoadPipe
- Fix req priority in DCache MainPipe
|
#
20e09ab1 |
| 09-May-2024 |
happy-lx <[email protected]> |
fix bug of stream (#2756)
Bug Description:
(1) Increase the way of Dcache to 8 to reduce the problem of running on the bwaves test caused by too many addresses mapped to the same set.
(2) Set ldu0
fix bug of stream (#2756)
Bug Description:
(1) Increase the way of Dcache to 8 to reduce the problem of running on the bwaves test caused by too many addresses mapped to the same set.
(2) Set ldu0 to a high-confidence prefetch request channel to increase the probability that the prefetch request will be accepted by Dcache's MSHR.
(3) Fix the issue that ldu sends an error ready back to the prefetcher to prevent the prefetch request from being dropped.
(4) Dont let the prefetch request access Dcache's DataArray.
(5) Add a extra port in Muti-level prefetch Queue to accept more pf req from stream&stride
(6) Larger Stream bit vector Array 16 -> 32 to cover muti Stream access pattern in Bwaves and GemsFDTD.
In addition, the decline in libquantum is a bit strange.
show more ...
|
#
ffd3154d |
| 25-Apr-2024 |
CharlieLiu <[email protected]> |
DCache: New feature evict on refill (#2919)
- Remove module RefillPipe, move DCache replacer access/update to
MainPipe.
- Using l2_hint as an early wake-up signal for MSHR.
---------
Co-auth
DCache: New feature evict on refill (#2919)
- Remove module RefillPipe, move DCache replacer access/update to
MainPipe.
- Using l2_hint as an early wake-up signal for MSHR.
---------
Co-authored-by: YukunXue <[email protected]>
Co-authored-by: Tang Haojin <[email protected]>
Co-authored-by: ssszwic <[email protected]>
Co-authored-by: Kunlin You <[email protected]>
show more ...
|
#
ec86549e |
| 02-Jan-2024 |
sfencevma <[email protected]> |
MemBlock: enable 3ld3st (#2524)
* enable 3ld3st
* assign enqLsq
* fix IssQueSize
* remove performance regression
* MMU: Fix ptwrepeater when 3ld + 3st
* fix minimal config params
*
MemBlock: enable 3ld3st (#2524)
* enable 3ld3st
* assign enqLsq
* fix IssQueSize
* remove performance regression
* MMU: Fix ptwrepeater when 3ld + 3st
* fix minimal config params
* fix minimal config LoadQueueReplaySize
* add 3ld3st switch
* fix bank conflict valid logic
* fix strict memory ambiguous logic
* fix wakeup logic
* disable 3ld3st by default
* modify minimal config params
---------
Co-authored-by: Lyn <[email protected]>
Co-authored-by: good-circle <[email protected]>
show more ...
|