#
602aa9f1 |
| 02-Apr-2025 |
cz4e <[email protected]> |
feat(Sram): add `SRAM_CTL` interface (#4474)
* add `SRAM_CTL` interface for SRAMTemplate * use `SRAM_WITH_CTL` to enable, e.g. `make sim-verilog CONFIG=KunminghuV2Config RELEASE=1 SRAM_WITH_CTL=
feat(Sram): add `SRAM_CTL` interface (#4474)
* add `SRAM_CTL` interface for SRAMTemplate * use `SRAM_WITH_CTL` to enable, e.g. `make sim-verilog CONFIG=KunminghuV2Config RELEASE=1 SRAM_WITH_CTL=1`
show more ...
|
#
ebe07d61 |
| 20-Mar-2025 |
梁森 Liang Sen <[email protected]> |
feat(dfx): reuse dcache data sram read data register as mbist pipeline (#4371)
Co-authored-by: sfencevma <[email protected]>
|
#
11269ca7 |
| 09-Mar-2025 |
Tang Haojin <[email protected]> |
chore: fix several deprecation warning (#4352)
|
#
4b2c87ba |
| 27-Feb-2025 |
梁森 Liang Sen <[email protected]> |
feat(dfx): integerate dfx components (#4312)
|
#
fa5e530d |
| 21-Jan-2025 |
cz4e <[email protected]> |
timing(VSegmentUnit): duplicate latchVAddr (#4209)
* `latchVAddr` needs to index all dcache data sram from top to bottom, which causes a large fanout, so duplicate `latchVaddr`
|
#
0b9f4b2d |
| 25-Dec-2024 |
cz4e <[email protected]> |
area(CacheOpDecoder): remove CacheOpDecoder (#4050)
* CacheOpDecoder is no longer used
|
#
8b33cd30 |
| 13-Dec-2024 |
klin02 <[email protected]> |
feat(XSLog): move all XSLog outside WhenContext for collection
As data in WhenContext is not acessible in another module. To support XSLog collection, we move all XSLog and related signal outside Wh
feat(XSLog): move all XSLog outside WhenContext for collection
As data in WhenContext is not acessible in another module. To support XSLog collection, we move all XSLog and related signal outside WhenContext. For example, when(cond1){XSDebug(cond2, pable)} to XSDebug(cond1 && cond2, pable)
show more ...
|
#
452b5843 |
| 19-Dec-2024 |
Huijin Li <[email protected]> |
power(MemBlock): power optimization in MemBlock (#4059)
power optimization: (1) use “withClockGate” instead of ClockGate in DCache (2) reduce LSQ entries
|
#
72dab974 |
| 16-Dec-2024 |
cz4e <[email protected]> |
feat(CtrlUnit, DCache): support L1 DCache RAS (#4009)
# L1 DCache RAS extension support
The L1 DCache supports the part of Reliability, Availability, and Serviceability (RAS) Extension. * L1 DCache
feat(CtrlUnit, DCache): support L1 DCache RAS (#4009)
# L1 DCache RAS extension support
The L1 DCache supports the part of Reliability, Availability, and Serviceability (RAS) Extension. * L1 DCache protection with Single Error Correct Double Error Detect (SECDED) ECC on the RAMs. This includes the L1 DChace tag and data RAMs. Not recovery error tag or data. * Fault Handling Interrupt (Bus Error Unit Interrupt,BEU, 65) * Error inject
## ECC Error Detect An error might be triggered, when access L1 DCache. * **Error Report**: * Tag ECC Error: As long as an ECC error occurs on a certain path, it is judged that an ECC error has occurred. * Data ECC Error: If an ECC error occurs in the hit line, it is considered that an ECC error has occurred. If it does not hit, it will not be processed. * If an instruction access triggers an ECC error, a Hardware error is considered and an exception is reported. * Whenever there is an error in starting, an error message needs to be sent to BEU. * When the hardware detects an error, it reports it to the BEU and triggers the NMI external interrupt(65).
* **Load instruction**: * Only ECC errors of tags or data will be triggered during execution, and the errors will be reported to the BEU and a `Hardware Error` will be reported.
* **Probe/Snoop**: * If a tag ecc error occurs, there is no need to change the cache status, and a `ProbeAck` with `corrupt=1` needs to be returned to l2. * If a data ecc error occurs, change the cache status according to the rules. If data needs to be returned, `ProbeAckData` with `corrupt=1` needs to be returned to l2.
* **Replace/Evict**: * `ReleaseData` with `corrupt=1` needs to be returned to l2.
* **Store to L1 DCache**: * If a tag ecc error occurs, the cacheline is released according to the `Repalce/Evict` process and the data is written to L1 DCache without reporting errors to l2. * If a data ecc error occurs, the data is written directly without reporting the error to l2.
* **Atomics**: * report `Hardware Error`, do not report errors to l2.
## Error Inject Each core's L1 DCache is configured with a memory map register-controlled controller, and each hardware unit that supports ECC is configured with a control bank. After the Bank register configuration is completed, L1 DCache will trigger an ecc error for the first access L1 DCache. <div style="text-align: center;"> <img src="https://github.com/user-attachments/assets/8c4d23c5-0324-4e52-bcf4-29b47a282d72" alt="err_inject" width="200" /> </div>
### Address Space Address space `0x38022000`-`0x3802207F`, a total of 128 bytes of space, this space is the local space of each hart. <div style="text-align: center;"> <img width="292" alt="ctl_bank" src="https://github.com/user-attachments/assets/89f88b24-37a4-4786-a192-401759eb95cf"> </div>
### L1 DCache Control Bank Each Control Bank contains registers: `ECCCTL`, `ECCEID`, `ECCMASK`, each register is 8 bytes. <img width="414" alt="eccctl" src="https://github.com/user-attachments/assets/b22ff437-d05d-4b3c-a353-dbea1afdc156"> * ECCCTL(ECC Control): ECC injection control register. * `ese(error signaling enable)`: Indicates that the injection is valid and is initialized to 0. When the injection is successful and `pst==0`, ese will be clean. * `pst(persist)`: Continuously inject signals. When `pst==1`, the `ECCEID` counter decreases to 0 and after successful injection, the injection timer will be restored to the last set `ECCEID` and re-injected; when `pst==0`, it will be injected only once. * `ede(error delay enable)`: Indicates that counter is valid and initialized to 0. If * `ese==1` and `ede==0`, error injection is effective immediately. * `ese==1` and `ede==1`, you need to wait until `ECCEID` decrements to 0 before the injection is effective. * `cmp(component)`: Injection target, initialized to 0. * 1'b0: The injection object is tag. * 1'b1: The injection object is data. * `bank`: The bank valid signal is initialized to 0. When the bit in the `bank` is set, the corresponding mask is valid. <img width="414" alt="ecceid" src="https://github.com/user-attachments/assets/8cea0d8d-2540-44b1-b1f9-c1ed6ec5341e">
* ECCEID(ECC Error Inject Delay): ECC injection delay controller. * When `ese==1` and `ede==1`, it starts to decrease until it reaches 0. Currently, the same clock as the core frequency is used, which can also be divided. Since ECC injection relies on L1 DCache access, the time of the `EID` and the time when the ECC error is triggered may not be consistent.
<img width="414" alt="eccmask" src="https://github.com/user-attachments/assets/b1be83fd-17a6-4324-8aa6-45858249c476">
* ECCMASK(ECC Mask): ECC injection mask register. * 0 means no inversion, 1 means flip. Tag injection only uses the bits in `ECCMASK0` corresponding to the tag length.
### Error Inject Example ``` 1 # set control bank base address 2 mv x3, $(BASEADDR) 3 4 # set eid 5 mv x5, 500 # delay 500 cycles 6 sd x5, 8(x3) # mmio store 7 8 # set mask 9 mv x5, 0x1 # flip bit 0 10 sd x5, 16(x3) # mmio store 11 12 # set ctl 13 mv x5, 0x7 # comp = 0, ede = 1, pst = 1, ese = 1 14 sd x5, 0(x3) # mmio store ```
show more ...
|
#
c5a867ff |
| 16-Dec-2024 |
Anzo <[email protected]> |
fix(BankedDataArray): fix `oldest` selection logic (#4039)
Changes in this Commit(8ffb12e45361b854daf46d200530e9b2b01e4a9c) will make: In this case, there will be multiple replay: ldu0,1,2's lqptr=
fix(BankedDataArray): fix `oldest` selection logic (#4039)
Changes in this Commit(8ffb12e45361b854daf46d200530e9b2b01e4a9c) will make: In this case, there will be multiple replay: ldu0,1,2's lqptr= [5,7,6], bank_conflict only in ldu1 and ldu2. Ideally only replay ldu1, but here both ldu1 and ldu2 will replay.
This mod fixes the issue and theoretically performance will improve again.
show more ...
|
#
8ffb12e4 |
| 13-Dec-2024 |
Anzo <[email protected]> |
fix(bank_conflict): Selecting the oldest Load causes a conflict (#4036)
This modification changes `load bank conflict` from [default priority 0
1 2] to [so that the oldest Load does not have a `ban
fix(bank_conflict): Selecting the oldest Load causes a conflict (#4036)
This modification changes `load bank conflict` from [default priority 0
1 2] to [so that the oldest Load does not have a `bank conflict`].
In the following, `Load 0` refers to `LoadUnit 0`.
For example, before:
Load 0 lqidx 5
Load 1 lqidx 3
Load 2 lqidx 8
Assuming that three Loads have `bank conflict`, then we will default to
making Load1 and Load2 have `bank conflict` so that they can be
replayed.
---
However, this may lead to deadlocks in some cases.
For example:
Load 0 robidx 7
Store 0 robidx 6
Load 1 robidx 5
`Store 0` is dependent on `Load 1` for data, while `Load 0` is dependent
on `Store 0` for data, and `Load 0` and `Load 1` will have a `bank
conflict`.
In this case then, `Load 1` will `Replay` because of `bank conflict` and
`Load 0` will `Replay` because of `forward fault`(because of misalign).
---
With the modification, we will choose to make the oldest Load not
generate `bank conflict`, thus circumventing the jamming problem.
**Note !!! This may introduce performance fluctuations (up or down)**
show more ...
|
#
98d2aaa1 |
| 12-Dec-2024 |
cz4e <[email protected]> |
fix(BankedDataArray): fix readline error_delayed selection (#4018)
Bug description:
use **s2** index to select **s3** readline **error_delayed**
Fix:
use **s3** index to select **s3** readline
fix(BankedDataArray): fix readline error_delayed selection (#4018)
Bug description:
use **s2** index to select **s3** readline **error_delayed**
Fix:
use **s3** index to select **s3** readline **error_delayed**
show more ...
|
#
b34797bc |
| 25-Nov-2024 |
cz4e <[email protected]> |
area(DCache ECC): combine ecc with tag/data (#3902)
|
#
c49ebec8 |
| 18-Nov-2024 |
Haoyuan Feng <[email protected]> |
docs: add acknowledgements (#3861)
|
#
a5f58fbc |
| 29-Sep-2024 |
lixin <[email protected]> |
timing(dataArray): seperate bankedDataRead kill
Do not let banked_read_valid include kill to improve the timing of reading sram. Later, use kill to determine bankConflict in load s2.
fix(BankedData
timing(dataArray): seperate bankedDataRead kill
Do not let banked_read_valid include kill to improve the timing of reading sram. Later, use kill to determine bankConflict in load s2.
fix(BankedDataArray): remove kill logic when generate rr_bank_conflict
data_bank will select the read address based on the priority of the valid signal. When there are multiple read requests, bank conflicts occur and the high-priority needs to be killed, the data read by the low-priority loadunit will be overwritten.
show more ...
|
#
b32e9518 |
| 08-Nov-2024 |
Huijin Li <[email protected]> |
power(MemBlock): add ClockGate for DCache SRAM (#3824)
By using ClockGate for DCache SRAM, memory Power has 64% reduction,
MemBlock total power has 23.38% reduction.
|
#
7bd3dbdd |
| 06-Sep-2024 |
happy-lx <[email protected]> |
fix(dcache): fix perf bug of BankedDataArray (#3509)
If the addresses(for example:0x88000000, 0x90000000) of two read
requests fall in the same dcache set(0), the same bank(0), and different
ways,
fix(dcache): fix perf bug of BankedDataArray (#3509)
If the addresses(for example:0x88000000, 0x90000000) of two read
requests fall in the same dcache set(0), the same bank(0), and different
ways, bank conflict will occur in the previous design.
In fact, in the design of BankedDataArray, each read request will read
all the way of an entire bank. So this situation should not necessarily
produce a bank conflict.
code Example:
li x31,10
a:
li x30,1024
li x21,0x88000000
li x22,0x90000000
b:
ld x3,0(x21)
ld x4,0(x22)
addi x21,x21,8
addi x22,x22,8
addi x30,x30,-1
bnez x30,b
addi x31,x31,-1
bnez x31,a
show more ...
|
#
08b0bc30 |
| 03-Sep-2024 |
happy-lx <[email protected]> |
timing(MemBlock): optimize MemBlock timing (#3467)
This PR optimizes the timing of MemBlock. Specific optimizations include
but are not limited to:
+ TLB use the redirect for the next cycle
+ Opt
timing(MemBlock): optimize MemBlock timing (#3467)
This PR optimizes the timing of MemBlock. Specific optimizations include
but are not limited to:
+ TLB use the redirect for the next cycle
+ Optimize VLSU feedback and redirect
+ Optimise ldCancel and writeback signal generation
+ Optimise TLB Query Vaddr/hlv/hlvx/valid etc
+ Delay MMIO Store writeback for 1 Cycle
+ Fix tlbNoQuery and pmp logic
+ Remove clock gating for s3_fast_rep
+ Remove wbq conflict check to LoadPipe/MainPipe
+ Remove Mux in dcache resp data
+ Optimise data generation logic of LoadUnit
+ Duplicate Register in LoadUnit for data writeback
+ Duplicate Register in loadPipe for missQueue enq
+ Add skid buffer in VLSU
+ Select data from metaArray at S1
+ Simplify the enqueuing logic of missQueue
+ Separately generate the ready logic of miss Queue
+ Relax the conditions valid for bankdataArray reads
+ Add Reg between Dcache Mainpipe with sms prefetcher
+ Optimise store exceptionBuffer pipeline
---------
Co-authored-by: weiding liu <[email protected]>
Co-authored-by: Charlie Liu <[email protected]>
Co-authored-by: good-circle <[email protected]>
show more ...
|
#
d4564868 |
| 17-Jul-2024 |
weiding liu <[email protected]> |
Dcache: refactor dcache's read data delay for better port timing
|
#
4a0e27ec |
| 31-Jul-2024 |
Yanqin Li <[email protected]> |
wpu: fix the issue of abnormal power (#2976)
fix points:
1. parameter bug in DCacheWrapper
2. add clock gate to avoid frequent flip in BankedDataArray
3. remove redundant designs in WPU
power
wpu: fix the issue of abnormal power (#2976)
fix points:
1. parameter bug in DCacheWrapper
2. add clock gate to avoid frequent flip in BankedDataArray
3. remove redundant designs in WPU
power comparison:

show more ...
|
#
e3da8bad |
| 22-Jul-2024 |
Tang Haojin <[email protected]> |
build: purge chisel 3 and add deprecation check (#3250)
|
#
31d5a9c4 |
| 09-Jan-2024 |
sfencevma <[email protected]> |
ECC: add enable option for ecc
|
#
5adc4829 |
| 16-Jun-2024 |
Yanqin Li <[email protected]> |
memblock: add rest clockgate of reg (#3017)
Co-authored-by: cai luoshan <[email protected]> Co-authored-by: Cai Luoshan <[email protected]> Co-authored-by: good-circle <
memblock: add rest clockgate of reg (#3017)
Co-authored-by: cai luoshan <[email protected]> Co-authored-by: Cai Luoshan <[email protected]> Co-authored-by: good-circle <[email protected]> Co-authored-by: Ma-YX <[email protected]> Co-authored-by: Ma-YX <[email protected]> Co-authored-by: CharlieLiu <[email protected]>
show more ...
|
#
0184a80e |
| 15-Jun-2024 |
Yanqin Li <[email protected]> |
L1CacheErrorInfo: code refactor for correct and convenient clockgate (#3044)
|
#
c686adcd |
| 10-May-2024 |
Yinan Xu <[email protected]> |
Bump utility and disable ConstantIn by default (#2955)
* use BigInt for initValue of Constantin.createRecord
* use WITH_CONSTANTIN=1 to enable the ConstantIn plugin
|