History log of /XiangShan/src/main/scala/xiangshan/cache/dcache/CtrlUnit.scala (Results 1 – 1 of 1)
Revision Date Author Comments
# 72dab974 16-Dec-2024 cz4e <[email protected]>

feat(CtrlUnit, DCache): support L1 DCache RAS (#4009)

# L1 DCache RAS extension support

The L1 DCache supports the part of Reliability, Availability, and
Serviceability (RAS) Extension.
* L1 DCache

feat(CtrlUnit, DCache): support L1 DCache RAS (#4009)

# L1 DCache RAS extension support

The L1 DCache supports the part of Reliability, Availability, and
Serviceability (RAS) Extension.
* L1 DCache protection with Single Error Correct Double Error Detect
(SECDED) ECC on the RAMs. This includes the L1 DChace tag and data RAMs.
Not recovery error tag or data.
* Fault Handling Interrupt (Bus Error Unit Interrupt,BEU, 65)
* Error inject

## ECC Error Detect
An error might be triggered, when access L1 DCache.
* **Error Report**:
* Tag ECC Error: As long as an ECC error occurs on a certain path, it
is judged that an ECC error has occurred.
* Data ECC Error: If an ECC error occurs in the hit line, it is
considered
that an ECC error has occurred. If it does not hit, it will not be
processed.
* If an instruction access triggers an ECC error, a Hardware error is
considered and an exception is reported.
* Whenever there is an error in starting, an error message needs to
be sent to BEU.
* When the hardware detects an error, it reports it to the BEU and
triggers the NMI external interrupt(65).

* **Load instruction**:
* Only ECC errors of tags or data will be triggered during execution,
and the errors will be reported to the BEU and a `Hardware Error`
will be reported.

* **Probe/Snoop**:
* If a tag ecc error occurs, there is no need to change the cache
status,
and a `ProbeAck` with `corrupt=1` needs to be returned to l2.
* If a data ecc error occurs, change the cache status according to
the rules. If data needs to be returned, `ProbeAckData` with `corrupt=1`
needs to be returned to l2.

* **Replace/Evict**:
* `ReleaseData` with `corrupt=1` needs to be returned to l2.

* **Store to L1 DCache**:
* If a tag ecc error occurs, the cacheline is released according to the
`Repalce/Evict` process and the data is written to L1 DCache without
reporting errors to l2.
* If a data ecc error occurs, the data is written directly without
reporting
the error to l2.

* **Atomics**:
* report `Hardware Error`, do not report errors to l2.

## Error Inject
Each core's L1 DCache is configured with a memory map
register-controlled
controller, and each hardware unit that supports ECC is configured with
a
control bank. After the Bank register configuration is completed, L1
DCache
will trigger an ecc error for the first access L1 DCache.
<div style="text-align: center;">
<img
src="https://github.com/user-attachments/assets/8c4d23c5-0324-4e52-bcf4-29b47a282d72"
alt="err_inject" width="200" />
</div>

### Address Space
Address space `0x38022000`-`0x3802207F`, a total of 128 bytes of space,
this space is the local space of each hart.
<div style="text-align: center;">
<img width="292" alt="ctl_bank"
src="https://github.com/user-attachments/assets/89f88b24-37a4-4786-a192-401759eb95cf">
</div>

### L1 DCache Control Bank
Each Control Bank contains registers: `ECCCTL`, `ECCEID`, `ECCMASK`,
each register is 8 bytes.
<img width="414" alt="eccctl"
src="https://github.com/user-attachments/assets/b22ff437-d05d-4b3c-a353-dbea1afdc156">
* ECCCTL(ECC Control): ECC injection control register.
* `ese(error signaling enable)`: Indicates that the injection is valid
and is initialized to 0. When the injection is successful and `pst==0`,
ese will be clean.
* `pst(persist)`: Continuously inject signals. When `pst==1`,
the `ECCEID`
counter decreases to 0 and after successful injection, the
injection timer will be restored to the last set `ECCEID` and
re-injected;
when `pst==0`, it will be injected only once.
* `ede(error delay enable)`: Indicates that counter is valid and
initialized to 0. If
* `ese==1` and `ede==0`, error injection is effective immediately.
* `ese==1` and `ede==1`, you need to wait until `ECCEID`
decrements to 0 before the injection is effective.
* `cmp(component)`: Injection target, initialized to 0.
* 1'b0: The injection object is tag.
* 1'b1: The injection object is data.
* `bank`: The bank valid signal is initialized to 0. When the bit in
the `bank` is set, the corresponding mask is valid.

<img width="414" alt="ecceid"
src="https://github.com/user-attachments/assets/8cea0d8d-2540-44b1-b1f9-c1ed6ec5341e">

* ECCEID(ECC Error Inject Delay): ECC injection delay controller.
* When `ese==1` and `ede==1`, it
starts to decrease until it reaches 0. Currently, the same clock as
the core frequency is used, which can also be divided. Since ECC
injection relies on L1 DCache access, the time of the `EID` and the
time when the ECC error is triggered may not be consistent.

<img width="414" alt="eccmask"
src="https://github.com/user-attachments/assets/b1be83fd-17a6-4324-8aa6-45858249c476">

* ECCMASK(ECC Mask): ECC injection mask register.
* 0 means no inversion, 1 means flip.
Tag injection only uses the bits in `ECCMASK0` corresponding to
the tag length.

### Error Inject Example
```
1 # set control bank base address
2 mv x3, $(BASEADDR)
3
4 # set eid
5 mv x5, 500 # delay 500 cycles
6 sd x5, 8(x3) # mmio store
7
8 # set mask
9 mv x5, 0x1 # flip bit 0
10 sd x5, 16(x3) # mmio store
11
12 # set ctl
13 mv x5, 0x7 # comp = 0, ede = 1, pst = 1, ese = 1
14 sd x5, 0(x3) # mmio store
```

show more ...