8966a895 | 02-Sep-2024 |
xu_zh <[email protected]> |
ICache: fix metaArray ECC check (#3419)
Currently, metaArray ECC check is valid 2 cycles after request:
https://github.com/OpenXiangShan/XiangShan/blob/49162c9ab67070931573c1d4a372e2c858a72716/sr
ICache: fix metaArray ECC check (#3419)
Currently, metaArray ECC check is valid 2 cycles after request:
https://github.com/OpenXiangShan/XiangShan/blob/49162c9ab67070931573c1d4a372e2c858a72716/src/main/scala/xiangshan/frontend/icache/ICache.scala#L262
However, prefetchPipe s1 handshakes with both WayLookup and prefetchPipe
s2 assuming that all signals of the metaArray.io.readResp are valid 1
cycle after the request, resulting in the error.
Simply removing this RegEnable may lead to problems with long timing
paths (metaArray (sram) -> ECC check (xor reduction) -> prefetchPipe s1
(wire) -> wayLookup (bypass, wire) -> mainPipe s0 (wire) -> mainPipe s1
(reg)), so no.
This PR may result in case-specific errors not being checked out, which
in turn results in additional fetch requests being sent to the L2 cache,
but does not causes corrupted data being sent to the backend. See
discussion in notes:
https://github.com/OpenXiangShan/XiangShan/blob/8b87b8dcbfd5945c5bd7815eb5e569fec252ddc6/src/main/scala/xiangshan/frontend/icache/IPrefetch.scala#L279-L293
There are 2 more potential solutions described in an internal yuque
document, however, due to the complexity of implementation, area
overhead and other considerations, the current solution is considered to
be optimal.
show more ...
|
f80535c3 | 14-Aug-2024 |
xu_zh <[email protected]> |
ICache: raise af if meta/data array ECC fail
In current design, meta/data array corruption does not raise any
exception (whether or not `io.csr_parity_enable === true.B`), which may
pose two probl
ICache: raise af if meta/data array ECC fail
In current design, meta/data array corruption does not raise any
exception (whether or not `io.csr_parity_enable === true.B`), which may
pose two problems:
1. When meta corrupt, `ptag` comparison result may be invalid, and thus
cache hit may be treated as a cache miss, thereby sending (pre)fetch
request to L2 cache incorrectly;
2. When meta/data/l2 corrupt, instruction data sent to the backend may
be invalid. Although the errors are sent to beu, which sends an
interrupt via plic, the timing of the interrupt is not as controllable
as an exception. It is therefore reasonable to mark invalid data as
access fault to keep it from execution.
This PR:
1. Raise af if meta/data array ECC fail (when `io.csr_parity_enable ===
true.B`), the priority of this af is lower than iTLB & PMP exceptions
2. Cancle (pre)fetching if meta array ECC fail (by merging
`meta_corrupt` exceptions to `s2_exception`)
Note:
RISC-V Machine ISA v1.13 (draft) introduced a "hardware error"
exception, described as:
> A Hardware Error exception is a synchronous exception triggered when
corrupted or uncorrectable data is accessed explicitly or implicitly by
an instruction. In this context, "data" encompasses all types of
information used within a RISC-V hart. Upon a hardware error exception,
the xepc register is set to the address of the instruction that
attempted to access corrupted data, while the xtval register is set
either to 0 or to the virtual address of an instruction fetch, load, or
store that attempted to access corrupted data. The priority of Hardware
Error exception is implementation-defined, but any given occurrence is
generally expected to be recognized at the point in the overall priority
order at which the hardware error is discovered.
Maybe it's better to raise hardware error instead of access fault when
ECC check failed. But it's draft and XiangShan backend does not
implement this exception code yet, so we still raise af here. This may
need to be modified in the future.
show more ...
|
b4f1e5b2 | 01-Jul-2024 |
xu_zh <[email protected]> |
IPrefetch: MSHR should update IPrefetch s1 waymask (#3122)
Fixes MC-Linux CI fail:
https://github.com/OpenXiangShan/XiangShan/actions/runs/9709320741/job/26802800197.
In IPrefetch:
1. s0 send r
IPrefetch: MSHR should update IPrefetch s1 waymask (#3122)
Fixes MC-Linux CI fail:
https://github.com/OpenXiangShan/XiangShan/actions/runs/9709320741/job/26802800197.
In IPrefetch:
1. s0 send read request to MetaArray
2. s1:
- receive response from MetaArray (therefore `s1_SRAM_valid === true.B`)
- and receive update request from MSHR(`fromMSHR.valid &&
!fromMSHR.bits.corrupt === true.B`)
- and `s1_fire === true.B`
3. waymasks directly from SRAM(which might be outdated) enters s2 stage,
and update request from MSHR is actually discarded.
If it is a miss(`waymask === 0.U`), IPrefetch will send miss request to
MSHR. In this case, multiple refills of the same cache block may occur,
which in turn causes a bug with multiple hits in the MetaArray.
As a fix, we should use information from MSHR to update
`s1_SRAM_waymasks` too.
Local MC-Linux test passed with seed=1244.
show more ...
|