39f2ec76 | 09-Aug-2022 |
William Wang <[email protected]> |
lq: add 1 extra stage for lq data write (#1705)
Now lq data is divided into 8 banks by default. Write to lq
data takes 2 cycles to finish
Lq data will not be read in at least 2 cycles after wri
lq: add 1 extra stage for lq data write (#1705)
Now lq data is divided into 8 banks by default. Write to lq
data takes 2 cycles to finish
Lq data will not be read in at least 2 cycles after write, so it is ok
to add this delay. For example:
T0: update lq meta, lq data write req start
T1: lq data write finish, new wbidx selected
T2: read lq data according to new wbidx selected
show more ...
|
0a992150 | 06-Aug-2022 |
William Wang <[email protected]> |
std: add an extra pipe stage for std (#1704) |
29b5bc3c | 03-Aug-2022 |
William Wang <[email protected]> |
sq: always update data/addrModule when st s1_valid (#1703) |
e5cb7504 | 30-Jul-2022 |
William Wang <[email protected]> |
lq: fix X introduced by violation check (#1695) |
c1af2986 | 27-Jul-2022 |
William Wang <[email protected]> |
lq: opt lq data wen (load_s2_valid) fanout (#1687) |
67cddb05 | 18-Nov-2022 |
William Wang <[email protected]> |
ldu: report ldld vio and fwd error in s3 (#1685)
It should fix the timing problem caused by ldld violation check and forward error check |
353424a7 | 26-Jul-2022 |
William Wang <[email protected]> |
lq: update data field iff load_s2 valid (#1680)
Now we update data field (fwd data, uop) in load queue when load_s2 is valid. It will help to on lq wen fanout problem.
State flags will be treated d
lq: update data field iff load_s2 valid (#1680)
Now we update data field (fwd data, uop) in load queue when load_s2 is valid. It will help to on lq wen fanout problem.
State flags will be treated differently. They are still updated accurately according to loadIn.valid
show more ...
|
3d3419b9 | 26-Jul-2022 |
William Wang <[email protected]> |
sbuffer: add an extra cycle for sbuffer write
In previous design, sbuffer valid entry select and sbuffer data write are in the same cycle, which caused huge fanout. An extra write stage is added to
sbuffer: add an extra cycle for sbuffer write
In previous design, sbuffer valid entry select and sbuffer data write are in the same cycle, which caused huge fanout. An extra write stage is added to solve this problem.
Now sbuffer enq logic is divided into 3 stages:
sbuffer_in_s0: * read data and meta from store queue * store them in 2 entry fifo queue
sbuffer_in_s1: * read data and meta from fifo queue * update sbuffer meta (vtag, ptag, flag) * prevert that line from being sent to dcache (add a block condition) * prepare cacheline level write enable signal, RegNext() data and mask
sbuffer_in_s2: * use cacheline level buffer to update sbuffer data and mask * remove dcache write block (if there is)
show more ...
|
eb163ef0 | 17-Nov-2022 |
Haojin Tang <[email protected]> |
top-down: introduce top-down counters and scripts (#1803)
* top-down: add initial top-down features
* rob600: enlarge queue/buffer size
* :art: After git pull
* :sparkles: Add BranchResteer
top-down: introduce top-down counters and scripts (#1803)
* top-down: add initial top-down features
* rob600: enlarge queue/buffer size
* :art: After git pull
* :sparkles: Add BranchResteers->CtrlBlock
* :sparkles: Cg BranchResteers after pending
* :sparkles: Add robflush_bubble & ldReplay_bubble
* :ambulance: Fix loadReplay->loadReplay.valid
* :art: Dlt printf
* :sparkles: Add stage2_redirect_cycles->CtrlBlock
* :saprkles: CtrlBlock:Add s2Redirect_when_pending
* :sparkles: ID:Add ifu2id_allNO_cycle
* :sparkles: Add ifu2ibuffer_validCnt
* :sparkles: Add ibuffer_IDWidth_hvButNotFull
* :sparkles: Fix ifu2ibuffer_validCnt
* :ambulance: Fix ibuffer_IDWidth_hvButNotFull
* :sparkles: Fix ifu2ibuffer_validCnt->stop
* feat(buggy): parameterize load/store pipeline, etc.
* fix: use LoadPipelineWidth rather than LoadQueueSize
* fix: parameterize `rdataPtrExtNext`
* fix(SBuffer): fix idx update logic
* fix(Sbuffer): use `&&` to generate flushMask instead of `||`
* fix(atomic): parameterize atomic logic in `MemBlock`
* fix(StoreQueue): update allow enque requirement
* chore: update comments, requirements and assertions
* chore: refactor some Mux to meet original logic
* feat: reduce `LsMaxRsDeq` to 2 and delete it
* feat: support one load/store pipeline
* feat: parameterize `EnsbufferWidth`
* chore: resharp codes for better generated name
* top-down: add initial top-down features
* rob600: enlarge queue/buffer size
* top-down: add l1, l2, l3 and ddr loads bound perf counters
* top-down: dig into l1d loads bound
* top-down: move memory related counters to `Scheduler`
* top-down: add 2 Ldus and 2 Stus
* top-down: v1.0
* huancun: bump HuanCun to a version with top-down
* chore: restore parameters and update `build.sc`
* top-down: use ExcitingUtils instead of BoringUtils
* top-down: add switch of top-down counters
* top-down: add top-down scripts
* difftest: enlarge stuck limit cycles again
Co-authored-by: gaozeyu <[email protected]>
show more ...
|
e323d51e | 13-Oct-2022 |
happy-lx <[email protected]> |
lq: update data field iff load_s2 valid (#1795)
Now we update data field (fwd data, uop) in load queue when load_s2
is valid. It will help to on lq wen fanout problem.
State flags will be treate
lq: update data field iff load_s2 valid (#1795)
Now we update data field (fwd data, uop) in load queue when load_s2
is valid. It will help to on lq wen fanout problem.
State flags will be treated differently. They are still updated
accurately according to loadIn.valid
Co-authored-by: William Wang <[email protected]>
show more ...
|
03efd994 | 30-Sep-2022 |
happy-lx <[email protected]> |
Sync timing modification of #1681 and #1793 (#1793)
* ldu: optimize dcache hitvec wiring
In previous design, hitvec is generated in load s1, then send to dcache
and lsu (rs) side separately. As
Sync timing modification of #1681 and #1793 (#1793)
* ldu: optimize dcache hitvec wiring
In previous design, hitvec is generated in load s1, then send to dcache
and lsu (rs) side separately. As dcache and lsu (rs side) is far in real
chip, it caused severe wiring problem.
Now we generate 2 hitvec in parallel:
* hitvec 1 is generated near dcache.
To generate that signal, paddr from dtlb is sent to dcache in load_s1
to geerate hitvec. The hitvec is then sent to dcache to generate
data array read_way_en.
* hitvec 2 is generated near lsu and rs in load_s2, tag read result
from dcache, as well as coh_state, is sent to lsu in load_s1,
then it is used to calcuate hitvec in load_s2. hitvec 2 is used
to generate hit/miss signal used by lsu.
It should fix the wiring problem caused by hitvec
* ldu: opt loadViolationQuery.resp.ready timing
An extra release addr register is added near lsu to speed up the
generation of loadViolationQuery.resp.ready
* l1tlb: replace NormalPage data module and add duplicate resp result
data module:
add BankedSyncDataMoudleWithDup data module:
divided the data array into banks and read as Async, bypass write data.
RegNext the data result * #banks. choose from the chosen data.
duplicate:
duplicate the chosen data and return to outside(tlb).
tlb return (ppn+perm) * #DUP to outside (for load unit only)
TODO: load unit use different tlb resp result to different module.
one for lsq, one for dcache.
* l1tlb: Fix wrong vidx_bypass logic after using duplicate data module
We use BankedSyncDataMoudleWithDup instead of SyncDataModuleTemplate,
whose write ports are not Vec.
Co-authored-by: William Wang <[email protected]>
Co-authored-by: ZhangZifei <[email protected]>
Co-authored-by: good-circle <[email protected]>
show more ...
|
9bb2ac0f | 17-Sep-2022 |
happy-lx <[email protected]> |
lq: fix load load violation check logic (#1764)
* lq: fix load to load check logic
* when a load instruction missed in dcache and then refilled by dcache, waiting to be written back, if the block
lq: fix load load violation check logic (#1764)
* lq: fix load to load check logic
* when a load instruction missed in dcache and then refilled by dcache, waiting to be written back, if the block is released by dcache, it also needs to be marked as released
* lq: refix load-load violation check logic
show more ...
|
d46eedc2 | 24-Jul-2022 |
William Wang <[email protected]> |
lq: fix X caused by mem violation check (#1658)
Note that it is intend to prevent X prop in simulation, may cause
timing problem. These check can be removed safely for better timing |
867a84a8 | 07-Jul-2022 |
William Wang <[email protected]> |
chore: fix merge conflict |
b6d53cef | 05-Jul-2022 |
William Wang <[email protected]> |
mem,hpm: optimize memblock hpm timing |
51c35d40 | 01-Jul-2022 |
William Wang <[email protected]> |
sq: move dataInvalidSqIdx PriorityEncoder to load_s2 |
ee5099c9 | 01-Jul-2022 |
William Wang <[email protected]> |
lq: do not use refill mask to select wb entry
It will add l1 dcache miss latency by 1 cycle |
6786cfb7 | 28-Jun-2022 |
William Wang <[email protected]> |
dcache: repipeline ecc check logic for timing (#1582)
This commit re-pipelines ECC check logic in data cache and exception generate logic for better timing.
Now ecc error is checked 1 cycle after r
dcache: repipeline ecc check logic for timing (#1582)
This commit re-pipelines ECC check logic in data cache and exception generate logic for better timing.
Now ecc error is checked 1 cycle after reading result from data sram. An extra cycle is added for load
writeback to ROB.
Future work: move the pipeline to https://github.com/OpenXiangShan/XiangShan/blob/master/src/main/scala/xiangshan/backend/CtrlBlock.scala#L266-L277, which add a regnext.
* dcache: repipeline ecc check logic for timing
* chore: fix normal loadAccessFault logic
* wbu: delay load unit wb for 1 cycle
* dcache: add 1 extra cycle for beu error report
show more ...
|
46f74b57 | 06-May-2022 |
Haojin Tang <[email protected]> |
feat: parameterize load store (#1527)
* feat: parameterize load/store pipeline, etc.
* fix: use LoadPipelineWidth rather than LoadQueueSize
* fix: parameterize `rdataPtrExtNext`
* SBuffer:
feat: parameterize load store (#1527)
* feat: parameterize load/store pipeline, etc.
* fix: use LoadPipelineWidth rather than LoadQueueSize
* fix: parameterize `rdataPtrExtNext`
* SBuffer: fix idx update logic
* atomic: parameterize atomic logic in `MemBlock`
* StoreQueue: update allow enque requirement
* feat: support one load/store pipeline
* feat: parameterize `EnsbufferWidth`
* chore: resharp codes for better generated name
show more ...
|
09203307 | 02-Apr-2022 |
William Wang <[email protected]> |
mem: reduce refill to use latency (#1401)
* mem: optimize missq reject to lq timing
DCache replay request is quite slow to generate, as it need to compare
load address with address in all valid
mem: reduce refill to use latency (#1401)
* mem: optimize missq reject to lq timing
DCache replay request is quite slow to generate, as it need to compare
load address with address in all valid miss queue entries.
Now we delay the usage of replay request from data cache.
Now replay request will not influence normal execuation flow until
load_s3 (1 cycle after load_s2, load result writeback to RS).
It is worth mentioning that "select refilling inst for load
writeback" will be disabled if dcacheRequireReplay in the
last cycle.
* dcache: compare probe block addr instead of full addr
* mem: do not replay from RS when ldld vio or fwd failed
ld-ld violation or forward failure will let an normal load inst replay
from fetch. If TLB hit and ld-ld violation / forward failure happens,
we write back that inst immediately. Meanwhile, such insts will not be
replayed from rs.
It should fix "mem: optimize missq reject to lq timing"
* mem: fix replay from rs condition
* mem: reduce refill to use latency
This commit update lq entry flag carefully in load_s3 to avoid extra
refill delay. It will remove the extra refill delay introduced by #1375
without harming memblock timing.
In #1375, we delayed load refill when dcache miss queue entry fails
to accept a miss. #1375 exchanges performance for better timing.
* mem: fix rs feedback priority
When dataInvalid && mshrFull, a succeed refill should not cancel
rs replay.
show more ...
|
9658ce50 | 25-Mar-2022 |
LinJiawei <[email protected]> |
Bump chisel to 3.5.0 |
e41db104 | 27-Mar-2022 |
happy-lx <[email protected]> |
sq: fix use of OHToUInt (#1505) |
ef3b5b96 | 13-Feb-2022 |
William Wang <[email protected]> |
mem: fix ldld vio check implementation (#1456)
* mem: fix ldld vio mask gen logic
* mem: fix lq released flag update logic
Make sure that every load before a probe has correct released flag
mem: fix ldld vio check implementation (#1456)
* mem: fix ldld vio mask gen logic
* mem: fix lq released flag update logic
Make sure that every load before a probe has correct released flag
See the PR of this commit for illustration
* mem: fix ld-ld violation check logic
* ci: clean up workspace before do real test
* mem: reduce lq released flag update delay for 1 cycle
* chore: bump difftest to run no-smp diff
* ci: add mc test
* mem: fix lq released flag update logic
* chore: set difftest firstCommit_limit to 10000
* ci: use dual-nemu-so for mc test
show more ...
|
bbd4b852 | 06-Jan-2022 |
William Wang <[email protected]> |
trigger: add addr trigger for atom insts |
bde9b502 | 07-Jan-2022 |
Yinan Xu <[email protected]> |
difftest: delay commit and regfile for two cycles (#1417)
CSRs are updated later after instructions commit from ROB. Thus, we
need to delay difftest commit for several cycles. |