6b6d88e6 | 22-Dec-2021 |
William Wang <[email protected]> |
mem: optimize missq reject to lq timing (#1375)
* mem: optimize missq reject to lq timing
DCache replay request is quite slow to generate, as it need to compare
load address with address in all
mem: optimize missq reject to lq timing (#1375)
* mem: optimize missq reject to lq timing
DCache replay request is quite slow to generate, as it need to compare
load address with address in all valid miss queue entries.
Now we delay the usage of replay request from data cache.
Now replay request will not influence normal execution flow until
load_s3 (1 cycle after load_s2, load result writeback to RS).
Note1: It is worth mentioning that "select refilling inst for load
writeback" will be disabled if dcacheRequireReplay in the
last cycle.
Note2: ld-ld violation or forward failure will let an normal load inst replay
from fetch. If TLB hit and ld-ld violation / forward failure happens,
we write back that inst immediately. Meanwhile, such insts will not be
replayed from rs.
* dcache: compare probe block addr instead of full addr
show more ...
|
10551d4e | 21-Dec-2021 |
Yinan Xu <[email protected]> |
lsq: add LsqEnqCtrl to optimize enqueue timing (#1380)
This commit adds an LsqEnqCtrl module to add one more clock cycle
between dispatch and load/store queue.
LsqEnqCtrl maintains the lqEnqPtr/
lsq: add LsqEnqCtrl to optimize enqueue timing (#1380)
This commit adds an LsqEnqCtrl module to add one more clock cycle
between dispatch and load/store queue.
LsqEnqCtrl maintains the lqEnqPtr/sqEnqPtr and lqCounter/sqCounter.
They are used to determine whether load/store queue can accept new
instructions. After that, instructions are sent to load/store queue.
This module decouples queue allocation and real enqueue.
Besides, uop storage in load/store queue are optimized. In dispatch,
only robIdx is required. Other information is naturally conveyed in
the pipeline and can be stored later in load/store queue if needed.
For example, exception vector, trigger, ftqIdx, pdest, etc are
unnecessary before the instruction leaves the load/store pipeline.
show more ...
|
a4e57ea3 | 20-Dec-2021 |
Li Qianruo <[email protected]> |
Merge branch 'master' into trigger |
08596256 | 13-Dec-2021 |
William Wang <[email protected]> |
trigger: fix lq hitvec raddr |
cdd255d8 | 10-Dec-2021 |
Li Qianruo <[email protected]> |
Merge branch 'master' into trigger |
1ca0e4f3 | 10-Dec-2021 |
Yinan Xu <[email protected]> |
core: refactor hardware performance counters (#1335)
This commit optimizes the coding style and timing for hardware
performance counters.
By default, performance counters are RegNext(RegNext(_)). |
6ab6918f | 09-Dec-2021 |
Yinan Xu <[email protected]> |
core: refactor writeback parameters (#1327)
This commit adds WritebackSink and WritebackSource parameters for
multiple modules. These traits hide implementation details from
other modules by defin
core: refactor writeback parameters (#1327)
This commit adds WritebackSink and WritebackSource parameters for
multiple modules. These traits hide implementation details from
other modules by defining IO-related functions in modules.
By using WritebackSink, ROB is able to choose the writeback sources.
Now fflags and exceptions are connected from exe units to reduce write
ports and optimize timing.
Further optimizations on write-back to RS and better coding style to
be added later.
show more ...
|
a4047ed0 | 05-Dec-2021 |
William Wang <[email protected]> |
trigger: fix lq trigger hit vec source |
f4d8d00e | 02-Dec-2021 |
William Wang <[email protected]> |
Optimize memblock timing (#1288)
* mem: delay uncache op start for 1 cycle
* dcache: decouple miss and replay signal
Now resp.miss will not depend on s2_nack_no_mshr
* lq,mem: give released
Optimize memblock timing (#1288)
* mem: delay uncache op start for 1 cycle
* dcache: decouple miss and replay signal
Now resp.miss will not depend on s2_nack_no_mshr
* lq,mem: give released flag update 1 more cycle
* chore: fix a name typo
* dcache: delay probe req for 1 cycle
show more ...
|
b978565c | 01-Dec-2021 |
William Wang <[email protected]> |
trigger: optimize memblock trigger timing
* For timing reasons, accurate load data trigger will not be used. Now load data trigger will report a hit on the following load * Only compare vaddr in loa
trigger: optimize memblock trigger timing
* For timing reasons, accurate load data trigger will not be used. Now load data trigger will report a hit on the following load * Only compare vaddr in load_s2, compare result will be stored in lq
show more ...
|
8a33de1f | 01-Dec-2021 |
Yinan Xu <[email protected]> |
rob,lsq: delay one more cycle for commits (#1286) |
4f83157c | 24-Nov-2021 |
William Wang <[email protected]> |
sq: check addrValid in vpmaskNotEqual to avoid X (#1258) |
5668a921 | 16-Nov-2021 |
Jiawei Lin <[email protected]> |
Fix multi-core dedup bug (#1235)
* FDivSqrt: use hierarchy API to avoid dedup bug
* Dedup: use hartId from io port instead of core parameters
* Bump fudian |
96b1e495 | 15-Nov-2021 |
William Wang <[email protected]> |
Optmize memblock timing (#1218)
DCache timing problem has not been solved yet. DCache structure will be further changed.
* sbuffer: add extra perf counters
* sbuffer: optmize timeout replay ch
Optmize memblock timing (#1218)
DCache timing problem has not been solved yet. DCache structure will be further changed.
* sbuffer: add extra perf counters
* sbuffer: optmize timeout replay check timing
* sbuffer: optmize do_uarch_drain check timing
Now we only compare merge entry's vtag, check will not start until
mergeIdx is generated by PriorityEncoder
* mem, lq: optmize writeback select logic timing
* dcache: replace missqueue reill req arbiter
* dcache: refactor missqueue entry select logic
* mem: add comments for lsq data
* dcache: give amo alu an extra cycle
* sbuffer: optmize sbuffer forward data read timing
show more ...
|
72951335 | 15-Nov-2021 |
Li Qianruo <[email protected]> |
Trigger Implementation for Debug Mode (#1170)
* Untested Trigger Implementation
Co-authored-by: William Wang <[email protected]>
Co-authored-by: Lingrui98 <[email protected]>
Co-autho
Trigger Implementation for Debug Mode (#1170)
* Untested Trigger Implementation
Co-authored-by: William Wang <[email protected]>
Co-authored-by: Lingrui98 <[email protected]>
Co-authored-by: rvcoresjw <[email protected]>
show more ...
|
1545277a | 11-Nov-2021 |
Yinan Xu <[email protected]> |
top: enable fpga option for simulation emu (#1213)
* disable log as default
* code clean up |
300ded30 | 04-Nov-2021 |
William Wang <[email protected]> |
Optimize dcache timing (#1195)
* dcache: do not check readline rmask
This should opt bank_conflict check timing
* dcache: block replace if store s1 valid
It takes quite long to generate way
Optimize dcache timing (#1195)
* dcache: do not check readline rmask
This should opt bank_conflict check timing
* dcache: block replace if store s1 valid
It takes quite long to generate way_en in mainpipe s1. As a result,
use s1 way_en to judge if replace should be blocked will cause severe
timing problem
Now we simply block replace if mainpipe.s1.valid
Refill timing to be optmized later
* sbuffer: delay sbuffer enqueue for 1 cycle
With store queue growing larger, read data from datamodule nearly
costs a whole cycle. Hence we delay sbuffer enqueue for 1 cycle
for better timing.
* dcache: reduce probe queue size
* dcache: replace probe pipe req RRArbiter with Arbiter
* dcache: reduce writeback queue size for timing opt
* dcache: delay wbqueue enqueue req for 1 cycle
Addr enqueue req will compare its addr with addrs in all writeback
entries to check if it should be blocked. Delay enqueue req will
give that process more time.
* dcache: set default replacer to setplru
It does not change current design
* dcache: fix wbqueue req_delayed deadlock
We delayed writeback queue enq for 1 cycle, missQ req does not
depend on wbQ enqueue. As a result, missQ req may be blocked
in req_delayed. When grant comes, that req should also be updated
* dcache: remove outdated require
* dcache: replace missReqArb RRArbiter with Arbiter
* perf: add detailed histogram for low dcache latency
* dcache: fix wbqueue entry alloc logic
* dcache: opt probe req timing
In current design, resv_set is maintained in dcache. All probe req
will be blocked if that addr is in resv_set.
However, checking if that addr is in resv_set costs almost half a cycle,
which causes severe timing problem.
Now when we update update_resv_set, all probe reqs will be blocked
in the next cycle. It should give Probe reservation set addr compare an
independent cycle, which will lead to better timing
show more ...
|
e9092fe2 | 01-Nov-2021 |
Lemover <[email protected]> |
tlb: timing optimizatin in hit check, fault check, atomic unit and store unit (#1189)
* tlb: timing optimization, fault doesn't care hit now
* mem.atomic: 'paddr write to reg' dont care hit
* mem
tlb: timing optimizatin in hit check, fault check, atomic unit and store unit (#1189)
* tlb: timing optimization, fault doesn't care hit now
* mem.atomic: 'paddr write to reg' dont care hit
* mem.atomic: regnext exception and check them next cycle
* tlb.hit: dont care set-bits when hit check
* storequeue: divide tlb.miss with paddr write for opt timing
* mem.atomic: fix bug that wrong usage addrAligned
show more ...
|
beabc72d | 29-Oct-2021 |
William Wang <[email protected]> |
mem: fix ld-ld violation check, enable it by default (#1184) |
ca2f90a6 | 25-Oct-2021 |
Lemover <[email protected]> |
pma: add pmp-like pma, software can read and write (#1169)
remove the old hard-wired pma and turn to pmp-like csr registers. the pma config is writen in pma register.
1. pma are m-priv csr, so only
pma: add pmp-like pma, software can read and write (#1169)
remove the old hard-wired pma and turn to pmp-like csr registers. the pma config is writen in pma register.
1. pma are m-priv csr, so only m-mode csrrw can change pma
2. even in m-mode, pma should be always checked, no matter lock or not
3. so carefully write pma, make sure not to "suicide"
* pma: add pmp-like pma, just module/bundle added, not to circuit
use reserved 2 bits as atomic and cached
* pma: add pmp-like pma into pmp module
pma have two more attribute than pmp
1. atmoic;
2. c/cache, if false, go to mmio.
pma uses 16+4 machine-level custom ready write csr.
pma will always be checked even in m-mode.
* pma: remove the old MemMap in tlb, mmio arrives next cycle
* pma: ptw raise af when mmio
* pma: fix bug of match's zip with last entry
* pma: fix bug of pass reset signal through method's parameter
strange bug, want to reset, pass reset signal to a method, does not
work.
import chisel3.Module.reset, the method can access reset it's self.
* pma: move some method to trait and fix bug of pma_init value
* pma: fix bug of pma init value assign way
* tlb: fix stupid bug that pf.ld not & fault_valid
* loadunit: fix bug that uop is flushed, pmp's dcache kill failed also
* ifu: mmio access needs f2_valid now
* loadunit: if mmio and have sent fastUop, flush pipe when commit
* storeunit: stu->lsq at stage1 and re-in lsq at stage2 to update mmio
show more ...
|
7057cff8 | 24-Oct-2021 |
Yinan Xu <[email protected]> |
lsq: enqueue at dispatch2 stage (#1167)
This commit changes when instructions enter load/store queue.
Now, at dispatch2, load/store instructions enter load/store queue. |
cd365d4c | 23-Oct-2021 |
rvcoresjw <[email protected]> |
add performance counters at core and hauncun (#1156)
* Add perf counters
* add reg from hpm counter source
* add print perfcounter enable |
71b114f8 | 22-Oct-2021 |
William Wang <[email protected]> |
mem: remove outdated uncache state assertion (#1159)
Now uncache store may commit together with cached store. For example:
0: sd to uncache_addr
4: sd to cache_addr
8: sd to cache_addr
May com
mem: remove outdated uncache state assertion (#1159)
Now uncache store may commit together with cached store. For example:
0: sd to uncache_addr
4: sd to cache_addr
8: sd to cache_addr
May commit in the same cycle.
It should eliminate wrong assertion in xalancbmk.
show more ...
|
67682d05 | 22-Oct-2021 |
William Wang <[email protected]> |
Add ld-ld violation check (#1140)
* mem: support ld-ld violation check
* mem: do not fast wakeup if ld vio check failed
* mem: disable ld-ld vio check after core reset |
ca18a0b4 | 20-Oct-2021 |
William Wang <[email protected]> |
mem: add Zicbom and Zicboz support (#1145)
Now we merge them for timing opt, unit test to be added later |