1## Batch Size 32 Compile true 2 3| Experiment | Warmup_latency (s) | Average_latency (s) | Throughput (samples/sec) | GPU Utilization (%) | 4| ---------- | ------------------ | ------------------- | ------------------------ | ------------------- | 5| original | 13.559 +/- 0.183 | 4.756 +/- 0.960 | 401.554 +/- 58.539 | 43.026 +/- 1.221 | 6| h2d_d2h_threads | 12.471 +/- 0.819 | 5.596 +/- 1.180 | 340.906 +/- 69.513 | 32.313 +/- 8.138 | 7