1## Batch Size 128 Compile false 2 3| Experiment | Warmup_latency (s) | Average_latency (s) | Throughput (samples/sec) | GPU Utilization (%) | 4| ---------- | ------------------ | ------------------- | ------------------------ | ------------------- | 5| original | 5.355 +/- 0.293 | 14.172 +/- 0.267 | 518.884 +/- 7.854 | 56.108 +/- 0.862 | 6| h2d_d2h_threads | 3.810 +/- 0.319 | 14.146 +/- 1.145 | 551.909 +/- 34.079 | 52.057 +/- 4.121 | 7| 2_predict_workers | 3.639 +/- 0.037 | 11.161 +/- 0.160 | 636.701 +/- 14.753 | 53.279 +/- 2.659 | 8| 3_predict_workers | 4.930 +/- 0.060 | 10.532 +/- 0.801 | 677.736 +/- 25.115 | 53.806 +/- 1.324 | 9| 4_predict_workers | 3.819 +/- 0.253 | 11.451 +/- 0.439 | 638.146 +/- 22.611 | 50.129 +/- 1.764 | 10