1## Batch Size 256 Compile false 2 3| Experiment | Warmup_latency (s) | Average_latency (s) | Throughput (samples/sec) | GPU Utilization (%) | 4| ---------- | ------------------ | ------------------- | ------------------------ | ------------------- | 5| original | 6.424 +/- 1.141 | 26.361 +/- 0.557 | 573.027 +/- 9.661 | 64.000 +/- 3.405 | 6| h2d_d2h_threads | 4.600 +/- 0.724 | 21.314 +/- 0.403 | 704.344 +/- 9.843 | 71.963 +/- 1.558 | 7| 2_predict_workers | 4.199 +/- 0.363 | 16.772 +/- 1.435 | 864.678 +/- 32.353 | 70.026 +/- 1.403 | 8| 3_predict_workers | 4.496 +/- 0.755 | 15.983 +/- 0.455 | 912.386 +/- 18.299 | 68.283 +/- 2.226 | 9| 4_predict_workers | 4.252 +/- 0.515 | 14.702 +/- 0.259 | 951.261 +/- 7.986 | 70.716 +/- 2.774 | 10