inference/results/output_256_false.md

## Batch Size 256 Compile false

| Experiment | Warmup_latency (s) | Average_latency (s) | Throughput (samples/sec) | GPU Utilization (%) |
| ---------- | ------------------ | ------------------- | ------------------------ | ------------------- |
| original | 6.424 +/- 1.141 | 26.361 +/- 0.557 | 573.027 +/- 9.661 | 64.000 +/- 3.405 |
| h2d_d2h_threads | 4.600 +/- 0.724 | 21.314 +/- 0.403 | 704.344 +/- 9.843 | 71.963 +/- 1.558 |
| 2_predict_workers | 4.199 +/- 0.363 | 16.772 +/- 1.435 | 864.678 +/- 32.353 | 70.026 +/- 1.403 |
| 3_predict_workers | 4.496 +/- 0.755 | 15.983 +/- 0.455 | 912.386 +/- 18.299 | 68.283 +/- 2.226 |
| 4_predict_workers | 4.252 +/- 0.515 | 14.702 +/- 0.259 | 951.261 +/- 7.986 | 70.716 +/- 2.774 |