Abstract
Modern GPUs have long been capable of processing queries at a high throughput. However, until recently, GPUs faced slow data transfers from CPU main memory, and thus did not reach high processing rates for large, out-of-core data. To cope, database management systems (DBMSs) restrict their data access path to bulk data transfers orchestrated by the CPU, i.e., table scans. When queries expose selectivity, a full table scan wastes band-width, leaving performance on the table. With the arrival of fast interconnects, this design choice must be reconsidered. GPUs can directly access data at up to 7 × higher bandwidth, whereby bytes are loaded on-demand. We investigate four classic and recent index structures (binary search, B+tree, Harmonia, and RadixSpline), which we access via a fast interconnect. We show that indexing data can reduce transfer volume. However, when embedded into an index-nested loop join, we find that all indexes fail to outperform a hash join in the most interesting case: a highly selective query on large data (over 100 GiB). Therefore, we propose windowed partitioning , an index lookup optimization that generalizes to any index. As a result, index-nested loop joins run up to 3–10 × faster than a hash join. Overall, we show that out-of-core indexes are a feasible design choice to exploit selectivity when using a fast interconnect.