TY - JOUR
T1 - Efficient bundle sorting
AU - Matias, Yossi
AU - Segal, Eran
AU - Vitter, Jeffrey Scott
PY - 2006
Y1 - 2006
N2 - Many data sets to be sorted consist of a limited number of distinct keys. Sorting such data sets can be thought of as bundling together identical keys and having the bundles placed in order; we therefore denote this as bundle sorting, We describe an efficient algorithm for bundle sorting in external memory, which requires at most c(N/B) logM/B k disk accesses, where N is the number of keys, M is the size of internal memory, k is the number of distinct keys, ⊖ is the transfer block size, and 2 < c < 4. For moderately sized k, this bound circumvents the ⊖((N/B)logM/B(N/B)) I/O lower bound known for general sorting. We show that our algorithm is optimal by proving a matching lower bound for bundle sorting. The improved running time of bundle sorting over general sorting can be significant in practice, as demonstrated by experimentation. An important feature of the new algorithm is that it is executed "in-place," requiring no additional disk space.
AB - Many data sets to be sorted consist of a limited number of distinct keys. Sorting such data sets can be thought of as bundling together identical keys and having the bundles placed in order; we therefore denote this as bundle sorting, We describe an efficient algorithm for bundle sorting in external memory, which requires at most c(N/B) logM/B k disk accesses, where N is the number of keys, M is the size of internal memory, k is the number of distinct keys, ⊖ is the transfer block size, and 2 < c < 4. For moderately sized k, this bound circumvents the ⊖((N/B)logM/B(N/B)) I/O lower bound known for general sorting. We show that our algorithm is optimal by proving a matching lower bound for bundle sorting. The improved running time of bundle sorting over general sorting can be significant in practice, as demonstrated by experimentation. An important feature of the new algorithm is that it is executed "in-place," requiring no additional disk space.
KW - Algorithms
KW - Bundle sorting
KW - External memory
KW - Sorting
UR - http://www.scopus.com/inward/record.url?scp=34247242213&partnerID=8YFLogxK
U2 - 10.1137/S0097539704446554
DO - 10.1137/S0097539704446554
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:34247242213
SN - 0097-5397
VL - 36
SP - 394
EP - 410
JO - SIAM Journal on Computing
JF - SIAM Journal on Computing
IS - 2
ER -