Radix sort function in R











up vote
1
down vote

favorite












I know that sort in R uses radix sort, and I played around in implementing it (mainly to be sure I understood the algorithm properly). However, my implementation is an order of magnitude slower than the native implementation in R.



I'm curious: is there a way to implement radix sort in R so that it wouldn't be so slow?



My code:



get_digit <- function(x, d) {
# digits from the right
# i.e.: first digit is the ones, second is the tens, etc.
(x %% 10^d) %/% (10^(d-1))
}

radix_sort <- function(x) {
# k is maximum number of digits
k <- max(nchar(x))
for(i in 1:k) {
x_digit_i <- get_digit(x, i)
# split numbers based on their i digit
x_by_bucket <- split(x, x_digit_i)
# recombine the vectors, now sorted to the i'th digit
x <- unlist(x_by_bucket, use.names = FALSE)
}
# since each iteration is a stable sort, the final result
# is a sorted array, yay!
x
}


Checking the running time:



> library(microbenchmark)
> x <- sample(100)
> microbenchmark(radix_sort(x), sort(x))
Unit: microseconds
expr min lq mean median uq max neval cld
radix_sort(x) 459.378 465.895 485.58322 480.4835 496.779 649.956 100 b
sort(x) 27.314 29.487 33.05064 31.9710 33.212 73.563 100 a
> x <- sample(10000)
> microbenchmark(radix_sort(x), sort(x))
Unit: microseconds
expr min lq mean median uq max
radix_sort(x) 44317.123 44777.8965 46062.3446 45204.3715 45714.807 63838.148
sort(x) 158.609 165.7485 198.3083 186.6995 206.099 750.832









share|improve this question




















  • 4




    Welcome, currently your title makes your question sound off-topic and so is likely the cause of your downvote. This is as asking us to write algorithms for you is off-topic. I'd recommend that you reword it like the body of your question.
    – Peilonrayz
    Nov 7 at 14:38












  • I don't know what kind of answer you expect. I already told you why I believe it impossible to come close to R's implementation which is in compiled code. If you really want something fast, you can't implement it in pure R. That's why we have Rcpp.
    – Roland
    Nov 7 at 15:31






  • 1




    I've told you before and I'll say it again. Profile your code! Hadley covers how to do that as well in his book. I bet you'll see that most of the time is spent in the split calls, probably followed by unlist. If you want to speed this up, you need to avoid these.
    – Roland
    Nov 8 at 7:45






  • 1




    Then here is another idea to avoid split: start with z <- outer(x_digit_i, 0:9, "=="). Then you you can do x <- x[row(z)[z]]. Or slightly faster, x <- x[1L + (which(z) - 1L) %% length(x)]
    – flodel
    Nov 9 at 0:24








  • 1




    And another small improvement: %%, %/%, and == are faster with integers, so use (x %% as.integer(10^d)) %/% as.integer(10^(d-1)) inside get_digit. It might also make the code more robust to floating point error issues when using == like I suggested.
    – flodel
    Nov 9 at 1:42















up vote
1
down vote

favorite












I know that sort in R uses radix sort, and I played around in implementing it (mainly to be sure I understood the algorithm properly). However, my implementation is an order of magnitude slower than the native implementation in R.



I'm curious: is there a way to implement radix sort in R so that it wouldn't be so slow?



My code:



get_digit <- function(x, d) {
# digits from the right
# i.e.: first digit is the ones, second is the tens, etc.
(x %% 10^d) %/% (10^(d-1))
}

radix_sort <- function(x) {
# k is maximum number of digits
k <- max(nchar(x))
for(i in 1:k) {
x_digit_i <- get_digit(x, i)
# split numbers based on their i digit
x_by_bucket <- split(x, x_digit_i)
# recombine the vectors, now sorted to the i'th digit
x <- unlist(x_by_bucket, use.names = FALSE)
}
# since each iteration is a stable sort, the final result
# is a sorted array, yay!
x
}


Checking the running time:



> library(microbenchmark)
> x <- sample(100)
> microbenchmark(radix_sort(x), sort(x))
Unit: microseconds
expr min lq mean median uq max neval cld
radix_sort(x) 459.378 465.895 485.58322 480.4835 496.779 649.956 100 b
sort(x) 27.314 29.487 33.05064 31.9710 33.212 73.563 100 a
> x <- sample(10000)
> microbenchmark(radix_sort(x), sort(x))
Unit: microseconds
expr min lq mean median uq max
radix_sort(x) 44317.123 44777.8965 46062.3446 45204.3715 45714.807 63838.148
sort(x) 158.609 165.7485 198.3083 186.6995 206.099 750.832









share|improve this question




















  • 4




    Welcome, currently your title makes your question sound off-topic and so is likely the cause of your downvote. This is as asking us to write algorithms for you is off-topic. I'd recommend that you reword it like the body of your question.
    – Peilonrayz
    Nov 7 at 14:38












  • I don't know what kind of answer you expect. I already told you why I believe it impossible to come close to R's implementation which is in compiled code. If you really want something fast, you can't implement it in pure R. That's why we have Rcpp.
    – Roland
    Nov 7 at 15:31






  • 1




    I've told you before and I'll say it again. Profile your code! Hadley covers how to do that as well in his book. I bet you'll see that most of the time is spent in the split calls, probably followed by unlist. If you want to speed this up, you need to avoid these.
    – Roland
    Nov 8 at 7:45






  • 1




    Then here is another idea to avoid split: start with z <- outer(x_digit_i, 0:9, "=="). Then you you can do x <- x[row(z)[z]]. Or slightly faster, x <- x[1L + (which(z) - 1L) %% length(x)]
    – flodel
    Nov 9 at 0:24








  • 1




    And another small improvement: %%, %/%, and == are faster with integers, so use (x %% as.integer(10^d)) %/% as.integer(10^(d-1)) inside get_digit. It might also make the code more robust to floating point error issues when using == like I suggested.
    – flodel
    Nov 9 at 1:42













up vote
1
down vote

favorite









up vote
1
down vote

favorite











I know that sort in R uses radix sort, and I played around in implementing it (mainly to be sure I understood the algorithm properly). However, my implementation is an order of magnitude slower than the native implementation in R.



I'm curious: is there a way to implement radix sort in R so that it wouldn't be so slow?



My code:



get_digit <- function(x, d) {
# digits from the right
# i.e.: first digit is the ones, second is the tens, etc.
(x %% 10^d) %/% (10^(d-1))
}

radix_sort <- function(x) {
# k is maximum number of digits
k <- max(nchar(x))
for(i in 1:k) {
x_digit_i <- get_digit(x, i)
# split numbers based on their i digit
x_by_bucket <- split(x, x_digit_i)
# recombine the vectors, now sorted to the i'th digit
x <- unlist(x_by_bucket, use.names = FALSE)
}
# since each iteration is a stable sort, the final result
# is a sorted array, yay!
x
}


Checking the running time:



> library(microbenchmark)
> x <- sample(100)
> microbenchmark(radix_sort(x), sort(x))
Unit: microseconds
expr min lq mean median uq max neval cld
radix_sort(x) 459.378 465.895 485.58322 480.4835 496.779 649.956 100 b
sort(x) 27.314 29.487 33.05064 31.9710 33.212 73.563 100 a
> x <- sample(10000)
> microbenchmark(radix_sort(x), sort(x))
Unit: microseconds
expr min lq mean median uq max
radix_sort(x) 44317.123 44777.8965 46062.3446 45204.3715 45714.807 63838.148
sort(x) 158.609 165.7485 198.3083 186.6995 206.099 750.832









share|improve this question















I know that sort in R uses radix sort, and I played around in implementing it (mainly to be sure I understood the algorithm properly). However, my implementation is an order of magnitude slower than the native implementation in R.



I'm curious: is there a way to implement radix sort in R so that it wouldn't be so slow?



My code:



get_digit <- function(x, d) {
# digits from the right
# i.e.: first digit is the ones, second is the tens, etc.
(x %% 10^d) %/% (10^(d-1))
}

radix_sort <- function(x) {
# k is maximum number of digits
k <- max(nchar(x))
for(i in 1:k) {
x_digit_i <- get_digit(x, i)
# split numbers based on their i digit
x_by_bucket <- split(x, x_digit_i)
# recombine the vectors, now sorted to the i'th digit
x <- unlist(x_by_bucket, use.names = FALSE)
}
# since each iteration is a stable sort, the final result
# is a sorted array, yay!
x
}


Checking the running time:



> library(microbenchmark)
> x <- sample(100)
> microbenchmark(radix_sort(x), sort(x))
Unit: microseconds
expr min lq mean median uq max neval cld
radix_sort(x) 459.378 465.895 485.58322 480.4835 496.779 649.956 100 b
sort(x) 27.314 29.487 33.05064 31.9710 33.212 73.563 100 a
> x <- sample(10000)
> microbenchmark(radix_sort(x), sort(x))
Unit: microseconds
expr min lq mean median uq max
radix_sort(x) 44317.123 44777.8965 46062.3446 45204.3715 45714.807 63838.148
sort(x) 158.609 165.7485 198.3083 186.6995 206.099 750.832






performance r radix-sort






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 7 at 14:58









200_success

127k15148410




127k15148410










asked Nov 7 at 14:19









Tal Galili

1114




1114








  • 4




    Welcome, currently your title makes your question sound off-topic and so is likely the cause of your downvote. This is as asking us to write algorithms for you is off-topic. I'd recommend that you reword it like the body of your question.
    – Peilonrayz
    Nov 7 at 14:38












  • I don't know what kind of answer you expect. I already told you why I believe it impossible to come close to R's implementation which is in compiled code. If you really want something fast, you can't implement it in pure R. That's why we have Rcpp.
    – Roland
    Nov 7 at 15:31






  • 1




    I've told you before and I'll say it again. Profile your code! Hadley covers how to do that as well in his book. I bet you'll see that most of the time is spent in the split calls, probably followed by unlist. If you want to speed this up, you need to avoid these.
    – Roland
    Nov 8 at 7:45






  • 1




    Then here is another idea to avoid split: start with z <- outer(x_digit_i, 0:9, "=="). Then you you can do x <- x[row(z)[z]]. Or slightly faster, x <- x[1L + (which(z) - 1L) %% length(x)]
    – flodel
    Nov 9 at 0:24








  • 1




    And another small improvement: %%, %/%, and == are faster with integers, so use (x %% as.integer(10^d)) %/% as.integer(10^(d-1)) inside get_digit. It might also make the code more robust to floating point error issues when using == like I suggested.
    – flodel
    Nov 9 at 1:42














  • 4




    Welcome, currently your title makes your question sound off-topic and so is likely the cause of your downvote. This is as asking us to write algorithms for you is off-topic. I'd recommend that you reword it like the body of your question.
    – Peilonrayz
    Nov 7 at 14:38












  • I don't know what kind of answer you expect. I already told you why I believe it impossible to come close to R's implementation which is in compiled code. If you really want something fast, you can't implement it in pure R. That's why we have Rcpp.
    – Roland
    Nov 7 at 15:31






  • 1




    I've told you before and I'll say it again. Profile your code! Hadley covers how to do that as well in his book. I bet you'll see that most of the time is spent in the split calls, probably followed by unlist. If you want to speed this up, you need to avoid these.
    – Roland
    Nov 8 at 7:45






  • 1




    Then here is another idea to avoid split: start with z <- outer(x_digit_i, 0:9, "=="). Then you you can do x <- x[row(z)[z]]. Or slightly faster, x <- x[1L + (which(z) - 1L) %% length(x)]
    – flodel
    Nov 9 at 0:24








  • 1




    And another small improvement: %%, %/%, and == are faster with integers, so use (x %% as.integer(10^d)) %/% as.integer(10^(d-1)) inside get_digit. It might also make the code more robust to floating point error issues when using == like I suggested.
    – flodel
    Nov 9 at 1:42








4




4




Welcome, currently your title makes your question sound off-topic and so is likely the cause of your downvote. This is as asking us to write algorithms for you is off-topic. I'd recommend that you reword it like the body of your question.
– Peilonrayz
Nov 7 at 14:38






Welcome, currently your title makes your question sound off-topic and so is likely the cause of your downvote. This is as asking us to write algorithms for you is off-topic. I'd recommend that you reword it like the body of your question.
– Peilonrayz
Nov 7 at 14:38














I don't know what kind of answer you expect. I already told you why I believe it impossible to come close to R's implementation which is in compiled code. If you really want something fast, you can't implement it in pure R. That's why we have Rcpp.
– Roland
Nov 7 at 15:31




I don't know what kind of answer you expect. I already told you why I believe it impossible to come close to R's implementation which is in compiled code. If you really want something fast, you can't implement it in pure R. That's why we have Rcpp.
– Roland
Nov 7 at 15:31




1




1




I've told you before and I'll say it again. Profile your code! Hadley covers how to do that as well in his book. I bet you'll see that most of the time is spent in the split calls, probably followed by unlist. If you want to speed this up, you need to avoid these.
– Roland
Nov 8 at 7:45




I've told you before and I'll say it again. Profile your code! Hadley covers how to do that as well in his book. I bet you'll see that most of the time is spent in the split calls, probably followed by unlist. If you want to speed this up, you need to avoid these.
– Roland
Nov 8 at 7:45




1




1




Then here is another idea to avoid split: start with z <- outer(x_digit_i, 0:9, "=="). Then you you can do x <- x[row(z)[z]]. Or slightly faster, x <- x[1L + (which(z) - 1L) %% length(x)]
– flodel
Nov 9 at 0:24






Then here is another idea to avoid split: start with z <- outer(x_digit_i, 0:9, "=="). Then you you can do x <- x[row(z)[z]]. Or slightly faster, x <- x[1L + (which(z) - 1L) %% length(x)]
– flodel
Nov 9 at 0:24






1




1




And another small improvement: %%, %/%, and == are faster with integers, so use (x %% as.integer(10^d)) %/% as.integer(10^(d-1)) inside get_digit. It might also make the code more robust to floating point error issues when using == like I suggested.
– flodel
Nov 9 at 1:42




And another small improvement: %%, %/%, and == are faster with integers, so use (x %% as.integer(10^d)) %/% as.integer(10^(d-1)) inside get_digit. It might also make the code more robust to floating point error issues when using == like I suggested.
– flodel
Nov 9 at 1:42










1 Answer
1






active

oldest

votes

















up vote
1
down vote



accepted










Putting these comments into an answer...



As pointed out, it will be a tall order to beat or even approach the computation times of the native sort function, as it is compiled from C. Often with R, the trick to make your code faster consists in composing with some of these fast compiled building blocks that are native functions. In particular, we would look to substitute for loops like the one you have with vectorized functions. Unfortunately, your use of a for loop here is particular in the sense that each iteration has a side-effect to the x vector. This means that the order of the operations is important and we cannot run the loop iterations independently in parallel or via vectorized functions.



If we cannot get rid of the for loop, we are left trying to optimize the code within the body of the loop. Let's start with a profile of your code:



x <- sample(10000)
Rprof(tmp <- tempfile())
for (i in 1:10) z <- radix_sort(x)
Rprof()
summaryRprof(tmp)$by.total
# total.time total.pct self.time self.pct
# "radix_sort" 8.26 99.76 0.72 8.70
# "split" 7.34 88.65 0.06 0.72
# "split.default" 7.28 87.92 0.54 6.52
# "as.factor" 6.74 81.40 0.08 0.97
# "factor" 6.64 80.19 1.72 20.77
# "as.character" 4.34 52.42 4.34 52.42
# "unique" 0.42 5.07 0.04 0.48
# "unique.default" 0.38 4.59 0.38 4.59
# "%%" 0.14 1.69 0.14 1.69
# "get_digit" 0.14 1.69 0.00 0.00
# "sort.list" 0.12 1.45 0.02 0.24
# "order" 0.08 0.97 0.06 0.72
# "unlist" 0.06 0.72 0.06 0.72
# [...]


We see that the main culprit here is the use of split. Another surprise here is the use of order (it seems to be applied when figuring out the levels of x_digit_i) which you could consider like cheating given your objective.



So what alternative do we have to your use of split/unsplit? Essentially, you have a vector x that you want to reorder based on a vector of digits x_digit_i. One way is to use outer to create a matrix of TRUE/FALSE where each column locates a different digit (for a total of ten columns):



z <- outer(x_digit_i, 0:9, "==")


Then, you want to turn this matrix into a vector of indices, such that the first few indices will locate the zeroes, then the ones, etc (the equivalent of idx <- order(x_digit_i). You can do so by doing:



idx <- row(z)[z]


or (harder to understand but a bit faster)



idx <- 1L + (which(z) - 1L) %% length(x)  


Finally, you just have to do:



x <- x[idx]


Also note that since you are dealing with integers, your code might be a little faster (and more robust by avoiding floating point errors) if you make sure to use integers everywhere possible. In particular, your get_digit function could be rewritten as follows:



get_digit <- function(x, d) (x %% as.integer(10^d)) %/% as.integer(10^(d-1))


The benchmarks below show decent progress (where my suggestions are implemented under the name sort_radix2) bridging the gap with the native sort. I hope it helps!



x <- sample(100)
microbenchmark(radix_sort(x), radix_sort2(x), sort(x))
# Unit: microseconds
# expr min lq mean median uq max neval
# radix_sort(x) 964.692 972.3675 1025.35180 984.3775 1012.178 2233.397 100
# radix_sort2(x) 250.642 256.5720 282.58952 261.2910 282.449 1266.061 100
# sort(x) 82.270 86.1605 92.22669 88.0230 90.943 223.249 100

x <- sample(10000)
microbenchmark(radix_sort(x), radix_sort2(x), sort(x))
# Unit: microseconds
# expr min lq mean median uq max neval
# radix_sort(x) 71939.706 76147.1715 80028.7541 78389.8140 81512.4140 144632.484 100
# radix_sort2(x) 24218.810 27613.3190 34841.8724 29477.7115 31772.9415 143283.337 100
# sort(x) 411.691 454.4015 563.4825 492.6165 558.0925 3412.719 100





share|improve this answer























  • Thanks Flodel, that was instructive :)
    – Tal Galili
    2 days ago











Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














 

draft saved


draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f207144%2fradix-sort-function-in-r%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
1
down vote



accepted










Putting these comments into an answer...



As pointed out, it will be a tall order to beat or even approach the computation times of the native sort function, as it is compiled from C. Often with R, the trick to make your code faster consists in composing with some of these fast compiled building blocks that are native functions. In particular, we would look to substitute for loops like the one you have with vectorized functions. Unfortunately, your use of a for loop here is particular in the sense that each iteration has a side-effect to the x vector. This means that the order of the operations is important and we cannot run the loop iterations independently in parallel or via vectorized functions.



If we cannot get rid of the for loop, we are left trying to optimize the code within the body of the loop. Let's start with a profile of your code:



x <- sample(10000)
Rprof(tmp <- tempfile())
for (i in 1:10) z <- radix_sort(x)
Rprof()
summaryRprof(tmp)$by.total
# total.time total.pct self.time self.pct
# "radix_sort" 8.26 99.76 0.72 8.70
# "split" 7.34 88.65 0.06 0.72
# "split.default" 7.28 87.92 0.54 6.52
# "as.factor" 6.74 81.40 0.08 0.97
# "factor" 6.64 80.19 1.72 20.77
# "as.character" 4.34 52.42 4.34 52.42
# "unique" 0.42 5.07 0.04 0.48
# "unique.default" 0.38 4.59 0.38 4.59
# "%%" 0.14 1.69 0.14 1.69
# "get_digit" 0.14 1.69 0.00 0.00
# "sort.list" 0.12 1.45 0.02 0.24
# "order" 0.08 0.97 0.06 0.72
# "unlist" 0.06 0.72 0.06 0.72
# [...]


We see that the main culprit here is the use of split. Another surprise here is the use of order (it seems to be applied when figuring out the levels of x_digit_i) which you could consider like cheating given your objective.



So what alternative do we have to your use of split/unsplit? Essentially, you have a vector x that you want to reorder based on a vector of digits x_digit_i. One way is to use outer to create a matrix of TRUE/FALSE where each column locates a different digit (for a total of ten columns):



z <- outer(x_digit_i, 0:9, "==")


Then, you want to turn this matrix into a vector of indices, such that the first few indices will locate the zeroes, then the ones, etc (the equivalent of idx <- order(x_digit_i). You can do so by doing:



idx <- row(z)[z]


or (harder to understand but a bit faster)



idx <- 1L + (which(z) - 1L) %% length(x)  


Finally, you just have to do:



x <- x[idx]


Also note that since you are dealing with integers, your code might be a little faster (and more robust by avoiding floating point errors) if you make sure to use integers everywhere possible. In particular, your get_digit function could be rewritten as follows:



get_digit <- function(x, d) (x %% as.integer(10^d)) %/% as.integer(10^(d-1))


The benchmarks below show decent progress (where my suggestions are implemented under the name sort_radix2) bridging the gap with the native sort. I hope it helps!



x <- sample(100)
microbenchmark(radix_sort(x), radix_sort2(x), sort(x))
# Unit: microseconds
# expr min lq mean median uq max neval
# radix_sort(x) 964.692 972.3675 1025.35180 984.3775 1012.178 2233.397 100
# radix_sort2(x) 250.642 256.5720 282.58952 261.2910 282.449 1266.061 100
# sort(x) 82.270 86.1605 92.22669 88.0230 90.943 223.249 100

x <- sample(10000)
microbenchmark(radix_sort(x), radix_sort2(x), sort(x))
# Unit: microseconds
# expr min lq mean median uq max neval
# radix_sort(x) 71939.706 76147.1715 80028.7541 78389.8140 81512.4140 144632.484 100
# radix_sort2(x) 24218.810 27613.3190 34841.8724 29477.7115 31772.9415 143283.337 100
# sort(x) 411.691 454.4015 563.4825 492.6165 558.0925 3412.719 100





share|improve this answer























  • Thanks Flodel, that was instructive :)
    – Tal Galili
    2 days ago















up vote
1
down vote



accepted










Putting these comments into an answer...



As pointed out, it will be a tall order to beat or even approach the computation times of the native sort function, as it is compiled from C. Often with R, the trick to make your code faster consists in composing with some of these fast compiled building blocks that are native functions. In particular, we would look to substitute for loops like the one you have with vectorized functions. Unfortunately, your use of a for loop here is particular in the sense that each iteration has a side-effect to the x vector. This means that the order of the operations is important and we cannot run the loop iterations independently in parallel or via vectorized functions.



If we cannot get rid of the for loop, we are left trying to optimize the code within the body of the loop. Let's start with a profile of your code:



x <- sample(10000)
Rprof(tmp <- tempfile())
for (i in 1:10) z <- radix_sort(x)
Rprof()
summaryRprof(tmp)$by.total
# total.time total.pct self.time self.pct
# "radix_sort" 8.26 99.76 0.72 8.70
# "split" 7.34 88.65 0.06 0.72
# "split.default" 7.28 87.92 0.54 6.52
# "as.factor" 6.74 81.40 0.08 0.97
# "factor" 6.64 80.19 1.72 20.77
# "as.character" 4.34 52.42 4.34 52.42
# "unique" 0.42 5.07 0.04 0.48
# "unique.default" 0.38 4.59 0.38 4.59
# "%%" 0.14 1.69 0.14 1.69
# "get_digit" 0.14 1.69 0.00 0.00
# "sort.list" 0.12 1.45 0.02 0.24
# "order" 0.08 0.97 0.06 0.72
# "unlist" 0.06 0.72 0.06 0.72
# [...]


We see that the main culprit here is the use of split. Another surprise here is the use of order (it seems to be applied when figuring out the levels of x_digit_i) which you could consider like cheating given your objective.



So what alternative do we have to your use of split/unsplit? Essentially, you have a vector x that you want to reorder based on a vector of digits x_digit_i. One way is to use outer to create a matrix of TRUE/FALSE where each column locates a different digit (for a total of ten columns):



z <- outer(x_digit_i, 0:9, "==")


Then, you want to turn this matrix into a vector of indices, such that the first few indices will locate the zeroes, then the ones, etc (the equivalent of idx <- order(x_digit_i). You can do so by doing:



idx <- row(z)[z]


or (harder to understand but a bit faster)



idx <- 1L + (which(z) - 1L) %% length(x)  


Finally, you just have to do:



x <- x[idx]


Also note that since you are dealing with integers, your code might be a little faster (and more robust by avoiding floating point errors) if you make sure to use integers everywhere possible. In particular, your get_digit function could be rewritten as follows:



get_digit <- function(x, d) (x %% as.integer(10^d)) %/% as.integer(10^(d-1))


The benchmarks below show decent progress (where my suggestions are implemented under the name sort_radix2) bridging the gap with the native sort. I hope it helps!



x <- sample(100)
microbenchmark(radix_sort(x), radix_sort2(x), sort(x))
# Unit: microseconds
# expr min lq mean median uq max neval
# radix_sort(x) 964.692 972.3675 1025.35180 984.3775 1012.178 2233.397 100
# radix_sort2(x) 250.642 256.5720 282.58952 261.2910 282.449 1266.061 100
# sort(x) 82.270 86.1605 92.22669 88.0230 90.943 223.249 100

x <- sample(10000)
microbenchmark(radix_sort(x), radix_sort2(x), sort(x))
# Unit: microseconds
# expr min lq mean median uq max neval
# radix_sort(x) 71939.706 76147.1715 80028.7541 78389.8140 81512.4140 144632.484 100
# radix_sort2(x) 24218.810 27613.3190 34841.8724 29477.7115 31772.9415 143283.337 100
# sort(x) 411.691 454.4015 563.4825 492.6165 558.0925 3412.719 100





share|improve this answer























  • Thanks Flodel, that was instructive :)
    – Tal Galili
    2 days ago













up vote
1
down vote



accepted







up vote
1
down vote



accepted






Putting these comments into an answer...



As pointed out, it will be a tall order to beat or even approach the computation times of the native sort function, as it is compiled from C. Often with R, the trick to make your code faster consists in composing with some of these fast compiled building blocks that are native functions. In particular, we would look to substitute for loops like the one you have with vectorized functions. Unfortunately, your use of a for loop here is particular in the sense that each iteration has a side-effect to the x vector. This means that the order of the operations is important and we cannot run the loop iterations independently in parallel or via vectorized functions.



If we cannot get rid of the for loop, we are left trying to optimize the code within the body of the loop. Let's start with a profile of your code:



x <- sample(10000)
Rprof(tmp <- tempfile())
for (i in 1:10) z <- radix_sort(x)
Rprof()
summaryRprof(tmp)$by.total
# total.time total.pct self.time self.pct
# "radix_sort" 8.26 99.76 0.72 8.70
# "split" 7.34 88.65 0.06 0.72
# "split.default" 7.28 87.92 0.54 6.52
# "as.factor" 6.74 81.40 0.08 0.97
# "factor" 6.64 80.19 1.72 20.77
# "as.character" 4.34 52.42 4.34 52.42
# "unique" 0.42 5.07 0.04 0.48
# "unique.default" 0.38 4.59 0.38 4.59
# "%%" 0.14 1.69 0.14 1.69
# "get_digit" 0.14 1.69 0.00 0.00
# "sort.list" 0.12 1.45 0.02 0.24
# "order" 0.08 0.97 0.06 0.72
# "unlist" 0.06 0.72 0.06 0.72
# [...]


We see that the main culprit here is the use of split. Another surprise here is the use of order (it seems to be applied when figuring out the levels of x_digit_i) which you could consider like cheating given your objective.



So what alternative do we have to your use of split/unsplit? Essentially, you have a vector x that you want to reorder based on a vector of digits x_digit_i. One way is to use outer to create a matrix of TRUE/FALSE where each column locates a different digit (for a total of ten columns):



z <- outer(x_digit_i, 0:9, "==")


Then, you want to turn this matrix into a vector of indices, such that the first few indices will locate the zeroes, then the ones, etc (the equivalent of idx <- order(x_digit_i). You can do so by doing:



idx <- row(z)[z]


or (harder to understand but a bit faster)



idx <- 1L + (which(z) - 1L) %% length(x)  


Finally, you just have to do:



x <- x[idx]


Also note that since you are dealing with integers, your code might be a little faster (and more robust by avoiding floating point errors) if you make sure to use integers everywhere possible. In particular, your get_digit function could be rewritten as follows:



get_digit <- function(x, d) (x %% as.integer(10^d)) %/% as.integer(10^(d-1))


The benchmarks below show decent progress (where my suggestions are implemented under the name sort_radix2) bridging the gap with the native sort. I hope it helps!



x <- sample(100)
microbenchmark(radix_sort(x), radix_sort2(x), sort(x))
# Unit: microseconds
# expr min lq mean median uq max neval
# radix_sort(x) 964.692 972.3675 1025.35180 984.3775 1012.178 2233.397 100
# radix_sort2(x) 250.642 256.5720 282.58952 261.2910 282.449 1266.061 100
# sort(x) 82.270 86.1605 92.22669 88.0230 90.943 223.249 100

x <- sample(10000)
microbenchmark(radix_sort(x), radix_sort2(x), sort(x))
# Unit: microseconds
# expr min lq mean median uq max neval
# radix_sort(x) 71939.706 76147.1715 80028.7541 78389.8140 81512.4140 144632.484 100
# radix_sort2(x) 24218.810 27613.3190 34841.8724 29477.7115 31772.9415 143283.337 100
# sort(x) 411.691 454.4015 563.4825 492.6165 558.0925 3412.719 100





share|improve this answer














Putting these comments into an answer...



As pointed out, it will be a tall order to beat or even approach the computation times of the native sort function, as it is compiled from C. Often with R, the trick to make your code faster consists in composing with some of these fast compiled building blocks that are native functions. In particular, we would look to substitute for loops like the one you have with vectorized functions. Unfortunately, your use of a for loop here is particular in the sense that each iteration has a side-effect to the x vector. This means that the order of the operations is important and we cannot run the loop iterations independently in parallel or via vectorized functions.



If we cannot get rid of the for loop, we are left trying to optimize the code within the body of the loop. Let's start with a profile of your code:



x <- sample(10000)
Rprof(tmp <- tempfile())
for (i in 1:10) z <- radix_sort(x)
Rprof()
summaryRprof(tmp)$by.total
# total.time total.pct self.time self.pct
# "radix_sort" 8.26 99.76 0.72 8.70
# "split" 7.34 88.65 0.06 0.72
# "split.default" 7.28 87.92 0.54 6.52
# "as.factor" 6.74 81.40 0.08 0.97
# "factor" 6.64 80.19 1.72 20.77
# "as.character" 4.34 52.42 4.34 52.42
# "unique" 0.42 5.07 0.04 0.48
# "unique.default" 0.38 4.59 0.38 4.59
# "%%" 0.14 1.69 0.14 1.69
# "get_digit" 0.14 1.69 0.00 0.00
# "sort.list" 0.12 1.45 0.02 0.24
# "order" 0.08 0.97 0.06 0.72
# "unlist" 0.06 0.72 0.06 0.72
# [...]


We see that the main culprit here is the use of split. Another surprise here is the use of order (it seems to be applied when figuring out the levels of x_digit_i) which you could consider like cheating given your objective.



So what alternative do we have to your use of split/unsplit? Essentially, you have a vector x that you want to reorder based on a vector of digits x_digit_i. One way is to use outer to create a matrix of TRUE/FALSE where each column locates a different digit (for a total of ten columns):



z <- outer(x_digit_i, 0:9, "==")


Then, you want to turn this matrix into a vector of indices, such that the first few indices will locate the zeroes, then the ones, etc (the equivalent of idx <- order(x_digit_i). You can do so by doing:



idx <- row(z)[z]


or (harder to understand but a bit faster)



idx <- 1L + (which(z) - 1L) %% length(x)  


Finally, you just have to do:



x <- x[idx]


Also note that since you are dealing with integers, your code might be a little faster (and more robust by avoiding floating point errors) if you make sure to use integers everywhere possible. In particular, your get_digit function could be rewritten as follows:



get_digit <- function(x, d) (x %% as.integer(10^d)) %/% as.integer(10^(d-1))


The benchmarks below show decent progress (where my suggestions are implemented under the name sort_radix2) bridging the gap with the native sort. I hope it helps!



x <- sample(100)
microbenchmark(radix_sort(x), radix_sort2(x), sort(x))
# Unit: microseconds
# expr min lq mean median uq max neval
# radix_sort(x) 964.692 972.3675 1025.35180 984.3775 1012.178 2233.397 100
# radix_sort2(x) 250.642 256.5720 282.58952 261.2910 282.449 1266.061 100
# sort(x) 82.270 86.1605 92.22669 88.0230 90.943 223.249 100

x <- sample(10000)
microbenchmark(radix_sort(x), radix_sort2(x), sort(x))
# Unit: microseconds
# expr min lq mean median uq max neval
# radix_sort(x) 71939.706 76147.1715 80028.7541 78389.8140 81512.4140 144632.484 100
# radix_sort2(x) 24218.810 27613.3190 34841.8724 29477.7115 31772.9415 143283.337 100
# sort(x) 411.691 454.4015 563.4825 492.6165 558.0925 3412.719 100






share|improve this answer














share|improve this answer



share|improve this answer








edited 2 days ago

























answered Nov 18 at 2:02









flodel

3,2021915




3,2021915












  • Thanks Flodel, that was instructive :)
    – Tal Galili
    2 days ago


















  • Thanks Flodel, that was instructive :)
    – Tal Galili
    2 days ago
















Thanks Flodel, that was instructive :)
– Tal Galili
2 days ago




Thanks Flodel, that was instructive :)
– Tal Galili
2 days ago


















 

draft saved


draft discarded



















































 


draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f207144%2fradix-sort-function-in-r%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Quarter-circle Tiles

build a pushdown automaton that recognizes the reverse language of a given pushdown automaton?

Mont Emei