Radix sort function in R
up vote
1
down vote
favorite
I know that sort
in R uses radix sort, and I played around in implementing it (mainly to be sure I understood the algorithm properly). However, my implementation is an order of magnitude slower than the native implementation in R.
I'm curious: is there a way to implement radix sort in R so that it wouldn't be so slow?
My code:
get_digit <- function(x, d) {
# digits from the right
# i.e.: first digit is the ones, second is the tens, etc.
(x %% 10^d) %/% (10^(d-1))
}
radix_sort <- function(x) {
# k is maximum number of digits
k <- max(nchar(x))
for(i in 1:k) {
x_digit_i <- get_digit(x, i)
# split numbers based on their i digit
x_by_bucket <- split(x, x_digit_i)
# recombine the vectors, now sorted to the i'th digit
x <- unlist(x_by_bucket, use.names = FALSE)
}
# since each iteration is a stable sort, the final result
# is a sorted array, yay!
x
}
Checking the running time:
> library(microbenchmark)
> x <- sample(100)
> microbenchmark(radix_sort(x), sort(x))
Unit: microseconds
expr min lq mean median uq max neval cld
radix_sort(x) 459.378 465.895 485.58322 480.4835 496.779 649.956 100 b
sort(x) 27.314 29.487 33.05064 31.9710 33.212 73.563 100 a
> x <- sample(10000)
> microbenchmark(radix_sort(x), sort(x))
Unit: microseconds
expr min lq mean median uq max
radix_sort(x) 44317.123 44777.8965 46062.3446 45204.3715 45714.807 63838.148
sort(x) 158.609 165.7485 198.3083 186.6995 206.099 750.832
performance r radix-sort
|
show 5 more comments
up vote
1
down vote
favorite
I know that sort
in R uses radix sort, and I played around in implementing it (mainly to be sure I understood the algorithm properly). However, my implementation is an order of magnitude slower than the native implementation in R.
I'm curious: is there a way to implement radix sort in R so that it wouldn't be so slow?
My code:
get_digit <- function(x, d) {
# digits from the right
# i.e.: first digit is the ones, second is the tens, etc.
(x %% 10^d) %/% (10^(d-1))
}
radix_sort <- function(x) {
# k is maximum number of digits
k <- max(nchar(x))
for(i in 1:k) {
x_digit_i <- get_digit(x, i)
# split numbers based on their i digit
x_by_bucket <- split(x, x_digit_i)
# recombine the vectors, now sorted to the i'th digit
x <- unlist(x_by_bucket, use.names = FALSE)
}
# since each iteration is a stable sort, the final result
# is a sorted array, yay!
x
}
Checking the running time:
> library(microbenchmark)
> x <- sample(100)
> microbenchmark(radix_sort(x), sort(x))
Unit: microseconds
expr min lq mean median uq max neval cld
radix_sort(x) 459.378 465.895 485.58322 480.4835 496.779 649.956 100 b
sort(x) 27.314 29.487 33.05064 31.9710 33.212 73.563 100 a
> x <- sample(10000)
> microbenchmark(radix_sort(x), sort(x))
Unit: microseconds
expr min lq mean median uq max
radix_sort(x) 44317.123 44777.8965 46062.3446 45204.3715 45714.807 63838.148
sort(x) 158.609 165.7485 198.3083 186.6995 206.099 750.832
performance r radix-sort
4
Welcome, currently your title makes your question sound off-topic and so is likely the cause of your downvote. This is as asking us to write algorithms for you is off-topic. I'd recommend that you reword it like the body of your question.
– Peilonrayz
Nov 7 at 14:38
I don't know what kind of answer you expect. I already told you why I believe it impossible to come close to R's implementation which is in compiled code. If you really want something fast, you can't implement it in pure R. That's why we have Rcpp.
– Roland
Nov 7 at 15:31
1
I've told you before and I'll say it again. Profile your code! Hadley covers how to do that as well in his book. I bet you'll see that most of the time is spent in thesplit
calls, probably followed byunlist
. If you want to speed this up, you need to avoid these.
– Roland
Nov 8 at 7:45
1
Then here is another idea to avoidsplit
: start withz <- outer(x_digit_i, 0:9, "==")
. Then you you can dox <- x[row(z)[z]]
. Or slightly faster,x <- x[1L + (which(z) - 1L) %% length(x)]
– flodel
Nov 9 at 0:24
1
And another small improvement:%%
,%/%
, and==
are faster with integers, so use(x %% as.integer(10^d)) %/% as.integer(10^(d-1))
insideget_digit
. It might also make the code more robust to floating point error issues when using==
like I suggested.
– flodel
Nov 9 at 1:42
|
show 5 more comments
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I know that sort
in R uses radix sort, and I played around in implementing it (mainly to be sure I understood the algorithm properly). However, my implementation is an order of magnitude slower than the native implementation in R.
I'm curious: is there a way to implement radix sort in R so that it wouldn't be so slow?
My code:
get_digit <- function(x, d) {
# digits from the right
# i.e.: first digit is the ones, second is the tens, etc.
(x %% 10^d) %/% (10^(d-1))
}
radix_sort <- function(x) {
# k is maximum number of digits
k <- max(nchar(x))
for(i in 1:k) {
x_digit_i <- get_digit(x, i)
# split numbers based on their i digit
x_by_bucket <- split(x, x_digit_i)
# recombine the vectors, now sorted to the i'th digit
x <- unlist(x_by_bucket, use.names = FALSE)
}
# since each iteration is a stable sort, the final result
# is a sorted array, yay!
x
}
Checking the running time:
> library(microbenchmark)
> x <- sample(100)
> microbenchmark(radix_sort(x), sort(x))
Unit: microseconds
expr min lq mean median uq max neval cld
radix_sort(x) 459.378 465.895 485.58322 480.4835 496.779 649.956 100 b
sort(x) 27.314 29.487 33.05064 31.9710 33.212 73.563 100 a
> x <- sample(10000)
> microbenchmark(radix_sort(x), sort(x))
Unit: microseconds
expr min lq mean median uq max
radix_sort(x) 44317.123 44777.8965 46062.3446 45204.3715 45714.807 63838.148
sort(x) 158.609 165.7485 198.3083 186.6995 206.099 750.832
performance r radix-sort
I know that sort
in R uses radix sort, and I played around in implementing it (mainly to be sure I understood the algorithm properly). However, my implementation is an order of magnitude slower than the native implementation in R.
I'm curious: is there a way to implement radix sort in R so that it wouldn't be so slow?
My code:
get_digit <- function(x, d) {
# digits from the right
# i.e.: first digit is the ones, second is the tens, etc.
(x %% 10^d) %/% (10^(d-1))
}
radix_sort <- function(x) {
# k is maximum number of digits
k <- max(nchar(x))
for(i in 1:k) {
x_digit_i <- get_digit(x, i)
# split numbers based on their i digit
x_by_bucket <- split(x, x_digit_i)
# recombine the vectors, now sorted to the i'th digit
x <- unlist(x_by_bucket, use.names = FALSE)
}
# since each iteration is a stable sort, the final result
# is a sorted array, yay!
x
}
Checking the running time:
> library(microbenchmark)
> x <- sample(100)
> microbenchmark(radix_sort(x), sort(x))
Unit: microseconds
expr min lq mean median uq max neval cld
radix_sort(x) 459.378 465.895 485.58322 480.4835 496.779 649.956 100 b
sort(x) 27.314 29.487 33.05064 31.9710 33.212 73.563 100 a
> x <- sample(10000)
> microbenchmark(radix_sort(x), sort(x))
Unit: microseconds
expr min lq mean median uq max
radix_sort(x) 44317.123 44777.8965 46062.3446 45204.3715 45714.807 63838.148
sort(x) 158.609 165.7485 198.3083 186.6995 206.099 750.832
performance r radix-sort
performance r radix-sort
edited Nov 7 at 14:58
200_success
127k15148410
127k15148410
asked Nov 7 at 14:19
Tal Galili
1114
1114
4
Welcome, currently your title makes your question sound off-topic and so is likely the cause of your downvote. This is as asking us to write algorithms for you is off-topic. I'd recommend that you reword it like the body of your question.
– Peilonrayz
Nov 7 at 14:38
I don't know what kind of answer you expect. I already told you why I believe it impossible to come close to R's implementation which is in compiled code. If you really want something fast, you can't implement it in pure R. That's why we have Rcpp.
– Roland
Nov 7 at 15:31
1
I've told you before and I'll say it again. Profile your code! Hadley covers how to do that as well in his book. I bet you'll see that most of the time is spent in thesplit
calls, probably followed byunlist
. If you want to speed this up, you need to avoid these.
– Roland
Nov 8 at 7:45
1
Then here is another idea to avoidsplit
: start withz <- outer(x_digit_i, 0:9, "==")
. Then you you can dox <- x[row(z)[z]]
. Or slightly faster,x <- x[1L + (which(z) - 1L) %% length(x)]
– flodel
Nov 9 at 0:24
1
And another small improvement:%%
,%/%
, and==
are faster with integers, so use(x %% as.integer(10^d)) %/% as.integer(10^(d-1))
insideget_digit
. It might also make the code more robust to floating point error issues when using==
like I suggested.
– flodel
Nov 9 at 1:42
|
show 5 more comments
4
Welcome, currently your title makes your question sound off-topic and so is likely the cause of your downvote. This is as asking us to write algorithms for you is off-topic. I'd recommend that you reword it like the body of your question.
– Peilonrayz
Nov 7 at 14:38
I don't know what kind of answer you expect. I already told you why I believe it impossible to come close to R's implementation which is in compiled code. If you really want something fast, you can't implement it in pure R. That's why we have Rcpp.
– Roland
Nov 7 at 15:31
1
I've told you before and I'll say it again. Profile your code! Hadley covers how to do that as well in his book. I bet you'll see that most of the time is spent in thesplit
calls, probably followed byunlist
. If you want to speed this up, you need to avoid these.
– Roland
Nov 8 at 7:45
1
Then here is another idea to avoidsplit
: start withz <- outer(x_digit_i, 0:9, "==")
. Then you you can dox <- x[row(z)[z]]
. Or slightly faster,x <- x[1L + (which(z) - 1L) %% length(x)]
– flodel
Nov 9 at 0:24
1
And another small improvement:%%
,%/%
, and==
are faster with integers, so use(x %% as.integer(10^d)) %/% as.integer(10^(d-1))
insideget_digit
. It might also make the code more robust to floating point error issues when using==
like I suggested.
– flodel
Nov 9 at 1:42
4
4
Welcome, currently your title makes your question sound off-topic and so is likely the cause of your downvote. This is as asking us to write algorithms for you is off-topic. I'd recommend that you reword it like the body of your question.
– Peilonrayz
Nov 7 at 14:38
Welcome, currently your title makes your question sound off-topic and so is likely the cause of your downvote. This is as asking us to write algorithms for you is off-topic. I'd recommend that you reword it like the body of your question.
– Peilonrayz
Nov 7 at 14:38
I don't know what kind of answer you expect. I already told you why I believe it impossible to come close to R's implementation which is in compiled code. If you really want something fast, you can't implement it in pure R. That's why we have Rcpp.
– Roland
Nov 7 at 15:31
I don't know what kind of answer you expect. I already told you why I believe it impossible to come close to R's implementation which is in compiled code. If you really want something fast, you can't implement it in pure R. That's why we have Rcpp.
– Roland
Nov 7 at 15:31
1
1
I've told you before and I'll say it again. Profile your code! Hadley covers how to do that as well in his book. I bet you'll see that most of the time is spent in the
split
calls, probably followed by unlist
. If you want to speed this up, you need to avoid these.– Roland
Nov 8 at 7:45
I've told you before and I'll say it again. Profile your code! Hadley covers how to do that as well in his book. I bet you'll see that most of the time is spent in the
split
calls, probably followed by unlist
. If you want to speed this up, you need to avoid these.– Roland
Nov 8 at 7:45
1
1
Then here is another idea to avoid
split
: start with z <- outer(x_digit_i, 0:9, "==")
. Then you you can do x <- x[row(z)[z]]
. Or slightly faster, x <- x[1L + (which(z) - 1L) %% length(x)]
– flodel
Nov 9 at 0:24
Then here is another idea to avoid
split
: start with z <- outer(x_digit_i, 0:9, "==")
. Then you you can do x <- x[row(z)[z]]
. Or slightly faster, x <- x[1L + (which(z) - 1L) %% length(x)]
– flodel
Nov 9 at 0:24
1
1
And another small improvement:
%%
, %/%
, and ==
are faster with integers, so use (x %% as.integer(10^d)) %/% as.integer(10^(d-1))
inside get_digit
. It might also make the code more robust to floating point error issues when using ==
like I suggested.– flodel
Nov 9 at 1:42
And another small improvement:
%%
, %/%
, and ==
are faster with integers, so use (x %% as.integer(10^d)) %/% as.integer(10^(d-1))
inside get_digit
. It might also make the code more robust to floating point error issues when using ==
like I suggested.– flodel
Nov 9 at 1:42
|
show 5 more comments
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
Putting these comments into an answer...
As pointed out, it will be a tall order to beat or even approach the computation times of the native sort
function, as it is compiled from C. Often with R, the trick to make your code faster consists in composing with some of these fast compiled building blocks that are native functions. In particular, we would look to substitute for
loops like the one you have with vectorized functions. Unfortunately, your use of a for
loop here is particular in the sense that each iteration has a side-effect to the x
vector. This means that the order of the operations is important and we cannot run the loop iterations independently in parallel or via vectorized functions.
If we cannot get rid of the for
loop, we are left trying to optimize the code within the body of the loop. Let's start with a profile of your code:
x <- sample(10000)
Rprof(tmp <- tempfile())
for (i in 1:10) z <- radix_sort(x)
Rprof()
summaryRprof(tmp)$by.total
# total.time total.pct self.time self.pct
# "radix_sort" 8.26 99.76 0.72 8.70
# "split" 7.34 88.65 0.06 0.72
# "split.default" 7.28 87.92 0.54 6.52
# "as.factor" 6.74 81.40 0.08 0.97
# "factor" 6.64 80.19 1.72 20.77
# "as.character" 4.34 52.42 4.34 52.42
# "unique" 0.42 5.07 0.04 0.48
# "unique.default" 0.38 4.59 0.38 4.59
# "%%" 0.14 1.69 0.14 1.69
# "get_digit" 0.14 1.69 0.00 0.00
# "sort.list" 0.12 1.45 0.02 0.24
# "order" 0.08 0.97 0.06 0.72
# "unlist" 0.06 0.72 0.06 0.72
# [...]
We see that the main culprit here is the use of split
. Another surprise here is the use of order
(it seems to be applied when figuring out the levels of x_digit_i
) which you could consider like cheating given your objective.
So what alternative do we have to your use of split
/unsplit
? Essentially, you have a vector x
that you want to reorder based on a vector of digits x_digit_i
. One way is to use outer
to create a matrix of TRUE
/FALSE
where each column locates a different digit (for a total of ten columns):
z <- outer(x_digit_i, 0:9, "==")
Then, you want to turn this matrix into a vector of indices, such that the first few indices will locate the zeroes, then the ones, etc (the equivalent of idx <- order(x_digit_i)
. You can do so by doing:
idx <- row(z)[z]
or (harder to understand but a bit faster)
idx <- 1L + (which(z) - 1L) %% length(x)
Finally, you just have to do:
x <- x[idx]
Also note that since you are dealing with integers, your code might be a little faster (and more robust by avoiding floating point errors) if you make sure to use integers everywhere possible. In particular, your get_digit
function could be rewritten as follows:
get_digit <- function(x, d) (x %% as.integer(10^d)) %/% as.integer(10^(d-1))
The benchmarks below show decent progress (where my suggestions are implemented under the name sort_radix2
) bridging the gap with the native sort
. I hope it helps!
x <- sample(100)
microbenchmark(radix_sort(x), radix_sort2(x), sort(x))
# Unit: microseconds
# expr min lq mean median uq max neval
# radix_sort(x) 964.692 972.3675 1025.35180 984.3775 1012.178 2233.397 100
# radix_sort2(x) 250.642 256.5720 282.58952 261.2910 282.449 1266.061 100
# sort(x) 82.270 86.1605 92.22669 88.0230 90.943 223.249 100
x <- sample(10000)
microbenchmark(radix_sort(x), radix_sort2(x), sort(x))
# Unit: microseconds
# expr min lq mean median uq max neval
# radix_sort(x) 71939.706 76147.1715 80028.7541 78389.8140 81512.4140 144632.484 100
# radix_sort2(x) 24218.810 27613.3190 34841.8724 29477.7115 31772.9415 143283.337 100
# sort(x) 411.691 454.4015 563.4825 492.6165 558.0925 3412.719 100
Thanks Flodel, that was instructive :)
– Tal Galili
2 days ago
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
Putting these comments into an answer...
As pointed out, it will be a tall order to beat or even approach the computation times of the native sort
function, as it is compiled from C. Often with R, the trick to make your code faster consists in composing with some of these fast compiled building blocks that are native functions. In particular, we would look to substitute for
loops like the one you have with vectorized functions. Unfortunately, your use of a for
loop here is particular in the sense that each iteration has a side-effect to the x
vector. This means that the order of the operations is important and we cannot run the loop iterations independently in parallel or via vectorized functions.
If we cannot get rid of the for
loop, we are left trying to optimize the code within the body of the loop. Let's start with a profile of your code:
x <- sample(10000)
Rprof(tmp <- tempfile())
for (i in 1:10) z <- radix_sort(x)
Rprof()
summaryRprof(tmp)$by.total
# total.time total.pct self.time self.pct
# "radix_sort" 8.26 99.76 0.72 8.70
# "split" 7.34 88.65 0.06 0.72
# "split.default" 7.28 87.92 0.54 6.52
# "as.factor" 6.74 81.40 0.08 0.97
# "factor" 6.64 80.19 1.72 20.77
# "as.character" 4.34 52.42 4.34 52.42
# "unique" 0.42 5.07 0.04 0.48
# "unique.default" 0.38 4.59 0.38 4.59
# "%%" 0.14 1.69 0.14 1.69
# "get_digit" 0.14 1.69 0.00 0.00
# "sort.list" 0.12 1.45 0.02 0.24
# "order" 0.08 0.97 0.06 0.72
# "unlist" 0.06 0.72 0.06 0.72
# [...]
We see that the main culprit here is the use of split
. Another surprise here is the use of order
(it seems to be applied when figuring out the levels of x_digit_i
) which you could consider like cheating given your objective.
So what alternative do we have to your use of split
/unsplit
? Essentially, you have a vector x
that you want to reorder based on a vector of digits x_digit_i
. One way is to use outer
to create a matrix of TRUE
/FALSE
where each column locates a different digit (for a total of ten columns):
z <- outer(x_digit_i, 0:9, "==")
Then, you want to turn this matrix into a vector of indices, such that the first few indices will locate the zeroes, then the ones, etc (the equivalent of idx <- order(x_digit_i)
. You can do so by doing:
idx <- row(z)[z]
or (harder to understand but a bit faster)
idx <- 1L + (which(z) - 1L) %% length(x)
Finally, you just have to do:
x <- x[idx]
Also note that since you are dealing with integers, your code might be a little faster (and more robust by avoiding floating point errors) if you make sure to use integers everywhere possible. In particular, your get_digit
function could be rewritten as follows:
get_digit <- function(x, d) (x %% as.integer(10^d)) %/% as.integer(10^(d-1))
The benchmarks below show decent progress (where my suggestions are implemented under the name sort_radix2
) bridging the gap with the native sort
. I hope it helps!
x <- sample(100)
microbenchmark(radix_sort(x), radix_sort2(x), sort(x))
# Unit: microseconds
# expr min lq mean median uq max neval
# radix_sort(x) 964.692 972.3675 1025.35180 984.3775 1012.178 2233.397 100
# radix_sort2(x) 250.642 256.5720 282.58952 261.2910 282.449 1266.061 100
# sort(x) 82.270 86.1605 92.22669 88.0230 90.943 223.249 100
x <- sample(10000)
microbenchmark(radix_sort(x), radix_sort2(x), sort(x))
# Unit: microseconds
# expr min lq mean median uq max neval
# radix_sort(x) 71939.706 76147.1715 80028.7541 78389.8140 81512.4140 144632.484 100
# radix_sort2(x) 24218.810 27613.3190 34841.8724 29477.7115 31772.9415 143283.337 100
# sort(x) 411.691 454.4015 563.4825 492.6165 558.0925 3412.719 100
Thanks Flodel, that was instructive :)
– Tal Galili
2 days ago
add a comment |
up vote
1
down vote
accepted
Putting these comments into an answer...
As pointed out, it will be a tall order to beat or even approach the computation times of the native sort
function, as it is compiled from C. Often with R, the trick to make your code faster consists in composing with some of these fast compiled building blocks that are native functions. In particular, we would look to substitute for
loops like the one you have with vectorized functions. Unfortunately, your use of a for
loop here is particular in the sense that each iteration has a side-effect to the x
vector. This means that the order of the operations is important and we cannot run the loop iterations independently in parallel or via vectorized functions.
If we cannot get rid of the for
loop, we are left trying to optimize the code within the body of the loop. Let's start with a profile of your code:
x <- sample(10000)
Rprof(tmp <- tempfile())
for (i in 1:10) z <- radix_sort(x)
Rprof()
summaryRprof(tmp)$by.total
# total.time total.pct self.time self.pct
# "radix_sort" 8.26 99.76 0.72 8.70
# "split" 7.34 88.65 0.06 0.72
# "split.default" 7.28 87.92 0.54 6.52
# "as.factor" 6.74 81.40 0.08 0.97
# "factor" 6.64 80.19 1.72 20.77
# "as.character" 4.34 52.42 4.34 52.42
# "unique" 0.42 5.07 0.04 0.48
# "unique.default" 0.38 4.59 0.38 4.59
# "%%" 0.14 1.69 0.14 1.69
# "get_digit" 0.14 1.69 0.00 0.00
# "sort.list" 0.12 1.45 0.02 0.24
# "order" 0.08 0.97 0.06 0.72
# "unlist" 0.06 0.72 0.06 0.72
# [...]
We see that the main culprit here is the use of split
. Another surprise here is the use of order
(it seems to be applied when figuring out the levels of x_digit_i
) which you could consider like cheating given your objective.
So what alternative do we have to your use of split
/unsplit
? Essentially, you have a vector x
that you want to reorder based on a vector of digits x_digit_i
. One way is to use outer
to create a matrix of TRUE
/FALSE
where each column locates a different digit (for a total of ten columns):
z <- outer(x_digit_i, 0:9, "==")
Then, you want to turn this matrix into a vector of indices, such that the first few indices will locate the zeroes, then the ones, etc (the equivalent of idx <- order(x_digit_i)
. You can do so by doing:
idx <- row(z)[z]
or (harder to understand but a bit faster)
idx <- 1L + (which(z) - 1L) %% length(x)
Finally, you just have to do:
x <- x[idx]
Also note that since you are dealing with integers, your code might be a little faster (and more robust by avoiding floating point errors) if you make sure to use integers everywhere possible. In particular, your get_digit
function could be rewritten as follows:
get_digit <- function(x, d) (x %% as.integer(10^d)) %/% as.integer(10^(d-1))
The benchmarks below show decent progress (where my suggestions are implemented under the name sort_radix2
) bridging the gap with the native sort
. I hope it helps!
x <- sample(100)
microbenchmark(radix_sort(x), radix_sort2(x), sort(x))
# Unit: microseconds
# expr min lq mean median uq max neval
# radix_sort(x) 964.692 972.3675 1025.35180 984.3775 1012.178 2233.397 100
# radix_sort2(x) 250.642 256.5720 282.58952 261.2910 282.449 1266.061 100
# sort(x) 82.270 86.1605 92.22669 88.0230 90.943 223.249 100
x <- sample(10000)
microbenchmark(radix_sort(x), radix_sort2(x), sort(x))
# Unit: microseconds
# expr min lq mean median uq max neval
# radix_sort(x) 71939.706 76147.1715 80028.7541 78389.8140 81512.4140 144632.484 100
# radix_sort2(x) 24218.810 27613.3190 34841.8724 29477.7115 31772.9415 143283.337 100
# sort(x) 411.691 454.4015 563.4825 492.6165 558.0925 3412.719 100
Thanks Flodel, that was instructive :)
– Tal Galili
2 days ago
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
Putting these comments into an answer...
As pointed out, it will be a tall order to beat or even approach the computation times of the native sort
function, as it is compiled from C. Often with R, the trick to make your code faster consists in composing with some of these fast compiled building blocks that are native functions. In particular, we would look to substitute for
loops like the one you have with vectorized functions. Unfortunately, your use of a for
loop here is particular in the sense that each iteration has a side-effect to the x
vector. This means that the order of the operations is important and we cannot run the loop iterations independently in parallel or via vectorized functions.
If we cannot get rid of the for
loop, we are left trying to optimize the code within the body of the loop. Let's start with a profile of your code:
x <- sample(10000)
Rprof(tmp <- tempfile())
for (i in 1:10) z <- radix_sort(x)
Rprof()
summaryRprof(tmp)$by.total
# total.time total.pct self.time self.pct
# "radix_sort" 8.26 99.76 0.72 8.70
# "split" 7.34 88.65 0.06 0.72
# "split.default" 7.28 87.92 0.54 6.52
# "as.factor" 6.74 81.40 0.08 0.97
# "factor" 6.64 80.19 1.72 20.77
# "as.character" 4.34 52.42 4.34 52.42
# "unique" 0.42 5.07 0.04 0.48
# "unique.default" 0.38 4.59 0.38 4.59
# "%%" 0.14 1.69 0.14 1.69
# "get_digit" 0.14 1.69 0.00 0.00
# "sort.list" 0.12 1.45 0.02 0.24
# "order" 0.08 0.97 0.06 0.72
# "unlist" 0.06 0.72 0.06 0.72
# [...]
We see that the main culprit here is the use of split
. Another surprise here is the use of order
(it seems to be applied when figuring out the levels of x_digit_i
) which you could consider like cheating given your objective.
So what alternative do we have to your use of split
/unsplit
? Essentially, you have a vector x
that you want to reorder based on a vector of digits x_digit_i
. One way is to use outer
to create a matrix of TRUE
/FALSE
where each column locates a different digit (for a total of ten columns):
z <- outer(x_digit_i, 0:9, "==")
Then, you want to turn this matrix into a vector of indices, such that the first few indices will locate the zeroes, then the ones, etc (the equivalent of idx <- order(x_digit_i)
. You can do so by doing:
idx <- row(z)[z]
or (harder to understand but a bit faster)
idx <- 1L + (which(z) - 1L) %% length(x)
Finally, you just have to do:
x <- x[idx]
Also note that since you are dealing with integers, your code might be a little faster (and more robust by avoiding floating point errors) if you make sure to use integers everywhere possible. In particular, your get_digit
function could be rewritten as follows:
get_digit <- function(x, d) (x %% as.integer(10^d)) %/% as.integer(10^(d-1))
The benchmarks below show decent progress (where my suggestions are implemented under the name sort_radix2
) bridging the gap with the native sort
. I hope it helps!
x <- sample(100)
microbenchmark(radix_sort(x), radix_sort2(x), sort(x))
# Unit: microseconds
# expr min lq mean median uq max neval
# radix_sort(x) 964.692 972.3675 1025.35180 984.3775 1012.178 2233.397 100
# radix_sort2(x) 250.642 256.5720 282.58952 261.2910 282.449 1266.061 100
# sort(x) 82.270 86.1605 92.22669 88.0230 90.943 223.249 100
x <- sample(10000)
microbenchmark(radix_sort(x), radix_sort2(x), sort(x))
# Unit: microseconds
# expr min lq mean median uq max neval
# radix_sort(x) 71939.706 76147.1715 80028.7541 78389.8140 81512.4140 144632.484 100
# radix_sort2(x) 24218.810 27613.3190 34841.8724 29477.7115 31772.9415 143283.337 100
# sort(x) 411.691 454.4015 563.4825 492.6165 558.0925 3412.719 100
Putting these comments into an answer...
As pointed out, it will be a tall order to beat or even approach the computation times of the native sort
function, as it is compiled from C. Often with R, the trick to make your code faster consists in composing with some of these fast compiled building blocks that are native functions. In particular, we would look to substitute for
loops like the one you have with vectorized functions. Unfortunately, your use of a for
loop here is particular in the sense that each iteration has a side-effect to the x
vector. This means that the order of the operations is important and we cannot run the loop iterations independently in parallel or via vectorized functions.
If we cannot get rid of the for
loop, we are left trying to optimize the code within the body of the loop. Let's start with a profile of your code:
x <- sample(10000)
Rprof(tmp <- tempfile())
for (i in 1:10) z <- radix_sort(x)
Rprof()
summaryRprof(tmp)$by.total
# total.time total.pct self.time self.pct
# "radix_sort" 8.26 99.76 0.72 8.70
# "split" 7.34 88.65 0.06 0.72
# "split.default" 7.28 87.92 0.54 6.52
# "as.factor" 6.74 81.40 0.08 0.97
# "factor" 6.64 80.19 1.72 20.77
# "as.character" 4.34 52.42 4.34 52.42
# "unique" 0.42 5.07 0.04 0.48
# "unique.default" 0.38 4.59 0.38 4.59
# "%%" 0.14 1.69 0.14 1.69
# "get_digit" 0.14 1.69 0.00 0.00
# "sort.list" 0.12 1.45 0.02 0.24
# "order" 0.08 0.97 0.06 0.72
# "unlist" 0.06 0.72 0.06 0.72
# [...]
We see that the main culprit here is the use of split
. Another surprise here is the use of order
(it seems to be applied when figuring out the levels of x_digit_i
) which you could consider like cheating given your objective.
So what alternative do we have to your use of split
/unsplit
? Essentially, you have a vector x
that you want to reorder based on a vector of digits x_digit_i
. One way is to use outer
to create a matrix of TRUE
/FALSE
where each column locates a different digit (for a total of ten columns):
z <- outer(x_digit_i, 0:9, "==")
Then, you want to turn this matrix into a vector of indices, such that the first few indices will locate the zeroes, then the ones, etc (the equivalent of idx <- order(x_digit_i)
. You can do so by doing:
idx <- row(z)[z]
or (harder to understand but a bit faster)
idx <- 1L + (which(z) - 1L) %% length(x)
Finally, you just have to do:
x <- x[idx]
Also note that since you are dealing with integers, your code might be a little faster (and more robust by avoiding floating point errors) if you make sure to use integers everywhere possible. In particular, your get_digit
function could be rewritten as follows:
get_digit <- function(x, d) (x %% as.integer(10^d)) %/% as.integer(10^(d-1))
The benchmarks below show decent progress (where my suggestions are implemented under the name sort_radix2
) bridging the gap with the native sort
. I hope it helps!
x <- sample(100)
microbenchmark(radix_sort(x), radix_sort2(x), sort(x))
# Unit: microseconds
# expr min lq mean median uq max neval
# radix_sort(x) 964.692 972.3675 1025.35180 984.3775 1012.178 2233.397 100
# radix_sort2(x) 250.642 256.5720 282.58952 261.2910 282.449 1266.061 100
# sort(x) 82.270 86.1605 92.22669 88.0230 90.943 223.249 100
x <- sample(10000)
microbenchmark(radix_sort(x), radix_sort2(x), sort(x))
# Unit: microseconds
# expr min lq mean median uq max neval
# radix_sort(x) 71939.706 76147.1715 80028.7541 78389.8140 81512.4140 144632.484 100
# radix_sort2(x) 24218.810 27613.3190 34841.8724 29477.7115 31772.9415 143283.337 100
# sort(x) 411.691 454.4015 563.4825 492.6165 558.0925 3412.719 100
edited 2 days ago
answered Nov 18 at 2:02
flodel
3,2021915
3,2021915
Thanks Flodel, that was instructive :)
– Tal Galili
2 days ago
add a comment |
Thanks Flodel, that was instructive :)
– Tal Galili
2 days ago
Thanks Flodel, that was instructive :)
– Tal Galili
2 days ago
Thanks Flodel, that was instructive :)
– Tal Galili
2 days ago
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f207144%2fradix-sort-function-in-r%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
4
Welcome, currently your title makes your question sound off-topic and so is likely the cause of your downvote. This is as asking us to write algorithms for you is off-topic. I'd recommend that you reword it like the body of your question.
– Peilonrayz
Nov 7 at 14:38
I don't know what kind of answer you expect. I already told you why I believe it impossible to come close to R's implementation which is in compiled code. If you really want something fast, you can't implement it in pure R. That's why we have Rcpp.
– Roland
Nov 7 at 15:31
1
I've told you before and I'll say it again. Profile your code! Hadley covers how to do that as well in his book. I bet you'll see that most of the time is spent in the
split
calls, probably followed byunlist
. If you want to speed this up, you need to avoid these.– Roland
Nov 8 at 7:45
1
Then here is another idea to avoid
split
: start withz <- outer(x_digit_i, 0:9, "==")
. Then you you can dox <- x[row(z)[z]]
. Or slightly faster,x <- x[1L + (which(z) - 1L) %% length(x)]
– flodel
Nov 9 at 0:24
1
And another small improvement:
%%
,%/%
, and==
are faster with integers, so use(x %% as.integer(10^d)) %/% as.integer(10^(d-1))
insideget_digit
. It might also make the code more robust to floating point error issues when using==
like I suggested.– flodel
Nov 9 at 1:42