Average neighbours inside a vector
up vote
9
down vote
favorite
My data :
data <- c(1,5,11,15,24,31,32,65)
There are 2 neighbours: 31 and 32. I wish to remove them and keep only the mean value (e.g. 31.5), in such a way data would be :
data <- c(1,5,11,15,24,31.5,65)
It seems simple, but I wish to do it automatically, and sometimes with vectors containing more neighbours. For instance :
data_2 <- c(1,5,11,15,24,31,32,65,99,100,101,140)
r vector difference neighbours
add a comment |
up vote
9
down vote
favorite
My data :
data <- c(1,5,11,15,24,31,32,65)
There are 2 neighbours: 31 and 32. I wish to remove them and keep only the mean value (e.g. 31.5), in such a way data would be :
data <- c(1,5,11,15,24,31.5,65)
It seems simple, but I wish to do it automatically, and sometimes with vectors containing more neighbours. For instance :
data_2 <- c(1,5,11,15,24,31,32,65,99,100,101,140)
r vector difference neighbours
Is this only about pairs of consecutive numbers or also about longer runs, e.g. 31, 32, 33, 34?
– Klaus Gütter
13 hours ago
It could be also longer runs (like 99, 100, 101 in data_2)
– Loulou
13 hours ago
1
Maybe use thecumsum(...diff(...idiom to create groups, liketapply(data, cumsum(c(1L, diff(data) > 1)), mean)
– Henrik
13 hours ago
Is your data sorted?
– Konrad Rudolph
11 hours ago
Yes, always growing order
– Loulou
10 hours ago
add a comment |
up vote
9
down vote
favorite
up vote
9
down vote
favorite
My data :
data <- c(1,5,11,15,24,31,32,65)
There are 2 neighbours: 31 and 32. I wish to remove them and keep only the mean value (e.g. 31.5), in such a way data would be :
data <- c(1,5,11,15,24,31.5,65)
It seems simple, but I wish to do it automatically, and sometimes with vectors containing more neighbours. For instance :
data_2 <- c(1,5,11,15,24,31,32,65,99,100,101,140)
r vector difference neighbours
My data :
data <- c(1,5,11,15,24,31,32,65)
There are 2 neighbours: 31 and 32. I wish to remove them and keep only the mean value (e.g. 31.5), in such a way data would be :
data <- c(1,5,11,15,24,31.5,65)
It seems simple, but I wish to do it automatically, and sometimes with vectors containing more neighbours. For instance :
data_2 <- c(1,5,11,15,24,31,32,65,99,100,101,140)
r vector difference neighbours
r vector difference neighbours
edited 13 hours ago
asked 13 hours ago
Loulou
1186
1186
Is this only about pairs of consecutive numbers or also about longer runs, e.g. 31, 32, 33, 34?
– Klaus Gütter
13 hours ago
It could be also longer runs (like 99, 100, 101 in data_2)
– Loulou
13 hours ago
1
Maybe use thecumsum(...diff(...idiom to create groups, liketapply(data, cumsum(c(1L, diff(data) > 1)), mean)
– Henrik
13 hours ago
Is your data sorted?
– Konrad Rudolph
11 hours ago
Yes, always growing order
– Loulou
10 hours ago
add a comment |
Is this only about pairs of consecutive numbers or also about longer runs, e.g. 31, 32, 33, 34?
– Klaus Gütter
13 hours ago
It could be also longer runs (like 99, 100, 101 in data_2)
– Loulou
13 hours ago
1
Maybe use thecumsum(...diff(...idiom to create groups, liketapply(data, cumsum(c(1L, diff(data) > 1)), mean)
– Henrik
13 hours ago
Is your data sorted?
– Konrad Rudolph
11 hours ago
Yes, always growing order
– Loulou
10 hours ago
Is this only about pairs of consecutive numbers or also about longer runs, e.g. 31, 32, 33, 34?
– Klaus Gütter
13 hours ago
Is this only about pairs of consecutive numbers or also about longer runs, e.g. 31, 32, 33, 34?
– Klaus Gütter
13 hours ago
It could be also longer runs (like 99, 100, 101 in data_2)
– Loulou
13 hours ago
It could be also longer runs (like 99, 100, 101 in data_2)
– Loulou
13 hours ago
1
1
Maybe use the
cumsum(...diff(... idiom to create groups, like tapply(data, cumsum(c(1L, diff(data) > 1)), mean)– Henrik
13 hours ago
Maybe use the
cumsum(...diff(... idiom to create groups, like tapply(data, cumsum(c(1L, diff(data) > 1)), mean)– Henrik
13 hours ago
Is your data sorted?
– Konrad Rudolph
11 hours ago
Is your data sorted?
– Konrad Rudolph
11 hours ago
Yes, always growing order
– Loulou
10 hours ago
Yes, always growing order
– Loulou
10 hours ago
add a comment |
4 Answers
4
active
oldest
votes
up vote
6
down vote
accepted
Here is another idea that creates an id via cumsum(c(TRUE, diff(a) > 1)), where 1 shows the gap threshold, i.e.
#our group variable
grp <- cumsum(c(TRUE, diff(a) > 1))
#keep only groups with length 1 (i.e. with no neighbor)
i1 <- a[!!!ave(a, grp, FUN = function(i) length(i) > 1)]
#Find the mean of the groups with more than 1 rows,
i2 <- unname(tapply(a, grp, function(i)mean(i[length(i) > 1])))
#Concatenate the above 2 (eliminating NAs from i2) to get final result
c(i1, i2[!is.na(i2)])
#[1] 1.0 5.0 11.0 15.0 24.0 65.0 31.5
You can also wrap it in a function. I left the gap as a parameter so you can adjust,
get_vec <- function(x, gap) {
grp <- cumsum(c(TRUE, diff(x) > gap))
i1 <- x[!!!ave(x, grp, FUN = function(i) length(i) > 1)]
i2 <- unname(tapply(x, grp, function(i) mean(i[length(i) > 1])))
return(c(i1, i2[!is.na(i2)]))
}
get_vec(a, 1)
#[1] 1.0 5.0 11.0 15.0 24.0 65.0 31.5
get_vec(a_2, 1)
#[1] 1.0 5.0 11.0 15.0 24.0 65.0 140.0 31.5 100.0
DATA:
a <- c(1,5,11,15,24,31,32,65)
a_2 <- c(1, 5, 11, 15, 24, 31, 32, 65, 99, 100, 101, 140)
add a comment |
up vote
3
down vote
Here is my solution, which uses run-length encoding to identify groups:
foo <- function(x) {
y <- x - seq_along(x) #normalize to zero differences in groups
ind <- rle(y) #run-length encoding
ind$values <- ind$lengths != 1 #to find groups
ind$values[ind$values] <- cumsum(ind$values[ind$values]) #group ids
ind <- inverse.rle(ind)
xnew <- x
xnew[ind != 0] <- ave(x, ind, FUN = mean)[ind != 0] #calculate means
xnew[!(duplicated(ind) & ind != 0)] #remove duplicates from groups
}
foo(data)
#[1] 1.0 5.0 11.0 15.0 24.0 31.5 65.0
foo(data_2)
#[1] 1.0 5.0 11.0 15.0 24.0 31.5 65.0 100.0 140.0
data_3 <- c(1, 2, 4, 1, 2)
foo(data_3)
#[1] 1.5 4.0 1.5
I assume that you don't need an extremely efficient solution. If you do, I'd recommend a simple C++ for loop in Rcpp.
add a comment |
up vote
2
down vote
I have a data.table based solution, same could be translated into dplyr I guess:
library(data.table)
df <- data.table(data2 = c(1,5,11,15,24,31,32,65,99,100,101,140))
df[,neighbours := ifelse(c(0,diff(data_2)) == 1,1,0)]
df[,neighbours := c(neighbours[1:(.N-1)],1),by = rleid(neighbours)]
df[,neigh_seq := rleid(neighbours)]
unique(df[,ifelse(neighbours == 1,mean(data2),data2),by = neigh_seq])
neigh_seq V1
1: 1 1.0
2: 1 5.0
3: 1 11.0
4: 1 15.0
5: 1 24.0
6: 2 31.5
7: 3 65.0
8: 4 100.0
9: 5 140.0
What it does :
first line set neigbours to 1 if the difference with following number is 1
1: 1 0
2: 5 0
3: 11 0
4: 15 0
5: 24 0
6: 31 0
7: 32 1
8: 65 0
9: 99 0
10: 100 1
11: 101 1
12: 140 0
I wanr to group so that neighbour variable is 1 for all neigbours. I need to add 1 to each end of each groups:
df[,neighbours := c(neighbours[1:(.N-1)],1),by = rleid(neighbours)]
data2 neighbours
1: 1 0
2: 5 0
3: 11 0
4: 15 0
5: 24 0
6: 31 1
7: 32 1
8: 65 0
9: 99 1
10: 100 1
11: 101 1
12: 140 0
Then after I just do a grouping on changing neighbour value, and set the value to mean if they are neihbours
df[,ifelse(neighbours == 1,mean(data2),data2),by = rleid(neighbours)]
rleid V1
1: 1 1.0
2: 1 5.0
3: 1 11.0
4: 1 15.0
5: 1 24.0
6: 2 31.5
7: 2 31.5
8: 3 65.0
9: 4 100.0
10: 4 100.0
11: 4 100.0
12: 5 140.0
and take the unique values. And voila.
add a comment |
up vote
0
down vote
This is a dplyr version, also using as a grouping variable cumsum(c(1,diff(x)!=1)):
library(dplyr)
data_2 %>% data.frame(x = .) %>%
group_by(id = cumsum(c(1,diff(x)!=1))) %>%
summarise(res = mean(x)) %>%
select(res)
# A tibble: 9 x 1
res
<dbl>
1 1.0
2 5.0
3 11.0
4 15.0
5 24.0
6 31.5
7 65.0
8 100.0
9 140.0
add a comment |
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
6
down vote
accepted
Here is another idea that creates an id via cumsum(c(TRUE, diff(a) > 1)), where 1 shows the gap threshold, i.e.
#our group variable
grp <- cumsum(c(TRUE, diff(a) > 1))
#keep only groups with length 1 (i.e. with no neighbor)
i1 <- a[!!!ave(a, grp, FUN = function(i) length(i) > 1)]
#Find the mean of the groups with more than 1 rows,
i2 <- unname(tapply(a, grp, function(i)mean(i[length(i) > 1])))
#Concatenate the above 2 (eliminating NAs from i2) to get final result
c(i1, i2[!is.na(i2)])
#[1] 1.0 5.0 11.0 15.0 24.0 65.0 31.5
You can also wrap it in a function. I left the gap as a parameter so you can adjust,
get_vec <- function(x, gap) {
grp <- cumsum(c(TRUE, diff(x) > gap))
i1 <- x[!!!ave(x, grp, FUN = function(i) length(i) > 1)]
i2 <- unname(tapply(x, grp, function(i) mean(i[length(i) > 1])))
return(c(i1, i2[!is.na(i2)]))
}
get_vec(a, 1)
#[1] 1.0 5.0 11.0 15.0 24.0 65.0 31.5
get_vec(a_2, 1)
#[1] 1.0 5.0 11.0 15.0 24.0 65.0 140.0 31.5 100.0
DATA:
a <- c(1,5,11,15,24,31,32,65)
a_2 <- c(1, 5, 11, 15, 24, 31, 32, 65, 99, 100, 101, 140)
add a comment |
up vote
6
down vote
accepted
Here is another idea that creates an id via cumsum(c(TRUE, diff(a) > 1)), where 1 shows the gap threshold, i.e.
#our group variable
grp <- cumsum(c(TRUE, diff(a) > 1))
#keep only groups with length 1 (i.e. with no neighbor)
i1 <- a[!!!ave(a, grp, FUN = function(i) length(i) > 1)]
#Find the mean of the groups with more than 1 rows,
i2 <- unname(tapply(a, grp, function(i)mean(i[length(i) > 1])))
#Concatenate the above 2 (eliminating NAs from i2) to get final result
c(i1, i2[!is.na(i2)])
#[1] 1.0 5.0 11.0 15.0 24.0 65.0 31.5
You can also wrap it in a function. I left the gap as a parameter so you can adjust,
get_vec <- function(x, gap) {
grp <- cumsum(c(TRUE, diff(x) > gap))
i1 <- x[!!!ave(x, grp, FUN = function(i) length(i) > 1)]
i2 <- unname(tapply(x, grp, function(i) mean(i[length(i) > 1])))
return(c(i1, i2[!is.na(i2)]))
}
get_vec(a, 1)
#[1] 1.0 5.0 11.0 15.0 24.0 65.0 31.5
get_vec(a_2, 1)
#[1] 1.0 5.0 11.0 15.0 24.0 65.0 140.0 31.5 100.0
DATA:
a <- c(1,5,11,15,24,31,32,65)
a_2 <- c(1, 5, 11, 15, 24, 31, 32, 65, 99, 100, 101, 140)
add a comment |
up vote
6
down vote
accepted
up vote
6
down vote
accepted
Here is another idea that creates an id via cumsum(c(TRUE, diff(a) > 1)), where 1 shows the gap threshold, i.e.
#our group variable
grp <- cumsum(c(TRUE, diff(a) > 1))
#keep only groups with length 1 (i.e. with no neighbor)
i1 <- a[!!!ave(a, grp, FUN = function(i) length(i) > 1)]
#Find the mean of the groups with more than 1 rows,
i2 <- unname(tapply(a, grp, function(i)mean(i[length(i) > 1])))
#Concatenate the above 2 (eliminating NAs from i2) to get final result
c(i1, i2[!is.na(i2)])
#[1] 1.0 5.0 11.0 15.0 24.0 65.0 31.5
You can also wrap it in a function. I left the gap as a parameter so you can adjust,
get_vec <- function(x, gap) {
grp <- cumsum(c(TRUE, diff(x) > gap))
i1 <- x[!!!ave(x, grp, FUN = function(i) length(i) > 1)]
i2 <- unname(tapply(x, grp, function(i) mean(i[length(i) > 1])))
return(c(i1, i2[!is.na(i2)]))
}
get_vec(a, 1)
#[1] 1.0 5.0 11.0 15.0 24.0 65.0 31.5
get_vec(a_2, 1)
#[1] 1.0 5.0 11.0 15.0 24.0 65.0 140.0 31.5 100.0
DATA:
a <- c(1,5,11,15,24,31,32,65)
a_2 <- c(1, 5, 11, 15, 24, 31, 32, 65, 99, 100, 101, 140)
Here is another idea that creates an id via cumsum(c(TRUE, diff(a) > 1)), where 1 shows the gap threshold, i.e.
#our group variable
grp <- cumsum(c(TRUE, diff(a) > 1))
#keep only groups with length 1 (i.e. with no neighbor)
i1 <- a[!!!ave(a, grp, FUN = function(i) length(i) > 1)]
#Find the mean of the groups with more than 1 rows,
i2 <- unname(tapply(a, grp, function(i)mean(i[length(i) > 1])))
#Concatenate the above 2 (eliminating NAs from i2) to get final result
c(i1, i2[!is.na(i2)])
#[1] 1.0 5.0 11.0 15.0 24.0 65.0 31.5
You can also wrap it in a function. I left the gap as a parameter so you can adjust,
get_vec <- function(x, gap) {
grp <- cumsum(c(TRUE, diff(x) > gap))
i1 <- x[!!!ave(x, grp, FUN = function(i) length(i) > 1)]
i2 <- unname(tapply(x, grp, function(i) mean(i[length(i) > 1])))
return(c(i1, i2[!is.na(i2)]))
}
get_vec(a, 1)
#[1] 1.0 5.0 11.0 15.0 24.0 65.0 31.5
get_vec(a_2, 1)
#[1] 1.0 5.0 11.0 15.0 24.0 65.0 140.0 31.5 100.0
DATA:
a <- c(1,5,11,15,24,31,32,65)
a_2 <- c(1, 5, 11, 15, 24, 31, 32, 65, 99, 100, 101, 140)
edited 11 hours ago
answered 12 hours ago
Sotos
27.2k51640
27.2k51640
add a comment |
add a comment |
up vote
3
down vote
Here is my solution, which uses run-length encoding to identify groups:
foo <- function(x) {
y <- x - seq_along(x) #normalize to zero differences in groups
ind <- rle(y) #run-length encoding
ind$values <- ind$lengths != 1 #to find groups
ind$values[ind$values] <- cumsum(ind$values[ind$values]) #group ids
ind <- inverse.rle(ind)
xnew <- x
xnew[ind != 0] <- ave(x, ind, FUN = mean)[ind != 0] #calculate means
xnew[!(duplicated(ind) & ind != 0)] #remove duplicates from groups
}
foo(data)
#[1] 1.0 5.0 11.0 15.0 24.0 31.5 65.0
foo(data_2)
#[1] 1.0 5.0 11.0 15.0 24.0 31.5 65.0 100.0 140.0
data_3 <- c(1, 2, 4, 1, 2)
foo(data_3)
#[1] 1.5 4.0 1.5
I assume that you don't need an extremely efficient solution. If you do, I'd recommend a simple C++ for loop in Rcpp.
add a comment |
up vote
3
down vote
Here is my solution, which uses run-length encoding to identify groups:
foo <- function(x) {
y <- x - seq_along(x) #normalize to zero differences in groups
ind <- rle(y) #run-length encoding
ind$values <- ind$lengths != 1 #to find groups
ind$values[ind$values] <- cumsum(ind$values[ind$values]) #group ids
ind <- inverse.rle(ind)
xnew <- x
xnew[ind != 0] <- ave(x, ind, FUN = mean)[ind != 0] #calculate means
xnew[!(duplicated(ind) & ind != 0)] #remove duplicates from groups
}
foo(data)
#[1] 1.0 5.0 11.0 15.0 24.0 31.5 65.0
foo(data_2)
#[1] 1.0 5.0 11.0 15.0 24.0 31.5 65.0 100.0 140.0
data_3 <- c(1, 2, 4, 1, 2)
foo(data_3)
#[1] 1.5 4.0 1.5
I assume that you don't need an extremely efficient solution. If you do, I'd recommend a simple C++ for loop in Rcpp.
add a comment |
up vote
3
down vote
up vote
3
down vote
Here is my solution, which uses run-length encoding to identify groups:
foo <- function(x) {
y <- x - seq_along(x) #normalize to zero differences in groups
ind <- rle(y) #run-length encoding
ind$values <- ind$lengths != 1 #to find groups
ind$values[ind$values] <- cumsum(ind$values[ind$values]) #group ids
ind <- inverse.rle(ind)
xnew <- x
xnew[ind != 0] <- ave(x, ind, FUN = mean)[ind != 0] #calculate means
xnew[!(duplicated(ind) & ind != 0)] #remove duplicates from groups
}
foo(data)
#[1] 1.0 5.0 11.0 15.0 24.0 31.5 65.0
foo(data_2)
#[1] 1.0 5.0 11.0 15.0 24.0 31.5 65.0 100.0 140.0
data_3 <- c(1, 2, 4, 1, 2)
foo(data_3)
#[1] 1.5 4.0 1.5
I assume that you don't need an extremely efficient solution. If you do, I'd recommend a simple C++ for loop in Rcpp.
Here is my solution, which uses run-length encoding to identify groups:
foo <- function(x) {
y <- x - seq_along(x) #normalize to zero differences in groups
ind <- rle(y) #run-length encoding
ind$values <- ind$lengths != 1 #to find groups
ind$values[ind$values] <- cumsum(ind$values[ind$values]) #group ids
ind <- inverse.rle(ind)
xnew <- x
xnew[ind != 0] <- ave(x, ind, FUN = mean)[ind != 0] #calculate means
xnew[!(duplicated(ind) & ind != 0)] #remove duplicates from groups
}
foo(data)
#[1] 1.0 5.0 11.0 15.0 24.0 31.5 65.0
foo(data_2)
#[1] 1.0 5.0 11.0 15.0 24.0 31.5 65.0 100.0 140.0
data_3 <- c(1, 2, 4, 1, 2)
foo(data_3)
#[1] 1.5 4.0 1.5
I assume that you don't need an extremely efficient solution. If you do, I'd recommend a simple C++ for loop in Rcpp.
edited 13 hours ago
answered 13 hours ago
Roland
98.5k6106177
98.5k6106177
add a comment |
add a comment |
up vote
2
down vote
I have a data.table based solution, same could be translated into dplyr I guess:
library(data.table)
df <- data.table(data2 = c(1,5,11,15,24,31,32,65,99,100,101,140))
df[,neighbours := ifelse(c(0,diff(data_2)) == 1,1,0)]
df[,neighbours := c(neighbours[1:(.N-1)],1),by = rleid(neighbours)]
df[,neigh_seq := rleid(neighbours)]
unique(df[,ifelse(neighbours == 1,mean(data2),data2),by = neigh_seq])
neigh_seq V1
1: 1 1.0
2: 1 5.0
3: 1 11.0
4: 1 15.0
5: 1 24.0
6: 2 31.5
7: 3 65.0
8: 4 100.0
9: 5 140.0
What it does :
first line set neigbours to 1 if the difference with following number is 1
1: 1 0
2: 5 0
3: 11 0
4: 15 0
5: 24 0
6: 31 0
7: 32 1
8: 65 0
9: 99 0
10: 100 1
11: 101 1
12: 140 0
I wanr to group so that neighbour variable is 1 for all neigbours. I need to add 1 to each end of each groups:
df[,neighbours := c(neighbours[1:(.N-1)],1),by = rleid(neighbours)]
data2 neighbours
1: 1 0
2: 5 0
3: 11 0
4: 15 0
5: 24 0
6: 31 1
7: 32 1
8: 65 0
9: 99 1
10: 100 1
11: 101 1
12: 140 0
Then after I just do a grouping on changing neighbour value, and set the value to mean if they are neihbours
df[,ifelse(neighbours == 1,mean(data2),data2),by = rleid(neighbours)]
rleid V1
1: 1 1.0
2: 1 5.0
3: 1 11.0
4: 1 15.0
5: 1 24.0
6: 2 31.5
7: 2 31.5
8: 3 65.0
9: 4 100.0
10: 4 100.0
11: 4 100.0
12: 5 140.0
and take the unique values. And voila.
add a comment |
up vote
2
down vote
I have a data.table based solution, same could be translated into dplyr I guess:
library(data.table)
df <- data.table(data2 = c(1,5,11,15,24,31,32,65,99,100,101,140))
df[,neighbours := ifelse(c(0,diff(data_2)) == 1,1,0)]
df[,neighbours := c(neighbours[1:(.N-1)],1),by = rleid(neighbours)]
df[,neigh_seq := rleid(neighbours)]
unique(df[,ifelse(neighbours == 1,mean(data2),data2),by = neigh_seq])
neigh_seq V1
1: 1 1.0
2: 1 5.0
3: 1 11.0
4: 1 15.0
5: 1 24.0
6: 2 31.5
7: 3 65.0
8: 4 100.0
9: 5 140.0
What it does :
first line set neigbours to 1 if the difference with following number is 1
1: 1 0
2: 5 0
3: 11 0
4: 15 0
5: 24 0
6: 31 0
7: 32 1
8: 65 0
9: 99 0
10: 100 1
11: 101 1
12: 140 0
I wanr to group so that neighbour variable is 1 for all neigbours. I need to add 1 to each end of each groups:
df[,neighbours := c(neighbours[1:(.N-1)],1),by = rleid(neighbours)]
data2 neighbours
1: 1 0
2: 5 0
3: 11 0
4: 15 0
5: 24 0
6: 31 1
7: 32 1
8: 65 0
9: 99 1
10: 100 1
11: 101 1
12: 140 0
Then after I just do a grouping on changing neighbour value, and set the value to mean if they are neihbours
df[,ifelse(neighbours == 1,mean(data2),data2),by = rleid(neighbours)]
rleid V1
1: 1 1.0
2: 1 5.0
3: 1 11.0
4: 1 15.0
5: 1 24.0
6: 2 31.5
7: 2 31.5
8: 3 65.0
9: 4 100.0
10: 4 100.0
11: 4 100.0
12: 5 140.0
and take the unique values. And voila.
add a comment |
up vote
2
down vote
up vote
2
down vote
I have a data.table based solution, same could be translated into dplyr I guess:
library(data.table)
df <- data.table(data2 = c(1,5,11,15,24,31,32,65,99,100,101,140))
df[,neighbours := ifelse(c(0,diff(data_2)) == 1,1,0)]
df[,neighbours := c(neighbours[1:(.N-1)],1),by = rleid(neighbours)]
df[,neigh_seq := rleid(neighbours)]
unique(df[,ifelse(neighbours == 1,mean(data2),data2),by = neigh_seq])
neigh_seq V1
1: 1 1.0
2: 1 5.0
3: 1 11.0
4: 1 15.0
5: 1 24.0
6: 2 31.5
7: 3 65.0
8: 4 100.0
9: 5 140.0
What it does :
first line set neigbours to 1 if the difference with following number is 1
1: 1 0
2: 5 0
3: 11 0
4: 15 0
5: 24 0
6: 31 0
7: 32 1
8: 65 0
9: 99 0
10: 100 1
11: 101 1
12: 140 0
I wanr to group so that neighbour variable is 1 for all neigbours. I need to add 1 to each end of each groups:
df[,neighbours := c(neighbours[1:(.N-1)],1),by = rleid(neighbours)]
data2 neighbours
1: 1 0
2: 5 0
3: 11 0
4: 15 0
5: 24 0
6: 31 1
7: 32 1
8: 65 0
9: 99 1
10: 100 1
11: 101 1
12: 140 0
Then after I just do a grouping on changing neighbour value, and set the value to mean if they are neihbours
df[,ifelse(neighbours == 1,mean(data2),data2),by = rleid(neighbours)]
rleid V1
1: 1 1.0
2: 1 5.0
3: 1 11.0
4: 1 15.0
5: 1 24.0
6: 2 31.5
7: 2 31.5
8: 3 65.0
9: 4 100.0
10: 4 100.0
11: 4 100.0
12: 5 140.0
and take the unique values. And voila.
I have a data.table based solution, same could be translated into dplyr I guess:
library(data.table)
df <- data.table(data2 = c(1,5,11,15,24,31,32,65,99,100,101,140))
df[,neighbours := ifelse(c(0,diff(data_2)) == 1,1,0)]
df[,neighbours := c(neighbours[1:(.N-1)],1),by = rleid(neighbours)]
df[,neigh_seq := rleid(neighbours)]
unique(df[,ifelse(neighbours == 1,mean(data2),data2),by = neigh_seq])
neigh_seq V1
1: 1 1.0
2: 1 5.0
3: 1 11.0
4: 1 15.0
5: 1 24.0
6: 2 31.5
7: 3 65.0
8: 4 100.0
9: 5 140.0
What it does :
first line set neigbours to 1 if the difference with following number is 1
1: 1 0
2: 5 0
3: 11 0
4: 15 0
5: 24 0
6: 31 0
7: 32 1
8: 65 0
9: 99 0
10: 100 1
11: 101 1
12: 140 0
I wanr to group so that neighbour variable is 1 for all neigbours. I need to add 1 to each end of each groups:
df[,neighbours := c(neighbours[1:(.N-1)],1),by = rleid(neighbours)]
data2 neighbours
1: 1 0
2: 5 0
3: 11 0
4: 15 0
5: 24 0
6: 31 1
7: 32 1
8: 65 0
9: 99 1
10: 100 1
11: 101 1
12: 140 0
Then after I just do a grouping on changing neighbour value, and set the value to mean if they are neihbours
df[,ifelse(neighbours == 1,mean(data2),data2),by = rleid(neighbours)]
rleid V1
1: 1 1.0
2: 1 5.0
3: 1 11.0
4: 1 15.0
5: 1 24.0
6: 2 31.5
7: 2 31.5
8: 3 65.0
9: 4 100.0
10: 4 100.0
11: 4 100.0
12: 5 140.0
and take the unique values. And voila.
edited 12 hours ago
answered 13 hours ago
denis
1,9611218
1,9611218
add a comment |
add a comment |
up vote
0
down vote
This is a dplyr version, also using as a grouping variable cumsum(c(1,diff(x)!=1)):
library(dplyr)
data_2 %>% data.frame(x = .) %>%
group_by(id = cumsum(c(1,diff(x)!=1))) %>%
summarise(res = mean(x)) %>%
select(res)
# A tibble: 9 x 1
res
<dbl>
1 1.0
2 5.0
3 11.0
4 15.0
5 24.0
6 31.5
7 65.0
8 100.0
9 140.0
add a comment |
up vote
0
down vote
This is a dplyr version, also using as a grouping variable cumsum(c(1,diff(x)!=1)):
library(dplyr)
data_2 %>% data.frame(x = .) %>%
group_by(id = cumsum(c(1,diff(x)!=1))) %>%
summarise(res = mean(x)) %>%
select(res)
# A tibble: 9 x 1
res
<dbl>
1 1.0
2 5.0
3 11.0
4 15.0
5 24.0
6 31.5
7 65.0
8 100.0
9 140.0
add a comment |
up vote
0
down vote
up vote
0
down vote
This is a dplyr version, also using as a grouping variable cumsum(c(1,diff(x)!=1)):
library(dplyr)
data_2 %>% data.frame(x = .) %>%
group_by(id = cumsum(c(1,diff(x)!=1))) %>%
summarise(res = mean(x)) %>%
select(res)
# A tibble: 9 x 1
res
<dbl>
1 1.0
2 5.0
3 11.0
4 15.0
5 24.0
6 31.5
7 65.0
8 100.0
9 140.0
This is a dplyr version, also using as a grouping variable cumsum(c(1,diff(x)!=1)):
library(dplyr)
data_2 %>% data.frame(x = .) %>%
group_by(id = cumsum(c(1,diff(x)!=1))) %>%
summarise(res = mean(x)) %>%
select(res)
# A tibble: 9 x 1
res
<dbl>
1 1.0
2 5.0
3 11.0
4 15.0
5 24.0
6 31.5
7 65.0
8 100.0
9 140.0
answered 6 hours ago
Lamia
3,0651717
3,0651717
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53704926%2faverage-neighbours-inside-a-vector%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Is this only about pairs of consecutive numbers or also about longer runs, e.g. 31, 32, 33, 34?
– Klaus Gütter
13 hours ago
It could be also longer runs (like 99, 100, 101 in data_2)
– Loulou
13 hours ago
1
Maybe use the
cumsum(...diff(...idiom to create groups, liketapply(data, cumsum(c(1L, diff(data) > 1)), mean)– Henrik
13 hours ago
Is your data sorted?
– Konrad Rudolph
11 hours ago
Yes, always growing order
– Loulou
10 hours ago