Hello I have a very large data frame and it is a partial part:
v1 <- c('i1', 'i10', 'i11')
v2 <- c(0.11, 0.07, 0.114)
v3 <- c(0.07, 0.08, 0.03)
df <- data.frame(cbind(v1, v2, v3))
How can I write some codes to convert each row into a combined vector, x <- c()?
that is, my expected output should be and the variable names need to be from column V1 :
i1 <- c(0.11014318, 0.07302843, 0.01360761, 0.10619829, 0.14513045)
i10 <- c(0.07360007, 0.08013833, 0.13104657, 0.13174247, 0.14256615)
i11 <- c(0.11418245, 0.03300573, 0.11425297, 0.13686428, 0.03367279)
After converting each row into a vector, I need to compute the cosine similarity among these vectors so that's why I need to split each row and save them as vectors with names from the first column V1.
library(lsa)
cosine(i1, i10)
cosine(i1, i11)
cosine(i10, i11)
The following question
Hello SamR. Thanks for your kind help but I do not know why it does not work when adding more columns V4 and V5 and one more row with the ID i12? Thanks so much for your patience and help.
data_matrix <- function(df){
data_matrix <- tail(t(df), -1) |>
sapply(as.numeric) |>
matrix(
nrow = ncol(df)-1,
ncol = nrow(df),
dimnames = list(
seq_len(nrow(df)-1), # rows
df[,1] # columns
)
)
}
v1 <- c('i1', 'i10', 'i11', 'i12')
v2 <- c(0.11, 0.07, 0.114, 0.67)
v3 <- c(0.07, 0.08, 0.03, 087)
v4 <- c(0.12, 0.13, 0.14, 0.18)
v5 <- c(0.19, 0.21, 0.22, 0.22)
df <- data.frame(cbind(v1, v2, v3, v4, v5))
df
data_matrix(df)
It just returns the error:
Error in matrix(sapply(tail(t(df), -1), as.numeric), nrow = ncol(df) - :
length of 'dimnames' [1] not equal to array extent
