I have two df
df_1
| X | X1 | X2 | X3 |
|---|---|---|---|
| A | B | C | D |
| E | E | F | G |
| H | I | J | L |
and another, df_2
| X | X4 | X5 |
|---|---|---|
| Z | Y | W |
| P | O | S |
I would like to merge both by column name and integrate the elements
| X | X1 | X2 | X3 | X4 | X5 |
|---|---|---|---|---|---|
| A | B | C | D | Y | W |
| E | E | F | G | O | S |
| H | I | J | L | NA | NA |
| Z | NA | NA | NA | NA | NA |
| P | NA | NA | NA | NA | NA |
In dplyr, you can use bind_rows and then order NAs and non-NAs:
library(dplyr)
bind_rows(df_1, df_2) |>
mutate(across(everything(), ~ .x[order(is.na(.x))]))
# X X1 X2 X3 X4 X5
#1 A B C D Y W
#2 E E F G O S
#3 H I J L <NA> <NA>
#4 Z <NA> <NA> <NA> <NA> <NA>
#5 P <NA> <NA> <NA> <NA> <NA>
With base R you can first bind rows with different columns and then move NA values to the end
mismatch_rbind <- function(a, b) {
a[setdiff(names(b), names(a))] <- NA
b[setdiff(names(a), names(b))] <- NA
rbind(a, b)
}
na_last <- function(x) {
r <- x[!is.na(x)]
length(r) <- length(x)
r
}
out <- mismatch_rbind(df_1, df_2)
out[] <- lapply(out, na_last)
out
# X X1 X2 X3 X4 X5
# 1 A B C D Y W
# 2 E E F G O S
# 3 H I J L <NA> <NA>
# 4 Z <NA> <NA> <NA> <NA> <NA>
# 5 P <NA> <NA> <NA> <NA> <NA>
Another base R solution: iterate over all column names to make a list of combined columns; pad with NAs to the same length; and coerce back to a dataframe.
new_cols <- union(names(df_1), names(df_2)) |>
setNames(nm = _) |>
lapply(\(x) c(df_1[[x]], df_2[[x]]))
max_len <- max(sapply(new_cols, length))
new_cols |>
lapply(\(x) {
length(x) <- max_len
x
}) |>
as.data.frame()
X X1 X2 X3 X4 X5
1 A B C D Y W
2 E E F G O S
3 H I J L <NA> <NA>
4 Z <NA> <NA> <NA> <NA> <NA>
5 P <NA> <NA> <NA> <NA> <NA>
solution in data.table
library(data.table)
setDT(df_1)
setDT(df_2)
rbindlist(list(df_1, df_2), fill = TRUE)[, lapply(.SD, \(x) na.omit(x)[1:.N])]
results
X X1 X2 X3 X4 X5
1: A B C D Y W
2: E E F G O S
3: H I J L <NA> <NA>
4: Z <NA> <NA> <NA> <NA> <NA>
5: P <NA> <NA> <NA> <NA> <NA>