One approach to this (which is sure to be a duplicate answer somewhere?) is:
- reshape the data to long format
- nest the data to create a column of tibbles containing the independent and dependent values
- run each regression and store the results in a new column
- tidy the regression output and unnest the data
Your example data is not usable, so let's use mtcars. Assume we want to predict mpg using each of disp, hp, drat, wt and qsec. First we select, gather and nest:
library(dplyr)
library(tidyr)
library(purrr)
library(broom)
mtcars %>%
select(mpg, disp, hp, drat, wt, qsec) %>%
gather(Var, Val, -mpg) %>%
nest(data = c(mpg, Val))
Result:
# A tibble: 5 x 2
Var data
<chr> <list>
1 disp <tibble [32 x 2]>
2 hp <tibble [32 x 2]>
3 drat <tibble [32 x 2]>
4 wt <tibble [32 x 2]>
5 qsec <tibble [32 x 2]>
Now we can map each row to a regression, and create the column of tidied output:
mtcars %>%
select(mpg, disp, hp, drat, wt, qsec) %>%
gather(Var, Val, -mpg) %>%
nest(data = c(mpg, Val)) %>%
mutate(model = map(data, ~lm(mpg ~ Val, data = .)),
tidied = map(model, tidy))
# A tibble: 5 x 4
Var data model tidied
<chr> <list> <list> <list>
1 disp <tibble [32 x 2]> <lm> <tibble [2 x 5]>
2 hp <tibble [32 x 2]> <lm> <tibble [2 x 5]>
3 drat <tibble [32 x 2]> <lm> <tibble [2 x 5]>
4 wt <tibble [32 x 2]> <lm> <tibble [2 x 5]>
5 qsec <tibble [32 x 2]> <lm> <tibble [2 x 5]>
And finally, select the columns we want and unnest:
mtcars %>%
select(mpg, disp, hp, drat, wt, qsec) %>%
gather(Var, Val, -mpg) %>%
nest(data = c(mpg, Val)) %>%
mutate(model = map(data, ~lm(mpg ~ Val, data = .)),
tidied = map(model, tidy)) %>%
select(-model, -data) %>%
unnest(cols = c(tidied))
Result:
# A tibble: 10 x 6
Var term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 disp (Intercept) 29.6 1.23 24.1 3.58e-21
2 disp Val -0.0412 0.00471 -8.75 9.38e-10
3 hp (Intercept) 30.1 1.63 18.4 6.64e-18
4 hp Val -0.0682 0.0101 -6.74 1.79e- 7
5 drat (Intercept) -7.52 5.48 -1.37 1.80e- 1
6 drat Val 7.68 1.51 5.10 1.78e- 5
7 wt (Intercept) 37.3 1.88 19.9 8.24e-19
8 wt Val -5.34 0.559 -9.56 1.29e-10
9 qsec (Intercept) -5.11 10.0 -0.510 6.14e- 1
10 qsec Val 1.41 0.559 2.53 1.71e- 2
You could add filters from dplyr::filter(), for example to remove the Intercept rows, or select a p-value threshold.
mtcars %>%
select(mpg, disp, hp, drat, wt, qsec) %>%
gather(Var, Val, -mpg) %>%
nest(data = c(mpg, Val)) %>%
mutate(model = map(data, ~lm(mpg ~ Val, data = .)),
tidied = map(model, tidy)) %>%
select(-model, -data) %>%
unnest(cols = c(tidied)) %>%
filter(p.value < 0.01,
term != "(Intercept)")
# A tibble: 5 x 6
Var term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 disp Val -0.0412 0.00471 -8.75 9.38e-10
2 hp Val -0.0682 0.0101 -6.74 1.79e- 7
3 drat Val 7.68 1.51 5.10 1.78e- 5
4 wt Val -5.34 0.559 -9.56 1.29e-10