I have a dataset containing several variables and I wish to statistically test the variances (Kruskal-test) for each variable seperately.
My data (df) looks like that: (carbon and nitrogen content for diffrent agricultural managements (see name)). I have 16 groups (to simplify it, I´d say, I have got 8 groups):
extract of the data
1. List item
name N_cont C_cont agriculture
C_ero 1,064 8,380 1
C_ero 0,961 8,086 1
C_ero 0,977 8,331 1
Ds_ero 1,767 17,443 2
Ds_ero 1,802 18,264 2
Ds_ero 2,083 20,112 2
Ms_ero 1,547 14,380 3
Ms_ero 1,566 15,313 3
Ms_ero 1,505 14,760 3
Md_ero 1,512 14,303 4
Md_ero 1,656 15,331 4
Md_ero 1,500 13,788 4
C_upsl 1,121 10,581 5
C_upsl 1,159 10,460 5
C_upsl 1,223 10,171 5
Ds_upsl 1,962 20,656 6
Ds_upsl 1,784 16,780 6
Ds_upsl 1,720 17,482 6
Ms_upsl 1,578 16,228 7
Ms_upsl 1,634 15,331 7
Ms_upsl 1,394 13,419 7
Md_upsl 1,286 11,824 8
Md_upsl 1,241 11,452 8
Md_upsl 1,317 11,932 8
I already put a factor for the agriculture
df$agriculture<-factor(df$agriculture)
I can do statistical tests compairing all of the 16 groups.
e.g. kruskal.test(df$C,df$agriculture)
But now I would like to do statistic tests just for specific groups out of the 8 groups, e.g. those which contain e.g. an C (Conventional) or rather DS (Direct seeding) in the name column
or e.g. ero (eroding site) or upsl (upper slope)
It did try grep or split, but it did not work, because the dimension of x and y should be the same.
Do you have any clue?