I'm working on a dataframe where it's important to keep the order. I would like to split it into chunks that I process afterwards.
The splitting is done based on the 3rd column type, all contiguous records with the same value of columns type (or any given categorical column) should be in one chunk, and if possible i want to done it in a pythonic way.
but I can only think of solutions where I iterate through the df. Consider that this will have to work on dataframes with tens of thousands of entries roughly, and I have no idea of the fastest strategy to do so. Here is a small example of what I have:
value_1 value_2 type
0 -0.005842 -0.494596 a
1 0.697689 0.354717 a
2 -0.354206 -1.776550 a
3 2.154078 0.344629 a
4 1.072475 1.004945 a
5 -1.338075 0.175607 b
6 -1.913883 -0.123627 b
7 -0.021376 -0.170775 b
8 -0.274882 -0.043913 b
9 0.676371 -0.691243 b
10 0.440201 -0.577944 c
11 -0.689345 -0.445433 b
12 1.540386 -1.084499 c
13 0.236204 -0.072807 b
14 -0.257084 0.848501 c
15 0.681666 -0.265254 b
16 -1.168614 -0.359998 c
17 0.355938 1.529444 b
18 0.292976 -0.301847 c
19 0.670068 0.735191 b
20 0.551594 -0.074768 a
21 -1.251568 -0.022201 a
22 0.376663 -1.556191 a
23 -0.266714 0.860436 d
24 -0.871324 1.014529 d
25 1.504529 -0.657725 d
And here is how I would like to split it
value_1 value_2 type
0 1.411723 -0.836490 a
1 0.482826 1.625925 a
2 -0.054475 2.046166 a
3 0.020816 0.155194 a
4 0.840539 0.287658 a
value_1 value_2 type
5 0.257208 -2.311165 b
6 -1.545194 -0.193307 b
7 0.197849 -1.276644 b
8 0.074072 -0.172764 b
9 -2.562816 0.393645 b
value_1 value_2 type
10 0.258265 -0.978293 c
value_1 value_2 type
11 -0.804841 -0.78802 b
value_1 value_2 type
12 -0.509034 1.116428 c
value_1 value_2 type
13 -0.264252 1.025199 b
value_1 value_2 type
14 -0.268105 -0.795613 c
value_1 value_2 type
15 0.481051 0.184827 b
value_1 value_2 type
16 1.242139 0.401806 c
value_1 value_2 type
17 1.301684 0.281108 b
value_1 value_2 type
18 0.189178 0.894425 c
value_1 value_2 type
19 -0.093207 0.894564 b
value_1 value_2 type
20 -2.231735 0.250696 a
21 -0.276050 -0.712792 a
22 0.298974 -0.529791 a
value_1 value_2 type
23 0.115159 2.769695 d
24 0.636069 -1.066387 d
25 1.048230 1.500125 d
Something like a groupby that gives back just a list of slices according to the value of the chosen column would be perfect, I haven't found any existing function like that