You have an array uint32_t states[] where each state[i] represents 16 states?
To count the number of 0b11 states in the variable uint32_t s you can use the following approach:
First, pre-process s such that every state 0b11 leads to exactly one 1 bit and all other states lead to 0 bits. Then count the numbers of 1 bits.
Pre-Processing
Split s into the left bits l and right bits r of each state.
s AB CD EF GH IJ LM NO PQ RS TU VW XY ZΓ ΔΠ ΦΨ ДЖ
l = s & 0xAAAAAAAA = A0 C0 E0 G0 I0 L0 N0 P0 R0 T0 V0 X0 Z0 Δ0 Φ0 Д0
r = s & 0x55555555 = 0B 0D 0F 0H 0J 0M 0O 0Q 0S 0U 0W 0Y 0Γ 0Π 0Ψ 0Ж
Then align the bits of l and r.
(l >>> 1) = 0A 0C 0E 0G 0I 0L 0N 0P 0R 0T 0V 0X 0Z 0Δ 0Φ 0Д
r = 0B 0D 0F 0H 0J 0M 0O 0Q 0S 0U 0W 0Y 0Γ 0Π 0Ψ 0Ж
Finally, use & to get a 1-bit if and only if the state was 0b11.
(l >>> 1) & r = 0? 0? 0? 0? 0? 0? 0? 0? 0? 0? 0? 0? 0? 0? 0? 0?
Here ? is 1 if the corresponding state was 0b11 and 0 otherwise.
The same result can be achived by the simplified formula (s >>> 1) & s & 0x55555555.
Bit-Counting
To count the 0b11 states in s we only have to count the 1-bits in
(s >>> 1) & s & 0x55555555.
Bit-counting can be done without a loop as explained in the book Hacker's Delight, chapter 5 or in this Stackoverflow answer.
The methods shown here only apply to a single array element. To count the states in your whole array loop over its elements.
Optimization
As pointed out by Lundin in the comments, the operation (s >>> 1) might me expensive if your processors cannot fit uint32_t into its registers. In this case it would be sensible to declare your array states[] not as uint32_t but whatever works best on your processor – the procedure stays the same, you only have to use more or less 555…. If, for some reason you cannot change the type of your array, you can still access it as if it had another type, see how to cast an int array to a byte array in C.