I have two very large arrays X_pos and X_neg with sizes (900000,100) and (1200000,100). I'd like to find out if any row element of arr1 is present on arr2. If there is a match, I need a list of indices for the matching rows. Later I need to remove them and make both arrays having no same row match. For now, I'm using for loops:
list_conflict=[]
for i in range (len(X_pos)):
for j in range (len(X_neg)):
if (X_pos[i]==X_neg[j]):
list_conflict.append([i,j])
fault_index_pos = np.unique([x[0] for x in list_conflict])
fault_index_neg = np.unique([x[1] for x in list_conflict])
X_neg = np.delete(X_neg,fault_index_neg,axis=0)
X_pos = np.delete(X_pos,fault_index_pos,axis=0)
It takes an element of X_pos on outer loop and compares it with every element of X_neg exhaustively. If finds a match, appends indices list_conflict with first element being X_pos position and second X_pos. Then fault_index_pos and fault_index_neg are squeezed into unique elements, since an element of X_pos could be on multiple places of X_pos and list will have recurrent positions. Finally, matching elements are removed with np.delete by taking fault_index lists as index to be deleted.
I'm looking for a faster approach for conflict comparison call it set, vectorization or anything else.
P.S. Example arrays:
X_pos
array([[0,1,2,3,4],[0,1,2,3,5],[0,1,2,3,6],[2,4,5,6,7]])
X_neg
array([[2,4,5,6,7],[0,1,2,3,7],[0,1,2,3,4],[0,1,2,3,9],[1,9,3,2,5]])
It should return the indices that are duplicate in the other array which is list_conflict = [[0,1],[3,1]]. Then by these indices, I should be able to delete same elements in both arrays. Possible duplicate question is what I asked yesterday, but it fails to precisely explain the problem with unnecessary simplification. Those answers are not a solution to this problem and they had to be altered in some way to work.