You could use the DateFrame.replace method to replace the values in df_p with values from df_f:
In [124]: df_pf = df_p.replace(df_f['c']); df_pf
Out[124]:
p1 p2 p3 p4 p5
2614 0.143359 0.143359 0.192049 0.192049 0.192049
3735 0.183946 0.725704 0.290311 0.183946 0.143359
1450 0.183946 0.192049 0.183946 0.192049 0.192049
Since Pandas aligns indices before multiplying two DataFrames, if we strip the
ps and cs off the column labels, then we could obtain the desired products
using df_pf.mul(df_c):
df_pf.columns = df_pf.columns.str.extract(r'(\d+)', expand=False)
df_c.columns = df_c.columns.str.extract(r'(\d+)', expand=False)
df_c = df_pf.mul(df_c)
The correct order for the columns for each row can be obtained using np.argsort with axis=1 specified. The order array returned by np.argsort can then be used to reorder df_c and df_p:
order = np.argsort(-df_c.values, axis=1)
nrows, ncols = df_c.shape
np_c = df_c.values[np.arange(nrows)[:,None], order]
np_p = df_p.values[np.arange(nrows)[:,None], order]
The above uses NumPy's advanced integer indexing to reorder the values in each row separately.
import numpy as np
import pandas as pd
df_p = pd.DataFrame({'p1': [104, 100, 100],
'p2': [104, 103, 102],
'p3': [102, 101, 100],
'p4': [102, 100, 102],
'p5': [102, 104, 102]}, index=[2614,3735,1450])
df_c = pd.DataFrame({'c1': [0.33829499999999996, 0.097799999999999998, 0.16092200000000001],
'c2': [0.190882, 0.124296, 0.40370300000000003],
'c3': [0.15723099999999998, 0.26847500000000002, 0.12239000000000001],
'c4': [0.13577600000000001, 0.26511099999999999, 0.13061199999999998],
'c5': [0.177816, 0.24431900000000001, 0.18237300000000001]}, index=[2614,3735,1450])
df_f = pd.DataFrame({'c': [0.183946,
0.29031099999999999,
0.192049,
0.72570400000000002,
0.14335899999999999]}, index=list(range(100,105)))
def using_pandas(df_p, df_c, df_f):
# this works no matter the order of the columns and rows of `df_p` and `df_c`.
# aligns `df_p` and `df_c` based on the numeric part of their column names
df_pf = df_p.replace(df_f['c'])
# change the column names to match since Pandas will align the indices before multiplying
df_pf.columns = df_pf.columns.str.extract(r'(\d+)', expand=False)
df_c.columns = df_c.columns.str.extract(r'(\d+)', expand=False)
df_c = df_pf.mul(df_c)
order = np.argsort(-df_c.values, axis=1)
nrows, ncols = df_c.shape
np_c = df_c.values[np.arange(nrows)[:,None], order]
np_p = df_p.values[np.arange(nrows)[:,None], order]
return np_c, np_p
np_c, np_p = using_pandas(df_p, df_c, df_f)
print(np_c)
print(np_p)
yields
[[ 0.04849763 0.03414938 0.03019606 0.02736465 0.02607565]
[ 0.0902021 0.07794125 0.04876611 0.03502533 0.01798992]
[ 0.07753076 0.03502455 0.02960096 0.0250839 0.02251315]]
[[104 102 102 104 102]
[103 101 100 104 100]
[102 102 100 102 100]]
Alternatively, if the columns and rows of df_p and df_c are already aligned,
then you can gain a bit more speed by doing the multiplication in NumPy instead of Pandas:
def using_numpy(df_p, df_c, df_f):
# faster than using_pandas, but assumes `df_p` and `df_c` are already aligned
df_pf = df_p.replace(df_f['c'])
df_pf = df_pf.values
df_c = df_c.values
df_p = df_p.values
df_c = df_pf * df_c
order = np.argsort(-df_c, axis=1)
nrows, ncols = df_c.shape
np_c = df_c[np.arange(nrows)[:,None], order]
np_p = df_p[np.arange(nrows)[:,None], order]
return np_c, np_p
For these small DataFrames, using_numpy is slightly faster than using_pandas.
The difference in speed would be more pronounced if the DataFrames were larger.
But again, note that using_numpy relies on the indices already being aligned.
In [138]: %timeit using_numpy(df_p, df_c, df_f)
1000 loops, best of 3: 1.15 ms per loop
In [139]: %timeit using_pandas(df_p, df_c, df_f)
1000 loops, best of 3: 1.62 ms per loop