Consider this dataframe as MRE:
>>> df
image_id class_name
0 47ed17dcb2cbeec15182ed335a8b5a9e Nodule/Mass # <- dup 1
1 47ed17dcb2cbeec15182ed335a8b5a9e Aortic enlargement # <- dup 1
2 47ed17dcb2cbeec15182ed335a8b5a9e Pulmonary fibrosis # <- dup 1
3 7c1add6833d5f0102b0d3619a1682a64 Lung Opacity # <- dup 2
4 7c1add6833d5f0102b0d3619a1682a64 Pulmonary fibrosis # <- dup 2
5 5550a493b1c4554da469a072fdfab974 No finding # <- dup 3
6 5550a493b1c4554da469a072fdfab974 No finding # <- dup 3
To get expect outcome, you need to group rows by image_id and join all values from class_name together and separated by ' | ':
>>> df.groupby('image_id')['class_name'].apply(lambda x: ' | '.join(sorted(set(x))))
image_id
47ed17dcb2cbeec15182ed335a8b5a9e Aortic enlargement | Nodule/Mass | Pulmonary f...
5550a493b1c4554da469a072fdfab974 No finding
7c1add6833d5f0102b0d3619a1682a64 Lung Opacity | Pulmonary fibrosis
Use set to remove class_name duplicates for a same image_id and sorted to get class_name lexicographical ordered.
Update
You can use MultiIndex to show correctly your duplicated rows. Try:
>>> df.set_index(['image_id', 'class_name']).sort_index()
class_id rad_id x_min y_min x_max y_max width height
image_id class_name
000434271f63a053c4128a0ba6352c7f No finding 14 R6 NaN NaN NaN NaN 2336 2836
No finding 14 R2 NaN NaN NaN NaN 2336 2836
No finding 14 R3 NaN NaN NaN NaN 2336 2836
00053190460d56c53cc3e57321387478 No finding 14 R11 NaN NaN NaN NaN 1994 2430
No finding 14 R2 NaN NaN NaN NaN 1994 2430
... ... ... ... ... ... ... ... ...
fff0f82159f9083f3dd1f8967fc54f6a No finding 14 R9 NaN NaN NaN NaN 2048 2500
No finding 14 R14 NaN NaN NaN NaN 2048 2500
fff2025e3c1d6970a8a6ee0404ac6940 No finding 14 R1 NaN NaN NaN NaN 1994 2150
No finding 14 R5 NaN NaN NaN NaN 1994 2150
No finding 14 R2 NaN NaN NaN NaN 1994 2150
[67914 rows x 8 columns]