I'm trying to figure how to split name from surname, into two new dataframe columns.
name is always in UPPERCASE, whilst surname is in title case. Without losing information.
There are a number of Stack Overflow questions, but I'm not certain how to use them with a pandas dataframe column:
- Regex to match only uppercase “words” with some exceptions
- How to extract all UPPER from a string? Python
for example:
data = {'Naam aanvrager': ['DREGGHE Joannes', 'MAHIEU Leo', 'NIEUWENHUIJSE', 'COPPENS', 'VERBURGHT Cornelis', 'NUYTTENS Adriaen', 'DE LARUELLE Pieter', 'VAN VIJVER', 'SILBO Martinus', 'STEEMAERE Anthone']}
df = pd.DataFrame(data)
Naam aanvrager
0 DREGGHE Joannes
1 MAHIEU Leo
2 NIEUWENHUIJSE
3 COPPENS
4 VERBURGHT Cornelis
5 NUYTTENS Adriaen
6 DE LARUELLE Pieter
7 VAN VIJVER
8 SILBO Martinus
9 STEEMAERE Anthone
the wanted output (two extra columns "Name" and "Surname"):
| name | surname |
|---|---|
| DREGGHE | Joannes |
| MAHIEU | Leo |
| NIEUWENHUIJSE | |
| COPPENS | |
| VERBURGHT | Cornelis |
| NUYTTENS | Adriaen |
| DE LAURELLE | Pieter |
| VAN VIJVER | |
| SILBO | Martinus |
| STEEMAERE | Anthone |