Context:
I have a df like this:
| title | text |
|---|---|
| Donald Trump Sends Out $15B | Donald Trump just couldn't wish all Americans |
| Drunk Bragging Trump Staffer Started | House Intelligence Committee Chairman Devin |
| ... | ... |
Both title and text are of object datatype
I am trying to run the following code:
for i in range (0, len(msg)):
review = re.sub('[^a-zA-Z]',' ', df['title'][i])
review = review.lower()
review = review.split()
review = [ps.stem(word) for word in review if not word in stopwords.words('english')]
review = ' '.join(review)
corpus.append(review)
Error:
However, I am getting the following error on re.sub line:
TypeError: expected string or bytes-like object
I referred to this question. But no progress. I am still getting same error.
Desired output:
>code: corpus[0:1]
>Result: [['donald trump send b'], ['drink brag trump staffer start']]
What I tried?
I tried all the possibilities from the above SO link. Also, tried changing the datatype of column by df['title'] = df['title'].astype('string'). Getting same error :(
Additional info:
- If I use different code to replace non-alphabets and try to run, I am getting
AttributeError: 'Series' object has no attribute 'lower'error inlower()line - I have a different df in different notebook. This code works perfect (object being datatype)
Any help would be appreciated!