If, s=url['title'] makes s equal to this:
In [48]: s=u'Oscar Winners Best Pictures Box Set \xc2\xa36.49'
Then the problem is
- in the code that defines
url,
- or else the content from the web is
mal-formed.
If Case 1, we'd need to see the code that defines url.
If Case 2, a quick-and-dirty workaround would be to encode the unicode object s with the raw-unicode-escape codec:
In [49]: print(s)
Oscar Winners Best Pictures Box Set £6.49
In [50]: print(s.encode('raw-unicode-escape'))
Oscar Winners Best Pictures Box Set £6.49
See also this SO question.
Regarding titles like s=u'Star Trek XI £3.99': Again, it would be nice fix the problem before it gets to this stage -- perhaps by looking at how url is defined. But assuming the content from the web is mal-formed, a workaround would be:
In [86]: import re
In [87]: print(re.sub(r'&#x([a-fA-F\d]+);',lambda m: unichr(int(m.group(1),base=16)),s))
Star Trek XI £3.99
A little bit of explanation:
Note that
In [51]: x=u'£'
In [53]: x.encode('utf-8')
Out[53]: '\xc2\xa3'
So the unicode object u'£', encoded with the utf-8 codec, becomes the string object '\xc2\xa3'.
Somehow, url['title'] is getting defined to be the unicode object
u'\xc2\xa3'. (The u makes a big difference!)
Thus we have u'\xc2\xa3' when we desire '\xc2\xa3'.
Encoding the unicode object u'\xc2\xa3' with the raw-unicode-escape codec transforms it to '\xc2\xa3'.