I'm wondering is there any method to check a Chinese character is simplified Chinese or traditional Chinese in Python 3?
Asked
Active
Viewed 2,847 times
5
-
http://cjklib.org/0.3/library/cjklib.characterlookup.html seems to hold some promise but I'm not competent to write a useful answer from that. – tripleee Sep 12 '15 at 18:16
-
related: [What's the complete range for Chinese characters in Unicode?](http://stackoverflow.com/q/1366068/4279) – jfs Sep 13 '15 at 16:14
2 Answers
6
cjklib does not support Python 3. In Python 3, you can use hanzidentifier.
import hanzidentifier
print(hanzidentifier.has_chinese('Hello my name is John.'))
》 False
print(hanzidentifier.has_chinese('Country in Simplified: 国家. Country in Traditional: 國家.'))
》 True
print(hanzidentifier.is_simplified('John说:你好!'))
》 True
print(hanzidentifier.is_traditional('John說:你好!'))
》 True
Blckknght
- 100,903
- 11
- 120
- 169
Hong Zher Tan
- 61
- 1
- 3
1
You can use getCharacterVariants() in cjklib to query the character's simplified (S) and traditional (T) variants. As described in the Unihan database documentation, you can use this data to determine the classification for a character.
一二三
- 21,059
- 11
- 65
- 74