Counting syllables

Question

I'm looking to assign some different readability scores to text in R such as the Flesh Kincaid.

Does anyone know of a way to segment words into syllables using R? I don't necessarily need the syllable segments themselves but a count.

so for instance:

x <- c('dog', 'cat', 'pony', 'cracker', 'shoe', 'Popsicle')

would yield: 1, 1, 2, 2, 1, 3

Each number corresponding the the number of syllables in the word.

Tyler Rinker · Answer 1 · 2014-02-26T23:28:17.657

12

qdap version 1.1.0 does this task:

library(qdap)
x <- c('dog', 'cat', 'pony', 'cracker', 'shoe', 'Popsicle')
syllable_sum(x)

## [1] 1 1 2 2 1 3

edited Feb 26 '14 at 23:28

answered Jan 11 '13 at 06:13

Tyler Rinker

108,132
65
322
519

kfmfe04 · Answer 2 · 2011-12-18T17:52:36.643

8

gsk3 is correct: if you want a correct solution, it is non-trivial.

For example, you have to watch out for strange things like silent e at the end of a word (eg pane), or know when it's not silent, as in finale.

However, if you just want a quick-and-dirty approximation, this will do it:

> nchar( gsub( "[^X]", "", gsub( "[aeiouy]+", "X", tolower( x ))))
[1] 1 1 2 2 1 3

To understand how the parts work, just strip away the function calls from the outside in, starting with nchar and then gsub, etc... ...until the expression makes sense to you.

But my guess is, considering a fight between R's power vs the profusion of exceptions in the English language, you could get a decent answer (maybe 99% right?) parsing through normal text, without a lot of work - heck, the simple parser above may get 90%+ right. With a little more work, you could deal with silent e's if you like.

It all depends on your application - whether this is good enough or you need something more accurate.

edited Dec 18 '11 at 17:52

answered Dec 18 '11 at 17:39

kfmfe04

14,936
14
74
140

ty - gotta love regular-expressions 8^) – kfmfe04 Dec 19 '11 at 02:27
2

A more efficient & simpler version of the same approximation would be something like `sapply(gregexpr("[aeiouy]+", x, ignore.case=TRUE), length)`. – Ken Williams Dec 19 '11 at 06:02
1

@kfmfe04 I have actually used your base and added some mods and am at about a 95% accuracy rate. I'm searching now for a dictionary to run before the algorith (as was the suggestion in the link provided gsk3). If I could mark both answers correct I would but alas can not. Thank you for your thoughtful response. – Tyler Rinker Dec 20 '11 at 02:42

score 5 · Accepted Answer · edited May 23 '17 at 11:46

5

Some tools for NLP are available here:

http://cran.r-project.org/web/views/NaturalLanguageProcessing.html

The task is non-trivial though. More hints (including an algorithm you could implement) here:

Detecting syllables in a word

edited May 23 '17 at 11:46

Community

1
1

answered Dec 18 '11 at 12:33

Ari B. Friedman

71,271
35
175
235

score 4 · Answer 4 · answered May 02 '12 at 18:38

4

The koRpus package will help you out immensley, but it's a little difficult to work with.

stopifnot(require(koRpus))
tokens <- tokenize(text, format="obj", lang='en')
flesch.kincaid(tokens)

answered May 02 '12 at 18:38

Zach

29,791
35
142
201

1

I now have a function to count syllables very accurately and to do flesch.kincaid. I plan on releasing it some time this summer. – Tyler Rinker May 02 '12 at 18:40
@Tyler Rinker That's awesome! Post a comment back here when it's out. How fast is your function? – Zach May 02 '12 at 18:41
I did benchmarking at the time (got a lot of help using hash tables from talkstats.com people) but can't remember off hand. let's just say that it's as fast as on line syllable counters and more accurate. I use a combined dictionary/algorithm approach. The hash table makes it fly. – Tyler Rinker May 02 '12 at 19:19
On github: `# install.packages("devtools"); library(devtools); install_github("qdap", "trinker")` – Tyler Rinker Jul 11 '12 at 05:20

Counting syllables

4 Answers4

Linked