Quantcast
Channel: n-grams in python, four, five, six grams? - Stack Overflow
Viewing all articles
Browse latest Browse all 18

Answer by ΔημητρηςΠαππάς for n-grams in python, four, five, six grams?

$
0
0

Using only nltk tools

from nltk.tokenize import word_tokenizefrom nltk.util import ngramsdef get_ngrams(text, n ):    n_grams = ngrams(word_tokenize(text), n)    return [ ''.join(grams) for grams in n_grams]

Example output

get_ngrams('This is the simplest text i could think of', 3 )['This is the', 'is the simplest', 'the simplest text', 'simplest text i', 'text i could', 'i could think', 'could think of']

In order to keep the ngrams in array format just remove ''.join


Viewing all articles
Browse latest Browse all 18