Great native python based answers given by other users. But here's the nltk
approach (just in case, the OP gets penalized for reinventing what's already existing in the nltk
library).
There is an ngram module that people seldom use in nltk
. It's not because it's hard to read ngrams, but training a model base on ngrams where n > 3
will result in much data sparsity.
from nltk import ngramssentence = 'this is a foo bar sentences and I want to ngramize it'n = 6sixgrams = ngrams(sentence.split(), n)for grams in sixgrams: print(grams)