Using only nltk tools
from nltk.tokenize import word_tokenizefrom nltk.util import ngramsdef get_ngrams(text, n ): n_grams = ngrams(word_tokenize(text), n) return [ ''.join(grams) for grams in n_grams]
Example output
get_ngrams('This is the simplest text i could think of', 3 )['This is the', 'is the simplest', 'the simplest text', 'simplest text i', 'text i could', 'i could think', 'could think of']
In order to keep the ngrams in array format just remove ''.join