Quantcast
Channel: n-grams in python, four, five, six grams? - Stack Overflow
Browsing all 18 articles
Browse latest View live

Answer by Mat for n-grams in python, four, five, six grams?

It's an old question, but if you want to actually get the n-grams as a list of substrings (not as list of lists or tuples) and don't want to import anything, the following code works just fine and is...

View Article


Answer by 柯鴻儀 for n-grams in python, four, five, six grams?

It's quite easy to do n gram in python, for example:def n_gram(list,n): return [ list[i:i+n] for i in range(len(list)-n+1) ]and if you do :str = "I really like python, it's pretty...

View Article

Answer by James McGuigan for n-grams in python, four, five, six grams?

If you want a pure iterator solution for large strings with constant memory usage:from typing import Iterable import itertoolsdef ngrams_iter(input: str, ngram_size: int, token_regex=r"[^\s]+") ->...

View Article

Answer by inspectorG4dget for n-grams in python, four, five, six grams?

After about seven years, here's a more elegant answer using collections.deque:def ngrams(words, n): d = collections.deque(maxlen=n) d.extend(words[:n]) words = words[n:] for window, word in...

View Article

Answer by Ritesh Bhat for n-grams in python, four, five, six grams?

People have already answered pretty nicely for the scenario where you need bigrams or trigrams but if you need everygram for the sentence in that case you can use nltk.util.everygrams>>> from...

View Article


Answer by Joe Zhow for n-grams in python, four, five, six grams?

You can get all 4-6gram using the code without other package below:from itertools import chaindef get_m_2_ngrams(input_list, min, max): for s in chain(*[get_ngrams(input_list, k) for k in range(min,...

View Article

Answer by Yann Dubois for n-grams in python, four, five, six grams?

If efficiency is an issue and you have to build multiple different n-grams (up to a hundred as you say), but you want to use pure python I would do: from itertools import chaindef n_grams(seq,...

View Article

Answer by Serendipity for n-grams in python, four, five, six grams?

A more elegant approach to build bigrams with python’s builtin zip(). Simply convert the original string into a list by split(), then pass the list once normally and once offset by one element. string...

View Article


Answer by ΔημητρηςΠαππάς for n-grams in python, four, five, six grams?

Using only nltk toolsfrom nltk.tokenize import word_tokenizefrom nltk.util import ngramsdef get_ngrams(text, n ): n_grams = ngrams(word_tokenize(text), n) return [ ''.join(grams) for grams in...

View Article


Answer by Daniel Pérez Rada for n-grams in python, four, five, six grams?

Nltk is great, but sometimes is a overhead for some projects:import redef tokenize(text, ngrams=1): text = re.sub(r'[\b\(\)\\\"\'\/\[\]\s+\,\.:\?;]', '', text) text = re.sub(r'\s+', '', text) tokens =...

View Article

Answer by Franck Dernoncourt for n-grams in python, four, five, six grams?

You can use sklearn.feature_extraction.text.CountVectorizer:import sklearn.feature_extraction.text # FYI http://scikit-learn.org/stable/install.htmlngram_size = 4string = ["I really like python, it's...

View Article

Answer by sel for n-grams in python, four, five, six grams?

For four_grams it is already in NLTK, here is a piece of code that can help you toward this: from nltk.collocations import * import nltk #You should tokenize your text text = "I do not like green eggs...

View Article

Answer by M.A.Hassan for n-grams in python, four, five, six grams?

here is another simple way for do n-grams >>> from nltk.util import ngrams>>> text = "I am aware that nltk only offers bigrams and trigrams, but is there a way to split my text in...

View Article


Answer by alvas for n-grams in python, four, five, six grams?

Great native python based answers given by other users. But here's the nltk approach (just in case, the OP gets penalized for reinventing what's already existing in the nltk library).There is an ngram...

View Article

Answer by inspectorG4dget for n-grams in python, four, five, six grams?

I'm surprised that this hasn't shown up yet:In [34]: sentence = "I really like python, it's pretty awesome.".split()In [35]: N = 4In [36]: grams = [sentence[i: i + N] for i in range(len(sentence) - N +...

View Article


Answer by tzaman for n-grams in python, four, five, six grams?

You can easily whip up your own function to do this using itertools: from itertools import izip, islice, tees = 'spam and eggs'N = 3trigrams = izip(*(islice(seq, index, None) for index, seq in...

View Article

Answer by Nik for n-grams in python, four, five, six grams?

I have never dealt with nltk but did N-grams as part of some small class project. If you want to find the frequency of all N-grams occurring in the string, here is a way to do that. D would give you...

View Article


n-grams in python, four, five, six grams?

I'm looking for a way to split a text into n-grams.Normally I would do something like:import nltkfrom nltk import bigramsstring = "I really like python, it's pretty awesome."string_bigrams =...

View Article
Browsing all 18 articles
Browse latest View live