Tanggal :September 22, 2020

nlp7. Stop word removal in Python NLTK

Spread the love

The function nltk.corpus.stopwords.words gets a list of 127 stop words which usually do not add much to the meaning of sentences. However, it is always possible to find exceptions.

The list is put in S. If you are getting too much filtering, you should try to shorten the stoplist.


# nlp7.py
from __future__ import print_function, division
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
lines = """Dr. Brown gave a speech. I'd wish I knew then
what I know now. Finally, he praised Python! At 8 o'clock,
he went home."""
S = stopwords.words("english")
t = 't'
A = word_tokenize(lines.lower())
for a in A:
if a not in S:
print(t,a)

# dr.
# brown
# gave
# speech
# .
# 'd
# wish
# knew
# know
# .
# finally
# ,
# praised
# python
# !
# 8
# o'clock
# ,
# went
# home
# .
Share

Leave a Reply

Your email address will not be published. Required fields are marked *