Tanggal :September 27, 2020

nlp3. Sentence tokenization in Python NLTK

Spread the love

A text has to be broken into sentences for further processing.

We can always write a bunch of rules, or we can use nltk.tokenize.sent_tokenize.


# nlp3.py
from __future__ import print_function, division
from nltk.tokenize import sent_tokenize
lines = """This is the first sentence. Dr. Brown gave a speech.
Finally, he praised Python! At 8 o'clock, he went home."""

A = sent_tokenize(lines)
print('type(A)=',type(A))
for i,j in enumerate(A):
print(i,': ',j)

# type(A)= <type 'list'>
# 0 : This is the first sentence.
# 1 : Dr. Brown gave a speech.
# 2 : Finally, he praised Python!
# 3 : At 8 o'clock, he went home.
Share

Leave a Reply

Your email address will not be published. Required fields are marked *