Tanggal :September 27, 2020

nlp12. Fileids in Python NLTK

Spread the love

We can access a specific text within a corpus by using a fileid.

The length of inaugural, that is, len(inaugural.words()) is 145735. However, by putting a fileid, in the call to the words method, we can select only a particular text.

The particular text we selected has a world length of, that is, len(inaugural.words(‘1789-Washington.txt’)) is equal to 1538. We can use the fileids attribute of inaugural, or whatever the corpus happens to be, to get a list with the text names.

The first few words of the first inaugural is printed.


# nlp12.py
from __future__ import print_function, division
from nltk.corpus import inaugural
A = inaugural.fileids()
s = 2*' '
for a in A[:5]:
print(s+a)
B = inaugural.words(A[0])
for b in B[:20]:
print(b, end = s)

# 1789-Washington.txt
# 1793-Washington.txt
# 1797-Adams.txt
# 1801-Jefferson.txt
# 1805-Jefferson.txt
# Fellow - Citizens of the Senate and of
# the House of Representatives : Among the
# vicissitudes incident to life no
Share

Leave a Reply

Your email address will not be published. Required fields are marked *