bpy6. Reading a multi-record FASTA file into Biopython

This text file is saved as bpy6.fna in the working directory.


>seq1
ATTGGA
>seq2
ATG
>seq3
GCTA

We have to use the function Bio.SeqIO.parse to read a multi-record file. This function returns a generator object.

Usually, we will just loop over the generator. However, we can also use the next method, to yield a new SeqRecord object. When there are no more SeqRecord objects to yield, it will return a StopIteration Exception.

We reload the records object, to show the next method example. The previous named records object, is garbage collected, as we do not have reference to it.


# bpy6.py

from __future__ import print_function
from Bio import SeqIO
records = SeqIO.parse('bpy6.fna','fasta')
print('type(records)=',type(records))
for record in records:
print('id:',record.id)
print('alpha:',record.seq.alphabet)
print('seq:',record.seq)
records = SeqIO.parse('bpy6.fna','fasta')
record1 = records.next()
record2 = records.next()
record3 = records.next()
rec_ids = [r.id for r in [record1,record2,record3]]
print('rec_ids =',rec_ids)
#type(records)=
#id: seq1
#alpha: SingleLetterAlphabet()
#seq: ATTGGA
#id: seq2
#alpha: SingleLetterAlphabet()
#seq: ATG
#id: seq3
#alpha: SingleLetterAlphabet()
#seq: GCTA
#rec_ids = ['seq1', 'seq2', 'seq3']

Leave a Reply

Your email address will not be published. Required fields are marked *