Presence-absence Matrix to Fasta format


Convert the binary matrix to fasta format with this simple code in Python !. Recommended for larger file sizes !

Sample Input:
Sp1 1 1 1 1
Sp2 0 0 0 0
Sp3 1 1 0 0

Sample Output:
>Sp1
['1', '1', '1', '1']
>Sp2
['0', '0', '0', '0']
>Sp3
['1', '1', '0', '0']

Workaround the output file in any text file to remove [,'  ] to generate final output as:
>Sp1
1111
>Sp2
0000
>Sp3
1100

__author__ = 'Arun Prasanna'
'''
This is a simple python code to convert binary matrix into fasta format. There 
are many public softwares available. But each one has size limits (<=2MB). 
This code processes 61 x 51000 character matrix in less than 10 seconds in 
python 2.7! 
'''

with open('infile_matrix-cp','r') as infile:
    entries = infile.read().strip()
each_line = entries.splitlines()
Header = []
for row in each_line:
    element = row.split("\t")
    Sym = '>' #Add > symbol
    Header_txt = Sym + element[0]  #Concat >species
    Header.append(Header_txt)
    Header.append(element[1:])

# Write the output into file
of = open('gloome_fastaInput.txt','w')
for each in Header:
    print>>of, each

print "Program complete"

Comments

Popular posts from this blog

Pick Matching lines with list of keywords

Install Parallel versions of Python from source

Map multiple annotations using pandas