Presence-absence Matrix to Fasta format
Convert the binary matrix to fasta format with this simple code in Python !. Recommended for larger file sizes !
Sample Input:
Sp1 1 1 1 1
Sp2 0 0 0 0
Sp3 1 1 0 0
Sample Output:
>Sp1
['1', '1', '1', '1']
>Sp2
['0', '0', '0', '0']
>Sp3
['1', '1', '0', '0']
Workaround the output file in any text file to remove [,' ] to generate final output as:
>Sp1
1111
>Sp2
0000
>Sp3
1100
__author__ = 'Arun Prasanna' ''' This is a simple python code to convert binary matrix into fasta format. There
are many public softwares available. But each one has size limits (<=2MB).
This code processes 61 x 51000 character matrix in less than 10 seconds in
python 2.7! ''' with open('infile_matrix-cp','r') as infile: entries = infile.read().strip() each_line = entries.splitlines() Header = [] for row in each_line: element = row.split("\t") Sym = '>' #Add > symbol Header_txt = Sym + element[0] #Concat >species Header.append(Header_txt) Header.append(element[1:]) # Write the output into file of = open('gloome_fastaInput.txt','w') for each in Header: print>>of, each print "Program complete"
Comments
Post a Comment