Fasta Header Replacer

April 05, 2016

Handling sequence files (like .fasta) is one of the trickiest problems for novice in Bioinformatics. Bio-Perl, Bio-python are quite useful but looks really scary :-( !.
MATLAB offers a cool solution with its in-built Bioinformatics toolbox !!. Reading a fasta file with 'fastaread' is as easy as 'xlsread' ...followingly the same with 'fastawrite'/'xlswrite' :-)

fastaread simply extract the sequence headers & sequences in cell arrays !. Voila !!! Once it does...then one can do all kinds of manipulation they want.

Here is a simple-self-explanatory, one-file-at-a-time code to replace the header with an user-defined headers. Besides, creates a translation table. If you want to process multiple file then one can readily loop it over directory operations.

INPUT (sequence.fasta)

>gi|154163|gb|M83220.1|STYLEXA Salmonella typhimurium lexA (repressor of DNA damage inducible genes) gene, 5' end
ATGCGCCAGCTGCAAAATTTAAAT

>gi|154164|gb|M83220.1|STYLEXA Salmonella typhimurium lexA (repressor of DNA damage inducible genes) gene, 5'end
ATGCGCCAGCTGCAAAATTTAAAT

OUTPUT1(Testout.fasta):
>1_Org
ATGCGCCAGCTGCAAAATTTAAAT

>2_Org
ATGCGCCAGCTGCAAAATTTAAAT

OUTPUT2: (TransTab.txt)
gi|154163|gb|M83220.1|STYLEXA Salmonella typhimurium lexA .....5' end 1_Org
gi|154164|gb|M83220.1|STYLEXA Salmonella typhimurium lexA .....5' end 2_Org

%Author: Arun Prasanna
%M-CODE to replace old header with new ones in fasta file !.
%Output files are 1. .fasta file with new header & sequence
%=================2. Translation table with old & new header
% Algo: Read a fastafile -> store head, seq as cell -> generate
% new header names -> write output with new header, sequence pairs; 
% write translation table

clear; clc;
[Header, Seq] = fastaread('sequence.fasta');
Header = Header';
Seq= Seq';
[rH,cH] = size(Header);
for i = 1:rH
    id = num2str(i)
    org_name = '_Org';
    new{i,1} = strcat(id,org_name)
end
fastawrite('Testout.fasta',new,Seq);
TransTab = horzcat(Header,new);
%===========Section-to-write-cell-array-2-Txt-file===========%
fileID = fopen('TransTab.txt','w');
[nrows,ncols] = size(TransTab);
for row = 1:nrows
    fprintf(fileID,'%s\t%s\n',TransTab{row,:});
end
disp('Program Complete')

Search This Blog

Working algos for Biological Data: Simple to Complex problems

Fasta Header Replacer

Comments

Post a Comment

Popular posts from this blog

Pick Matching lines with list of keywords

Condense fasta header

Hash_lookup for Cluster data