Posts

Showing posts from April, 2016

Fasta Header Replacer V2.0

Extension of previous code 'Fasta Header Replacer.m' to process files in batch mode. Keep all/only the .fasta files inside the specified directory.

%Author: Arun Prasanna%Version 2.0 of Fasta_Header_replacer.m!.%Efficient to process files in batch mode. clear;clc;FileList=dir('D:\BRC_POSTDOC-RESEARCH\ARMILLARIA_Project\PROTEIN_FASTA');[rFL,cFL]=size(FileList);fori=3:rFL%i of 1 & 2 are . & .. respectivelyOrg_name{i-2,1}=FileList(i).name;%FileList is a structureend[rOn,cOn]=size(Org_name);forOL=1:rOnFileName=char(Org_name{OL});[Header,Seq]=fastaread(FileName);Header=Header';Seq=Seq';[rH,cH]=size(Header);check(OL,1)=rH;forIL=1:rHid=num2str(IL);[tok,rem]=strtok(FileName,'.');%Extract org name from FileName itself.new{IL,1}=strcat(id,'_',tok);endOutFileName=strcat(tok,'_mod',rem);fastawrite(OutFileName,new,Seq)TransTab=horzcat(new,Header);%===========Section-to-write-cell-array-2-Txt-file===========%TT=strcat(

Fasta Header Replacer

Handling sequence files (like .fasta) is one of the trickiest problems for novice in Bioinformatics. Bio-Perl, Bio-python are quite useful but looks really scary :-( !.
MATLAB offers a cool solution with its in-built Bioinformatics toolbox !!. Reading a fasta file with 'fastaread' is as easy as 'xlsread' ...followingly the same with 'fastawrite'/'xlswrite' :-)

fastaread simply extract the sequence headers & sequences in cell arrays !. Voila !!! Once it does...then one can do all kinds of manipulation they want.

Here is a simple-self-explanatory, one-file-at-a-time code to replace the header with an user-defined headers. Besides, creates a translation table. If you want to process multiple file then one can readily loop it over directory operations.

INPUT (sequence.fasta)

>gi|154163|gb|M83220.1|STYLEXA Salmonella typhimurium lexA (repressor of DNA damage inducible genes) gene, 5' end
ATGCGCCAGCTGCAAAATTTAAAT

>gi|154164|gb|M83220.1|STYLEXA S…