Posts

Showing posts from April, 2016

Fasta Header Replacer V2.0

Extension of previous code 'Fasta Header Replacer.m' to process files in batch mode. Keep all/only the .fasta files inside the specified directory. %Author: Arun Prasanna %Version 2.0 of Fasta_Header_replacer.m!. %Efficient to process files in batch mode. clear ; clc ; FileList = dir ( 'D:\BRC_POSTDOC-RESEARCH\ARMILLARIA_Project\PROTEIN_FASTA' ); [ rFL , cFL ] = size ( FileList ); for i = 3 : rFL %i of 1 & 2 are . & .. respectively     Org_name { i - 2 , 1 } = FileList ( i ). name ; %FileList is a structure end [ rOn , cOn ] = size ( Org_name ); for OL = 1 : rOn     FileName = char ( Org_name { OL });     [ Header , Seq ] = fastaread ( FileName );     Header = Header ' ;     Seq = Seq ' ;     [ rH , cH ] = size ( Header );     check ( OL , 1 ) = rH ;     for IL = 1 : rH         id = num2str ( IL );         [ tok , rem ] = strtok ( FileName , '.' ); %Extract org name from FileName its

Fasta Header Replacer

Handling sequence files (like .fasta) is one of the trickiest problems for novice in Bioinformatics. Bio-Perl, Bio-python are quite useful but looks really scary :-( !. MATLAB offers a cool solution with its in-built Bioinformatics toolbox !!. Reading a fasta file with 'fastaread' is as easy as 'xlsread' ...followingly the same with 'fastawrite'/'xlswrite' :-) fastaread simply extract the sequence headers & sequences in cell arrays !. Voila !!! Once it does...then one can do all kinds of manipulation they want. Here is a simple-self-explanatory, one-file-at-a-time code to replace the header with an user-defined headers. Besides, creates a translation table. If you want to process multiple file then one can readily loop it over directory operations. INPUT (sequence.fasta) >gi|154163|gb|M83220.1|STYLEXA Salmonella typhimurium lexA (repressor of DNA damage inducible genes) gene, 5' end ATGCGCCAGCTGCAAAATTTAAAT >gi|154164|gb|M8322