Working algos for Biological Data: Simple to Complex problems

Posts

Showing posts from April, 2016

Fasta Header Replacer V2.0

April 05, 2016

Extension of previous code 'Fasta Header Replacer.m' to process files in batch mode. Keep all/only the .fasta files inside the specified directory. %Author: Arun Prasanna %Version 2.0 of Fasta_Header_replacer.m!. %Efficient to process files in batch mode. clear ; clc ; FileList = dir ( 'D:\BRC_POSTDOC-RESEARCH\ARMILLARIA_Project\PROTEIN_FASTA' ); [ rFL , cFL ] = size ( FileList ); for i = 3 : rFL %i of 1 & 2 are . & .. respectively Org_name { i - 2 , 1 } = FileList ( i ). name ; %FileList is a structure end [ rOn , cOn ] = size ( Org_name ); for OL = 1 : rOn FileName = char ( Org_name { OL }); [ Header , Seq ] = fastaread ( FileName ); Header = Header ' ; Seq = Seq ' ; [ rH , cH ] = size ( Header ); check ( OL , 1 ) = rH ; for IL = 1 : rH ...

Fasta Header Replacer

April 05, 2016

Handling sequence files (like .fasta) is one of the trickiest problems for novice in Bioinformatics. Bio-Perl, Bio-python are quite useful but looks really scary :-( !. MATLAB offers a cool solution with its in-built Bioinformatics toolbox !!. Reading a fasta file with 'fastaread' is as easy as 'xlsread' ...followingly the same with 'fastawrite'/'xlswrite' :-) fastaread simply extract the sequence headers & sequences in cell arrays !. Voila !!! Once it does...then one can do all kinds of manipulation they want. Here is a simple-self-explanatory, one-file-at-a-time code to replace the header with an user-defined headers. Besides, creates a translation table. If you want to process multiple file then one can readily loop it over directory operations. INPUT (sequence.fasta) >gi|154163|gb|M83220.1|STYLEXA Salmonella typhimurium lexA (repressor of DNA damage inducible genes) gene, 5' end ATGCGCCAGCTGCAAAATTTAAAT >gi|154164|gb|M8322...