Gene Copy Number Matrix

Given a cluster file, one can create a gene copy number matrix (GCN). With this self-explanatory simple matlab file it is easy to create one.
Input Format:
Cluster file.xlsx:
(1) A  B  C  D
(2) A  A  A
(3) B C
(4) D D A

List File.xlsx:
A
B
C
D

Output: (of course, the output will have the file with only numbers printed)

     A B C D
(1) 1  1  1  1
(2) 3  0  0  0
(3) 0  1  1  0
(4) 1  0  0  2


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
%Author = Arun Prasanna
%Create a gene copy number matrix from cluster information
clear; clc;
tic
[mat1, mat] = xlsread('ClusterFile.xlsx','Sheet1'); clear mat1 
[mat2, head] = xlsread('Organism_list.xlsx','Sheet1'); clear mat2 %species/gene name
new_head = head(:,1)'; %transpose to make it as header
[rmat,cmat] = size(mat);
[rhead,chead] = size(new_head);
out = zeros(rmat,chead); counter =0; 
for i = 1:chead
    i %print value of i to track progress
    for j = 1:rmat
        for k = 1:cmat
            cmp = strcmp(new_head(1,i),mat(j,k));
            chk = strcmp(mat(j,k),''); %Check for empty field;
            if (cmp ==1 && chk ~= 1)
                counter = counter +1;
            else
                out(j,i) = out(j,i);
            end
        end
         out(j,i) = counter;
         counter =0;
    end
end
OF = xlswrite('ClusterFile.xlsx',out,'MatOut')
disp('Program ends...output written')
toc

Comments

Popular posts from this blog

Fasta Header Replacer V2.0

Map multiple annotations using pandas

Condense fasta header