Perl script to create a geometric graph from a set of 2D points

Here is another quick script by me. It is a Perl script which can be used to construct a geometric graph from a set of points P = {p1, p2, …} in 2D, using some edge threshold t. The points could be cells you detected in an image, or cities on a map.

What the script does in detail is the following: it reads the points, and for each pair of points (p1, p2), computes their distance d(p1,p2). It uses the points as vertices of a graph, and adds an edge e = (v1, v2) between a pair of vertices if (and only if) their distance d is not larger than the edge threshold t.

Here is an image explaining the same thing:

building_geom_graph_from_points


Input

The script reads an input CSV file which should contain the points, one per line. The first of the input file is expected to be the header, and you can configure the field name where the x and y values for a point are stored in the script. Here is part of an example input file:

Id,Area,XM,YM,Skew
1,628,249.820,12.070,NaN
2,533,455.282,17.241,NaN
3,762,1233.724,15.572,NaN
4,502,1531.295,13.102,NaN
5,967,2143.861,20.797,NaN
6,800,2197.466,15.274,NaN
7,950,2455.040,22.723,NaN
8,672,1118.219,27.342,NaN
9,1010,1262.300,28.307,NaN
10,679,1142.450,35.454,NaN

Script

Here is the Perl script:

#!/usr/bin/perl
## csvtogml.pl -- parse a CSV file containing 2D points and create a geom graph in GML format from it
## written by Tim Schäfer. This code is released under the WTFPL -- http://www.wtfpl.net/. (In short: do whatever you want with it, but don't blame me.)
use warnings;
#use strict;
use Math::Complex;

print "csv2gml.pl -- Convert CSV list of points to GML graph\n";
print "-----------------------------------------------------\n";



########  Settings -- you may need to adapt these ###########
my $file = 'Results.csv';
my $outfile = "Results_converted.csv";
my $outfile_xml = "Results_converted.xml";
my $first_line_is_header = 1;

my @data;
my $field_name_center_of_mass_X = "XM";
my $field_name_center_of_mass_Y = "YM";

my $field_number_id = 0;
my $verbose = 0;
my $edge_threshold = 300.0;
##################### end of settings #######################




my $field_number_center_of_mass_X = -1;    # will be set later
my $field_number_center_of_mass_Y = -1;    # will be set later

open(my $fh, '<', $file) or die "Can't read input file '$file' [$!]\n";
my $num_read = 0;
while (my $line = <$fh>) {
    chomp $line;
    my @fields = split(/,/, $line);
    #my @fields = Text::ParseWords::parse_line(',', 0, $line);
    if($first_line_is_header && $num_read == 0) {
        for my $i (0 .. $#fields) {
           if($fields[$i] eq $field_name_center_of_mass_X) {
               $field_number_center_of_mass_X = $i;
           }
           if($fields[$i] eq $field_name_center_of_mass_Y) {
               $field_number_center_of_mass_Y = $i;
           }
        }
    }
    else {
        push @data, \@fields;
    }
    $num_read++;
}

if($field_number_center_of_mass_X < 0 || $field_number_center_of_mass_Y < 0) {
    die("File is missing the header, or no fields labeled '$field_name_center_of_mass_X' and '$field_name_center_of_mass_Y' found.");
} else {
    print "field_number_id=$field_number_id, field_number_center_of_mass_X=$field_number_center_of_mass_X, field_number_center_of_mass_Y=$field_number_center_of_mass_Y\n";
}

my $num_pairs = 0;
my $num_edges = 0;
print "The data consists of " . (scalar @data) . " cells.\n";

open FHCSV, ">$outfile" or die ("Can't open target CSV file $outfile for writing [$!]\n");
print FHCSV "cell_id_A,cell_id_B,cell_center_of_mass_x_A,cell_center_of_mass_y_A,cell_center_of_mass_x_B,cell_center_of_mass_y_B,distance\n";

open FHXML, ">$outfile_xml" or die ("Can't open target XML file $outfile_xml for writing [$!]\n");
print FHXML "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<graph label=\"AlgoSB Cell graph\"\n    xmlns=\"http://www.cs.rpi.edu/XGMML\"\n    directed=\"0\">\n";

for my $i (0 .. $#data) {
    my $cell_id = $data[$i][$field_number_id];
    my $cell_center_of_mass_x = $data[$i][$field_number_center_of_mass_X];
    my $cell_center_of_mass_y = $data[$i][$field_number_center_of_mass_Y];
    print FHXML "    <node label=\"$cell_id\" id=\"$cell_id\">\n        <att name=\"center_of_mass_X\" type=\"real\" value=\"$cell_center_of_mass_x\"></att>\n        <att name=\"center_of_mass_Y\" type=\"real\" value=\"$cell_center_of_mass_y\"></att>\n    </node>\n";
}


for my $i (0 .. $#data) {
    for my $j ($i+1 .. $#data) {
        $num_pairs++;        
        my $cell_id_A = $data[$i][$field_number_id];
        my $cell_id_B = $data[$j][$field_number_id];
        my $cell_center_of_mass_x_A = $data[$i][$field_number_center_of_mass_X];
        my $cell_center_of_mass_y_A = $data[$i][$field_number_center_of_mass_Y];
        my $cell_center_of_mass_x_B = $data[$j][$field_number_center_of_mass_X];
        my $cell_center_of_mass_y_B = $data[$j][$field_number_center_of_mass_Y]; 
        my $complex_A = Math::Complex->make($cell_center_of_mass_x_A, $cell_center_of_mass_y_A);
        my $complex_B = Math::Complex->make($cell_center_of_mass_x_B, $cell_center_of_mass_y_B);
        my $distance = abs($complex_A - $complex_B);
        if($distance <= $edge_threshold) {
            $num_edges++;
            print FHCSV "$cell_id_A,$cell_id_B,$cell_center_of_mass_x_A,$cell_center_of_mass_y_A,$cell_center_of_mass_x_B,$cell_center_of_mass_y_B,$distance\n";
            print FHXML "    <edge source=\"$cell_id_A\" target=\"$cell_id_B\" label=\"$cell_id_A-$cell_id_B\">\n        <att name=\"length\" type=\"real\" value=\"$distance\"></att>\n    </edge>\n";
        }
        
        if($verbose) {
	    print "A_id=$cell_id_A, B_id=$cell_id_B,, A=($cell_center_of_mass_x_A, $cell_center_of_mass_y_A), B=($cell_center_of_mass_x_B, $cell_center_of_mass_y_B), distance=$distance\n";
        }
    }
}

print FHXML "</graph>\n";

print "Checked $num_pairs cell pairs, wrote info on $num_edges edges.\n";




close FHCSV;
close FHXML;

print "Output written in CSV format to file '$outfile'.\n";
print "Output written in XML format to file '$outfile_xml'.\n";

Output

The script writes the resulting graph to a file in XGMML format. This is a standard graph file format, which can be opened in software like Cytoscape.

Here is part of the output XML file:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
<?xml version="1.0" encoding="UTF-8"?>
<graph label="My graph"
    xmlns="http://www.cs.rpi.edu/XGMML"
    directed="0">
    <node label="1" id="1">
        <att name="center_of_mass_X" type="real" value="249.820"></att>
        <att name="center_of_mass_Y" type="real" value="12.070"></att>
    </node>
    <node label="2" id="2">
        <att name="center_of_mass_X" type="real" value="455.282"></att>
        <att name="center_of_mass_Y" type="real" value="17.241"></att>
    </node>
    <node label="3" id="3">
        <att name="center_of_mass_X" type="real" value="1233.724"></att>
        <att name="center_of_mass_Y" type="real" value="15.572"></att>
    </node>
    <node label="4" id="4">
        <att name="center_of_mass_X" type="real" value="1531.295"></att>
        <att name="center_of_mass_Y" type="real" value="13.102"></att>
    </node>
	<!-- more nodes here ... -->
	<edge source="1" target="2" label="1-2">
        <att name="length" type="real" value="205.527060712209"></att>
    </edge>
	<!-- more edges here ... -->
</graph>	
Advertisements

About dfspspirit

PhD student in bioinformatics, interested in photography, level design, digital image manipulation, architecture and, of course, bioinformatics.
This entry was posted in Bioinformatics, coding, IT and computers, Science and tagged , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s