You are here:

Perl & CGI/HTML table to text file using HTML::TreeBuilder module

Advertisement


Question
I would like to extract the data from an HTML table located
on the webpage
http://moneycentral.msn.com/investor/research/
sreport.asp?Symbol=ibm&ISA=1&Type=Equity

and put the data into a tab-delimited text file.

The script I created uses the HTML::TreeBuilder module and
can be viewed below.

When I run this perl script (finstat.pl) the output of each cell
(in the array) is filled with
"HTML::Element=HASH(0x2859d8)".  

I would like the output to be arranged in the same order as
viewed in the webpage (noted above).

I appreciate your help on this because it is driving me a bit
nuts...

Best Regards,
Roman

(perl version 5.8.3 / Mac OS 10.2.8)

******* below is my script finstat.pl *****************

#!/usr/bin/perl -w
use strict;
use LWP::Simple;
use HTML::TreeBuilder;

my $url = 'http://moneycentral.msn.com/investor/
research/sreport.asp?Symbol=xom&ISA=1&Type=Equity';
my $page = get($url) or die $!;
my $p = HTML::TreeBuilder->new_from_content( $page );

my @links = $p->look_down(
  _tag => 'td'
);

my @rows = map { $_->parent->parent } @links;

my @finstat;
for my $row (@rows) {
  my %acct;
  my @cells = $row->look_down( _tag => 'td' );
  $acct{title}   = $cells[0];
  $acct{first}   = $cells[1];
  $acct{second}   = $cells[2];
  $acct{third}    = $cells[3];
  $acct{fourth}    = $cells[4];
  $acct{fifth}    = $cells[5];
  
  print $acct{title}, "\t", $acct{first}, "\t", $acct{second},
"\t", $acct{third}, "\t", $acct{fourth}, "\t", $acct{fifth}, "\n";

}

$p = $p->delete; # don't need it anymore

Answer
Roman,

I am not familiar with the HTML::TreeBuilder package, when I need to do this kind of thing I build the code myself. This may not be too helpful to you, and your approach using the package may be the proper way to go about it, only I can't help with that package as said.


Basically if I were wanting to do this my approach might start like so:

$page = get $url; # fetch the web page
for(split $page,"\n")
{ next unless /Financial data in U\.S\. dollars/ }
for $line(split $page,"\n")
{
 for(split $line,"\s+")
 {
   # now start assigning data to a hash
 }
}

Sorry if this isn't specific enough. There is a limit to what I do as a volunteer, however if you want to use an approach like this and write back with a more specific question I'll have another look.


Marty Landman Face 2 Interface Inc.
Web Installed Formmailer: http://face2interface.com/Products/Formal.shtml
FormATable DB: http://face2interface.com/Products/FormATable.shtml
Make a Website: http://face2interface.com/Home/Demo.shtml

Perl & CGI

All Answers


Answers by Expert:


Ask Experts

Volunteer


Marty Landman

Expertise

Perl programming using CGI, databases, HTML templating, and website automation.

Experience

Web developer since 1998, owner of Face 2 Interface.

©2012 About.com, a part of The New York Times Company. All rights reserved.