AllExperts > Perl & CGI 
Search      
Perl & CGI
Volunteer
Answers to thousands of questions
 Home · More Perl & CGI Questions · Answer Library  · Encyclopedia ·
More Perl & CGI Answers
Question Library

Ask a question about Perl & CGI
Volunteer
Experts of the Month
Expert Login

Awards

About Us
Tell friends
Link to Us
Disclaimer

 
 
 
 
About Marty Landman
Expertise
Perl programming using CGI, databases, HTML templating, and website automation.

Experience
Web developer since 1998, owner of Face 2 Interface.

 
   

You are here:  Experts > Computing/Technology > Perl/PHP > Perl & CGI > HTML table to text file using HTML::TreeBuilder module

Perl & CGI - HTML table to text file using HTML::TreeBuilder module


Expert: Marty Landman - 4/19/2004

Question
I would like to extract the data from an HTML table located
on the webpage
http://moneycentral.msn.com/investor/research/
sreport.asp?Symbol=ibm&ISA=1&Type=Equity

and put the data into a tab-delimited text file.

The script I created uses the HTML::TreeBuilder module and
can be viewed below.

When I run this perl script (finstat.pl) the output of each cell
(in the array) is filled with
"HTML::Element=HASH(0x2859d8)".  

I would like the output to be arranged in the same order as
viewed in the webpage (noted above).

I appreciate your help on this because it is driving me a bit
nuts...

Best Regards,
Roman

(perl version 5.8.3 / Mac OS 10.2.8)

******* below is my script finstat.pl *****************

#!/usr/bin/perl -w
use strict;
use LWP::Simple;
use HTML::TreeBuilder;

my $url = 'http://moneycentral.msn.com/investor/
research/sreport.asp?Symbol=xom&ISA=1&Type=Equity';
my $page = get($url) or die $!;
my $p = HTML::TreeBuilder->new_from_content( $page );

my @links = $p->look_down(
  _tag => 'td'
);

my @rows = map { $_->parent->parent } @links;

my @finstat;
for my $row (@rows) {
  my %acct;
  my @cells = $row->look_down( _tag => 'td' );
  $acct{title}   = $cells[0];
  $acct{first}   = $cells[1];
  $acct{second}   = $cells[2];
  $acct{third}    = $cells[3];
  $acct{fourth}    = $cells[4];
  $acct{fifth}    = $cells[5];
  
  print $acct{title}, "\t", $acct{first}, "\t", $acct{second},
"\t", $acct{third}, "\t", $acct{fourth}, "\t", $acct{fifth}, "\n";

}

$p = $p->delete; # don't need it anymore

Answer
Roman,

I am not familiar with the HTML::TreeBuilder package, when I need to do this kind of thing I build the code myself. This may not be too helpful to you, and your approach using the package may be the proper way to go about it, only I can't help with that package as said.


Basically if I were wanting to do this my approach might start like so:

$page = get $url; # fetch the web page
for(split $page,"\n")
{ next unless /Financial data in U\.S\. dollars/ }
for $line(split $page,"\n")
{
 for(split $line,"\s+")
 {
   # now start assigning data to a hash
 }
}

Sorry if this isn't specific enough. There is a limit to what I do as a volunteer, however if you want to use an approach like this and write back with a more specific question I'll have another look.


Marty Landman Face 2 Interface Inc.
Web Installed Formmailer: http://face2interface.com/Products/Formal.shtml
FormATable DB: http://face2interface.com/Products/FormATable.shtml
Make a Website: http://face2interface.com/Home/Demo.shtml

View Follow-Ups    Add to this Answer   Ask a Question


 
User Agreement | Privacy Policy | Kids' Privacy Policy | Help
Copyright  © 2008 About, Inc. AllExperts, AllExperts.com, and About.com are registered trademarks of About, Inc. All rights reserved.