You are here:

PHP5/PHP Scraper

Advertisement


Question
Hi Kevin.  I haven't asked you any questions in a while but you've always come through for me with flying colors before and thought you may be able to help me again.

I have a scraper issue.  This particular code doesn't seem to be working and I have used it before.  Thinking it might be the site that file_get_contents is pointing at, Ive tried different sites.  Still no resolve.  What I am trying to do is just get the date and the horoscope from that page to display in this scraper page.  But all it's coming up with is 0.  Can you tell me what Im doing wrong?


<?php  
$html = file_get_contents("http://www.prokerala.com/astrology/horoscope/");
preg_match_all(
   "/\<h1>Today's Horoscopes - (.*?)\<\/h1>\<p>(.*?)\</p>/s",
   $html,
   $posts,
   PREG_SET_ORDER
);
foreach ($posts as $post) {
   $content1 = $post[1];
   $content2 = $post[2];
  

echo("$content1");
echo("$content2");
}
?>

Answer
Hey, John. I apologize for the slow reply. I was out of town and forgot to set my status to on vacation.

For your problem, I'm not sure at first glance where the issue is, but that's OK! Regular expressions are a terrible way to go about scraping data from an *ML datasource. PHP5 actually has a built-in DOM parser, but it is only useful if your data structure is 100% valid, and most websites aren't. For scraping website data, the Simple HTML DOM library is the way to go. It provides a very quick and easy way to load a document and iterate through its elements and is much easier (And less error prone) than using preg_match.

http://net.tutsplus.com/tutorials/php/html-parsing-and-screen-scraping-with-the-

I encourage you to modify your code to use the Simple HTML DOM library and to get back in touch if you still have issues.

Make sure to take note that the actual horoscopes on that page are loaded via an iframe and are not part of the page you're loading. The horoscopes are actually located at http://www.adze.com/webmaster/horoscopes.php

Good luck and please let me know if you continue to have issues.

PHP5

All Answers


Answers by Expert:


Ask Experts

Volunteer


Kevin Cackler

Expertise

Any and everything related to PHP4 and PHP5. I specialize in functional, readable, scalable object oriented code, and can answer your troublesome class and object questions.

Experience

5 years developing in PHP using flat files and databases (MySQL, Oracle) Lead PHP developer for a very large Texas based web hosting company The coder behind some of the largest pet communities online.

Education/Credentials
BS - IT/CS

©2016 About.com. All rights reserved.