The Drupal Migrate module makes moving into Drupal from another database backed system as easy as it can be. But what if the site you're coming from doesn't have a database, or doesn't provide access to the database? In this session, we'll walk through building a static site crawler to scrape structured data off of an existing site and put it into a database for processing with the Migrate module. The tools I'll be focusing on are:
Guzzle - A powerful HTTP Client for PHP that lets us batch requests and deal with all kinds of unexpected responses that may happen.
DomCrawler/CSSSelector - Symfony libraries specifically aimed at parsing and manipulating HTML responses using simple CSS selectors.
Silex - A microframework that's perfect for building your crawler and handling all the details (logging, CLI, database connections, etc).