Browse Source

Updated Scraper Reference (markdown)

Thibaut Courouble 12 years ago
parent
commit
0caa366d75
1 changed files with 2 additions and 1 deletions
  1. 2 1
      Scraper-Reference.md

+ 2 - 1
Scraper-Reference.md

@@ -57,7 +57,7 @@ Configuration is done via class attributes and divided into three main categorie
 * `version` [String] **(required)**  
   The version of the software at the time the scraper was last run. This is only informational and doesn't affect the scraper's behavior.
 
-* `base_url` [String] **(required)**  
+* `base_url` [String] **(required in `UrlScraper`)**  
   The documents' location. Only URLs _inside_ the `base_url` will be scraped. "inside" more or less means "starting with" except that `/docs` is outside `/doc` (but `/doc/` is inside).  
   Unless `root_path` is set, the root/initial URL is equal to `base_url`.
 
@@ -100,6 +100,7 @@ Default `html_filters`:
 * [`NormalizeUrlsFilter`](https://github.com/Thibaut/devdocs/blob/master/lib/docs/filters/core/normalize_urls.rb) — replaces all URLs with their fully qualified counterpart
 * [`InternalUrlsFilter`](https://github.com/Thibaut/devdocs/blob/master/lib/docs/filters/core/internal_urls.rb) — detects internal URLs (the ones to scrape) and replaces them with their unqualified, relative counterpart
 * [`NormalizePathsFilter`](https://github.com/Thibaut/devdocs/blob/master/lib/docs/filters/core/normalize_paths.rb) — makes the internal paths consistent (e.g. always end with `.html`)
+* [`CleanLocalUrlsFilter`](https://github.com/Thibaut/devdocs/blob/master/lib/docs/filters/core/clean_local_urls.rb) — remove links, iframes and images pointing to localhost (`FileScraper` only)
 
 Default `text_filters`: