Browse Source

Updated Scraper Reference (markdown)

Thibaut Courouble 12 years ago
parent
commit
0caa366d75
1 changed files with 2 additions and 1 deletions
  1. 2 1
      Scraper-Reference.md

+ 2 - 1
Scraper-Reference.md

@@ -57,7 +57,7 @@ Configuration is done via class attributes and divided into three main categorie
 * `version` [String] **(required)**  
 * `version` [String] **(required)**  
   The version of the software at the time the scraper was last run. This is only informational and doesn't affect the scraper's behavior.
   The version of the software at the time the scraper was last run. This is only informational and doesn't affect the scraper's behavior.
 
 
-* `base_url` [String] **(required)**  
+* `base_url` [String] **(required in `UrlScraper`)**  
   The documents' location. Only URLs _inside_ the `base_url` will be scraped. "inside" more or less means "starting with" except that `/docs` is outside `/doc` (but `/doc/` is inside).  
   The documents' location. Only URLs _inside_ the `base_url` will be scraped. "inside" more or less means "starting with" except that `/docs` is outside `/doc` (but `/doc/` is inside).  
   Unless `root_path` is set, the root/initial URL is equal to `base_url`.
   Unless `root_path` is set, the root/initial URL is equal to `base_url`.
 
 
@@ -100,6 +100,7 @@ Default `html_filters`:
 * [`NormalizeUrlsFilter`](https://github.com/Thibaut/devdocs/blob/master/lib/docs/filters/core/normalize_urls.rb) — replaces all URLs with their fully qualified counterpart
 * [`NormalizeUrlsFilter`](https://github.com/Thibaut/devdocs/blob/master/lib/docs/filters/core/normalize_urls.rb) — replaces all URLs with their fully qualified counterpart
 * [`InternalUrlsFilter`](https://github.com/Thibaut/devdocs/blob/master/lib/docs/filters/core/internal_urls.rb) — detects internal URLs (the ones to scrape) and replaces them with their unqualified, relative counterpart
 * [`InternalUrlsFilter`](https://github.com/Thibaut/devdocs/blob/master/lib/docs/filters/core/internal_urls.rb) — detects internal URLs (the ones to scrape) and replaces them with their unqualified, relative counterpart
 * [`NormalizePathsFilter`](https://github.com/Thibaut/devdocs/blob/master/lib/docs/filters/core/normalize_paths.rb) — makes the internal paths consistent (e.g. always end with `.html`)
 * [`NormalizePathsFilter`](https://github.com/Thibaut/devdocs/blob/master/lib/docs/filters/core/normalize_paths.rb) — makes the internal paths consistent (e.g. always end with `.html`)
+* [`CleanLocalUrlsFilter`](https://github.com/Thibaut/devdocs/blob/master/lib/docs/filters/core/clean_local_urls.rb) — remove links, iframes and images pointing to localhost (`FileScraper` only)
 
 
 Default `text_filters`:
 Default `text_filters`: