Browse Source

Improve names and types in Rust stdlib scraper…

Two main fixes:

1. Names: all pages except for modules' index pages ignored what module
   they were from, and were just prepended with `std::`. This meant
   there were 13 pages named `std::Iter`, about structs named `Iter`
   from different modules. It also meant that things outside modules,
   e.g. primitive types, were prefixed with `std::`, naming the page on
   `bool` as `std::bool`, although it can't be referenced that way in
   code. This also means that there are two pages named `std::char` -
   one for the module, and one for the primitive type `char`.

   This prefixes everything in a module with that module's path, and
   does not prefix primitives. It also includes submodules in the path.

   For example:
       std::fn          → fn
       std::Iter        → std::option::Iter
       std::MetadataExt → std::os::linux::fs::MetadataExt

2. Types: almost everything was filed in `std`, with the exception of
   modules' index pages and primitive types. This meant there were over
   30,000 pages in the `std` type, and many types for modules with only
   one page in them.

   This creates types for each module which include all submodules, and
   files anything not in a module, e.g. primitive types, in `std`.

   For example:
       std::bool / std::bool  → std / bool
       std / std::Iter        → std::option / std::option::Iter
       std / std::MetadataExt → std::os / std::os::linux::fs::MetadataExt
Calum Smith 2 months ago
parent
commit
fbb5e61720
1 changed files with 15 additions and 8 deletions
  1. 15 8
      lib/docs/filters/rust/entries.rb

+ 15 - 8
lib/docs/filters/rust/entries.rb

@@ -22,7 +22,15 @@ module Docs
         else
           at_css('main h1').at_css('button')&.remove
           name = at_css('main h1').content.remove(/\A.+\s/).remove('⎘')
-          mod = slug.split('/').first
+          path = slug.split('/')
+          if path.length == 2
+            # Anything in the standard library but not in a `std::*` module is
+            # globally available, not `use`d from the `std` crate, so we don't
+            # prepend `std::` to their name.
+            return name
+          end
+          path.pop if path.last == 'index'
+          mod = path[0..-2].join('::')
           name.prepend("#{mod}::") unless name.start_with?(mod)
           name
         end
@@ -38,13 +46,12 @@ module Docs
         elsif slug.start_with?('error_codes')
           'Compiler Errors'
         else
-          path = name.split('::')
-          heading = at_css('main h1').content.strip
-          if path.length > 2 || (path.length == 2 && (heading.start_with?('Module') || heading.start_with?('Primitive')))
-            path[0..1].join('::')
-          else
-            path[0]
-          end
+          path = slug.split('/')
+          # Discard the filename, and use the first two path components as the
+          # type, or one if there is only one. This means anything in a module
+          # `std::foo` or submodule `std::foo::bar` gets type `std::foo`, and
+          # things not in modules, e.g. primitive types, get type `std`.
+          path[0..-2][0..1].join('::')
         end
       end