Wikipedia:Miscellany for deletion/Second batch of mass-created portals based on a single navbox/Selection process

From Wikipedia, the free encyclopedia

Selection process for this nomination[edit]

This requires a programmer's text editor and WP:AWB. Some modest proficiency is needed with AWB's list-making features, but no great wizardry.

This list aims to include pages which meet all the following criteria

  • single-navbox portal
  • created by @The Transhumanist between 13 August 2018 and 22:41, 12 September 2018. That is the period from the start of TTH's mass creation until the oldest of the portals included in the first nomination
  • Not a redirect or duplicate of another page on the list
  • Not already tagged for MFD

I took 10 steps. Several of the steps could be merged, but I prefer to do tasks like this one step a time.

  1. View contribs. Display the user contributions by @The Transhumanist, selecting portal namespace, Only show edits that are page creations, 500 pages at a time — start here
    Note that this will no longer show the same results as when I did it, because some pages have since been deleted. However, I think that if you use the same cutoff dates, the end results should be identical, because the pages which have since been deleted were excluded in later steps.
  2. Screenscrape. Copy several screenfulls to a text editor (I was using Notepad++).
  3. Strip irrelevancies. On each line strip off everything except the page name. This can be done accurately thanks to hidden characters which were included in the screengrabs. I can't display the hidden chars (value U+200E: LEFT-TO-RIGHT MARK [LRM]) so I have replaced them with X, but with that modification the two regexes are: s/^.+\+[\d,]+‎X N // and s/X .+$//
  4. Linkify. Use a regex to convert each line to a link: s/^(.+)$/# [[:\1]]/. Save list as a file (total: 2,053 pages)
  5. Remove duplicates. Load list into WP:AWB. Use List->Remove duplicates. Save list (total: 1,901 pages)
  6. Remove redirects and non-existent pages. Use the skip tab to skip "page is rediect" and "doesn't exist". Save list (total: 1,737 pages)
  7. Remove pages already tagged for MFD. Use AWB in list-making ("pre-parse") mode to skip any page which matches /{{mfd/i. Save list. (total: 1,182 pages)
  8. Remove non-automated. Use AWB's "List comparer" tool" to keep only pages which transclude Template:Transclude list item excerpts as random slideshow or Template:Transclude linked excerpts as random slideshow. Save list. (total 1,175 pages)
  9. Keep only single-navbox selections. I used an AWB custom module (see my module to identify pages where the only unnamed parameter for {{Transclude list item excerpts as random slideshow}} or {{Transclude linked excerpts as random slideshow}} is a single template; other pages were skipped. Save list. (total 1,144 pages)
  10. Error-checking. Use AWB's list comparer to find any in of 1,144 pages which are not in the tracking Category:Automated portals with article list built solely from one template. That category is populated by Module:Excerpt slideshow, so every page in this list shoukd be in the category.
    Result: found 5 pages which were not in the category: Portal:Ezra Pound, Portal:Howard University, Portal:Ghosts, Portal:Belgrade, Portal:Boston Red Sox, Portal:Exoplanets. Manually checked each one, found that in every case my AWB run was correct and the problem was Lua modules being overloaded caused a Lua failure.

So the final list is the 1,144 pages identified in step 9.

I would welcome any checks by others on this process. --BrownHairedGirl (talk) • (contribs) 19:23, 14 April 2019 (UTC)[reply]