Wikipedia:Bots/Noticeboard

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by ערן (talk | contribs) at 16:08, 15 September 2018 (→‎New bot-like access group). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

    Bots noticeboard

    Here we coordinate and discuss Wikipedia issues related to bots and other programs interacting with the MediaWiki software. Bot operators are the main users of this noticeboard, but even if you are not one, your comments will be welcome. Just make sure you are aware about our bot policy and know where to post your issue.

    Do not post here if you came to


    Discussion concerning WP:BOTFLAG/WP:RC

    In particular, there might be a need to update WP:CREATEBOT with additional guidance. I know very little here, so I'll let others update the guide if that's needed. Headbomb {t · c · p · b} 02:11, 6 August 2018 (UTC)[reply]

    @Headbomb: in a nutshell, some terminology is easy to confuse. The "bot group" gives an account access to the "bot permission". Having the "bot permission" lets an account use the "bot flag". The bot flag is what marks an edit as a "bot edit". When editing via the web interface the "bot flag" is assumed unless suppressed on edit, when editing via the api interface the "bot flag" must be asserted with each edit that wants to be marked as a bot edit. In general most bot tasks will use a bot flag, except for ones where it is desirable to be seen on things like watchlists (for example BAGBOT updating the bots list). When a brand new bot is being trialed, we can't check if this is being used since it won't have access - for new tasks we can check the edits to see. — xaosflux Talk 02:19, 6 August 2018 (UTC)[reply]
    @Xaosflux: Yeah, but what I mean is don't tell me, let's tell bot ops in general, somewhere. Probably as a section of WP:CREATEBOT#How does a Wikipedia bot work?, but anywhere that makes sense would also work. Headbomb {t · c · p · b} 02:23, 6 August 2018 (UTC)[reply]
    @Headbomb: didn't mean to be needlessly "telling you" , mostly this was for general use just was in reply to you. It is something we can be sure BAG is talking about, perhaps add it to the BRFA list (will you be asserting the BOT FLAG?" as a new question? — xaosflux Talk 02:25, 6 August 2018 (UTC)[reply]
    That's an idea. I'm assuming AWB/Pywikipedia takes care of that automatically already? Headbomb {t · c · p · b} 02:28, 6 August 2018 (UTC)[reply]
    By default pywikibot flags edits made by users with the bot right as bot edits. Users can also specify whether or not edits should be flagged as bot edits. — JJMC89(T·C) 03:00, 6 August 2018 (UTC)[reply]
    Just a note for future docs: if using Special:BotPasswords be sure to have "High-volume editing" selected to enable "bot" grants. — xaosflux Talk 04:53, 6 August 2018 (UTC)[reply]
    That also applies to OAuth. — JJMC89(T·C) 04:57, 6 August 2018 (UTC)[reply]
    I added a small section at Wikipedia:Creating_a_bot#Bot_Flag Ronhjones  (Talk) 18:13, 18 August 2018 (UTC)[reply]

    Protection template bot?

    I do not think there is a bot that automatically adds a protection template to pages once an admin protects them. I'm used to adding the protection template myself, to the point that some admins left messages on my talk page concerning the matter. But from what it looks like, there at least used to be a bot that did it, but I'm not sure if it's working. This is why I'm bringing this up on this noticeboard and not the bot request page. Funplussmart (talk) 10:52, 8 August 2018 (UTC)[reply]

    Continued at Wikipedia:Bot_requests#Automatically_add_protection_templates_to_protected_pages Ronhjones  (Talk) 18:02, 18 August 2018 (UTC)[reply]

    SSTbot adding level 5 vital articles

    See [1] - is this really a good idea? I see one of the sons of Noah, Shem, was tagged today as a Level 5 vital article. Pinging User talk:Feminist. Doug Weller talk 15:49, 8 August 2018 (UTC)[reply]

    @Doug Weller: can you elaborate on your concern? Was the page not actually a Wikipedia:Vital articles/Level/5 article? Do you think these just shouldn't be tagged? — xaosflux Talk 17:27, 8 August 2018 (UTC)[reply]
    @Xaosflux: This may be the wrong board. Yes, I don't think one of the sons of Noah is really a vital article as it's not on the list. I can see how a bot can be tagging vital articles from a list, but that doesn't seem to be what's happening here. I will admit that if there are to be 50,000 level 5 articles which should be FA, this one doesn't seem to be top 50,000, but that's an issue for somewhere else, I'm not sure where. What I see here is a bot tagging articles not in the list. Doug Weller talk 17:42, 8 August 2018 (UTC)[reply]
    @Doug Weller: Shem appears on Wikipedia:Vital_articles/Level/5/Philosophy_and_religion#Abrahamic_and_Judaic_mythology_(15_articles) - so it doesn't appear to be malfunctioning - if you think it is malfunctioning this is certainly the right venue to bring it up! If you disagree with the contents of the V5 list, Wikipedia talk:Vital articles/Level/5 may be better. — xaosflux Talk 17:46, 8 August 2018 (UTC)[reply]
    @Xaosflux: you had me worried for a moment, I thought I might be going blind. I was looking at Wikipedia:Vital articles/Level/5 which doesn't include it. The title of the list you linked to seems misleading, as if you look closely it includes Level 4 articles, eg Noah and his 3 sons. Wikipedia:Vital articles/Level/5/People/Religious figures also includes a variety of levels. So if the bot is using these pages to tag all articles as Level 5, it shouldn't be doing that. Doug Weller talk 18:28, 8 August 2018 (UTC)[reply]
    @Doug Weller: see WP:BOTISSUE. Your best bet, contact the bot operator. Headbomb {t · c · p · b} 18:57, 8 August 2018 (UTC)[reply]
    I think those non-5's are on that list just for reference, did you find any level-4's that were getting tagged as level-5? If so, certainly start by contacting the operator, if they are unresponsive or you are still in disagreement we can certainly revisit the task here. Best regards, — xaosflux Talk 20:19, 8 August 2018 (UTC)[reply]
    I think the list is just silly. Considering Noah Level 4 but one of his sons Level 5 is certainly odd. ~ Rob13Talk 09:29, 17 August 2018 (UTC)[reply]
    @BU Rob13: that list is by nature subjective, and it appears lower "levels" are more important - I don't think it's odd that someone would think that improving Noah would be more beneficial to our readers than improving Ham (son of Noah). In any case, I'm not seeing any bot-specific problems here are you? — xaosflux Talk 10:56, 17 August 2018 (UTC)[reply]
    Oh, whoops! I thought the list went the other direction, with higher levels being more important. I'll shut up now. And no, I'm not seeing a bot-specific problem. I was just noting that, to the extent any problem exists, it is with the subjective list, not with the bot. ~ Rob13Talk 11:35, 17 August 2018 (UTC)[reply]

    Out of curiosity... (to all bot operators)

    How many of you use BotPasswords, and how many of you use OAuth, and why?—CYBERPOWER (Chat) 21:05, 8 August 2018 (UTC)[reply]

    • I use OAuth because of the security aspect.—CYBERPOWER (Chat) 21:05, 8 August 2018 (UTC)[reply]
    I just have sessions open for a long time on Toolforge. If I ever need to log in again, I suspect I would use BotPasswords due to convenience. However, if Pywikibot supported OAuth with minimal effort required (and I haven't checked), I would use that instead. Enterprisey (talk!) 21:07, 8 August 2018 (UTC)[reply]
    No idea what I use, whatever AWB/pywikipedia defaults are I guess, and because they're the default shit and seem to work. Headbomb {t · c · p · b} 21:41, 8 August 2018 (UTC)[reply]
    It does – mw:Manual:Pywikibot/OAuth. — JJMC89(T·C) 02:34, 9 August 2018 (UTC)[reply]
    • BotPasswords, I primarily use AWB that doesn't support OAuth - but also because of the huge pile of phab:T103587. BotPasswords just easily work, and I like the control over using web authentication - OAuth is just too much trouble for single-use consumers. — xaosflux Talk 22:50, 8 August 2018 (UTC)[reply]
      I've never had issues specifically related to owner-only consumers. Granted I've discovered a few bugs with OAuth, they're usually critical ones that got fixed promptly. InternetArchiveBot runs on OAuth.—CYBERPOWER (Chat) 23:00, 8 August 2018 (UTC)[reply]
      @Cyberpower678: one of the biggest benefits of BP of OA I see is that it is self-service, you want to use it - just go do it, no fussing around with have to go over to meta and fill out applications, having to republish if you want to modify a grant, etc. — xaosflux Talk 14:07, 9 August 2018 (UTC)[reply]
      User:Xaosflux. I think phab:T103587 is for things like web tools where end-users are represented by the tool for making edits to Wikipedia (similar to the Internet Archive Bot web tool). For bots, there is only 1 user, owner-only consumer, which is significantly more simple. The credential registration is automatically approved (within seconds) there's no backlog or waiting for approval because if the account has a bot-flag it is considered good enough for auto approval. It's basically self-service, filling out the application takes a minute or less. -- GreenC 14:23, 9 August 2018 (UTC)[reply]
    • BotPasswords because AWB. I think both are valid and would not favor one over the other in the approvals process, if that's a relevant component of your question. ~ Rob13Talk 23:35, 8 August 2018 (UTC)[reply]
    • A shell command-line OAuth method for saving pages is wikiget, it's fast, not language-dependent and installs as a single file. Oauth setup instructions -- to save a page: ./wikiget -E <page name> -S <edit summary> -P <text file> -- GreenC 23:45, 8 August 2018 (UTC)[reply]
    • I use OAuth for pywikibot (most of my bot work) because it is more secure and BotPasswords for AWB since OAuth isn't supported. — JJMC89(T·C) 02:34, 9 August 2018 (UTC)[reply]
      I'm surprised AWB doesn't support it. That's a little sad actually, given the dedicated devs that develop it.—CYBERPOWER (Chat) 13:43, 9 August 2018 (UTC)[reply]
      • AWB hasn't had an update in well over a year (@Reedy: here for a reminder an AWB update would be a godsend). It's holding back a lot of things. No idea if OAuth support would add anything to AWB though. Headbomb {t · c · p · b} 14:43, 9 August 2018 (UTC)[reply]
    Passwords are insecure, quite simply. There's a reason the world is moving to OAuth. If AWB doesn't move to OAuth, and there is a serious security incident, I wonder how the WMF sysadmins would respond given their job to protect the Wikimedia servers and infrastructure. -- GreenC 15:36, 9 August 2018 (UTC)[reply]
    Russia getting into User:MinusBot just fills me with dread. Headbomb {t · c · p · b} 15:39, 9 August 2018 (UTC)[reply]
    BOFH too -- GreenC 15:45, 9 August 2018 (UTC)[reply]
    I have no idea what my bot's real passwords are. :p—CYBERPOWER (Chat) 18:25, 9 August 2018 (UTC)[reply]
    I make my bot solve a captcha for each edit! — xaosflux Talk 00:25, 13 August 2018 (UTC)[reply]
    I've put a snapshot release of latest AWB SVN here https://sourceforge.net/projects/autowikibrowser/files/autowikibrowser/Snapshots/ which should help some users (@Headbomb: etc.) until the next formal release. Rjwilmsi 07:54, 11 August 2018 (UTC)[reply]
    • Botpasswords on custom php classes. SQLQuery me! 02:38, 9 August 2018 (UTC)[reply]
    • BotPasswords because I'm too lazy to integrate OAuth and do I really need it? —  HELLKNOWZ   ▎TALK 11:06, 11 August 2018 (UTC)[reply]
      @Hellknowz: Yes! Yes, you do! Why? Because I said so, and I am the bot law. ;-)—CYBERPOWER (Chat) 23:27, 12 August 2018 (UTC)[reply]
      May be later... —  HELLKNOWZ   ▎TALK 00:31, 13 August 2018 (UTC)[reply]

    So, for AWB to be able to use OAuth... We'd really need OAuth2 in MediaWiki. Which we don't have. See phab:T125337. This is due to distribution of secret keys. Or else, it's having everyone register their own OAuth app... I'm also intrigued why you think AWB would be the cause of a "serious security incident". Reedy (talk) 22:29, 16 August 2018 (UTC)[reply]

    PyWikiBot doesn't require users to register their own OAuth app, but they use OAuth fairly invisibly, how do they do it? -- GreenC 00:22, 17 August 2018 (UTC)[reply]
    Yes it does. Legoktm (talk) 01:25, 17 August 2018 (UTC)[reply]
    Ahh then it's defaulting to bot password thought it used OAuth -- GreenC 02:02, 17 August 2018 (UTC)[reply]
    I think PWB's (pywikibot.Site().login()) tries whatever you put in the config file. If you did not generate an OAuth token, surely it cannot try that? TigraanClick here to contact me 07:54, 20 August 2018 (UTC)[reply]
    Ever since I created an OAuth function for awk I have not needed PWB or AWB, I don't know why I thought PWB used OAuth other than a vague memory of seeing OAuth in the source and assuming it was being used in the same way other tools can make edits on behalf of editors using OAuth. -- GreenC 13:19, 20 August 2018 (UTC)[reply]
    • 2FA on the main bot account (since it's an admin bot), and then use bot password for normal use. Ronhjones  (Talk) 18:06, 18 August 2018 (UTC)[reply]
    • OAuth via PWB for me, because the manual said bot passwords are a Bad Thing and as a sheepish rookie botop I followed it. TigraanClick here to contact me 07:54, 20 August 2018 (UTC)[reply]

    WP:AWB: New version is out

    Just a heads up that a new version of AWB is out. The most important change (for me) is T159958. The "in template" lookup logic in the advanced find/replace has been tweaked to find something like {{cite journal}} in something like

    {{reflist|
    *{{cite journal |... }}
    }}

    whereas before you couldn't reach it because it was nested in a {{reflist}}. Many thanks to @Reedy: for the new version. A (partial?) changelog is available at here. Headbomb {t · c · p · b} 01:41, 18 August 2018 (UTC)[reply]

    Bots Newsletter, August 2018

    Bots Newsletter, August 2018

    Greetings!

    Here is the 6th issue of the Bots Newsletter. You can subscribe/unsubscribe from future newsletters by adding/removing your name from this list.

    Highlights for this newsletter include:

    ARBCOM
    • Nothing particular important happened. Those who care already know, those who don't know wouldn't care. The curious can dig ARBCOM archives themselves.
    BAG
    • There were no changes in BAG membership since the last Bots Newsletter. Headbomb went from semi-active to active.
    • In the last 3 months, only 3 BAG members have closed requests - help is needed with the backlog.
    BOTREQs and BRFAs

    As of writing, we have...

    Also

    Discussions

    These are some of the discussions that happened / are still happening since the last Bots Newsletter. Many are stale, but some are still active.

    New things

    Thank you! edited by: Headbomb 15:04, 18 August 2018 (UTC)[reply]


    (You can subscribe or unsubscribe from future newsletters by adding or removing your name from this list.)

    Please comment at the above link if this something that you could make a use of. Going to ping @Smith609, Ocaasi (WMF), Ocaasi, Jonesey95, AManWithNoPlan, CristianCantoro, Nemo bis, Pintoch, Tom29739, and Magioladitis: on this. Headbomb {t · c · p · b} 16:46, 21 August 2018 (UTC)[reply]

    New bot-like access group

    A new "bot-like" user group as appeared, "Copyright violation bots", possibly related to phab:T199359. Thank you to Dolotta for calling this out. @MMiller (WMF): do you have information about what is going on with this initiative and how it will impact editors here? — xaosflux Talk 18:45, 1 September 2018 (UTC)[reply]

    Ping also to ערן as these tasks have your bot account's name all over them. — xaosflux Talk 18:50, 1 September 2018 (UTC)[reply]
    Per Special:ListGroupRights, they have the ability to "Tag pages in the Special:NewPagesFeed as likely copyright violations, through the pagetriage-tagcopyvio API (pagetriage-copyvio)". Looks like the group was created in phab:T202041; for use in User:EranBot. Per phab:T201073 "After the attached patch is merged and deployed, there will be a new user group on English Wikipedia called "Copyright violation bots". A bureaucrat on enwiki can then put EranBot in that group." It seems like the purpose is to allow copyright violations detected by EranBot that fill up CopyPatrol to also mark pages as copyright violations on the NPP feed? Galobtter (pingó mió) 18:55, 1 September 2018 (UTC)[reply]
    Anything like this will certainly need a new BRFA, I seem to recall there was a notable false positive "copyright violation detection" problem with this bot (for example when copying public domain text). — xaosflux Talk 19:01, 1 September 2018 (UTC)[reply]
    Xaosflux: Thank you for opening this discussion. I haven't yet coded the reporting/using the API from the bot side, and haven't yet asked for this right, but I shall do it soon.
    The bot is already running on all changes in enwiki and suspected edits are reported to a database, and users go over them using Copypatrol developed by Community Tech team. Growth team is working on improving Special:NewPagesFeed and are looking for a way to integrate copyvio system into it, and the above tasks can provide some more details.
    Would you like me to open a new BRFA under Wikipedia:Bots/Requests for approval/EranBot/2? Eran (talk) 20:32, 1 September 2018 (UTC)[reply]
    If you are still in beta testing, etc - you don't have to do anything here (yet) - not until such time as you want to start testing edits or actions on (as opposed to against) the English Wikipedia. — xaosflux Talk 20:36, 1 September 2018 (UTC)[reply]
    Ping to @Roan Kattouw (WMF): for any input. Added phab:T193782 tracking to above. — xaosflux Talk 19:55, 4 September 2018 (UTC)[reply]
    Hi everyone, my apologies for not having announced this ahead of time. I thought adding an obscure new user group that isn't being used yet to the already quite long list of user groups would be unlikely to be noticed by anyone, but it took less than 48 hours. That'll teach me to never underestimate Wikipedians :)
    As others already inferred, the idea behind this group is to put EranBot in it, so that it can tell our software which new pages and drafts it thinks are possible copyvios. I wanted to clarify that this "possible copyvio" flag will only appear on Special:NewPagesFeed and nowhere else. It will say Possible issues: Copyvio and link to CopyPatrol for a more detailed report on why it thinks it might be copyvio; see also this screenshot. CopyPatrol and EranBot are existing tools that already score and list potential copyvios, and we're trying to make it easier to use them for new page patrolling and draft review.
    The copyvio feature will be available for testing on test.wikipedia.org soon (stay tuned for an announcement from MMiller (WMF)), and is not enabled on English Wikipedia yet (meaning that even if EranBot is put in this group and starts flagging things, those flags won't be displayed). The reason we need a group is that we need a way to trust only EranBot and not other users to pass us this copyvio information, and we figured the groups+rights system was the least bad way to do that. --Roan Kattouw (WMF) (talk) 22:22, 4 September 2018 (UTC)[reply]
    @Roan Kattouw (WMF): who is the "we" on this? Will the communities control who we trust to make these inputs? Also please see my note on phab:T199359 or above regarding "unoriginal" vs "copyright infringing" and how this is being determined. — xaosflux Talk 01:59, 5 September 2018 (UTC)[reply]
    @MMiller (WMF): perhaps you can address? Primary concern: Is this going to label editors contributions as "copyright violations" in situations where they are not actually in violation of a copyright? Where is the actual checking and validation of the terms of use of the copyright of the source material occurring? If this is actually only checking for "text this third party has seen elsewhere" without actually validating the copyright status we shouldn't be casting aspersions of a legal violation. If it is only checking for non-originality suspects, we should only call them out as such. — xaosflux Talk 14:40, 5 September 2018 (UTC)[reply]
    @Xaosflux: thanks for the question. The work we're doing here is essentially to post the results from the CopyPatrol tool alongside pages in the New Pages Feed. This project has been a collaboration with the NPP and AfC reviewing communities, and our product team has posted as much information as we could about how and why we're building the way we are. CopyPatrol scans all substantial edits (over 500 bytes) by using iThenticate's API (via EranBot), which is a third-party API used by institutions to check original writing for plagiarism. It checks text against its database of websites, academic journals, and books, and says what percent of the text is found in another source. This is not a definitive declaration of copyright violation, because of many potential exceptions. For instance, the text that it finds in multiple locations may in fact be public domain, or it may be a Wikipedia mirror site, or simply a long block quote. Therefore, users of CopyPatrol know that when something is flagged in that interface, it only means there is a potential violation, and that the human editor should investigate further before determining whether there is a violation. Similarly, what we'll do with the New Pages Feed is say "Potential issues: Copyvio", because it is only a potential issue, brought up by proxy through a plagiarism detection service. In conversations with the reviewing communities, it looks like the standard practice there is that any machine-generated flag about copyvio means that a human should investigate to make the decision, which is a practice they use with another popular tool, Earwig's Copyvio Detector. Does this help? -- MMiller (WMF) (talk) 19:01, 5 September 2018 (UTC)[reply]
    @MMiller (WMF): Can the "label/tag" be changed to something like "Potential issue: Copied text", "Potential issue: Reused Content", etc? We obviously take copyright very seriously as it is a core project tenet and I don't think we should blindly through around the phrase "copyright violation". Note, I'm only referring to the label, not the process; to focus on the substance of the content, not the motive of the editor. — xaosflux Talk 20:46, 5 September 2018 (UTC)[reply]
    BTW: The bot does more than just "non original" - it detects wikipedia mirrors (the source indicate it is mirror of wikipedia), creative commons content (the source indicate it is CC license) as well as citations (the added content contains link to the source). Eran (talk) 21:16, 5 September 2018 (UTC)[reply]
    @ערן: that is sort of my point, if someone adds such text their addition may not actually be a "Copyright Violation". I don't disagree that it may warrant extra recent changes patrol attention, and tagging sounds useful. — xaosflux Talk 21:25, 5 September 2018 (UTC)[reply]
    Just wanted to chime in to agree with Xaosflux here. I'm far from the most prolific user of Copypatrol, but I spend some time there. There are a number of things that aren't violations, and the false positives are usually either a vandal repeating "hi hi hi hi" or some other text, or something public domain or appropriately licensed. The latter content is frequently one of roughly three specific editors who won't show up in new pages since they're all autopatrolled, but certainly not always. I'd likewise be more comfortable with either of Xaosflux' suggested options. ~ Amory (utc) 01:07, 6 September 2018 (UTC)[reply]

    Xaosflux, I would like to start testing it on enwiki and get Eranbot into copyviobot group - any report that go to CopyPatrol will also goes to PageTriage via API of enwiki. AFAIK (Roan Kattouw (WMF), MMiller (WMF) correct me if I'm wrong) currently it will not displayed to users and later (once enabled in the PageTriage extension) there will be small hint with link to copypatrol for further infromation (example[2]). As for the text of the hint, I think there are good arguments here why the hint text should be cartefully considered (shouldn't be casting aspersions of a legal violation) - it can discussed later how to name it (copycheck? copypatrol?). Thanks, Eran (talk) 10:32, 15 September 2018 (UTC)[reply]

    @ערן: please file a WP:BRFA to have your new task reviewed. It looks like some testing may have been done on testwiki, if so please include information from those tests in the BRFA. I'm glad you are open to the labeling update (note the name of the "access group" doesn't matter, we can rename that locally) and both of your above suggestions sound fine to me. I certainly expect this will catch actual "copyright violations" in addition to other copied text - but we can leave the blaming up to humans :D — xaosflux Talk 14:47, 15 September 2018 (UTC)[reply]
    Wikipedia:Bots/Requests for approval/EranBot/2. Eran (talk) 16:08, 15 September 2018 (UTC)[reply]
    (Roan Kattouw (WMF), MMiller (WMF) - it looks like in the new group creation local community access wasn't included (e.g. "Allow bureaucrats to add/remove users from this group"), this should be done for enwiki, testwiki, test2wiki - do you need us to create the phab configuration request for this? — xaosflux Talk 14:51, 15 September 2018 (UTC)[reply]

    Server test reminder

    Hey, a reminder: the there will be a server test over the next few weeks, see m:Tech/Server switch 2018 for more info. On , September 12, it will not be possible to edit the wikis for a short period of time (same goes for October 10). /Johan (WMF) (talk) 12:11, 10 September 2018 (UTC)[reply]