FlightGear wiki talk:Instant-Refs

From FlightGear wiki
Jump to navigation Jump to search

more sources

at some point, we may also want to add support for other sources, such as:

that should cover all important sources. And it would allow us to also use the same script to help populate FlightGear Newsletter & changelogs, but also Release plan/Lessons learned. So this could be a real time-saver. --Hooray (talk) 14:40, 1 June 2014 (UTC)

Automatic update of script and old quotes

Thanks for the heads-up. Now, that makes me wonder if I can adapt the script to automatically parse existing wiki articles and update cquotes there automatically by re-opening the URL and re-running the extraction logic :) BTW: That reminds me: I was going to investigate adopting the downloadURL attribute[1] so that scripts can auto-update --Hooray (talk) 22:51, 11 June 2014 (UTC)

if we should continue to see this widely used, having a good way to re-download and re-run the script on pages with existing cquotes would be a good way to automatically update quotes. For that, we would want to encode certain meta info in each template, e.g.:
  • script version
  • quoting date/time
  • quote URL
  • selection offsets
  • quoting settings (format)

We can hook into form submission to update arbitrary quotes/contents using this: http://commons.oreilly.com/wiki/index.php/Greasemonkey_Hacks/Developer_Tools#Intercept_and_Modify_Form_Submissions

--Hooray (talk) 07:19, 15 November 2015 (EST)

Hosting

The script seems to be about to become sufficiently complex to deserve actual hosting:

(that would allow updating the script automatically)

Hi, is there any progress being made on this? I think the best way to host it would be to put in in a SourceForge repository under the FlightGear project (I don't know if it allows direct downloads, though, and we would need to relicense the script because public domain licenses are not OSI-approved); otherwise, I can submit it to GreasyFork (OpenUserJS is unsuitable as public domain licenses are not allowed). -- ElGaton (talk to me) 03:58, 1 May 2016 (EDT)
I don't think anybody has looked into this recently - however, relicensing should be a no-brainer, given that all the code is in the public domain anyway. I am not sure if it's worth the hassle to put up under sourceforge/FlightGear; we also need to keep in mind that this is not exactly the most popular "tool" around here. So, before this becomes even better accessible, it might be a good idea to comply with some of the requests made, e.g. by adding support for a different output mode (without using the FGCquote template, just the copied text, plus the ref part), so that quote-heavy pages don't look that obnoxious.
Regarding hosting in general, I still think it may be a good idea, but we would probably need to make sure that some requirements are met, e.g. having a revision history, and to provide other contributors with access-alternatively, we should maintain a copy of the script in the wiki, and add a note saying that changes to it will be reviewed/integrated with the hosted script by one of us (if you'd volunteer to help with that, I'd suggest to just go ahead and set up a public repository).
Admittedly, this is going to remain a fairly FG specific script, it would need quite a bit of work to become useful for other purposes (even though other OSS projects may be easy to support). --Hooray (talk) 06:33, 1 May 2016 (EDT)
OK, I think we can leave the script here and host it on GreasyFork (it allows public domain scripts) - I'll put the page on my watchlist to make sure I update the script there every time there's a version bump. Does that look good to you and the other contributors? -- ElGaton (talk to me) 09:26, 1 May 2016 (EDT)
Sounds good to me, given how the script has evolved over time, we've probably hit a natural limit already, i.e. being able to use multiple files easily, and update scripts automatically sounds like a useful thing to me. I don't know if greasyfork supports "teams" of contributors for better collaboration ? It might also be a good idea to generalize the script so that it supports arbitrary phpBB installations and mailing list archives other than just sourceforge (think gmane) - and maybe output formats other than just wikimedia markup, that way, the script may become more useful over time.--Hooray (talk) 13:46, 1 May 2016 (EDT)
It doesn't as it only offers a user script downloading facility (source code repositories like GitHub and SourceForge are better suited for the kind of work you mention). I'll wait a bit in case Red Leader, Philosopher or bigstones have any objections/remarks, then I'll upload the script there. -- ElGaton (talk to me) 17:14, 1 May 2016 (EDT)
do they support extensions spread across multiple files ? That is something that would help us quite a bit cleaning up the current code, which has become a bit convoluted over time. Apart from that, nothing wrong with github et al - I am just not sure if they're suitable for automatically propagating updated changes.--Hooray (talk) 10:12, 2 May 2016 (EDT)
No, not that I know of (generally speaking, all user scripts are made of a single .user.js file). We could "hack" it by splitting the current script in multiple files and requiring them from a main one as libraries via the @include directive, but that would rule out most hosting solutions (many providers only allow GreaseMonkey scripts which include additional files only from a handful of trusted domains/CDNs, for safety reasons).
As for propagating changes on GitHub and other code repository, the update URL must remain constant throughout the lifetime of the script. I'm sure this can be done on GitHub by publishing updates not as releases (where the addresses change), but on GitHub Pages (a user script I use does exactly this); on SourceForge, the usual download hosting could do the trick, but I'd need to perform a test first. (I'd prefer SF to keep the script "inside" the FG project hierarchy, if possible). -- ElGaton (talk to me) 10:50, 2 May 2016 (EDT)
actually, user scripts can be compromised of multiple files, e.g. via the require directive [2]. Personally, I would like to make use of that - we are already using that to include jQuery stuff, and it would help us organize the script a little better. I am not oposed to seeing the script hosted as some part of FG/SF, but I just don't think it's going to be a very popular idea - so far, everything worked out without major hinderance, so I would rather use an independent hosting solution, or continue to use the wiki to ensure that the barrier to entry/contributing isn't raised, e.g. by going through merge requests etc. Major contributions to the script like those from bigstones or Red Leader were simply "dropped" here directly, so this seems to have worked out really well.--Hooray (talk) 11:26, 2 May 2016 (EDT)
I've checked the GreasyFork external script policy: as long as all the files are hosted on/uploaded to GreasyFork, it's not a problem. (Also, sorry about the typo, I wrote @include instead of @require by mistake). I guess I can proceed with the upload now? -- ElGaton (talk to me) 12:22, 2 May 2016 (EDT)
sure, why not - it's in the public domain after all - feel free to make any changes to the script to have it auto-update. We can review everything once we have gathered a little more experience with this kind of scheme. --Hooray (talk) 12:27, 2 May 2016 (EDT)
Done Done - I'm updating the installation instructions in the main page. -- ElGaton (talk to me) 13:52, 2 May 2016 (EDT)
Script at: https://greasyfork.org/en/scripts/19331-instant-cquotes

--Hooray (talk) 14:18, 2 May 2016 (EDT)

Regarding updates, bumping the version number is fine - I'll take the script at the wiki revision where the version is bumped and upload it. (Note that I will not perform any prior testing, so be extra careful - maybe I should add this in the note above the script?). -- ElGaton (talk to me) 18:08, 2 May 2016 (EDT)
sounds good, I will be sure to stop updating the version number without prior testing, we could also add some tests to the code to spot the more obvious mistakes - in fact, there is already an OpenLink helper that could be used to download a few postings and try to extract the corresponding fields, so that we would only bump the version number if that succeeds. BTW: Thanks for handling the hosting part, much appreciated ! --Hooray (talk) 18:25, 2 May 2016 (EDT)

Issues

too greedy non-greedy regexes

The problem described in the previous section regarding regexes that eat up half messages seems to be related to my misunderstanding of the non-greediness. So, I managed to fix it for this one case, but this means that using .*? to match everything until you meet the following character (as I currently do pretty much everywhere) is dangerous and prone to failure. Any occurrence of that should be changed as I did in this edit and that's clearly cumbersome, not to say that it can still be incorrect: say I have

<a someprop="href" href="http://www.link.org" ... >

If I want to mach anything between a and href, I use [^(?:href)]*, but that would match only up to the text inside someprop, so I'd have to check that it's not inside doublequotes... Well, I guess this is getting too complicated for handling it the Chtulu way.

So my approach would be: fix this whenever the problem comes up, but don't overdo because we're already moving in dangerous ground. Or, rewrite it all using only xpaths *sobs*. --Bigstones (talk) 16:49, 17 June 2014 (UTC)

regex vectors

When testing things I realized that you are right: there are some scenarios where the regex may fail depending on how "complete" the selection is, because we obviously have hard-coded assumptions here. I'll see if it's feasible to also support vectors for regexes to extract the corresponding fields and try each regex in order to get a certain field, or if that doesn't make any sense... But quoting with/without author (anonymous quote) would be a valid test case here.--Hooray (talk) 16:46, 16 June 2014 (UTC)

Probably going to look into this sooner or later because this could be a simple solution to also support PM quoting - without having to parse the actual URL, we'd just try different regexes in order and use the one that succeeds. --Hooray (talk) 16:46, 16 June 2014 (UTC)

Syntaxhighlighting

Need to investigate what needs to be updated to support quoting code sections, as per [3] --Hooray (talk) 22:33, 14 June 2014 (UTC)

Postings that break our script for some reason

Misc notes

Detecting failed XPaths

you've got a point, we should probably check if xpath/regexes succeed or fail, and show a warning so that we know that the scripts needs to be updated because some xpath/regex may have changed.

Paragraphs / br (trailing slash)

There are some minor issues now, i.e. newline2br will no longer contain the trailing forward slash, so there's probably some JavaScript/regex oddity involved here, maybe slashes just need to be escaped. Will be testing the code with a few different forum postings and check the resulting cquote

Done Done. That's because newline2br wasn't used at all in html mode. I added addNewlines which puts newlines after br's and list related tags, if there are more newlines to be added that should be the place.
--Bigstones (talk) 14:03, 17 June 2014 (UTC)

support/ignore highlighted keywords/smilies

see Understanding Forward Compatibility for examples

Done Done --Bigstones (talk) 15:13, 17 June 2014 (UTC)

Beyond just cquotes (newsletter/changelog)

Gijs has recently revamped the newsletter template rather significantly, see: FlightGear_Newsletter_June_2014, and User_talk:Gijs#06.2F2014_newsletter:_too_much_of_a_ - basically, we could extend the script to support another output FORMAT to directly create markup for the newsletter/changelog. --Hooray (talk) 20:40, 30 June 2014 (UTC)

more styles/output formats

looking at some of the cleanup done by Red Leader recently, we could also directly support other styles/output formats to directly provide a format that looks well enough without requiring tons of manual editing. --Hooray (talk) 11:11, 13 February 2015 (EST)

token matching for keywords/variables

seems like it might make sense to match common keywords/acronyms and variables, such as e.g.:

file names with a known prefix like $FG_ROOT could even be pattern-matched using git link: $FG_ROOT/Nasal/canvas/MapStructure.nas would become $FG_ROOT/Nasal/canvas/MapStructure.nas

Equally, we could match common property paths to use code tags, e.g.: /sim/rendering would become /sim/rendering Ultimately, it might be a good idea to get in touch with Johan_G, Gijs and Red Leader to see what kind of format they'd prefer, especially because this could then be directly used for creating newsletter contents. --Hooray (talk) 11:19, 13 February 2015 (EST)

matching repository references using git link template

Most of us commonly refer to repository files using either the $FG_* references, or relative paths in the form of src/Main/fg_init.cxx:line number, we could automatically convert such references using the git/repo link template, e.g.:

Cquote1.png Further investigation found that the launcher *is* trying to add this directory to fg-aircraft (src/GUI/QtLauncher.cxx:772), but that this doesn't work because this option is processed before the launcher is run (intentionally, to allow the launcher to find aircraft in fg-aircraft: src/Main/main.cxx:448).
— Rebecca N. Palmer (2015-04-01). Re: [Flightgear-devel] Launcher issues on Linux.
(powered by Instant-Cquotes)
Cquote2.png

--Hooray (talk) 12:48, 1 April 2015 (EDT)

Development resources

Future focus/development and priorities (De-quoting)

Given that most people involved in maintaining the wiki don't seem to particularly appreciate the nature of articles consisting mainly of quotes, and that we don't seem to have any other good way to populate new articles quickly, I was thinking of changing the focus of the script to dynamically create articles using quotes (with proper refs), that would be merely copy-edited, i.e. using the process we are currently using to "de-quote" such articles by 1) categorizing related quotes, 2) coming up with headers/sub-headers, 3) re-writing/merging certain quotes, 4) attributing them - linking back to the quotes archives. Should we decide to pursue this, the script would be turned into a "wizard" where we can topics and sub-topics (wiki headings) and then add arbitrary text using the existing approach, but without using cquotes - i.e. the focus would be on extracting relevant contents, distilling them down and adding plenty of refs, including a references section at the bottom. Some tokens/words could be changed dynamically, but some kind of proof-reading/copy-editing would still need to be done afterwards, because people tend to use first person speech in their announcements/postings, which we would need to convert to 3rd person semi-automagically. Thoughts/ideas ?--Hooray (talk) 07:11, 15 November 2015 (EST)

changing and automating regex/xpath handling

We have currently hard-coded handling of different websites, regexes and xpath expressions so that things are a bit fragile at the moment, and once a website changes its style/template, our script would break immediately. However, we could change the way the script works by using a very simple NN (neural network) to come up with matching regex/xpath expressions automatically. The way this could work would be "supervised" training, where we would replace/adapt our existing CONFIG hash with a vector of URLs and contents to be extracted (date, time, title, posting). This kind of data would suffice entirely for a neural network to self-train itself and "learn" how to come up with regex/xpath expressions to extract the relevant contents, including not only hard-coded websites like sourceforge, but even completely different websites (think gmane) - because the "learning" routine would self-adapt by looking at how to get its contents, and never contain any hard-coded regex/xpath expressions anymore. The CONFIG hash would end up being a vector with "training data" for different websites to be supported. And whenever an expression fails, it could re-train itself accordingly.--Hooray (talk) 13:33, 15 November 2015 (EST)

nested quotes and AJAX for thread titles

Cquote1.png http://sourceforge.net/p/flightgear/mailman/message/33451055/
Cquote1.png As we move forward with FlightGear development and future versions, we will be expanding the "in app" aircraft center. This dialog inside flightgear lets you select, download, and switch to any of the aircraft in the library.
— Curt
Cquote2.png

— Hooray (Oct 7th, 2015). Re: New Canvas GUI.
(powered by Instant-Cquotes)
Cquote2.png

AJAX mode/testing

We should probably add a dialog to store credentials that we can use for accessing the wiki, which would need to be shown after installing the script, i.e. first-time use

--Hooray (talk) 12:59, 20 November 2015 (EST)

libraries

Referring to the #Hosting section, and my comment on wanting to use additional libs (like jQuery), I am primarily thinking about using a wizard-framework (e.g. jQuery steps) and a framework for creating wikimedia editor plugins (actions):

--Hooray (talk) 12:07, 2 May 2016 (EDT)

About dequoting the script's article

Thanks for doing this, but I am frankly not sure if it's worth the effort - there are much more popular/important articles using tons of quotes than this one, and given the reputation of the script, having quotes in this article may actually be a useful thing to make the case for having quotes in the first place - thus, I would frankly not spend much time going through this particular article - in fact, I'd be very surprised if the greasyfork download stats showed more than 3-5 people actually downloading, and using, the script - so this is really just a niche tool, and we probably better spend our time doing other things, and reviewing other wiki articles, than the script's article - at the very least, I would suggest to retain the references to the original discussions revolving these quotes (just my 2c). --Hooray (talk) 19:00, 2 May 2016 (EDT)

Done Done I've added back the quotes as references. -- ElGaton (talk to me) 02:53, 3 May 2016 (EDT)

Script configuration (persistence)

Work in progress

We have now several more or less related development efforts going on, and the code is also growing because of that - so it seems to make sense to summarize what's been going on recently. In general, all changes were made in response to the collection of feature requests and ideas we have accumulated over time - specifically, that means that the following roadmap is in the process of being implemented:

  • establish unit testing, and add a few self-tests, so that the script can be more easily tested, updated/reviewed in the future
  • rework the script to more easily support other sources (think gmane)
  • make it much easier to update modified xpath/regex expressions (i.e. provide a UI for that)
  • support persistence for script-specific settings
  • make it easier for people to port/maintain the script by encapsulating platform specifics
  • support asynchronous fetching of postings (AJAX), e.g. to fetch posting titles, attachments etc
  • prepare the groundwork for supporting template-based output formats (think newsletter, changelog, articles)
  • review what's necessary to allow the script to update fgcquote-based quotes automagically

--Hooray (talk) 16:47, 4 May 2016 (EDT)

External genetic/neural network libraries

While uploading the latest version of the script, I noticed that the Genetic and Synaptic libraries are hosted on sites not on the approved GreasyFork list. Shall I ask them to host them? -- ElGaton (talk to me) 13:28, 11 May 2016 (EDT)

Thanks for pointing that out, I missed that completely, because everything is working correctly here - but obviously, I rarely get to actually download/install the latest version via greasyfork. Those libs should be safe to be added to the list, i.e. they're fairly established/popular, lest I'd not be using them. For now, this is just an experiment anyway. The idea is to update the xpath/regex code to evolve if/when the underlying website (theme) changes. But if there is a problem, we could use other libs - the code is just used for testing ATM. --Hooray (talk) 13:41, 11 May 2016 (EDT)
Ongoing Ongoing I have submitted the libraries to the JSDelivr CDN; once they are up, I'll change the source URLs in the script and update it on GreasyFork as well. -- ElGaton (talk to me) 18:32, 17 May 2016 (EDT)
Hi Elgaton, thank you for taking care of this, it's very much appreciated ! --Hooray (talk) 18:55, 17 May 2016 (EDT)
You're welcome! My merge requests for the CDN were approved (this took a bit longer than expected because I had made a small mistake - I forgot to include a file required by the Genetic.js library). The libraries should be up in a day or two, at which point I'll update the URLs in the script and submit the new version to GreasyFork. -- ElGaton (talk to me) 02:59, 20 May 2016 (EDT)
Done Done Everything should be fine now. -- ElGaton (talk to me) 09:16, 20 May 2016 (EDT)
Thank you for all the help with this, and also for updating the greasyfork info/screenshots ! --Hooray (talk) 07:16, 21 May 2016 (EDT)

Genetic Expression solver

Screenshot showing the instant cquotes script with integrated regex solving support using genetic algorithms.

The genetic-js framework has been integrated, it is intended to help solve xpath/regex expressions procedurally using genetic algorithms/programming.

The idea is to provide a set of desired outputs (needles), available input data (haystack), and use existing (possibly outdated) regex/xpath expressions to seed a pool with potential solutions for retrieving the desired output. For now, this is just proof-of-concept, i.e. just an experiment.

For example, let's consider the typical format of a from header for any sourceforge posting:

var regexTests = [
  {haystack: "From: John Doe <John@do...> - 2020-07-02 17:36:03", needle: "John Doe"}, 
  {haystack: "From: Marc Twain <Marc@ta...> - 2010-01-03 07:36:03", needle: "Marc Twain"},
  {haystack: "From: George W. Bush <GWB@wh...> - 2055-11-11 17:33:13", needle: "George W. Bush"}
];

The basic idea is to run a fitness function and score regex expressions based on not throwing an exception (which makes them valid), and by checking if the desired tokens are part of the output string, and evolve (mutate/cross-over) the "fittest" expressions - i.e. those that satisfy at least /some/ of the heuristics.

Some of the metrics that can be used by the fitness function to determine if an expression is "fit", are:

  • valid expression (i.e. not throwing an exception)
  • relative offset/excess bytes in matches string (matching percentage of string found)
  • number of examples it can partially/successfully extract
  • length of the expression
  • runtime of the expression (favoring less complex expressions)

The search space, and runtime, can be significantly reduced by looking at similarities between all examples and coming up with a subset string that contains all identical components (e.g. the From: part in the author regex) and use that for seeding the initial generations.

Ultimately, this would allow the script to self-update its regex/xpath expressions if/when the underlying website (themes) change, but it would also allow to add support for new websites, without ever manually adding the required xpath/regex expressions, i.e. all that is needed is a sufficiently large number of example datasets to obtain the author, date and title information, and a URL for the script to download the HTML markup of the posting in question:

Note  The date field won't work as is, because it's actually post-processed using a transformation function, so we'd need to undo the transformation or use the actual date string instead
 // vector with tests to be executed for sanity checks (unit testing)
    tests: [
      {
        url: 'https://sourceforge.net/p/flightgear/mailman/message/35059454/',
        author: 'Erik Hofman',
        date: 'May 3rd, 2016', // NOTE: using the transformed date here 
        title: 'Re: [Flightgear-devel] Auto altimeter setting at startup (?)'
      },
      {
        url: 'https://sourceforge.net/p/flightgear/mailman/message/35059961/',
        author: 'Ludovic Brenta',
        date: 'May 3rd, 2016',
        title: 'Re: [Flightgear-devel] dual-control-tools and the limit on packet size'
      },
      {
        url: 'https://sourceforge.net/p/flightgear/mailman/message/20014126/',
        author: 'Tim Moore',
        date: 'Aug 4th, 2008',
        title: 'Re: [Flightgear-devel] Cockpit displays (rendering, modelling)'
      },
      {
        url: 'https://sourceforge.net/p/flightgear/mailman/message/23518343/',
        author: 'Tim Moore',
        date: 'Sep 10th, 2009',
        title: '[Flightgear-devel] Atmosphere patch from John Denker'
      } // add other tests below

    ], // end of vector with self-tests

Note how this no longer contains any hard-coded xpath/regex expressions - instead, the script can refer to the website specific defaults, and try those first, and if they fail, use those to seed new generations and evolve them procedurally until all tests succeed.

For a regex to be valid, it must work for all tests/examples.

Once the regex solver is working correctly, the code can be further generalized to also evolve xpath expressions (NOTE: the DOMParser API is /not/ available to webworkers ...) and chain those two components together.


Hierarchical Clustering

Been tinkering with a JavaScript module to automatically cluster postings based on certain keywords in the topic/title or posting (e.g. Nasal, Canvas, 2D API, rendering): https://harthur.github.io/clusterfck/

Note that in conjunction with processing article sections and/or whole articles, this could help automatically come up with matching postings for articles and vice versa.

--Hooray (talk) 18:11, 19 May 2016 (EDT)

Upcoming dependencies

The script is now using markitup to create the template editor. --Hooray (talk) 10:18, 21 May 2016 (EDT)

I've submitted a pull request for the library to be distributed by jsDelivr. -- ElGaton (talk to me) 16:08, 21 May 2016 (EDT)
Thank you, but please don't make this a priority: I only just realized that it is introducing a bunch of jQuery issues, and that the project hasn't been updated in years, so it probably makes more sense to use something else - i.e. the "integration" itself only takes 10 lines of code, but fixing all the jQuery compatibility issues, would be non-trivial. Thanks anyway. --Hooray (talk) 15:19, 22 May 2016 (EDT)
Done Done -- ElGaton (talk to me) 08:10, 27 May 2016 (EDT)

Code beautifier ?

I still had, and have, pending changes that would need to go through the same beautifier you're using, or they'd be lost - so for now, I have just overwritten your beautified code, hope that you didn't do this manually (there really are tons of automated and configurable beautifiers out there). --Hooray (talk) 15:59, 21 May 2016 (EDT)

Of course not - let me know when you've incorporated all changes so that I can beautify the script again. -- ElGaton (talk to me) 16:31, 21 May 2016 (EDT)


I have been using some web-based JavaScript syntax checking tools, too (e.g. www.jshint.com), so if it's a website, too - let's just share/document these tools here.--Hooray (talk) 16:34, 21 May 2016 (EDT)
I use JSLint plus a bit of manual editing (if needed). -- ElGaton (talk to me) 17:34, 21 May 2016 (EDT)

automated test

we now have a simple API to add addons to FlightGear without the need to mess around with FGData/Nasal or FGHome/Nasal directories. FlightGear now accepts the command line switch --addon=/path/to/some/addon (note: command line switch is just that: a command line switch - not an option to be entered into the launcher). [1]