FlightGear wiki talk:Instant-Refs: Difference between revisions

Jump to navigation Jump to search
Line 161: Line 161:


== changing and automating regex/xpath handling ==
== changing and automating regex/xpath handling ==
{{See also|#Genetic_Expression_solver}}


We have currently hard-coded handling of different websites, regexes and xpath expressions so that things are a  bit fragile at the moment, and once a website changes its style/template, our script would break immediately. However, we could change the way the script works by using a very simple NN (neural network) to come up with matching regex/xpath expressions automatically. The way this could work would be "supervised" training, where we would replace/adapt our existing CONFIG hash with a vector of URLs and contents to be extracted (date, time, title, posting). This kind of data would suffice entirely for a neural network to self-train itself and "learn" how to come up with regex/xpath expressions to extract the relevant contents, including not only hard-coded websites like sourceforge, but even completely different websites (think gmane) - because the "learning" routine would self-adapt by looking at how to get its contents, and never contain any hard-coded regex/xpath expressions anymore. The CONFIG hash would end up being a vector with "training data" for different websites to be supported. And whenever an expression fails, it could re-train itself accordingly.--[[User:Hooray|Hooray]] ([[User talk:Hooray|talk]]) 13:33, 15 November 2015 (EST)
We have currently hard-coded handling of different websites, regexes and xpath expressions so that things are a  bit fragile at the moment, and once a website changes its style/template, our script would break immediately. However, we could change the way the script works by using a very simple NN (neural network) to come up with matching regex/xpath expressions automatically. The way this could work would be "supervised" training, where we would replace/adapt our existing CONFIG hash with a vector of URLs and contents to be extracted (date, time, title, posting). This kind of data would suffice entirely for a neural network to self-train itself and "learn" how to come up with regex/xpath expressions to extract the relevant contents, including not only hard-coded websites like sourceforge, but even completely different websites (think gmane) - because the "learning" routine would self-adapt by looking at how to get its contents, and never contain any hard-coded regex/xpath expressions anymore. The CONFIG hash would end up being a vector with "training data" for different websites to be supported. And whenever an expression fails, it could re-train itself accordingly.--[[User:Hooray|Hooray]] ([[User talk:Hooray|talk]]) 13:33, 15 November 2015 (EST)

Navigation menu