20,741
edits
Line 239: | Line 239: | ||
The basic idea is to run a fitness function and score regex expressions based on not throwing an exception (which makes them valid), and by checking if the desired tokens are part of the output string. | The basic idea is to run a fitness function and score regex expressions based on not throwing an exception (which makes them valid), and by checking if the desired tokens are part of the output string. | ||
Ultimately, this would allow the script to self-update its regex/xpath expressions if/when the underlying website (themes) change, but it would also allow to add support for new websites, without ever manually adding the required xpath/regex expressions, i.e. all that is needed is a sufficiently large number of example datasets to obtain the author, date and title information. | |||
* http://regex.inginf.units.it/how.html | * http://regex.inginf.units.it/how.html |