20,741
edits
Line 246: | Line 246: | ||
* length of the expression | * length of the expression | ||
* runtime of the expression | * runtime of the expression | ||
The search space, and runtime, can be significantly reduced by looking at similarities between all examples and coming up with a subset string that contains all identical components (e.g. the <code>From:</code> part in the author regex). | |||
Ultimately, this would allow the script to self-update its regex/xpath expressions if/when the underlying website (themes) change, but it would also allow to add support for new websites, without ever manually adding the required xpath/regex expressions, i.e. all that is needed is a sufficiently large number of example datasets to obtain the author, date and title information. | Ultimately, this would allow the script to self-update its regex/xpath expressions if/when the underlying website (themes) change, but it would also allow to add support for new websites, without ever manually adding the required xpath/regex expressions, i.e. all that is needed is a sufficiently large number of example datasets to obtain the author, date and title information. |