20,741
edits
(→OCR) |
m (→Motivation) |
||
Line 2: | Line 2: | ||
== Idea == | |||
if processing actual PDFs to "retrieve" such navigational data procedurally is ever supposed to "fly", I think it would have to be done using OpenCV runnning in a background thread (actually a bunch of threads in a separate process), i.e. using machine learning - basically, feeding it a bunch of manually-annotated PDFs, segmenting each PDF into sub-areas (horizontal/vertical profile, frequencies, identifier etc) and running neural networks. | |||
Basically, such a thing would need to be very modular to be feasible - i.e. parallel processing of the rasterized image on the GPU, to split the chart into known components and retrieve the identifiers, frequencies, bearings etc that way. | |||
It is kind of an interesting problem and it would address a bunch of legal issues, too - just like downloading such data from the web works for a reason, but it would definitely be a rather complex piece of software I believe, and we would want to get people involved with machine learning and computer vision (OpenCV) - it is kinda a superset of doing OCR on approach charts, i.e. not just looking for a character set, but actual document structure and "iconography" for airports, navaids, route markers and so on. | |||
== Motivation == | == Motivation == | ||
[[File:Chart-scraping.png|thumb|Screenshot showing scrapy scraping d-TPPs]] | [[File:Chart-scraping.png|thumb|Screenshot showing scrapy scraping d-TPPs]] |