Howto:Processing d-tpp using Python: Difference between revisions

Revision as of 17:18, 28 November 2017

This article is a stub. You can help the wiki by expanding it.

Motivation

Screenshot showing scrapy scraping d-TPPs

Come up with the Python machinery to automatically download aviation charts and classify them for further processing/parsing (data extraction): http://155.178.201.160/d-tpp/

We will be downloading two different AIRAC cycles, i.e. at the time of writing 1712 & 1713:

Each directory contains a set of charts that will be post-processed by convering them to raster images.

Data sources

d-TPP
EuroControl [1]
VATSIM Charts [2] [3]
IVAO Charts [4]

Chart Classification

STARs - Standard Terminal Arrivals
IAPs - Instrument Approach Procedures
DPs - Departure Procedures

Modules

XML Processing

http://155.178.201.160/d-tpp/1712/xml_data/d-TPP_Metafile.xml

Scraping

import os
import urlparse
import scrapy

from scrapy.crawler import CrawlerProcess
from scrapy.http import Request

ITEM_PIPELINES = {'scrapy.pipelines.files.FilesPipeline': 1}

def createFolder(directory):
    try:
        if not os.path.exists(directory):
            os.makedirs(directory)
    except OSError:
        print ('Error: Creating directory. ' +  directory)
        


class dTPPSpider(scrapy.Spider):
    name = "pwc_tax"

    allowed_domains = ["155.178.201.160"]
    start_urls = ["http://155.178.201.160/d-tpp/1712/"]

    def parse(self, response):
        for href in response.css('a::attr(href)').extract():
            yield Request(
                url=response.urljoin(href),
                callback=self.save_pdf
            )

    def save_pdf(self, response):
        path = response.url.split('/')[-1]
        self.logger.info('Saving PDF %s', path)
        with open('./PDF/'+path, 'wb') as f:
            f.write(response.body)


process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})


createFolder('./PDF/')
process.crawl(dTPPSpider)
process.start() # the script will block here until the crawling is finished

Howto:Processing d-tpp using Python: Difference between revisions

Revision as of 17:18, 28 November 2017

Contents

Motivation

Data sources

Chart Classification

Modules

XML Processing

Scraping

Downloading

Converting to images

Uploading to the GPU

Classification

OCR

Prerequisites

Code

See also

Related

Navigation menu

Howto:Processing d-tpp using Python: Difference between revisions

Revision as of 17:18, 28 November 2017

Motivation

Data sources

Chart Classification

Modules

XML Processing

Scraping

Downloading

Converting to images

Uploading to the GPU

Classification

OCR

Prerequisites

Code

See also

Related

Navigation menu

Search