Howto:Processing d-tpp using Python: Difference between revisions
Jump to navigation
Jump to search
Line 10: | Line 10: | ||
=== XML Processing === | === XML Processing === | ||
http://155.178.201.160/d-tpp/1712/xml_data/d-TPP_Metafile.xml | |||
=== Scraping === | === Scraping === | ||
=== Downloading === | === Downloading === |
Revision as of 14:39, 28 November 2017
This article is a stub. You can help the wiki by expanding it. |
Motivation
Modules
XML Processing
http://155.178.201.160/d-tpp/1712/xml_data/d-TPP_Metafile.xml
Scraping
Downloading
Converting to images
Uploading to the GPU
Classification
OCR
Prerequisites
pip install --user
- requests
- pdf2image
Code
See also
- https://github.com/euske/pdfminer
- https://dzone.com/articles/pdf-reading
- https://automatetheboringstuff.com/chapter13/
- https://www.binpress.com/tutorial/manipulating-pdfs-with-python/167
- https://github.com/pmaupin/pdfrw