Using OCR to make digital cutups
first install tesseract:
> sudo apt install tesseract-ocr
then download a high-quality newspaper (or other document). I like the papers from Library of Congress.
then convert the document into a
> tesseract <your-downloaded-image.jpg> 1 -c hocr_font_info=1 -c hocr_char_boxes=1 txt hocr