Image to Text Conversion Services

 

bringing context and meaning to your images through OCR

Learn More

Beyond Digitisation

Divider

The utility of digitised images can be substantially improved when combined with contextual information. What were once static images can now be searched and indexed, filled with any relevant information one so wishes.

Relevant information can be produced from content within the images, by ascribing descriptive metadata or by linking images with your analogue or digital finding aids or other databased information. If these images contain textual information, they be converted into computer readable text.

This creation of searchable or editable text greatly increases the chances of discovery and assists with information sharing about your images and collections.

W. & F. Pascoe offers content conversion services at various levels, from the most basic, (for example, basic searchable PDFs using Optical Character Recognition (OCR) software to identify text within an image without any correction), through to the most complex, transcribed and “marked up” projects, (for example, completely republished and re-purposed documents using a combination of human manipulation, OCR software, and marking up into a future-proofed format).

A brief explanation on the basics involved is provided below. However, as this is a complicated area – and solutions are incredibly flexible and varied, please get in touch with us to discuss what is possible for you.

(Divider)

What's Involved

making sense of it all...

How it all works...

The process always begins with your assets, for example books or manuscripts (or any other objects with text content), undergoing the digitisation process, whereby they are scanned or photographed into a suitable image format. From there, software or human intervention is engaged to convert and produce output files in the formats you use – Microsoft Word, Microsoft Excel, Adobe PDF, Text or CSV or a variety of eBook formats.

However, since predicting future use is impossible, we offer a complete customised Extensible Markup Language (XML) conversion service if flexibility and the ability to re-purpose content is required. If you are interested in presenting your digitised assets on-line, we can create complete websites, web pages (that fit seamlessly within your existing website), or prepare the data in a suitable format for importation into your existing on-line framework so that you can make your content available to others.

For a more thorough explanation of what’s involved, as well as what’s possible, please select from the individual services below. As always, if there are any queries or you wish to discuss your specific requirements with us,  please get in touch with us to discuss what works best for you.

Optical Character Recognition (OCR)

Textual Conversion via OCROptical Character Recognition (OCR) software converts scanned images of printed or typewritten pages to searchable and editable text. We use a variety of software tools and have had great results with uncorrected OCR for clients.

Our standard OCR services are fully automated where powerful software analyses the digitised images and identifies the text within, avoiding the necessity for operator intervention. For more challenging material, W. & F. Pascoe also offers customised OCR services. This includes manual zoning of newspaper and journal text (where articles and headlines are not uniformly placed on the page) or isolating of marginalia and extracting abstract information from material.

We also offer the option of part-OCR of material, where specific parts can be OCR’d, eliminating extra costs for converting material that does not aid discovery. This is particularly useful in Journals and Magazines where advertorial content with artistic fonts is confusing the output.

That said, our OCR systems allow us to “Pattern Train”, whereby we “teach” the system how to recognise text of varying fonts in order to improve the accuracy of our conversion. Not all documents are suited for OCR, however. For example, the accuracy level of OCR on handwritten text is very poor and is often not usable.

For documents that are not suited for OCR, we offer transcription services whereby a mix of two typists and a third arbiter, or a typist and a verifier, type the same information. This is then compared and any discrepancies are highlighted and rectified.

As always, if there are any queries or you wish to discuss your specific requirements with us,  please get in touch with us to discuss what is possible for you.

Transcription

Textual Conversion via Transcription W. & F. Pascoe’s transcription service essentially involves manual conversion of textual information from a digitised image into another form, usually a document, spread sheet or database. Suitable digitised images that could be transcribed might include handwritten documents, index cards, lists, scripts, typed fonts that do not OCR well or simple data like names and addresses. Two options are available – Double data transcription, or Single-entry transcription.

With double data transcription methodology, a data entry quality control process which involves two passes. In the first pass through a set of records, data keystrokes are entered onto each record as the data entry operator types them. On the second pass through the batch, an operator at a separate machine enters the same data again.

This information is then either fed through a computer verification program, or is checked by a person comparing the two blocks of data by comparing the second operator’s keystrokes with the contents of the record. If no discrepancies are found, the verifier accepts the data. If there are, a choice is made as to which is the best to choose from.

This can be handled by means of strict vocabulary dictionaries, customer-prescribed “rules” or entered manually by a data operator.
The accuracy for double data transcription should exceed 99.9%.

Where single-entry transcription is adopted, it is usually in the interest of simplicity as well as cost management. It is the more economical alternative simply because it does not require data to be entered twice, and then compared. However, expected accuracy will vary depending on the transcription method chosen and the quality of the originals and digitised images. We recommend a pilot be conducted on sample data to enable fine tuning of cost-estimates as well as to provide evidence of realistic quality and accuracy expectations.

The real value of W. & F. Pascoe in in this transcription process is in the significant troubleshooting experience we have in this area, as well as providing a Quality Assurance procedure for your digitised assets. As always, if there are any queries or you wish to discuss your specific requirements with us,  please get in touch with us to discuss what is possible for you.

eBook Creation

ebook creation services

Give new life to old or damaged books by turning them digital. View them online, on your tablet, on your eReader.

W. & F. Pascoe offer an eBook creation service with a range of output levels from the straightforward copy to the bespoke re-creation.

Pricing starts from $1 per page but can vary based on the number of pages and level of manual formatting and correction required.

A setup fee will apply to encourage you to do more than one title per batch.

We take all reasonable efforts to minimise inaccuracies however the e-books created may contain some text and/or formatting errors.

Naturally Copyright permissions will need to be managed if they apply and we can assist you here.

This is a fantastic way to breathe new life into “out-of-print” books responsibly, and a real opportunity for local historians to re-engage your community.

Metadata Creation Services

metadata injection servicesWe can capture the full text content (and more) of digitised items including the contents of manuscripts, journals and books. We also can capture specific metadata such as author, dates, titles, descriptions, subjective information according to your rule set as well as other references and codes etc.

W. & F. Pascoe will work with you to establish a template for information indexing ensuring that the information that’s important to you is the information you receive. This extends to non-text originals such as photos, art or ephemera where you may wish to associate information recorded on the reverse, as a caption, or in an associated database or index.

We can customise filenaming by incorporating metadata, for example: ‘/author/book_name/pge001’. All metadata can usually be transcribed directly from the digitised images which means that there is no extra handling of your precious heritage items.