Hyperscience logo
Data entry
September 17, 2018

Data Entry, Automation, and the Emergence of a Universal Machine Language

Author imagePeter Brodsky

HyperScience automates data entry. Why would we choose to work on one of the most unthinkably, unspeakably uninteresting back office tasks? While you may expect data entry to be a shrinking industry, the reality is that about $57 billion is spent on data entry per year and the volume of data entry is increasing year over year. To make sense of this trend, it's necessary to understand why data entry exists in the first place.

There are two kinds of documents that large organizations receive in significant volume:

  1. Documents that are filled out either by hand or typed (such as mortgage applications or insurance claim forms)
  2. Documents that are generated on a computer and either printed on paper or emailed as fully digital PDFs (such as invoices or pay stubs).

Here’s the twist: while both kinds of documents can easily be read by people, neither kind of document is machine-readable. The information needed to comprehend the documents is not fully contained within the documents. Anyone who has ever tried reading handwriting has had the experience of hitting a word, being unable to read it, skipping ahead and then, with the benefit of further context, realizing what that hard-to-read word was. What makes that understanding possible isn’t the quality of the handwriting, but knowledge of the language, the context and the world in general.

The content need not be handwritten, though, to pose the same challenges. A speckle of dust or a smudge or just a low quality image can close a “C” and make look like an “O”. There is no way to know just from the pixels, which one it is. Anyone who understands the content though will know that the sentence says “The cat sat on the mat” rather than “The oat sat on the mat”.

Example of machine learning advatages in text recognition

Even for purely digital documents, like invoices that are rendered to PDF and sent via email, the same requirement around comprehension exists. Just extracting a due date requires some level of understanding of the underlying content. There can be any number of dates; the date the items were purchased, the date the invoice was generated, or the due date. Knowing which is which requires comprehension.

Because machines have traditionally struggled with comprehension, they have relied on rigid data formats. For instance, a machine processing invoices might require that the due date be in the top left corner, written as MM/DD/YY. While no other format would work, in exchange for that rigidity, machines are incredibly fast and make very few errors. The problem is that there is no global standard for most types of documents. Invoices come in as many formats as there are vendors. The result is that even machine-generated documents aren’t machine-readable.

Example of machine learning advatages in form data digitalization

On the other hand, while people may be slow and error prone, they have vastly superior flexibility in understanding data of different formats, even formats they’ve never seen before. This critical distinction between human-readable data and machine-readable data is at the crux of why data entry is necessary. Data entry is, in essence, a form of translation whereby a person translates between a human-readable document and a specific machine-readable format. For as long as organizations will need to consume data created by other organizations, there will be a need for data entry.

In automating data entry, HyperScience serves a market that currently spends 57 billion dollars a year on data entry. We create value for our customers by delivering on the three perennial promises of automation: better, faster, cheaper. Our customers choose us because we can offer an 80% reduction in cost, a 5x increase in speed and 67% reduction in error rate. A large company will spend tens of millions a year on its data entry operation. Aside from the considerable cost reduction, we offer increased efficacy. In healthcare, for example, a reduced clerical error rate can save lives while faster turnaround times for insurance claims provides customers faster access to much-needed funds.

Today we save our customers money and help them serve their customers better. But we have other ambitions, as well. Our greater goal is to target the very problem that data entry attempts to solve and unlock a distinctly fantastical future in which machines share a common universal language and communicate with each other directly… in a format that is backwards-compatible with humans.

Credits
Copy: Lindsay Bu & David Stess
Design: Denise Paik