By the time my co-founders and I started HyperScience, we had spent the better part of a decade working on machine learning projects for different companies. The thing about machine learning, though, is that in practice there’s not as much machine learning as there is data wrangling. For example, if one database stores customer names as first name / last name but another database stores customers names as last name / comma / first name, then for you to use both datasets, you need a snippet of code that converts between the two formats.
It’s thankless work known as ETL. On a two by two matrix, if required domain expertise is on one axis and job satisfaction is on the other, then ETL is in the death quadrant. It requires high levels of domain expertise and delivers excruciatingly negative levels of job satisfaction. It’s mind-numbing, soul-crushing, awful, horrible, terrible work. However, the good thing about being a machine learning engineer is that it uniquely qualifies you to automate the job of being a machine learning engineer in a way that being a doctor, for example, doesn’t remotely qualify you to automate the job of being a doctor. And so, we did exactly what you’d think we did — we set out to automate our old jobs.
It’s also important to note that, just as we were starting HyperScience, the gap between the supply and demand for machine learning engineers had turned into a veritable chasm. We wondered how large companies dealing with a global shortage of engineering talent were solving ETL at scale. Moreover, given that 3 companies control most of the world’s supply of qualified ML talent, what do the other fortune 497 do? The answer, quite simply, is manual data entry.
It was a surprising answer for us. We had always thought of data entry as the artifact of antiquated business processes that would sublimate away with the advent of modern enterprise software. We were wrong. Data entry is actually the inevitable consequence of modernity. While many of our customers use HyperScience to automate the transcription of paper documents, an even larger and growing contingent of customers use HyperScience to process documents that are often machine generated and purely digital, such as leases, invoices, bank statements, and various financial documents.
Data entry, therefore, is the process by which the arbitrary output format of one computer system is made to fit the equally arbitrary input requirements of another computer system. Because there are an effectively infinite number of ways to represent any given piece of data, the odds of two random systems being compatible with one another are vanishingly small, 1/∞ to be precise - in other words, never. The solution large organizations came up with is to fall back on the more flexible, graceful nature of language and to transmit documents in human-readable form. Manual data entry takes those human-readable documents and makes them machine-readable… for one and only one kind of machine.
There was something both familiar and bizarre about this arrangement. The internet, as we knew it, connected people using computers; yet, enterprises were connecting computers using people. It looked like an upside-down internet. Turning that upside-down world right-side-up felt like it could be a colossal opportunity. That’s how, in early June of 2014, HyperScience was born. Making human-readable documents machine-readable turned out to be the first step of an even larger, more ambitious mission but that’s a story for another time.
Copy: Lindsay Bu
Design: Kat Lim