Change in our world continues to accelerate, most recently driven by cognitive machines. 1 Machines are learning to do tasks once thought innately human. They beat us at our games, turn photos into paintings, and write passable poetry.
Today, unlike ever before, we assume a machine can do anything a human can do. Once we’ve imagined a new what, we go very quickly to thinking about how. That is, before we understand precisely what we’re trying to solve, we dive head first into solving it.
The human tendency to let how get before what is well known to software developers. During a job interview, a key skill that’s measured is requirements-gathering. When the interviewer poses a problem, she intentionally leaves out information that’s necessary to answer the question. The interviewee needs to remind himself to check his assumptions and ask clarifying questions before he starts to solve the problem.
In traditional software development, algorithms are expected to be 100% accurate. 2 Consequently, the what is the focal point and drives the how.
Machine cognition is completely different. When we employ machine learning, we don’t actually understand how the problem is solved. We may have a guess – and we embed our prior assumptions into a model’s architecture – but ultimately the parameters of these models are trained on data. And because the problem is difficult and we don’t know how it’s actually solved, we expect the accuracy to never be exactly 100%. 3
So what is the right response when facing a problem that’s suited to artificial intelligence? Just like with software development, we shouldn’t jump straight into how. We need to focus on requirements-gathering. The difference, however, is that with machine cognition the what is less important; answering why is more important.
To illustrate this, let’s look at an example that recently crossed my Slack feed:
“Who has experience with algorithms to identify the chorus within a song?”
This is undeniably a cognitive problem. What’s more human than recognizing and responding to a song’s catchy refrain? 4
Hearing a question like this, I almost can’t help but jump directly into technical approaches. Digital signal processing, audio transcription, autocorrelation, sequence-to-sequence models, and on and on! I have to wipe away the saliva from my lips and remind myself that I don’t really understand why I’m trying to solve this, yet.
The first thing we need to understand is why we need a chorus detection system: What will this system provide to our ultimate business goals.
- What is this being used for?
- Does this need to work for a known, finite list of songs? Or against any song that exists now or is written in the future?
- What is the data form of the input to our system? For example, is the data composed of lyrics in plain text format, audio from studio recordings, or audio captured in a noisy environment?
- What is the negative consequence of a false positive? What is the negative consequence of a false negative?
- Where should the computation take place? Should this occur on a smart-phone device, a desktop, or in “the cloud”?
The answers to these questions should help our solution take shape. In particular, we should be thinking about what answers allow us to simplify or completely short circuit the problem.
If we need to detect the chorus in 10 songs, by far the easiest thing to do is label those songs ourselves. This removes all artificial intelligence from the equation and consequently all avenues of failure. Even if this needs to work for all 2,000 songs in a karaoke playlist, it’s likely to be less expensive and more accurate to label the music with Amazon’s Mechanical Turk than to pay an engineer to build a detector.
Let’s assume that the problem can’t be completely short circuited. It must be a system that automatically works against new music that doesn’t exist today.
A next step is to think about the source of data and how that could simplify the problem greatly. My assumption is that chorus detection is done by evaluating an audio stream. However, that assumption needs to be checked. Can we, instead, ingest the song’s written lyrics? If so, that greatly reduces the data scale and removes most noise. In this case, we’d be looking for lyrics that repeat (“autocorrelation” if you want to impress a business stakeholder).
Even if the input is audio, a studio recording is very different from capturing the music in a crowded bar. We do know that audio can be programmatically identified even if there are high levels of background noise. 5 However, hearing repeated audio patterns in a noisy background may be very challenging.
Planning to fail
We believe that our solution will make mistakes. Now, we need to understand the consequences of those mistakes. By understanding what chorus detection is used for, we can optimize our algorithm to our higher level goals.
If this seems abstract for you, it may be useful to think about a high stakes prediction, such as testing for a medical condition, such as high cholesterol. Obviously, we want the test to perfectly reflect reality. But if the test were to produce an incorrect result for you, which would you prefer? It could flag you as having a dangerously high cholesterol when you actually have no underlying condition (false positive), or it could tell you that you have a healthy level of cholesterol when you actually have a high level. In the case of medical tests, you would likely prefer a false positive (which could be corrected on a follow-up test) to a false negative (which might go unverified, leaving your health at risk).
When building our classification models, we have the opportunity to tune for more false positives or false negatives. Economists use so-called indifference curves to understand the trade-off between the two outcomes. Data analysts will mention “ROC AUC” 6 when they’re angling for a promotion.
Machine learning is about stirring data into a pile of linear algebra. Machine cognition (artificial intelligence) has a higher goal: to give a machine the same cognitive abilities that we humans have. In order to deliver value to our customers, we must build machines that reason as we do, solve problems as we do, and fail gracefully as we try to do. To succeed, we must remember the core business problem and constantly reason how our pile of data and linear algebra is one step toward the solution.