Change in our world continues to accelerate, most recently driven by AI and cognitive computing. Machines are learning to do tasks once thought innately human. With AI and cognitive computing, they beat us at our games , turn photos into paintings, and write good poetry.
Today, unlike ever before, we assume a machine can do anything a human can. Once we’ve imagined a new what, we quickly think about how. Before we understand precisely what we’re trying to solve, we dive head first into solving it.
The human tendency to let the how get before the what is well known to software developers. During a job interview, an essential skill measured is requirements gathering. When the interviewer poses a problem, they intentionally leave out the information necessary to answer the question. The interviewee needs to remind themselves to check their assumptions and ask clarifying questions before they start to solve the problem.
In traditional software development, we expect algorithms to be 100% accurate. Consequently, the what is the focal point and drives the how.
Machine cognition is completely different. When we employ machine learning, we don’t actually understand how the problem is solved. We may have a guess – and we embed our prior assumptions into a model’s architecture – but ultimately, the parameters of these models are trained on data. And because the problem is complex and we don’t know how it’s actually solved, we expect the accuracy never to be exactly 100%.
So what is the correct response when facing a problem suited to artificial intelligence? Just like with software development, we shouldn’t jump straight into how. We need to focus on requirements gathering. However, the difference is that with machine cognition, answering what is less important than answering why.
To illustrate this, let’s look at an example that recently crossed my Slack feed:
“Who has experience with algorithms to identify the chorus within a song?”
This is undeniably a cognitive problem. What’s more human than recognizing and responding to a song’s catchy refrain?
Hearing a question like this, I almost can’t help but jump directly into technical approaches. Digital signal processing, audio transcription, autocorrelation, sequence-to-sequence models, and so on!
First, we need to understand why we need a chorus detection system: What will this system provide to our ultimate business goals?
- How will it be used?
- Does this need to work for a known, finite list of songs? Or against any song that exists now or in the future?
- What is the data form of the input to our system? For example, is the data composed of lyrics in plain text format, audio from studio recordings, or audio captured in a noisy environment?
- What is the negative consequence of a false positive? What is the negative consequence of a false negative?
- Where should the computation take place? Should this occur on a smartphone device, desktop, or the cloud?
The answers to these questions should help our solution take shape. In particular, we should consider what answers allow us to simplify or completely short-circuit the problem.
If we need to detect the chorus in 10 songs, the easiest thing to do is label those songs ourselves. This removes all artificial intelligence from the equation and all avenues of failure. Even if this needs to work for all 2,000 songs in a karaoke playlist, it’s likely to be less expensive and more accurate to label the music with Amazon’s Mechanical Turk than to pay an engineer to build a detector.
Let’s assume that we can’t completely short-circuit the problem. It must be a system that automatically works against new music that doesn’t exist today.
The next step is to consider the data source and how that could simplify the problem. My assumption is that chorus detection is done by evaluating an audio stream. However, that assumption needs to be checked. Can we, instead, ingest the song’s written lyrics? If so, that greatly reduces the data scale and removes most noise. In this case, we’d be looking for lyrics that repeat (“autocorrelation” if you want to impress a business stakeholder).
Even if the input is audio, a studio recording differs from capturing the music in a crowded bar. We know that audio can be programmatically identified even with high background noise levels. However, hearing repeated audio patterns in a noisy background may be challenging.
Planning to fail
Our solution will make mistakes. Now, we need to understand the consequences of those mistakes. We can optimize our algorithm to our higher-level goals by understanding what chorus detection is used for.
If this seems abstract, thinking about a high-stakes prediction may be helpful, such as testing for a medical condition like high cholesterol. Obviously, we want the test to reflect reality perfectly. But if the test produces an incorrect result, which would you prefer? It could flag you as having dangerously high cholesterol when you have no underlying condition (false positive), or it could tell you that you have a healthy cholesterol level when you have a high one. In the case of medical tests, you prefer a false positive (which could be corrected on a follow-up test) to a false negative (which might go unverified, leaving your health at risk).
We can tune for more false positives or negatives when building our classification models. Economists use so-called indifference curves to understand the trade-off between the two outcomes. Data analysts will mention “ROC AUC” when angling for a promotion.
Machine learning is about stirring data into a pile of linear algebra. Machine cognition (artificial intelligence) has a higher goal: to give a machine the same cognitive abilities as humans. To deliver value to our customers, we must build machines that reason as we do, solve problems as we do, and fail gracefully as we try to do. To succeed, we must remember the core business problem and constantly reason how our pile of data and linear algebra is one step toward the solution.