How to prevent AI errors

Invisible Technologies

•

Feb 13, 2024

Key Points

How to prevent AI errors

00:00

Many enterprise leaders recognize generative AI's transformative potential. Yet, some hesitate to adopt it, fearing AI errors could harm their reputation or operations.

But Invisible’s CFO Joseph Chittenden-Veal says the risk of doing nothing is potentially far higher than the risk of getting something wrong and that it is possible to implement AI into a business in a controlled and systemic way.

“By beginning with a focused application of AI in areas where your organization faces challenges, you can manage the integration effectively and provide immediate benefits, reducing the likelihood of errors,“ he explains.

With that in mind, here’s our guide to avoiding AI errors when implementing large language models (LLMs) or other generative AI in your organization.

Why AI makes errors

“When you’re working with an LLM, two plus two doesn’t always equal four,” says Invisible’s CTO Scott Downes. “AI is not deterministic. Nor is it supposed to be.”

“To avoid AI errors, you first need to understand why it makes them in the first place, and the essence of this comes down to the fact that it operates in a fundamentally different way from traditional software.”

Instead, Downes notes that generative AI operates on principles of statistical inference and pattern recognition, which means it makes decisions based on the likelihood of something being true rather than absolute certainties. This approach allows AI to handle a vast array of complex, nuanced tasks that other software simply can’t.

"However, this strength also brings its challenges,” Downes says. “Our experience has shown that when AI systems are trained on narrow or skewed datasets, the margin for error can increase significantly.”

“It's not just about the volume of data but its quality and diversity."

The importance of diverse data

Much of AI’s propensity for error can stem from its training, beginning with what it is actually trained on, Downes says.

“A model can only ever be as good as the data on which it is trained,” he explains. “If that data is limited, biased, or doesn’t represent real-world scenarios, AI’s decision-making will likely be flawed.”

For example, if a model for screening job applicants is trained only on resumes historically selected and those that tend to belong to a specific demographic group, AI could begin to automatically favor applicants from that group.

Alternatively, AI used for healthcare diagnosis could be less accurate in analyzing and diagnosing illnesses across ethnic groups, ages, or genders if trained predominantly on one demographic.

“For this reason, data must be diverse enough to encapsulate the scenarios the model will encounter,” Downes says.

Human insight is essential

Even still, training an LLM effectively and avoiding AI errors requires much more than just good data. It also requires using human subject matter experts (SMEs) to coach the model into providing accurate and contextually appropriate answers.

“Like a human, AI can be wrong, and it can also be very convincing,” Downes explains. “Any organization implementing a large language model needs humans to analyze the model’s outputs and guide it towards generating more ‘correct’ responses.”

This could mean, for instance, that a financial services organization looking to introduce a new AI model might have a team that includes regulatory experts to ensure compliance with financial laws, communication professionals to maximize the effectiveness of customer interactions, and legal advisors overseeing ethical and legal considerations.

Meanwhile, data scientists and product managers would also play a key role in aligning the AI's functionality with business objectives and customer requirements.

Ongoing maintenance another key

Because of AI’s fundamental differences compared with normal software, the training doesn’t end once a model has been deployed. Instead, outputs need to be continually monitored and refined so that they continue to match an organization’s objectives.

“LLMs can end up with what’s known as ‘model drift’,” Chittenden-Veal explains. “This occurs when the model, initially well-tuned to current data, begins to falter as the underlying business dynamics or customer behaviors evolve.”

In fact, one study reported a 5-to-20 percent drop off in accuracy due to model drift within the first six months of deployment. Chittenden-Veal says that organizations should always be mindful of this, as well as of what he describes as rare or unusual scenarios, known as ‘edge cases’.

“AI is essentially a statistical machine, and because it operates via statistics, you’re going to get some edge cases the model can’t effectively handle, every now and then,” he explains.

“You need humans in the loop to catch and handle these exceptions.”

Bite-sized chunks and outsourcing

If this sounds daunting, Chittenden-Veal says it shouldn’t necessarily be. While training data and capturing errors across a whole organization may be time-consuming and expensive, there are alternatives.

For starters, he explains, implementing AI doesn’t have to be an all-or-nothing thing.

“You need to start implementing AI now, but that doesn’t mean you have to do it across your whole organization,” he says.

"My recommendation is to start with a small use case. Where you know you've got pain points right now, perhaps with your most high-value people.”

“Integrating AI is complex. It’s not just about technology but aligning it with your business processes.” This means ensuring that the AI not only functions technically but also aligns seamlessly with the organization's operational goals and enhances existing workflows.

Meanwhile, Scott Downes says that the complexity involved in implementing and maintaining AI systems requires expertise beyond the scope of most organizations. He says that partnering with an experienced AI provider is usually the most effective way to avoid errors.

“The right provider will integrate human feedback with AI capabilities to achieve those most reliable and effective outcomes,” he concludes.