Deep Neural Networks: When, and When Not, to Use
Artificial Intelligence (AI) has been with us for well over half a century, but confusion still exists regarding what it is and where it is best applied. This is especially true for the latest AI incarnation: “deep learning.”
Overall, AI is a group of different technologies created to automate tasks that are typically accomplished by humans. The first types of AI were expert systems had specific instruction sets or rules that encoded types of activities and decisions into software. Now deep learning neural networks are all the rage. They provide a robust toolset to solve a wide range of problems. But they aren’t the answer to every complex task. Today, deep learning falls short of the “magical” complexity our human brains exhibit.
Even though we use words and phrases like “intelligence” or “modeled after the human brain,” deep learning neural networks are actually an abstraction of statistics. Deep learning neural networks focus on how to leverage existing neural network technologies and connect them to create new designs or “architectures.” Understanding these architectures is essential to applying the appropriate ones for specific tasks.
Deep Learning Advantages
With traditional neural networks, one of the most time-consuming tasks is to develop the “features” that are imported into the system. Features are the primary characteristics of a given data set that are key to determining the correct results. To develop features, you need SMEs to spend time reviewing sample input data to identify and encode the features so that that neural network understands them.
With deep learning neural networks, it’s unnecessary to define key features. Rather, the neural networks identify these features themselves and then make inferences about which ones are relevant to determining the proper output. These neural networks need samples along with ground truth data, but having the automated feature identification greatly reduces the amount of work required.
The best performing applications involve visual and speech analysis. Identification of features is often very difficult both from a subject matter perspective and the breadth of features available. Deep neural networks can identify all features and then make a determination as to which ones are relevant.
Since identification of features is not required, technical users — not steeped in the actual technology — can use them “off-the-shelf” as black boxes. Take open source deep (learning) neural networks (DNN) tools, such as Tensorflow from Google and CNTK from Microsoft. Software developers can download and train these systems for applications for which these tools were designed with little to no knowledge of the architecture involved.
Numerous open source DNN implementations allow developers to add learning capabilities to their projects. Advances are being made every day to address the limitations. So what are the drawbacks?
The biggest weakness of DNNs is the high amount of data required to train them. Unlike conventional neural networks, for which feature details are provided as part of the input, DNNs need enough data to identify features on their own. As a result, they often require in excess of 10 million samples to perform reliably.
The input data must provide greater variation in order to prevent “overfitting,” which happens when a neural network develops inferences not based on real relationships of the data, often the result of training on too limited a set of real incidents. Output works well on the training set, but not for a real-world environment. Unless you have access to a significant amount of labeled data, you might be better off with traditional machine learning techniques.
Deep learning neural nets are also computationally intensive. In fact, Google developed a special chipset, the Tensor Processing Unit, to make them more efficient. Unless a project can access (and pay for) significant compute power, deep learning neural networks often fail to provide superior output over conventional methods.
The Big Black Box
“Interpretability” or the ability of a layperson to understand why a model used by a DNN delivered certain results, also presents challenges for DNN adoption. For example, if a system identifies a suspicious area in a medical image, the radiologist may want to see the specific detection criteria. Many traditional machine learning models allow for interpretability, disclosing factors that resulted in a particular answer. DNNs are limited by how these tools identify features and create inferences, yielding a complex model that most likely has hundreds, or thousands, of factors. If your project must provide a view into the nature of output, DNNs make this more problematic.
DNNs are also challenging to use. Many available tools are too generic and cannot solve problems the way customized tools can. Without knowledge of how these architectures work, or expertise for modifying it for your needs, it’s difficult to choose the right one to use.
The Right DNN -- or No DNN
Ultimately, use of AI should focus on meeting project objectives and its associated constraints. Using DNNs is no exception. Traditional expert systems can effectively solve a problem, leaving more complex machine learning algorithms to problems where traditional techniques fall short.
Understanding the strengths and weaknesses of DNNs, even at a high level, is important before jumping into any specific tool set. If a project’s requirements are simple, a simpler solution may work best. If the project goes beyond image classification or object recognition and requires customization, you may want someone on hand who is knowledgeable about various DNN architectures.
Greg Council is vice president of at Parascript.