GPUs became the hardware of preference for deep learning largely by coincidence. The chips were initially designed to quickly render graphics in applications such as video gaming. Unlike CPUs, which have four to eight complex cores for carrying out a variety of computation, GPUs have hundreds of simple cores that will perform only specific operations—but the cores can tackle their operations at the same time instead of one after another, shrinking the time it requires to complete a rigorous computation.

It didn’t take really miss the AI research community to realize this massive parallelization also makes GPUs perfect for deep learning. Like graphics-rendering, deep learning involves simple mathematical calculations performed thousands and thousands of times. In 2011, in a collaboration with chipmaker Nvidia, Google found that the computer vision model it had trained on 2,000 CPUs to distinguish cats from people could achieve the same performance when trained on only 12 GPUs. GPUs became the de facto chip for model training and inferencing—the computational process that takes place when a trained model is employed for the tasks it absolutely was trained for.

But GPUs also aren’t ideal for deep learning. For a very important factor, they cannot work as a standalone chip. Because they are limited in the types of operations they can perform, they must be attached to CPUs for handling everything else. GPUs also have a restricted amount of cache memory, the date storage space nearest a chip’s processors. This means the bulk of the information is stored off-chip and must be retrieved when it is time for processing. The back-and-forth data flow ends up being fully a bottleneck for computation, capping the speed at which GPUs can run deep-learning algorithms.

In recent years, a large number of companies have cropped around design AI chips that circumvent these problems. The trouble is, the more specialized the hardware, the more expensive it becomes.

So Neural Magic intends to buck this trend. Instead of tinkering with the hardware, the business modified the program. It redesigned deep-learning algorithms to run more proficiently on a CPU through the use of the chips’ large available memory and complex cores. While the approach loses the speed achieved via a GPU’s parallelization, it reportedly gains right back about the same period of time by eliminating the necessity to ferry data on and off the chip. The algorithms can run on CPUs “at GPU speeds,” the business says—but at a fraction of the price. “It sounds like what they have done is figured out a way to take advantage of the memory of the CPU in a way that people haven’t before,” Thompson says.

Neural Magic believes there could be a few explanations why no one took this approach previously. First, it’s counterintuitive. The idea that deep learning needs specialized hardware is so entrenched that other approaches may possibly easily be overlooked. Second, applying AI in industry is still relatively new, and companies are simply beginning to try to find easier methods to deploy deep-learning algorithms. But whether the demand is deep enough for Neural Magic to lose is still unclear. The firm has been beta-testing its product with around 10 companies—only a sliver of the broader AI industry.

Neural Magic currently offers its technique for inferencing tasks in computer vision. Clients must still train their models on specialized hardware but can then use Neural Magic’s software to convert the trained model into a CPU-compatible format. One client, a large manufacturer of microscopy equipment, is now trialing this approach for adding on-device AI capabilities to its microscopes, says Shavit. Because the microscopes already feature a CPU, they won’t need any additional hardware. By contrast, using a GPU-based deep-learning model would require the equipment to be bulkier and more power hungry.

Another client wants to use Neural Magic to process security camera footage. That would enable it to monitor the traffic in and out of a building using computers already available on site; otherwise it may have to send the footage to the cloud, which may introduce privacy issues, or acquire special hardware for each building it monitors.

Shavit says inferencing can also be only first. Neural Magic plans to expand its offerings in the foreseeable future to help businesses train their AI models on CPUs as well. “We believe 10 to 20 years from now, CPUs will be the actual fabric for running machine-learning algorithms,” he says.

Thompson isn’t so sure. “The economics have really changed around chip production, and that is going to lead to a lot more specialization,” he says. Additionally, while Neural Magic’s technique gets more performance out of existing hardware, fundamental hardware advancements it’s still the only way to keep driving computing forward. “This sounds like a really good way to improve performance in neural networks,” he says. “But we want to improve not just neural networks but also computing overall.”