How Micron's Automata Promises to Improve Parallel Processing
Test and Monitor | Posted February 25, 2014

The most common application for Micron’ new chip might be Big Data, but it can be used anywhere a complex or unstructured data stream needs analysis.

Micron Technology is not the first company that comes to mind when you think CPUs. Perhaps it’s not even the third or fourth to come to mind, because Micron is a memory company. Still, you should not think of the Micron Automata Processor (AP) announced at Supercomputing '13 last November as a CPU, either, because it's not. Nor is it a memory device. Think of the AP more as a powerful processing “engine” that leverages the massive parallelism found within memory technology in order to provide significant scalability to parallel processing.

The AP is geared for analysis of large, unstructured data sets or real-time data analysis challenges. As a result, it is targeted at high-performance computing applications such as graph processing, big data, and bioinformatics.

Forget quad-core or even octo-core processors. The Automata processor is a scalable, two-dimensional fabric comprised of thousands or even millions of application-specific compute machines, called Automata, which operate in parallel to perform a targeted task or operation. So instead of four or eight cores brought to bear on a processing task, you have thousands, and all of them can be programmed for a specific task.

The AP design, then, is not for a whole new CPU socket, although there is a prototype that is built on a PCI Express card. Micron put eight processors on a memory module that fits in a DIMM socket, so you have a module that can perform processing offload for the CPU. The Automata processor sits on the memory doing the processing; the CPU has very little involvement.

"Many people ask ‘Why Micron?’" said Paul Dlugosch, Director of Automata Processing development at Micron. "This is the first example of a processing device that at its core is based on memory technology or memory architecture. The way to explain that is not how cache memory can support a CPU, but how we are using memory in a fundamentally new way. With the Automata Processor, we don’t use memory as a traditional read/write storage device. Rather, memory is used as the basis of a processing engine that analyzes information as it streams across the chip."

“The sequential instruction processing nature of conventional CPU/GPU architectures is not well aligned to the class of problems addressed by the AP,” said Dlugosch. “The fundamental problem is that of fine-grain parallelism. You have to understand the application requirements across a variety of domains. Any scalar conventional processor based on sequential instruction architecture is really where we saw the problem."

A traditional CPU has an execution pipeline that decodes instructions, executes them, then unloads registers after execution is completed. The CPU performs operations based on the instruction as they are being processed. The AP doesn’t have a fixed execution pipeline. Instead, its 2D fabric of tiny processor elements answers thousands or even millions of different questions about data at the same time for massive parallelism.

In addition to handling parallelism, the traditional method of rule sets is limiting. Dlugosch said that if you are looking for one feature in a data set (say a specific protein sequence or maybe a cyber security threat) you have a relatively easy problem; conventional CPUs are quite adept at addressing that issue. A single pattern is not a highly parallel problem. But if you have to process a data stream and evaluate it for tens, hundreds, or thousands of different features, that becomes a highly parallel problem for which conventional processing architectures are not well suited

"The more features you look for, the more memory is consumed. You can quickly exceed practical memory limitations, or you get such a large data structure [that] you get memory access problems," said Dlugosch.

Examples include analyzing data coming across the wire for a multitude of malware attacks, or scanning tweets based for certain features that help develop predictions about social trends or social unrest. These scenarios require multiple rules and they can get away from you fast.

Reconfigurable, reprogrammable

The Automata chip is more like an FPGA than a CPU in that it is reconfigurable. And because of this, it can take on the exact configuration that is best suited to solve the problem at hand. "With the Automata Processor, you don’t write a program of instructions. You configure it by compiling a program; and from that point forward it is an autonomous machine," said Dlugosch.

Plus, because the chip becomes what the user defines, it doesn't need to be told what to do next. Automata is a self-operating machine that is driven only by the data it receives, not by instructions. The data flowing through the machine drives the operation. So the programmer configures it to examine all of the data coming in, and as soon as data comes into the machine, Automata sets about doing what it was instructed to do, such as pattern matching.

Micron has a full function SDK to accompany the AP. The SDK is designed to take a user-defined pattern rule set or analytic definitions, compile it, and configure the chip to implement the exact machine requirements to process or analyze the data.

The Automata processor can be configured with either a list of regular expressions in PCRE or a direct description of the automata in an XML-based high-level language the company created, called the Automata Network Markup Language (ANML). PCRE will be accepted natively and unmodified into the compiler and can configure the chip that way. But ANML exploits all the architecture’s features. It allows end users to perform graphical design such as schematic capture, or to design highly complex automatons that can perform highly-detailed data set analysis.


The AP uses a DDR3-like memory interface chosen to simplify the physical design-in process for system integrators. The AP will be made available as single components or as DIMM modules. A PCIe board that is populated with up to 48 AP’s will also be available to early-access application developers.

It’s coming soon, but you can’t get your hands on the AP quite yet. Micron is making silicon now but a revision is planned; so don’t expect hardware samples until the  second half of 2014.

See also:


[dfads params='groups=937&limit=1&orderby=random']


By submitting this form, you agree to our
Terms of Use and Privacy Policy

Thanks for Subscribing

Keep an eye on your inbox for more great content.

Continue Reading

Add a little SmartBear to your life

Stay on top of your Software game with the latest developer tips, best practices and news, delivered straight to your inbox