New Computer Chip Simplifies Parallel Programming

The new chip architecture removes explicit synchronization to make parallel programming easier. (Image courtesy of Christine Danilo/MIT.)

Currently, to parallelize programs, a programmer has to divide their work explicitly into tasks and then enforce synchronization manually between tasks accessing shared data. A new chip named Swarm, developed by MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), is designed to remove this need for explicit synchronization. Parallel programs thus become much more efficient and easier to write.

In a recent issue of the IEEE journal Micro, the researchers disclosed just how efficient Swarm is. Simulation experiments showed that Swarm’s versions of common algorithms ran three to 18 times as fast as existing parallel algorithms.

It even sped up a program, which, until then, computer scientists had failed to parallelize, by a whopping fold change of 75. Moreover, Swarm’s versions needed only about one-tenth the code.

The published paper focused on parallelizing applications that have resisted multicore programs, such as exploration of graphs. Graphs are basically nodes connected by either weighted or unweighted line segments. For example, nodes might represent cities, and weighted line segments may represent the distances between them.

CSAIL researchers studied exactly this kind of problem when analyzing a standard algorithm for finding the fastest driving route between two points. The problem with graph exploration is that not all regions of the graph end up being fruitful. This is often found after the analysis has been carried out.

Motivated by this, computer scientists devised ways to prioritize graph exploration, such as looking at nodes with the lowest number of edges first.

What Makes Swarm Different

Swarm distinguishes itself from other multicore chips in this regard: It has an extra circuitry for handling prioritization. It time stamps tasks according to their priorities and works on the highest priority tasks in parallel. Lower priority tasks created during the process are readily queued by Swarm.

When it comes to synchronization, Swarm automatically resolves some conflicts that a programmer would otherwise have to worry about. For example, if data is written into a memory location by a lower priority task before that location is read by a higher priority task, Swarm backs out the results of the former.

To elaborate, the chip has circuitry that crams data into a fixed allotment of space and answers yes/no questions about its contents in what is known as a Bloom filter. This records memory addresses of all the data that Swarm’s cores are currently working on and helps Swarm identify memory access conflicts.

Each data is labeled with a time stamp of the last task that updated it, allowing, for example, tasks with later time stamps to read data without worrying about others using it. Finally, all cores occasionally report time stamps of its highest priority tasks still under execution. If a core’s earliest time stamps are earlier than any other cores, it can write its results into memory without risking any conflict.

Adapting existing algorithms to Swarm is also meant to be simple. After defining a function, a programmer needs only to specify a prioritizing metric and add a line of code that loads the function into Swarm’s queue of tasks.

Swarm’s architecture uses ideas of both transactional memory and thread-level speculation. While the former guarantees that updates to shared memory occur in an orderly way, the latter is a related parallelization technique that operates at the thread level as opposed to the instruction level. This makes the Swarm chip promising.

For more on the future of computer chip design, check out IBM’s brain-inspired neurosynaptic processor.