[lug] Parallella: low-power, cheap, flexible board with 16, 64, ... epiphany cores

Mon Jul 14 12:22:53 MDT 2014

I forget the name of next month's speaker on parallel processing approaches, but the chip I was talking about is the Epiphany, from Adapteva.  I got one of their Parallella boards in their kickstarter campaign: a Low-power, cheap, flexible board with 16 epiphany cores.  Versions with 64 cores are getting underway, and higher counts are coming.  This is the kind of thing smart phones and other embedded devies might be sporting, along with other low-power uses.

See these articles:

Supercomputing on the cheap with Parallella - Programming - O'Reilly Media
 http://programming.oreilly.com/2013/12/supercomputing-on-the-cheap-with-parallella.html

$99 Raspberry Pi-sized “supercomputer” touted in Kickstarter project | Ars Technica
 http://arstechnica.com/information-technology/2012/09/99-raspberry-pi-sized-supercomputer-touted-in-kickstarter-project/

  “Once completed, the 64-core version of the Parallella computer would deliver over 90 GFLOPS of performance and would have the the horse power comparable to a theoretical 45 GHz CPU [64 CPU cores x 700MHz] on a board the size of a credit card while consuming only 5 Watts under typical work loads.

They are offering a variety of products now:

New Parallella Product Offerings | Parallella
 http://www.parallella.org/2014/07/14/new-parallella-product-offerings/

I don't know a lot about parallel programming, but was interested to read the Epiphany Architecture Reference
  http://adapteva.com/docs/epiphany_arch_ref.pdf

which makes it sound pretty flexible and suitable for learning about different approaches also.  Here are some quotes:

The Epiphany architecture defines a multicore, scalable, shared-memory, parallel computing fabric. It consists of a 2D array of compute nodes connected by a low-latency mesh network-on- chip.

 * A superscalar, floating-point RISC CPU in each mesh node that can execute two floating point operations and a 64-bit memory load operation on every clock cycle.

 * Local memory in each mesh node that provides 32 Bytes/cycle of sustained bandwidth and is part of a distributed, shared memory system.

 * Multicore communication infrastructure in each node that includes a network interface, a multi-channel DMA engine, multicore address decoder, and network-monitor.

 * A 2D mesh network that supports on-chip node-to-node communication latencies in nanoseconds, with zero startup overhead.

 The Epiphany architecture is programming-model neutral and compatible with most popular parallel-programming methods, including Single Instruction Multiple Data (SIMD), Single Program Multiple Data (SPMD), Host-Slave programming, Multiple Instruction Multiple Data (MIMD), static and dynamic dataflow, systolic array, shared-memory multithreading, message- passing, and communicating sequential processes (CSP)

...

The eMesh supports efficient broadcasting of data to multiple cores through a special “multicast” routing mode.  ... In multicast mode, the normal eMesh routing algorithm described in 5.2 is overridden and the transaction is instead routed radially outwards from the transmitting node

So I'd love to get some informed perspectives on it vs the other low-power options out there.

Cheers,

Neal McBurnett                 http://neal.mcburnett.org/