[lug] Parallella: low-power, cheap, flexible board with 16, 64, ... epiphany cores

Davide Del Vento davide.del.vento at gmail.com
Mon Jul 14 22:08:02 MDT 2014


Hi Neal,
Thanks for sharing the link of the PDF (however TL;DR yet).
I've seen this board in the past and I'd be interested in exploring it.
However I yet have to see a quick datasheet (say one or two pages) with the
"hard specs" demonstrating these claims. When they say "local memory in
each mesh node that provides 32 Bytes/cycle of sustained bandwidth", that
is simply too good to be true. Plus, it't missing the important latency.
Plus, it's missing the rest of the bw and latency numbers for the rest of
the memory hierarchy.
I'm not saying they are lying, but their marketing is probably misleading.
At that speed it cannot be "memory". It must be "registers" or at the very
least fast cache. How much of that superfast "memory" is there per node?
What is the overhead to have the cache coherence (if there is such a
thing)? If there isn't cache coherence, what is the time to access memory
on *some other* node?

And I haven't even started on the multi-core litany. I'll let this guy do
it:
http://perilsofparallel.blogspot.com/2009/02/what-multicore-really-means-and.html
http://perilsofparallel.blogspot.com/2009/09/of-muffins-and-megahertz.html
http://perilsofparallel.blogspot.com/2010/02/parallel-powerpoint-why-power-law-is.html

Don't get me wrong. Parallel Computing is my bread and butter and I love
it. But I question (like this guy does, he's from the same industry as I
am) its usefulness for the client. It's great on the server (and especially
on the scientific computing server!), so the talk Alessandro will give will
be interesting to our group. But on the phone? Ask about "muffin per hours"
next time you buy a car!

Cheers,
Davide



On Mon, Jul 14, 2014 at 12:22 PM, Neal McBurnett <neal at bcn.boulder.co.us>
wrote:

> I forget the name of next month's speaker on parallel processing
> approaches, but the chip I was talking about is the Epiphany, from
> Adapteva.  I got one of their Parallella boards in their kickstarter
> campaign: a Low-power, cheap, flexible board with 16 epiphany cores.
>  Versions with 64 cores are getting underway, and higher counts are coming.
>  This is the kind of thing smart phones and other embedded devies might be
> sporting, along with other low-power uses.
>
> See these articles:
>
> Supercomputing on the cheap with Parallella - Programming - O'Reilly Media
>
> http://programming.oreilly.com/2013/12/supercomputing-on-the-cheap-with-parallella.html
>
> $99 Raspberry Pi-sized “supercomputer” touted in Kickstarter project | Ars
> Technica
>
> http://arstechnica.com/information-technology/2012/09/99-raspberry-pi-sized-supercomputer-touted-in-kickstarter-project/
>
>   “Once completed, the 64-core version of the Parallella computer would
> deliver over 90 GFLOPS of performance and would have the the horse power
> comparable to a theoretical 45 GHz CPU [64 CPU cores x 700MHz] on a board
> the size of a credit card while consuming only 5 Watts under typical work
> loads.
>
> They are offering a variety of products now:
>
> New Parallella Product Offerings | Parallella
>  http://www.parallella.org/2014/07/14/new-parallella-product-offerings/
>
> I don't know a lot about parallel programming, but was interested to read
> the Epiphany Architecture Reference
>   http://adapteva.com/docs/epiphany_arch_ref.pdf
>
> which makes it sound pretty flexible and suitable for learning about
> different approaches also.  Here are some quotes:
>
> The Epiphany architecture defines a multicore, scalable, shared-memory,
> parallel computing fabric. It consists of a 2D array of compute nodes
> connected by a low-latency mesh network-on- chip.
>
>  * A superscalar, floating-point RISC CPU in each mesh node that can
> execute two floating point operations and a 64-bit memory load operation on
> every clock cycle.
>
>  * Local memory in each mesh node that provides 32 Bytes/cycle of
> sustained bandwidth and is part of a distributed, shared memory system.
>
>  * Multicore communication infrastructure in each node that includes a
> network interface, a multi-channel DMA engine, multicore address decoder,
> and network-monitor.
>
>  * A 2D mesh network that supports on-chip node-to-node communication
> latencies in nanoseconds, with zero startup overhead.
>
>  The Epiphany architecture is programming-model neutral and compatible
> with most popular parallel-programming methods, including Single
> Instruction Multiple Data (SIMD), Single Program Multiple Data (SPMD),
> Host-Slave programming, Multiple Instruction Multiple Data (MIMD), static
> and dynamic dataflow, systolic array, shared-memory multithreading,
> message- passing, and communicating sequential processes (CSP)
>
> ...
>
> The eMesh supports efficient broadcasting of data to multiple cores
> through a special “multicast” routing mode.  ... In multicast mode, the
> normal eMesh routing algorithm described in 5.2 is overridden and the
> transaction is instead routed radially outwards from the transmitting node
>
> So I'd love to get some informed perspectives on it vs the other low-power
> options out there.
>
> Cheers,
>
> Neal McBurnett                 http://neal.mcburnett.org/
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> Join us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lug.boulder.co.us/pipermail/lug/attachments/20140714/06e39750/attachment-0001.html>


More information about the LUG mailing list