[lug] Parallella: low-power, cheap, flexible board with 16, 64, ... epiphany cores

Wed Jul 16 20:02:45 MDT 2014

Thanks to Steve Rogers and Quentin Hartman for the follow-up links.

Thanks, Davide, for the links and insights.

Re: http://perilsofparallel.blogspot.com/2009/02/what-multicore-really-means-and.html

That guy does indeed do a "litany" on multi-core.  But I did enjoy the funny and useful comparisons essentially on why a fleet of 16 cars that go 60 mph each is not the same as a car that goes 960 mph.

You can tell why he was motivated to write up the litany from his followup post which you also linked to - he was mad about a misleading business presentation from folks that ignored his input.  A frequent motivator :)
  http://perilsofparallel.blogspot.com/2009/09/of-muffins-and-megahertz.html

I did indeed include some of their glowing marketing numbers in my post, and I thank you all for the informative responses.  But to clarify my motivations, from my perspective, parallel computing in one form or another is a big part of the future, and it is very cool to have cheap general-purpose hardware to play with and explore the possibilities.  I get the impression that the Parallella supports a number of parallell processing styles, and the cores are general-purpose enough that I can program them without re-learning a lot, so it looks fun and educational.

For me, the most interesting part of the 'what multicore really means' post was the equally opinionated and also informative rant in the comments by "anonymous"

  http://perilsofparallel.blogspot.com/2009/02/what-multicore-really-means-and.html?showComment=1238239200000#c7290536757386895922

who makes some interesting points (along with the venting) about the utility of multiple cores for gaming and exploring other new ways to use computing power:

 "....there is one application that seems core hungry beyond even 4 cores- namely the only type of application that uses the full power of most people's systems, computer games"

 "In the REAL world, computer games drive [[a great deal of PC hardware development]]"

 'If the argument is that multi-core is a problem, because non-game apps rarely benefit, people in the real-world say "what are these non-game apps that need more power anyway?". As I have said above, such apps either don't exist, or rapidly find themselves moved to hardware (for example, dual-core chips were initially mostly useful for decoding hi-definition, Bluray like content, but now a small segment of the GPU on even ATI's and Nvidia's cheapest product decodes hi-def video with no CPU hit, and using far less power than a CPU solution).'...

So that all brings me back to wanting to know more about the many different styles of parallel programming, how to explore them with this new cheap prototyping platform, wondering how the epiphany chip might fit in, and what sorts of novel things people might use it for on embedded devices, phones, etc.

 "The problem with multi-core in the real world ... is that as soon as a software algorithm is developed that takes advantage of current PC CPU designs (like 3d-rasterising, or video-decoding), the algorithm is quickly baked into a standalone ASIC, providing a much cheaper and far more power efficient solution...."

Cheers,

Neal McBurnett                 http://neal.mcburnett.org/

From: Steve Rogers <shr066 at gmail.com>
To: "Boulder (Colorado) Linux Users Group -- General Mailing List" <lug at lug.boulder.co.us>
Subject: Re: [lug] Parallella: low-power, cheap, flexible board with 16, 64, ... epiphany cores

   IIRC Epiphany has 32 kb on chip memory for each processor for both code
   and data.Â  Effectively utilizing Parallella's capabilities looks like
   an interesting challenge as one may need to distribute the solution
   across the Epiphany, the ARM cores, and the FPGA depending upon the
   problem being solved.

   All the specs you could possibly want on the parallela board are linked
   from here:
   [1]http://www.parallella.org/board/
   Epiphany chip information is detailed here:
   [2]http://www.adapteva.com/epiphanyiii/
   Including a detailed data sheet that I would expect would answer all
   your questions.
   QH

On Mon, Jul 14, 2014 at 10:08:02PM -0600, Davide Del Vento wrote:
>    Hi Neal,
>    Thanks for sharing the link of the PDF (however TL;DR yet).
>    I've seen this board in the past and I'd be interested in exploring it.
>    However I yet have to see a quick datasheet (say one or two pages) with
>    the "hard specs" demonstrating these claims. When they say "local
>    memory in each mesh node that provides 32 Bytes/cycle of sustained
>    bandwidth", that is simply too good to be true. Plus, it't missing the
>    important latency. Plus, it's missing the rest of the bw and latency
>    numbers for the rest of the memory hierarchy.
>    I'm not saying they are lying, but their marketing is probably
>    misleading. At that speed it cannot be "memory". It must be "registers"
>    or at the very least fast cache. How much of that superfast "memory" is
>    there per node? What is the overhead to have the cache coherence (if
>    there is such a thing)? If there isn't cache coherence, what is the
>    time to access memory on *some other* node?
>    And I haven't even started on the multi-core litany. I'll let this guy
>    do it:
>    [1]http://perilsofparallel.blogspot.com/2009/02/what-multicore-really-m
>    eans-and.html
>    [2]http://perilsofparallel.blogspot.com/2009/09/of-muffins-and-megahert
>    z.html
>    [3]http://perilsofparallel.blogspot.com/2010/02/parallel-powerpoint-why
>    -power-law-is.html
>    Don't get me wrong. Parallel Computing is my bread and butter and I
>    love it. But I question (like this guy does, he's from the same
>    industry as I am) its usefulness for the client. It's great on the
>    server (and especially on the scientific computing server!), so the
>    talk Alessandro will give will be interesting to our group. But on the
>    phone? Ask about "muffin per hours" next time you buy a car!
>    Cheers,
>    Davide
> 
>    On Mon, Jul 14, 2014 at 12:22 PM, Neal McBurnett
>    <[4]neal at bcn.boulder.co.us> wrote:
> 
>      I forget the name of next month's speaker on parallel processing
>      approaches, but the chip I was talking about is the Epiphany, from
>      Adapteva. Â I got one of their Parallella boards in their
>      kickstarter campaign: a Low-power, cheap, flexible board with 16
>      epiphany cores. Â Versions with 64 cores are getting underway, and
>      higher counts are coming. Â This is the kind of thing smart phones
>      and other embedded devies might be sporting, along with other
>      low-power uses.
>      See these articles:
>      Supercomputing on the cheap with Parallella - Programming - O'Reilly
>      Media
>      Â [5]http://programming.oreilly.com/2013/12/supercomputing-on-the-ch
>      eap-with-parallella.html
>      $99 Raspberry Pi-sized âsupercomputerâ touted in Kickstarter project
>      | Ars Technica
>      Â [6]http://arstechnica.com/information-technology/2012/09/99-raspbe
>      rry-pi-sized-supercomputer-touted-in-kickstarter-project/
>      Â  âOnce completed, the 64-core version of the Parallella computer
>      would deliver over 90 GFLOPS of performance and would have the the
>      horse power comparable to a theoretical 45 GHz CPU [64 CPU cores x
>      700MHz] on a board the size of a credit card while consuming only 5
>      Watts under typical work loads.
>      They are offering a variety of products now:
>      New Parallella Product Offerings | Parallella
>      Â [7]http://www.parallella.org/2014/07/14/new-parallella-product-off
>      erings/
>      I don't know a lot about parallel programming, but was interested to
>      read the Epiphany Architecture Reference
>      Â  [8]http://adapteva.com/docs/epiphany_arch_ref.pdf
>      which makes it sound pretty flexible and suitable for learning about
>      different approaches also. Â Here are some quotes:
>      The Epiphany architecture defines a multicore, scalable,
>      shared-memory, parallel computing fabric. It consists of a 2D array
>      of compute nodes connected by a low-latency mesh network-on- chip.
>      Â * A superscalar, floating-point RISC CPU in each mesh node that
>      can execute two floating point operations and a 64-bit memory load
>      operation on every clock cycle.
>      Â * Local memory in each mesh node that provides 32 Bytes/cycle of
>      sustained bandwidth and is part of a distributed, shared memory
>      system.
>      Â * Multicore communication infrastructure in each node that
>      includes a network interface, a multi-channel DMA engine, multicore
>      address decoder, and network-monitor.
>      Â * A 2D mesh network that supports on-chip node-to-node
>      communication latencies in nanoseconds, with zero startup overhead.
>      Â The Epiphany architecture is programming-model neutral and
>      compatible with most popular parallel-programming methods, including
>      Single Instruction Multiple Data (SIMD), Single Program Multiple
>      Data (SPMD), Host-Slave programming, Multiple Instruction Multiple
>      Data (MIMD), static and dynamic dataflow, systolic array,
>      shared-memory multithreading, message- passing, and communicating
>      sequential processes (CSP)
>      ...
>      The eMesh supports efficient broadcasting of data to multiple cores
>      through a special âmulticastâ routing mode. Â ... In multicast mode,
>      the normal eMesh routing algorithm described in 5.2 is overridden
>      and the transaction is instead routed radially outwards from the
>      transmitting node
>      So I'd love to get some informed perspectives on it vs the other
>      low-power options out there.
>      Cheers,
>      Neal McBurnett Â  Â  Â  Â  Â  Â  Â  Â  [9]http://neal.mcburnett.org/
>      _______________________________________________
>      Web Page: Â [10]http://lug.boulder.co.us
>      Mailing List:
>      [11]http://lists.lug.boulder.co.us/mailman/listinfo/lug
>      Join us on IRC: [12]irc.hackingsociety.org port=6667
>      channel=#hackingsociety
> 
> References
> 
>    1. http://perilsofparallel.blogspot.com/2009/02/what-multicore-really-means-and.html
>    2. http://perilsofparallel.blogspot.com/2009/09/of-muffins-and-megahertz.html
>    3. http://perilsofparallel.blogspot.com/2010/02/parallel-powerpoint-why-power-law-is.html
>    4. mailto:neal at bcn.boulder.co.us
>    5. http://programming.oreilly.com/2013/12/supercomputing-on-the-cheap-with-parallella.html
>    6. http://arstechnica.com/information-technology/2012/09/99-raspberry-pi-sized-supercomputer-touted-in-kickstarter-project/
>    7. http://www.parallella.org/2014/07/14/new-parallella-product-offerings/
>    8. http://adapteva.com/docs/epiphany_arch_ref.pdf
>    9. http://neal.mcburnett.org/
>   10. http://lug.boulder.co.us/
>   11. http://lists.lug.boulder.co.us/mailman/listinfo/lug
>   12. http://irc.hackingsociety.org/

> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> Join us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety