[lug] CU colloquium at 3:30 on Parallel, adaptive, implicit computational fluid dynamics Re: Parallella: low-power, cheap, flexible board with 16, 64, ... epiphany cores

Thu Oct 2 13:52:48 MDT 2014

Those interested in parallelism might enjoy this symposium this afternoon.  They are typically also available sometime afterwards in video form online.

Neal McBurnett                 http://neal.mcburnett.org/

----- Forwarded message from Emily Adams <EmilyAdams at colorado.edu> -----

Date: Wed, 1 Oct 2014 14:46:11 -0600
To: "cs-colloquia at lists.colorado.edu" <cs-colloquia at lists.Colorado.EDU>
Subject: CS Colloquium REMINDER: 3:30pm Thursday, Oct. 2, in ECCR 265 - Kenneth Jansen (CU-Boulder)

Computational Science Challenges in Extreme Scale CFD

WHO: Kenneth Jansen, Professor, CU-Boulder Department of Aerospace Engineering Sciences
WHEN: 3:30-4:30 p.m. Thursday, Oct. 2
WHERE: ECCR 265

Free and open to all! Light refreshments will be served. 

ABSTRACT: Parallel, adaptive, implicit computational fluid dynamics solvers have recently been shown to scale to the full machine at several of the largest computer facilities (e.g., 92 billion unstructured finite elements on over 3 million processes). To accomplish this, a number of computational science challenges were overcome including; load balancing, parallel I/O and implicit equation solution. These challenges and new ones such as extreme scale data analytics will be discussed within the context of real-world applications stemming from aerodynamic flow control.

BIO: Kenneth Jansen joined the faculty of Aerospace Engineering Sciences in January 2010 after 13.5 years at Rensselaer Polytechnic Institute, where he held appointments in Mechanical, Aerospace and Nuclear Engineering (home) and Computer Science (joint).  Prior to this, he was a post-doctoral fellow at the Center for Turbulence Research (NASA/Stanford University). He received his Ph.D. in Mechanical Engineering, Division of Applied Mechanics, from Stanford University in 1993. His research balances the computational science challenges with the physical modeling challenges of massively parallel/adaptive computational fluid dynamics and has been funded by NSF (CISE), DOE (SciDAC), DOD and several companies. Applications span diverse fields of aerodynamics, cardiovascular flow and multiphase flow.

Hosted by James Martin. 

______________________________________________________________________________________

Subscribe/unsubscribe to CS Colloquia announcements: www.colorado.edu/cs/colloquia/colloquia-mailing-list
Watch past CS Colloquia sessions: http://vimeo.com/channels/cucs Follow the CS Department on Facebook: http://facebook.com/CUBoulderCompSci

2014-2015 Colloquium Communications: Emily Adams, emilyadams at colorado.edu
2014-2015 Colloquium Logistics: Emily Komendera, emily.komendera at colorado.edu
2014-2015 Colloquium Coordinator: Bor-Yuh Evan Chang, www.cs.colorado.edu/~bec/

----- End forwarded message -----

On Wed, Jul 16, 2014 at 08:02:45PM -0600, Neal McBurnett wrote:
> Thanks to Steve Rogers and Quentin Hartman for the follow-up links.
> 
> Thanks, Davide, for the links and insights.
> 
> Re: http://perilsofparallel.blogspot.com/2009/02/what-multicore-really-means-and.html
> 
> That guy does indeed do a "litany" on multi-core.  But I did enjoy the funny and useful comparisons essentially on why a fleet of 16 cars that go 60 mph each is not the same as a car that goes 960 mph.
> 
> You can tell why he was motivated to write up the litany from his followup post which you also linked to - he was mad about a misleading business presentation from folks that ignored his input.  A frequent motivator :)
>   http://perilsofparallel.blogspot.com/2009/09/of-muffins-and-megahertz.html
> 
> I did indeed include some of their glowing marketing numbers in my post, and I thank you all for the informative responses.  But to clarify my motivations, from my perspective, parallel computing in one form or another is a big part of the future, and it is very cool to have cheap general-purpose hardware to play with and explore the possibilities.  I get the impression that the Parallella supports a number of parallell processing styles, and the cores are general-purpose enough that I can program them without re-learning a lot, so it looks fun and educational.
> 
> For me, the most interesting part of the 'what multicore really means' post was the equally opinionated and also informative rant in the comments by "anonymous"
> 
>   http://perilsofparallel.blogspot.com/2009/02/what-multicore-really-means-and.html?showComment=1238239200000#c7290536757386895922
> 
> who makes some interesting points (along with the venting) about the utility of multiple cores for gaming and exploring other new ways to use computing power:
> 
>  "....there is one application that seems core hungry beyond even 4 cores- namely the only type of application that uses the full power of most people's systems, computer games"
> 
>  "In the REAL world, computer games drive [[a great deal of PC hardware development]]"
> 
>  'If the argument is that multi-core is a problem, because non-game apps rarely benefit, people in the real-world say "what are these non-game apps that need more power anyway?". As I have said above, such apps either don't exist, or rapidly find themselves moved to hardware (for example, dual-core chips were initially mostly useful for decoding hi-definition, Bluray like content, but now a small segment of the GPU on even ATI's and Nvidia's cheapest product decodes hi-def video with no CPU hit, and using far less power than a CPU solution).'...
> 
> So that all brings me back to wanting to know more about the many different styles of parallel programming, how to explore them with this new cheap prototyping platform, wondering how the epiphany chip might fit in, and what sorts of novel things people might use it for on embedded devices, phones, etc.
> 
>  "The problem with multi-core in the real world ... is that as soon as a software algorithm is developed that takes advantage of current PC CPU designs (like 3d-rasterising, or video-decoding), the algorithm is quickly baked into a standalone ASIC, providing a much cheaper and far more power efficient solution...."
> 
> Cheers,
> 
> Neal McBurnett                 http://neal.mcburnett.org/
> 
> From: Steve Rogers <shr066 at gmail.com>
> To: "Boulder (Colorado) Linux Users Group -- General Mailing List" <lug at lug.boulder.co.us>
> Subject: Re: [lug] Parallella: low-power, cheap, flexible board with 16, 64, ... epiphany cores
> 
>    IIRC Epiphany has 32 kb on chip memory for each processor for both code
>    and data.Â  Effectively utilizing Parallella's capabilities looks like
>    an interesting challenge as one may need to distribute the solution
>    across the Epiphany, the ARM cores, and the FPGA depending upon the
>    problem being solved.
> 
> 
>    All the specs you could possibly want on the parallela board are linked
>    from here:
>    [1]http://www.parallella.org/board/
>    Epiphany chip information is detailed here:
>    [2]http://www.adapteva.com/epiphanyiii/
>    Including a detailed data sheet that I would expect would answer all
>    your questions.
>    QH
> 
> 
> On Mon, Jul 14, 2014 at 10:08:02PM -0600, Davide Del Vento wrote:
> >    Hi Neal,
> >    Thanks for sharing the link of the PDF (however TL;DR yet).
> >    I've seen this board in the past and I'd be interested in exploring it.
> >    However I yet have to see a quick datasheet (say one or two pages) with
> >    the "hard specs" demonstrating these claims. When they say "local
> >    memory in each mesh node that provides 32 Bytes/cycle of sustained
> >    bandwidth", that is simply too good to be true. Plus, it't missing the
> >    important latency. Plus, it's missing the rest of the bw and latency
> >    numbers for the rest of the memory hierarchy.
> >    I'm not saying they are lying, but their marketing is probably
> >    misleading. At that speed it cannot be "memory". It must be "registers"
> >    or at the very least fast cache. How much of that superfast "memory" is
> >    there per node? What is the overhead to have the cache coherence (if
> >    there is such a thing)? If there isn't cache coherence, what is the
> >    time to access memory on *some other* node?
> >    And I haven't even started on the multi-core litany. I'll let this guy
> >    do it:
> >    [1]http://perilsofparallel.blogspot.com/2009/02/what-multicore-really-m
> >    eans-and.html
> >    [2]http://perilsofparallel.blogspot.com/2009/09/of-muffins-and-megahert
> >    z.html
> >    [3]http://perilsofparallel.blogspot.com/2010/02/parallel-powerpoint-why
> >    -power-law-is.html
> >    Don't get me wrong. Parallel Computing is my bread and butter and I
> >    love it. But I question (like this guy does, he's from the same
> >    industry as I am) its usefulness for the client. It's great on the
> >    server (and especially on the scientific computing server!), so the
> >    talk Alessandro will give will be interesting to our group. But on the
> >    phone? Ask about "muffin per hours" next time you buy a car!
> >    Cheers,
> >    Davide
> > 
> >    On Mon, Jul 14, 2014 at 12:22 PM, Neal McBurnett
> >    <[4]neal at bcn.boulder.co.us> wrote:
> > 
> >      I forget the name of next month's speaker on parallel processing
> >      approaches, but the chip I was talking about is the Epiphany, from
> >      Adapteva. Â I got one of their Parallella boards in their
> >      kickstarter campaign: a Low-power, cheap, flexible board with 16
> >      epiphany cores. Â Versions with 64 cores are getting underway, and
> >      higher counts are coming. Â This is the kind of thing smart phones
> >      and other embedded devies might be sporting, along with other
> >      low-power uses.
> >      See these articles:
> >      Supercomputing on the cheap with Parallella - Programming - O'Reilly
> >      Media
> >      Â [5]http://programming.oreilly.com/2013/12/supercomputing-on-the-ch
> >      eap-with-parallella.html
> >      $99 Raspberry Pi-sized âsupercomputerâ touted in Kickstarter project
> >      | Ars Technica
> >      Â [6]http://arstechnica.com/information-technology/2012/09/99-raspbe
> >      rry-pi-sized-supercomputer-touted-in-kickstarter-project/
> >      Â  âOnce completed, the 64-core version of the Parallella computer
> >      would deliver over 90 GFLOPS of performance and would have the the
> >      horse power comparable to a theoretical 45 GHz CPU [64 CPU cores x
> >      700MHz] on a board the size of a credit card while consuming only 5
> >      Watts under typical work loads.
> >      They are offering a variety of products now:
> >      New Parallella Product Offerings | Parallella
> >      Â [7]http://www.parallella.org/2014/07/14/new-parallella-product-off
> >      erings/
> >      I don't know a lot about parallel programming, but was interested to
> >      read the Epiphany Architecture Reference
> >      Â  [8]http://adapteva.com/docs/epiphany_arch_ref.pdf
> >      which makes it sound pretty flexible and suitable for learning about
> >      different approaches also. Â Here are some quotes:
> >      The Epiphany architecture defines a multicore, scalable,
> >      shared-memory, parallel computing fabric. It consists of a 2D array
> >      of compute nodes connected by a low-latency mesh network-on- chip.
> >      Â * A superscalar, floating-point RISC CPU in each mesh node that
> >      can execute two floating point operations and a 64-bit memory load
> >      operation on every clock cycle.
> >      Â * Local memory in each mesh node that provides 32 Bytes/cycle of
> >      sustained bandwidth and is part of a distributed, shared memory
> >      system.
> >      Â * Multicore communication infrastructure in each node that
> >      includes a network interface, a multi-channel DMA engine, multicore
> >      address decoder, and network-monitor.
> >      Â * A 2D mesh network that supports on-chip node-to-node
> >      communication latencies in nanoseconds, with zero startup overhead.
> >      Â The Epiphany architecture is programming-model neutral and
> >      compatible with most popular parallel-programming methods, including
> >      Single Instruction Multiple Data (SIMD), Single Program Multiple
> >      Data (SPMD), Host-Slave programming, Multiple Instruction Multiple
> >      Data (MIMD), static and dynamic dataflow, systolic array,
> >      shared-memory multithreading, message- passing, and communicating
> >      sequential processes (CSP)
> >      ...
> >      The eMesh supports efficient broadcasting of data to multiple cores
> >      through a special âmulticastâ routing mode. Â ... In multicast mode,
> >      the normal eMesh routing algorithm described in 5.2 is overridden
> >      and the transaction is instead routed radially outwards from the
> >      transmitting node
> >      So I'd love to get some informed perspectives on it vs the other
> >      low-power options out there.
> >      Cheers,
> >      Neal McBurnett Â  Â  Â  Â  Â  Â  Â  Â  [9]http://neal.mcburnett.org/
> >      _______________________________________________
> >      Web Page: Â [10]http://lug.boulder.co.us
> >      Mailing List:
> >      [11]http://lists.lug.boulder.co.us/mailman/listinfo/lug
> >      Join us on IRC: [12]irc.hackingsociety.org port=6667
> >      channel=#hackingsociety
> > 
> > References
> > 
> >    1. http://perilsofparallel.blogspot.com/2009/02/what-multicore-really-means-and.html
> >    2. http://perilsofparallel.blogspot.com/2009/09/of-muffins-and-megahertz.html
> >    3. http://perilsofparallel.blogspot.com/2010/02/parallel-powerpoint-why-power-law-is.html
> >    4. mailto:neal at bcn.boulder.co.us
> >    5. http://programming.oreilly.com/2013/12/supercomputing-on-the-cheap-with-parallella.html
> >    6. http://arstechnica.com/information-technology/2012/09/99-raspberry-pi-sized-supercomputer-touted-in-kickstarter-project/
> >    7. http://www.parallella.org/2014/07/14/new-parallella-product-offerings/
> >    8. http://adapteva.com/docs/epiphany_arch_ref.pdf
> >    9. http://neal.mcburnett.org/
> >   10. http://lug.boulder.co.us/
> >   11. http://lists.lug.boulder.co.us/mailman/listinfo/lug
> >   12. http://irc.hackingsociety.org/
> 
> > _______________________________________________
> > Web Page:  http://lug.boulder.co.us
> > Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> > Join us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety
> 
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> Join us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety