[lug] Processor assignment

Rob Nagler nagler at bivio.biz
Thu Mar 31 16:10:04 MDT 2016


Hi Davide,

Solved the problem: mpiexec --bind-to none

The reason we saw the problem in the container and not outside on Debian
Jessie is that the native (Jessie) OpenMPI is --bind-to-none by default.
They changed that in OpenMPI 1.8 which is --bind-to core, which is the
version we are using in the Docker container.

are pretty easy to install and configure. If you plan to use a
> resource manager, I'd do it sooner rather than later, since it may
> solve your issue (see below).
>

I understand. Turns out this is complicated by the fact that we are
initiating the jobs on demand from a web application. SLURM, etc. do not
play nicely. I looked at Kubernetes, but it is very opinionated, and well,
so is MPI. :)

We defaulted to distributing jobs with Celery, because it's easy enough to
manage (although Celery has some significant weaknesses wrt queue
management).

Would you be able to connect me with one of your sysadmins? I'm curious if
they manage clusters that service web applications.



> As you know, fork creates new processes by "splitting" the one you
> fork. I am not sure about what OpenMPI does on a single node, but it
> might do what it does on multiple nodes which is connecting "remotely"
> (e.g. by ssh, or other means) and starting "fresh" processes splitting
> from the sshd daemon (instead of its "normal" parent). In that case,
> without a resource manager, process placement will be left to what the
> OS does for sshd (and root is involved, where in plain fork it can all
> be in user). Hence my suggestion to jump straight to the process
> manager, which certainly have dials for tuning this, e.g. task
> geometry and the likes.


mpiexec forks before invoking the application. It communicates with its
children via environment variables (afaik). The ssh thing can be worked
around. SLURM sets all this up, I think, so that mpi knows how to connect
to the nodes. It's all very primitive imiho, but it's the standard in our
space.

Rob
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lug.boulder.co.us/pipermail/lug/attachments/20160331/41d2bcc4/attachment.html>


More information about the LUG mailing list