[lug] Software patents

Lee Woodworth blug-mail at duboulder.com
Tue Jun 2 12:33:22 MDT 2009


Zan Lynx wrote:

>> Nate Duehr wrote:
>>> I'm not even going to guess why Java took off... it's about the most  
>>> bloated crap I've ever seen.  Marketing money I guess.  It doesn't do  
>>> anything that a good developer library and some discipline in code  
>>> writing couldn't accomplish ten years earlier.  Its appeal still  
>>> baffles me.
.....
> 
> He's got a point.  There's nothing Java does that cannot be done in 
> native code or in a virtual machine language like LLVM.

Except the implementation time and total error count won't be the same.
The more lines of code, the more opportunities for error. Assembly
will take more lines of code for any decent-sized application so
there are more opportunities for error.

Anyone want to claim they could write a high performance, full
featured dbms in assembly in even the same amount of time it
would take with say C++ (have to include all your macro setup
and tool building time)?

The increasing complexity of processors is diminishing the cases
where hand coding assembly can have much higher performance than
good optimizer-produced code. Consider accounting for the increasing
number of functional units to keep fed, 3-level cache, multiple cores.
It won't take much of this before you would be spending more time
thinking about issues unrelated to the function being implemented
than thinking about the function behavior. Embedded systems using
less complicated processors have less mental overhead for now. But
as they get more complicated it will become harder to beat an optimizer.
Go lookup Branch Prediction and Speculative Execution and consider how
it impacts what makes for the optimal order of machine instructions.
Also consider run-time profiling and dynamic code generation. There
are systems that take profiling information from prior runs and adjust
the code. Some of the information used is which way a branch
usually goes. This helps optimize how branches are coded.

As far as LLVM being a magic bullet, if the particular LLVM machine
model isn't real close to the actual target hardware, then its VM
implementation has similar issues to address as the others, such as
Java (except perhaps GC).

Memory management is more complicated and error prone than people
seem to think. I spent 2 days tracking down a bug due to a double
free in a medium-sized C application. A consultant had freed the
same memory in two different places and the problem showed up
in my code. I had the contents of structure magically change on
me because the memory allocator reused the memory. It would be even
harder to track down in a multi-threaded app. If it wasn't a
significant problem, then people wouldn't have paid thousands
of dollars for the Purify memory analysis tool.

There is mental overhead, run-time overhead, and memory efficiency
overhead in manual memory management. In sophisticated data structures
with significant linking, one either has to manage ownership and
lifetime to decide when to free things, or one starts to implement
a manual GC, say with reference counting. As long there aren't any
circular references, reference counting won't orphan memory, otherwise
its back to careful tracking of ownership (which I found was often not
documented as part of a function's interface).

The main problems with C for memory management are stack allocation
and having such pointers to such objects live longer than the function
that passed the pointers along, not freeing memory, freeing memory
more than once, and the pinned nature of memory from malloc and friends
which prevents the memory manager from moving thins to make contigous
space space for a really big allocation.

Its even more fun if you pass pointers to the middle of a structure
to functions that squirrel them away. All of a sudden, reference
counting gets nasty. The called function has no idea the pointer
it was given is part of a larger structure and that's where the
reference count should be incremented. In C++ one could use genericized
smart pointers that actually contain the reference count's memory address
and the data structure's address. More code to maintain, pointer creation
becomes more verbose, and this assumes that circular references
aren't a problem.

Issues that arise with C-style memory managers are fragmentation
and cache locality. External fragmentation occurs where the size
of the free memory areas decreases with time as many malloc/free
cycles occur. Internal fragmention is the left-over space not used
in a structure that is mapped onto the memory returned by malloc.
General purpose allocators want to work with a small number of
allocation unit sizes. Otherwise the external fragmentation gets
really bad. So when you ask 18 bytes for a structure, you may well
have actually consumed 32 bytes. Do a literature search on memory
allocators for C libraries, you'll find many kinds.

So you say, I know my memory use I'll just write my own application
level memory manager. If you have simple data structures, this can
be efficient, but as soon as you start having complicated structures
with pointers it gets messy. Suppose you create a set of heaps sized
exactly for the size of the structures in the application (e.g.
a linked list of 1000x18 byte chunks + linking over head). How big
should those heaps be? Do you free parts of them after a temporary
spike in usage? If so, how are you going to know that a region of memory
isn't being used? What about the fragmentation that freeing might
create? (e.g. free one 1000x18 chunk, but then grab a 1000x12 chunk
from that freed area).

GC does have some efficiency compared to C-style allocation. Fragmentation
isn't an issue with copying collectors because the memory gets compacted
periodically. This can be a potential help with cache-locality. It used
to be that C's stack allocate and free was a major performance win compared
to GC. Modern GC's have been tuned to handle lots of alloc-quickly-free
memory usage.

The perceived run-time difference between C memory management and GC-based,
is a pause if the GC needs to do a large collection run. With the C-style,
you pay the cost of the memory management with every malloc/free call amortizing
the overhead in smaller chunks. GC has gotten better. but it is still an
occasional occurrence. For example, Eclipse infrequently pauses on me,
maybe once a day.

On a side note, an interesting experiment is being done with Java in the embedded
space. A person I talked to at BJUG is looking at a CPU+FPGA combination for
running the Java environment. The idea is for the JVM's just-in-time compilation
code to compile some ops directly to the FPGA and run the ops concurrently with
the rest of the interpreter compiled native code.



More information about the LUG mailing list