[lug] shmem/mmap/HUGETLB question

Doug Pintar ratnip3 at gmail.com
Sat Oct 15 15:36:13 MDT 2011


Hi, gang,
Thanks again for being so welcoming Thursday.

I'm working on a program and having a bit of a problem.  Copying long files (like movies and stuff, at least several hundred MB) from one drive to another takes longer than I think it should. I'm trying for a workaround but the paging system is horking me. Simple case: Attach a segment of shared memory (I'm using the HUGETLB ones so I get 2MB "pages" and it can optimize I/O, but the same thing holds for regular pages). Memory-map the file to be copied (or some portion if it's too big in which case you start a multi-process or multi-thread deal with a ring of buffers, one process/thread filling them, the other emptying them). "look at" the first byte of each page (this will cause the page fault that will read in the file data). Unmap the shared memory from the input file and map it to the output file. (This is where I think I get screwed, as it marks all the newly-mapped pages as being new so any access to them gives you fresh pages of 0s). Then just flush the mapped segment so it gets paged out to the output file. My question is, is there any way to keep the system from marking the pages mapped to the output file as new and just keep the data around that had already been loaded into them from the input file? It seems this would be a great way to copy stuff with no memory-memory data copying at all, and should be slicker than, as they say, owl shit. Theoretically there's a way to turn on something called CONFIG_MMAP_ALLOW_UNINITIALIZED during kernel building that will then let you mmap with the MAP_UNINITIALIZED option set, hopefully defeating the system's attempt to give you the clean virginal pages I don't want. (Sent to paradise with a bunch of "virgins"? Gad, those suicide bomber "martyrs" must be awfully undiscriminating...) I haven't been able to find this magic configuration value, however.  Any help or pointers would be appreciated.  I understand this could be a potential security hole, but if releasing the shared memory segment causes all the pages to be mapped anew the next time it's attached, it doesn't seem to be a big one, as only the original process tree will be able to see the "dirty" pages.
Thanks,
Doug Pintar (ratnip3 at gmail.com) Westminster
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lug.boulder.co.us/pipermail/lug/attachments/20111015/94318dd2/attachment.html>


More information about the LUG mailing list