[lug] [SOLVED...and a GREAT solution!] Re: sed, tr...escaping non-printable characters for display

stimits at comcast.net stimits at comcast.net
Sun Feb 22 10:41:52 MST 2015


Hi,
 
Thank you for this. This is so simple and so effective with so little effort it astonishes me how much time I spent researching this...and someone here immediately knows this little arcane magic! This has helped so much, thank you again!
 
----- Original Message -----From: Simos <blug at chinesetearoom.com>To: Boulder (Colorado) Linux Users Group -- General Mailing List <lug at lug.boulder.co.us>Sent: Sun, 22 Feb 2015 00:15:32 -0000 (UTC)Subject: Re: [lug] sed, tr...escaping non-printable characters for display
 
Hi,
 
man cat
 
-v, --show-nonprinting use ^ and M- notation, except for LFD and TAB
 
For example:
 
$ ls f*f?i?l?e
 
$ ls f* | cat -vf^Fi^Bl^Ee
 
HTH,
 
Simos

> Hi,> > The more I need to deal with comparing two file systems using only bash and core utilties, the more I miss languages like C/C++. In particular, it is very rare that a non-printable control character is included in a file name or directory name, yet linux itself is quite good at working with almost any embedded control character sequence possible. A newline, linefeed, tab...all of these can be embedded in file names or directories. I've encoded these names so working with them isn't too bad, but if I can't display the results when done the purpose of the script is lost. So to get around this, I'm trying to substitute non-printing (mostly control) characters with either "hat" notation (e.g., '^M') or hex or octal notation (e.g., '\0x1D' or '\012').> > tr almost does this in a trivial way, e.g.:> echo "${filename}" | tr '[:cntrl:]' '[A.._]'> ...the result of the above would be to replace control characters (decimal 1 through 31) with ASCII characters between capital 'A' and underscore '_'. But this leaves out the "hat", e.g., changing a carriage return to 'D' would actually need to show as '^D' to distinguish it from printable characters. How can I use tr to convert one character into a constant hat '^' plus a printable character?> > Using sed almost does the job too, e.g.:> echo "${filename}" | sed 's/\([^[:print:]]\)/?/g'> ...this will substitute a question mark for every non-printable character. So far this seems to be the best method, but it still doesn't give meaning to the non-printable character the way hat notation or hex notation would. Having sed capable of using the matched character and transforming it into a sequence of hat+transformed printable characters would be great, but I'm at a loss as to how to do that.> > Yet this must be a common problem, I feel like I must be reinventing the wheel while trying to solve this. Does anyone have a suggestion on how to print these non-printable file and directory names in a meaningful way, without using a non-bash script and without using non-core utilities? sed and tr are core, gawk and perl are not. It's hard to imagine how inefficient it would be to use bash to traverse every character of every file or directory name one at a time looking for non-printable characters in some enormous loop.> > Thanks!_______________________________________________Web Page: http://lug.boulder.co.usMailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lugJoin us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lug.boulder.co.us/pipermail/lug/attachments/20150222/f34ca2dd/attachment.html>


More information about the LUG mailing list