[lug] 14 Characters

Davide Del Vento davide.del.vento at gmail.com
Thu Jun 17 17:56:00 MDT 2021


Wow! And thanks!

On Thu, Jun 17, 2021 at 4:05 PM Jeffrey S. Haemer <jeffrey.haemer at gmail.com>
wrote:

> Folks,
>
> In case it's helpful, here's the POSIX Portable Filename Character Set
> (PPFCS).
> <https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_281>
> POSIX was written to conform to existing practice, and I think everyone
> supports it, so it's a low and safe bar.
>
> There are two kinds of standards: standards that make up new stuff and
> standards that document consensus on what exists. The first says, "Here's
> an idea for a new feature, described in enough detail that you could
> implement it." The second says, "Here's what everyone's doing now, so if
> you write a new implementation or an extension, at least don't break this."
>
> They're both useful, just different. IETF RFC's and Python PEPs are the
> former. POSIX was intended to be the latter, though there are spots where
> folks couldn't agree and someone was able to design a compromise which
> still permitted all the different, existing solutions.
>
> An instructive example was the fight over an archive program and format:
> Berkeley's *tar(1) *vs. AT&T's *cpio(1)*. Each distro/vendor used one or
> the other and no one wanted to budge. It was "TAR wars -- A fight to the
> death between 3cpio and tar2D2."
>
> One morning, at one of the meetings, someone showed up with a utility
> called *pax(1)* which handled both formats. ("Pax." Get it?) POSIX added
> a requirement for it, every vendor agreed to supply *pax(1)* in addition
> to their previous favorite, and just moved on.
>
> In theory, a new POSIX distro could even only supply *pax(1), *but I
> don't think anyone ever did that. I do think that, for a while, some *tar* and
> *cpio* implementations were just links to *pax *and chose how to behave
> based on *argv[0]*.
>
> Anyhoo, you can be POSIX-conforming and permit Unicode or blanks or
> whatever in your filenames, but you have to support the existing standard
> as a subset, so if you're porting software or programmers to a new system,
> and you stick to using the PPFCS in your filenames, you'll be safe.
>
> Behold the beauty of backwards compatibility.
>
> I think that, somewhere in the standard, POSIX also prohibits filenames
> that start with "-" (hyphen), since that could be confused with flags or an
> option, but someone else will have to hunt that one down.
>
> There's a parallel to the "standards that define vs. standards that
> document" dichotomy in natural language. *Webster's* and the *OED* document
> English. In contrast, the *Dictionnaire de l'Académie française *defines
> French.
>
> At one point, the POSIX group standardizing *sort(1), *regexes, and globs was
> struggling with how to handle languages with odd alphabets.
>
> Everyone who'd been trying to do internationalization (I18N) -- Sun, IBM,
> Hewlett-Packard, Unisys,  ... -- had a different implementation, designed, *ab
> initio, *by some genius in their marketing department at a happy hour, or
> something.
>
> One humorously painful language was Spanish, whose alphabet began, "A, B,
> C, Ch, D, ..."
>
> When I learned Spanish, "ch" was a letter of the alphabet, written with
> two characters, which sorted between "c" and "d". "Chupa" needed to sort
> *after* "coño" and "culo."
>
> (Don't believe me? Go find a Spanish dictionary older than about 1990.)
>
> The sets of magnetic refrigerator letters and wooden alphabet blocks that
> Juan and María got their children were different from the ones that Santa
> brought John and Mary's kids for Christmas. The alphabet song I'd have
> learned in Bogotá was longer than the one I did learn
> <https://www.youtube.com/watch?v=75p-N9YKqNo> in Norwalk.
>
> Right when the POSIX group had almost reached consensus, following months
> and months of discussion and negotiation, the *Real Academia Española*,
> which defines Spanish in their *Diccionario de la Lengua Española, *changed
> the alphabet. In the middle of a POSIX meeting, actually. Henceforth, "ch,"
> "ll," and "rr" would each be two letters, not one.
>
> *Ñ* got to stay a letter.
>
> Changing the alphabet made it, the *Real Academia* explained, easier to
> use computers.
> Hahahahaha ...
>
> So: change the alphabet, issue a new dictionary, and declare everything
> that used the old alphabet officially wrong, Puts "let's migrate all my
> Python 2 scripts to Python3" in perspective, I reckon.
>
>
> On Thu, Jun 17, 2021 at 11:18 AM Rob Nagler <nagler at bivio.biz> wrote:
>
>> Bucky Carr writes:
>> > I have no idea how someone would use 65535 rows much less 1 million.
>>
>> You are correct, 2^20 is the limit. I still wonder why there is a limit,
>> especially for modern computers.
>>
>> I spend a lot of my time helping people automate what they do in Excel so
>> I see some pretty interesting and large spreadsheets. Remember that CSV is
>> an export format so they can get pretty large. I have seen that many
>> stock/option trades in a year in a single account, for example.
>>
>> Rob
>>
>> _______________________________________________
>> Web Page:  http://lug.boulder.co.us
>> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
>> Join us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety
>
>
>
> --
> Jeffrey Haemer <jeffrey.haemer at gmail.com>   720-837-8908 [cell]
> *פרייהייט? דאס איז יאַנג דינען וואָרט!*
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> Join us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lug.boulder.co.us/pipermail/lug/attachments/20210617/4c24f88d/attachment.html>


More information about the LUG mailing list