[lug] 14 Characters

Thu Jun 17 16:04:45 MDT 2021

Folks,

In case it's helpful, here's the POSIX Portable Filename Character Set
(PPFCS).
<https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_281>
POSIX was written to conform to existing practice, and I think everyone
supports it, so it's a low and safe bar.

There are two kinds of standards: standards that make up new stuff and
standards that document consensus on what exists. The first says, "Here's
an idea for a new feature, described in enough detail that you could
implement it." The second says, "Here's what everyone's doing now, so if
you write a new implementation or an extension, at least don't break this."

They're both useful, just different. IETF RFC's and Python PEPs are the
former. POSIX was intended to be the latter, though there are spots where
folks couldn't agree and someone was able to design a compromise which
still permitted all the different, existing solutions.

An instructive example was the fight over an archive program and format:
Berkeley's *tar(1) *vs. AT&T's *cpio(1)*. Each distro/vendor used one or
the other and no one wanted to budge. It was "TAR wars -- A fight to the
death between 3cpio and tar2D2."

One morning, at one of the meetings, someone showed up with a utility
called *pax(1)* which handled both formats. ("Pax." Get it?) POSIX added a
requirement for it, every vendor agreed to supply *pax(1)* in addition to
their previous favorite, and just moved on.

In theory, a new POSIX distro could even only supply *pax(1), *but I don't
think anyone ever did that. I do think that, for a while, some *tar* and
*cpio* implementations were just links to *pax *and chose how to behave
based on *argv[0]*.

Anyhoo, you can be POSIX-conforming and permit Unicode or blanks or
whatever in your filenames, but you have to support the existing standard
as a subset, so if you're porting software or programmers to a new system,
and you stick to using the PPFCS in your filenames, you'll be safe.

Behold the beauty of backwards compatibility.

I think that, somewhere in the standard, POSIX also prohibits filenames
that start with "-" (hyphen), since that could be confused with flags or an
option, but someone else will have to hunt that one down.

There's a parallel to the "standards that define vs. standards that
document" dichotomy in natural language. *Webster's* and the *OED* document
English. In contrast, the *Dictionnaire de l'Académie française *defines
French.

At one point, the POSIX group standardizing *sort(1), *regexes, and globs was
struggling with how to handle languages with odd alphabets.

Everyone who'd been trying to do internationalization (I18N) -- Sun, IBM,
Hewlett-Packard, Unisys,  ... -- had a different implementation, designed, *ab
initio, *by some genius in their marketing department at a happy hour, or
something.

One humorously painful language was Spanish, whose alphabet began, "A, B,
C, Ch, D, ..."

When I learned Spanish, "ch" was a letter of the alphabet, written with two
characters, which sorted between "c" and "d". "Chupa" needed to sort *after*
"coño" and "culo."

(Don't believe me? Go find a Spanish dictionary older than about 1990.)

The sets of magnetic refrigerator letters and wooden alphabet blocks that
Juan and María got their children were different from the ones that Santa
brought John and Mary's kids for Christmas. The alphabet song I'd have
learned in Bogotá was longer than the one I did learn
<https://www.youtube.com/watch?v=75p-N9YKqNo> in Norwalk.

Right when the POSIX group had almost reached consensus, following months
and months of discussion and negotiation, the *Real Academia Española*,
which defines Spanish in their *Diccionario de la Lengua Española, *changed
the alphabet. In the middle of a POSIX meeting, actually. Henceforth, "ch,"
"ll," and "rr" would each be two letters, not one.

*Ñ* got to stay a letter.

Changing the alphabet made it, the *Real Academia* explained, easier to use
computers.
Hahahahaha ...

So: change the alphabet, issue a new dictionary, and declare everything
that used the old alphabet officially wrong, Puts "let's migrate all my
Python 2 scripts to Python3" in perspective, I reckon.

On Thu, Jun 17, 2021 at 11:18 AM Rob Nagler <nagler at bivio.biz> wrote:

> Bucky Carr writes:
> > I have no idea how someone would use 65535 rows much less 1 million.
>
> You are correct, 2^20 is the limit. I still wonder why there is a limit,
> especially for modern computers.
>
> I spend a lot of my time helping people automate what they do in Excel so
> I see some pretty interesting and large spreadsheets. Remember that CSV is
> an export format so they can get pretty large. I have seen that many
> stock/option trades in a year in a single account, for example.
>
> Rob
>
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> Join us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety

-- 
Jeffrey Haemer <jeffrey.haemer at gmail.com>   720-837-8908 [cell]
*פרייהייט? דאס איז יאַנג דינען וואָרט!*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lug.boulder.co.us/pipermail/lug/attachments/20210617/aea79673/attachment.html>