← Back to team overview

syncany-team team mailing list archive

Re: Illegal file names

 

Hi,

Le 14/12/2013 22:36, Philipp Heckel a écrit :
>> You have at least to record the encoding used by the creator of the
>> repository. 
> 
> Can you elaborate on why recording the encoding (locale, LC_CTYPE, I
> assume) of the creator of a database version (not repository!) would
> help? 

As far as I know, linux filesystems handle (almost) arbitrary binary
names. However, the way the binary name is decoded depends on the
locale. So the utf8 names in syncany database should be
translated/slugify locally if needed. I don't know to what extent this
is done automatically and properly in Java. I don't know (and honestly
don't care) how things work on other OS.

> Because: Java internally handles file names as UTF8 (correct?) and
> the database files are encoded as UTF8 ("-Dfile.encoding=UTF8" is set).
> And when a file on another user's PC has to be created, the only issue
> is how to encode the filename in the local encoding, e.g. to maybe
> translate it to ISO-8859-1 or something else, right?!

This is my understanding, but again, Java is sometimes doing mysterious
things with file encoding (more precisely, I didn't spend enough time
understanding all the subtleties of IOs in Java).

> I am not an expert on encoding stuff, so excuse my stupid questions :-)
> 
> I tried to understand how filenames are encoded, here's what I believe
> to have found out:
> - The standard file systems of all major OSs support Unicode filenames

Not exactly. MacOS supports only some canonical form of unicode
encoding, which is quite annoying.

> - There are different illegal characters (e.g. "/", "\", ":" on Windows)
> and filenames (e.g. "COM" or"" on Windows)
> - Some file systems are case-senstive, others aren't

Most are case-preserving but windows and mac os are not case-sensitive.

> - The OSs locale influences how with which encoding file names are
> stored (for all file systems?)
>   e.g. LC_CTYPE="en_US.iso88591" java org.syncany.Syncany will encode
> file names differently

That's my understanding.

>> [..]
>> Or slugify.
> That's generally the best solution. Although it's harder than it sounds,
> because it has to be target-platform-specific AND target encoding-specific.

Indeed. The recode program handles this type of things and it's enormous
(with a very complicated documentation which shows in my opinion the
difficulty of the task).

Cheers,

Fabrice




References