← Back to team overview

ecryptfs-devel team mailing list archive

Re: [Patch 0/1] Add support for file names that are too long after encryption

 

On Sun, Feb 6, 2011 at 12:55 AM, John Johansen
<john.johansen@xxxxxxxxxxxxx> wrote:
> On 02/05/2011 08:13 PM, Dustin Kirkland wrote:
>> On Tue, Feb 1, 2011 at 9:57 AM, John Johansen
>> <john.johansen@xxxxxxxxxxxxx> wrote:
>>> The following patch is a first pass at addressing the bug
>>>  "file name too long when creating new file"
>>>  https://bugs.launchpad.net/ecryptfs/+bug/344878
>>>
>>> Which occurs when a file is created with a file name that would be valid
>>> before encrypting and encoding but after being encrypted and encoded is
>>> too long for the underlying filesytem.
>>
>> First and foremost, thank you, John, for tackling this ~2 year old
>> problem.  It's something that we knew would be an issue when we
>> embarked on encrypted (or, as I prefer, obfuscated) filenames.  We
>> didn't really have any idea how big of a problem it might be.  I
>> remember doing a deep find on all of my ~10 Ubuntu systems, and did
>> not have any particular file that was >200 characters, nor path that
>> was >2000 characters.  However, different strokes for different folks,
>> and eventually people did start having issues.
>>
>>> Overview:
>>> To support file names that are too long when encrypted and encoded the patch
>>> stores the long file name (longname) in an xattr on the file and creates a
>>> "unique" short file name (shortname) which is stored in the underlying
>>> filesystem.  The shortname is never seen when accessing files from the
>>> ecryptfs view, but it is what will be found when accessing the lower
>>> filesystem directly.
>>>
>>> While the patch currently uses xattrs it is possible to convert to storing
>>> the longname in the ecryptfs file header (see below for some notes about
>>> advantanges and disadvantages), or even allow for both options.
>>
>> So we've discussed this 1:1, but I'll just re-state here...
>>
>> Ideally, we would use the header of the file to store this long
>> filename, which would provide the great advantage of providing
>> increased portability across filesystems.  Without requiring xattrs,
>> it's trivial, for instance, to backup the lower encrypted files to a
>> VFAT filesystem on a USB stick.
>>
> Right, and I am still looking at this.  It can be added without too
> much effort, and the two methods can coexist, if so desired.

Oh, one other use case that comes to mind beyond USB sticks -- it's
also somewhat common to back up data to CDRs and DVDRs, for which VFAT
is pretty much required too, I think.

>> However, in such circumstances, VFAT does lose some metadata about the
>> file, such as permissions.  Furthermore, I have come to understand the
>> complexity of supporting elemental UNIX/Linux functionality such as
>> hardlinks using the file header alone.  For these reasons, I will
>> accept that your xattr approach is a reasonable solution to this
>> problem, with the one very hard requirement that the short name you
>> provide when the xattr is not available is both a) unique, and b) as
>> absolutely descriptive as possible.
>>
> Right so currently we are just failing if we can't store the longname
> in an xattr.  Its possible we could provide a very descriptive
> shortname for vfat type failures, something along the lines of
>  ECRYPTFS.SHORTENED.XXXX~md5
> where XXX represent as much of the actual name as we can possibly fit
> in the space provided.

It would be nice if you could provide us a few examples from your
running code of what upper and lower filenames look like when they're
too long.

> Alternately we can continue looking at storing in the ecryptfs header
> of a special ecryptfs "dentry" file.
>
>>> Current State:
>>> - Use xattrs to store longname on the file
>>> - Detects xattr support at mount time
>>> - Uses a mount flag for longname support
>>>  - currently the mount flag is inverted.  Longname support is enabled
>>>    by default and the flag is used to disable it.
>>>  - current method is some what hacky in that it was assumed this
>>>    would be inverted, back to requiring a flag but if not this can
>>>    be cleaned up.
>>
>> This is okay by me.  I can add some support code in ecryptfs-utils
>> which would allow for this to be configured on a per-user basis in
>> ~/.ecryptfs with some flag file, perhaps
>> ~/.ecryptfs/disable-long-names.
>>
> Right, either way works.  Its just a matter of choosing which is the
> best for ecryptfs and moving forward.  I think either way will need
> some support code.
>
>>> - Currently the code is does not have a Kconfig to disable at compile
>>>  time.  Is this desired?
>>
>> Not desired by me, but I think others may have dissenting opinions and
>> valid reasons.
>>
>>> - the longname xattr is stored in the trusted namespace using the
>>>  trusted.ecryptfs. prefix
>>> - the longname is encrypted using the same tag70 packet encoding as any
>>>  other encrypted file name.  It is not encoded to reduce the size of the
>>>  xattr.
>>> - a file can have multiple longnames (hardlinks)
>>
>> Cool.
>>
>>> - each longname is stored as a single xattr name, value pair.
>>>  - the xattr name is based off of the encrypted and encoded shortname
>>>    without the ECRYPTFS_FNEK prefix
>>>    eg.
>>>       if the encrypted and encoded shortname is
>>>          ECRYPTFS_FNEK_ENCRYPTED.FZYwryMXdKVUQZfN26kvrVp30Yif
>>>       then the xattr name will be
>>>          trusted.ecryptfs.FZYwryMXdKVUQZfN26kvrVp30Yif
>>
>> Okay, that sounds fine to me.
>>
>> As an aside (and feel free to break this off into a separate thread,
>> if you like)...  Tyler and I have discussed several times shortening
>> the long-and-clunky preamble on encrypted filenames.  This does eat
>> ~23 out of 255, so roughly 10% of the total available file name
>> length.  It would be pretty rare to have encrypted and non-encrypted
>> files next to one another in the same directory, so this preamble
>> seems unnecessarily long, to me.  Any thoughts about trimming this
>> down some?  Tyler?  Mike?  John?
>>
> Biggest downside to trimming is losing some backwards compatibility, and only
> gaining a few bytes for it.

Okay, thanks.  Yeah, not worth it.

>>>    + it would be possible to reduce the size of the xattr name if it was
>>>      based on the unencrypted and unecoded shortname
>>>  - the value contains the encrypted long filename
>>> - if the expected longname is missing, the current code falls back to
>>>  using the shortname.
>>
>> Good.  I think we should really create some automated test cases
>> around here, generating thousands of files, testing and reporting on
>> the long names and shortnames, the mappings, etc.
>>
> yeah I have been messing with this a bit, and need to improve the tests
> I have, and kick them out.

Oh, good.  We should also collect a list of applications that are
known to write really long filenames, and test some of those, in
particular.  I *think* Evolution and Eclipse are guilty, though I have
no examples myself...  I can put a call out for some examples.

>>>  + a mount option could be added to force failure instead of trying to
>>>    gracefully fallback
>>> + the patch extends the ecryptfs private dentry field with a longname flag
>>>  that is used to indicate that the underlying dentry has a longname
>>> - a unique shortname is used as a place holder for the long file name in
>>>  the lower filesystem.
>>>  + the current encoding of the shortname will most like change a least some
>>
>> How so?  Can you elaborate on this?
>>
> Well I think it will include something of the directory either before or in the
> hash.  This fixes the name collision of two hardlinks having the same name but
> being in different directories.  While this isn't a problem per say for the
> obfuscated name it is a problem for the shortname as it prevents the second
> hardlink from being created.

Ah, okay.  Yeah, that could be troublesome in weird ways.

> That said we have to be careful with it and not just use the directory name,
> as we don't won't the names to break if the directory a file in is renamed.
>
>>>  + the shortname generated is always the same for the same name, this
>>>    leaks more information than it should and can result in collisions
>>>    if the same name is used from different directories.
>>
>> That's no different than we have now for all encrypted filenames.
>> This is why I prefer to call this feature "file name obfuscation"
>> rather than encryption.  The scheme for encrypting file contents is
>> particularly strong in eCryptfs, which each individual file being
>> encrypted with a unique, random key.  This is clearly not the case for
>> filenames, and this is due entirely to the performance demands
>> necessary.
>>
>> In any case, this is in no way a blocker for.  There's plenty of meta
>> information about an encrypted file which is already available --
>> permissions, ownerships, atimes, mtimes, ctimes.  The filename is
>> merely an extension of this.  Filename obfuscation is merely a subtle
>> layer of abstraction that makes the real filename simply non-obvious.
>> I maintain that the real value of eCryptfs is providing strong
>> security for the contents of each file at rest.
>>
> right, the only issue for me here is the hardlink name collision mentioned
> above.
>
>>>  + the current shortname generation doesn't deal with potential collision
>>>    between encrypted and encoded file names (this seems pretty unlikely),
>>>    nor with name collisions of filenames that hash to the same md5 (again
>>>    unlikely)
>>
>> Yeah, no worries here, by me.  Someone would really have to try and
>> cause collisions for this to be a problem.  This isn't a matter of
>> accidentally touching the stove.  This is more like sticking your hand
>> in a blender.  We don't recommend it.  No, really; don't.
>>
>>>  - currently the shortname is created from combining the the
>>>    ECRYPTFS_FNEK_ENCRYPTED. prefix with the encoded md5 hash of the long
>>>    file name.
>>>    eg.
>>>      ECRYPTFS_FNEK_ENCRYPTED.sdfjyo34n2lkh2lknlkafa--
>>>  - the shortname is encrypted and encoded just like any other filename
>>>  - both the shortname and the encrypted and encoded shortname must have
>>>    the ECRYPTFS_FNEK_ENCRYPTED. for a file name to be considered a valid
>>>    shortname
>>>  - This design allows for the shortname to "work" to some degree, with
>>>    older versions of ecryptfs.  Name lookups based off of the long file
>>>    name won't work but the shortname can be used so that files can
>>>    be copied/moved without losing data.
>>
>> Hmm, okay.  If I understand you correctly, I think I agree with this
>> approach.  I will want to play with it a bit and see how it actually
>> behaves in practice.  And we will want to establish some solid
>> documentation around this.
>>
> Yeah we will want to document the hell out of it because, while its nice
> that it can work on an older version of ecryptfs you risk "losing" the
> longname information if you rename files.
>
>>> - only the symlink name can be give a long name currently.  The
>>>  symlink target encryption hasn't changed.
>>>  - this means symlinks don't use the shortname when being accessed
>>>    by older versions of ecryptfs.  So even if the long name file
>>>    they reference exists they won't resolve to a long name file.
>>>    - it is possible to have the target to use shortnames
>>>  - it is possible to add support for long name targets, that after
>>>    encrypting and encoding are too long.  By using short names and
>>>    an extra xattr for the long target name on the symlink.
>>
>> Yeah, this does sound desirable.  I'd think symlinks should be able to
>> function by pointing to either a long name, or a short name, and that
>> eCryptfs would correctly handle both.
>>
> Hrmmm, yeah that would be easy enough to do.  I'll update for that.

Thanks.

>>> = Supportting long file names =
>>>
>>> Since encrypting and encoding expand the length of the dentry, we need to
>>> either cancel out the expansion or store the extra information for the
>>> long name else where.  This also necessitates putting a shorter place
>>> holder name as the name in the file system.
>>>
>>> Each method of dealing with long names have their own advantages and
>>> disadvantages.
>>>
>>> == compression ==
>>> Little gain, certainly not enough for all possible long file names.  Several
>>> applications make random large file names, etc.  Would also have to cope
>>> with language encoding etc.
>>
>> Yeah, agreed.  This was the very first thing I thought about when this
>> problem surfaced.  As I started looking at error reports, it became
>> clear that the (annoying?) programs that would systematically create
>> 200+ character filenames where often randomly generated.  For this
>> reason, compression would never really guarantee us a working
>> solution.
>>
>>> == reducing ECRYPTFS_FNEK_ENCRYPTED prefix ==
>>> Some gain in size, but loses any potential backwards compatibility.  Also
>>> doesn't deal with expansion caused by encoding, nor the tag70 packet header
>>> expanding the encrypted value.
>>
>> Okay, strike the paragraph I wrote above asking about this ;-)  (I
>> won't bother deleting it myself from this mail, as this response is
>> quite stream-of-conscious at this point).
>>
>>> == long file names with xattrs ==
>>>
>>> Disadvatanges
>>> - requires lower file system to support xattrs
>>
>> This is a bummer.  We need to handle this as gracefully as possible.
>> I'm thinking something like the old W95 approach of filena~1.txt for
>> fat32 -> fat16.  Obviously, we'd have somewhere around 200 characters
>> to play with so hopefully that should suffice.  Beyond that, we just
>> need to document the heck out of this.
>>
> Hrmm if you mean to encrypt and encode its less than that, closer to ~150
> characters if I remember correctly.

Right, sorry.

>>> - long file name information can be lost by copies, taring, backups, etc
>>>  made on the lower file system that are unaware of xattrs
>>
>> Again, documentation will be required.
>>
>> I just checked the manpage of tar in Ubuntu and was surprised that we
>> don't have the xattr support that RHEL patches in.  We might need to
>> consider pulling that into Ubuntu?   I also couldn't find an star
>> package.  This is something I'll need to chase down.  We will want to
>> make sure that Ubuntu (and other distros) have some method for
>> archiving files which supports xattrs, and we'll want to make it
>> perfectly clear that that's what eCryptfs recommends for backups.
>>
> yes, /me is puzzled that this has been done yet.

Okay, I'll chase that one down.  This is an Ubuntu-specific item that
doesn't specifically belong in the upstream discussion, except that
any/all distros that use this feature will also probably want to make
sure they have an archive utility that supports xattrs.

>>> - xattrs can be manipulated directly through the lower file system
>>
>> Hmm...  So can filenames, permissions, ownerships, and timestamps.  I
>> guess I'm not clear on the disadvantage here...
>>
> Well just its a little easier to mess with the filename than when its
> stored in the file, but your right going underneath for either lets
> you screw the with everything, so not much of a disadvantage

And we strongly discourage doing anything other than read-only backups
of the underlying files.

>>> Advantages
>>> - supports multiple names with space only limited by xattrs limits
>>>
>>> - no extra code to manage name value paris, if multiple long names are
>>>  to be supported.
>>>
>>> - provides for partial backwards compatibility
>>>
>>>  The ecryptfs header doesn't need to be modified, so previous versions
>>>  can still read/write the file data.  However version that don't support
>>>  long names via xattrs, will see the short name, and will not update
>>>  the long name xattrs.
>>
>> This is very important to me.  Thanks.
>>
>>> - allows for long directory and symlink names
>>
>> Oh good.
>>
> This and the above point were actually the reason for me to choose xattrs
> (at least for the first pass).
>
>>> - can allow for long symlink targets
>>>  If the encoded symlink target is to long an extra xattr containing the
>>>  target can be stored, and a short name style encoding can be performed
>>>  on the symlink target data.
>>
>> Very nice.
>>
> This isn't currently done but shouldn't be hard to add.  With symlink targets
> generally having more space than for just a name (think using ../) I am not
> sure its worth adding.  But maybe that is just my bias
>
>>> == long file names in the ecryptfs header ==
>>>
>>> Disadvantages
>>> - the space to store long file names is more limited than with xattrs.
>>>
>>>  In practice this shouldn't be a problem as just supporting a single long
>>>  file name would cover the majority of use cases, if multiple shorter
>>>  name links are allowed.
>>>
>>>  Even when storing multiple long names, being able to store 2 or 3
>>>  should cover almost all use cases.
>>>
>>> - requires extra code to manage name value pairs, if multiple long names
>>>  are to be supported.
>>>
>>>  This is just a matter of code. Xattrs provide support for name value pairs,
>>>  and supporting multiple long file names in the ecryptfs header would
>>>  require creating some addition code.
>>>
>>>  If however only a single long file name is supported then there is
>>>  no extra code required.  Though storing the long name as a name value
>>>  pair is still advisable as it will allow catching rename operations
>>>  that are done on the lower filesystem so that the stored long name is
>>>  not properly updated.
>>
>> Okay, well, all things being equal, I would prefer seeing all of this
>> solved in the header itself, but I can see that it's non trivial.
>>
> honestly if this were the only issue with using the header I do it in a
> heart beat

Right...

>> I understand that you've sent this design information to the
>> ecryptfs-devel@ list first, for initial feedback.  I think when you
>> send this to the Linux Filesystem list, you'll probably get much more
>> expert feedback on these issues.
>>
>>> - is not backwards compatible.
>>>
>>>  Storing long file names in xattrs allows for some degraded backwards
>>>  compatibily with older versions of ecryptfs.  But storing long names in
>>>  the ecryptfs header will prevent older version from being able to
>>>  access the stored data.
>>>
>>>  How important is this?  Not very, while being able to access the data
>>>  with an older version may be nice for data recovering it also risks
>>>  losing the longer specially stored longer names.
>>
>> Agreed.
>>
>>> - requires header to be updated, for renames or hardlinks with long names
>>>
>>>  This is mostly a non issue.  It may even be faster than storing an xattr.
>>>
>>> - can not be used for directories, symlinks
>>>
>>>  Storing the long file name in the ecryptfs header will only work for
>>>  encrypted files, it won't work for directories or symlinks as they
>>>  don't have a header.
>>
>> Dang.  Okay, yeah, that's a dealbreaker, and big +1 for xattrs, IMO.
>>
>>> - can not work around the symlink target being to long.
>>>
>>>  This is fs dependent but if the name for the symlink target is too long
>>>  after encrypting and encoding, creation of the symlink may fail, and
>>>  since symlinks have no header there is no place to store the extra
>>>  information.
>>>
>>> Advantages
>>> - the lower filesystem does not require xattr support
>>>
>>> - long name information will not be lost by copies, taring, or backups
>>>  made on the lower file system that don't store xattrs.
>>>
>>> == special dentry file ==
>>>
>>> XAttrs Notes
>>> - requires fs have xattr support
>>> - 4 namespaces (security, system, trusted, user)
>>>  - security: used by smack/selinux not appropriate to use
>>>  - system: is tied to acls for some filesystems, so affected by mount flags
>>>  - user: can't be trusted, can't be set on symlinks, device files
>>>  - trusted: need cap_sys_admin to see/set
>>> - not the same space restrictions as ecryptfs header, can use multiple xattrs
>>> - xattr can be ecrypted separate from file, so error in name encryption leaks
>>>  name instead of data.  Does this matter if relying on current encryption?
>>> - having longname xattr leaks that the file has a longname
>>>  - is this anyworse than directory walking would leak
>>
>> Non-issue, as I stated above.  Filenames, like other meta data, are
>> merely obfuscated, and not encrypted.
>>
>>> - use trusted.ecryptfs.<name>
>>
>> Hmm, I guess I'm confused about the statement above, requiring
>> cap_sys_admin ...  Will every user have to have cap_sys_admin to use
>> these long filenames?
>>
> No, ecryptfs gets around the regular permission checks by storing calling
> the underlying filesystem xattr routine directly.  This could be an issue
> except it only happens after the file checks so the user has already been
> validated to the file, and we are storing "system" information in the
> trusted xattr.
>
> For a user to be able to see or manipulate the trusted xattr directly they
> will require cap_sys_admin.
>
> There is some potential issue with this (that just came to me), this may
> prevent users from backing up the xattrs, as they can see or read them :(
> This is problematic as the other xattrs aren't really suited.
>  security - is not the right place to stick this information
>  system - has been too tightly tied to acls, at most fs implementation,
>           and may not even be available without a mount flag
>  user - would work except its not available without a mount flag, which
>         hasn't been on by default in Ubuntu

Hmm, okay.  I'll want to explore this a little more to make sure I
have a complete understanding of it.  We can do that 1:1 in IRC, or
another more synchronous medium ;-)

> I'll work on getting the next revision up and try to get a post to fs devel
> out early next week.

Great, thanks!

-- 
:-Dustin

Dustin Kirkland
Ubuntu Core Developer



Follow ups

References