ecryptfs-devel team mailing list archive
-
ecryptfs-devel team
-
Mailing list archive
-
Message #00136
Re: [Patch 0/1] Add support for file names that are too long after encryption
On Sun, Feb 6, 2011 at 12:55 AM, John Johansen
<john.johansen@xxxxxxxxxxxxx> wrote:
> On 02/05/2011 08:13 PM, Dustin Kirkland wrote:
>> On Tue, Feb 1, 2011 at 9:57 AM, John Johansen
>> <john.johansen@xxxxxxxxxxxxx> wrote:
>>> The following patch is a first pass at addressing the bug
>>> "file name too long when creating new file"
>>> https://bugs.launchpad.net/ecryptfs/+bug/344878
>>>
>>> Which occurs when a file is created with a file name that would be valid
>>> before encrypting and encoding but after being encrypted and encoded is
>>> too long for the underlying filesytem.
>>
>> First and foremost, thank you, John, for tackling this ~2 year old
>> problem. It's something that we knew would be an issue when we
>> embarked on encrypted (or, as I prefer, obfuscated) filenames. We
>> didn't really have any idea how big of a problem it might be. I
>> remember doing a deep find on all of my ~10 Ubuntu systems, and did
>> not have any particular file that was >200 characters, nor path that
>> was >2000 characters. However, different strokes for different folks,
>> and eventually people did start having issues.
>>
>>> Overview:
>>> To support file names that are too long when encrypted and encoded the patch
>>> stores the long file name (longname) in an xattr on the file and creates a
>>> "unique" short file name (shortname) which is stored in the underlying
>>> filesystem. The shortname is never seen when accessing files from the
>>> ecryptfs view, but it is what will be found when accessing the lower
>>> filesystem directly.
>>>
>>> While the patch currently uses xattrs it is possible to convert to storing
>>> the longname in the ecryptfs file header (see below for some notes about
>>> advantanges and disadvantages), or even allow for both options.
>>
>> So we've discussed this 1:1, but I'll just re-state here...
>>
>> Ideally, we would use the header of the file to store this long
>> filename, which would provide the great advantage of providing
>> increased portability across filesystems. Without requiring xattrs,
>> it's trivial, for instance, to backup the lower encrypted files to a
>> VFAT filesystem on a USB stick.
>>
> Right, and I am still looking at this. It can be added without too
> much effort, and the two methods can coexist, if so desired.
Oh, one other use case that comes to mind beyond USB sticks -- it's
also somewhat common to back up data to CDRs and DVDRs, for which VFAT
is pretty much required too, I think.
>> However, in such circumstances, VFAT does lose some metadata about the
>> file, such as permissions. Furthermore, I have come to understand the
>> complexity of supporting elemental UNIX/Linux functionality such as
>> hardlinks using the file header alone. For these reasons, I will
>> accept that your xattr approach is a reasonable solution to this
>> problem, with the one very hard requirement that the short name you
>> provide when the xattr is not available is both a) unique, and b) as
>> absolutely descriptive as possible.
>>
> Right so currently we are just failing if we can't store the longname
> in an xattr. Its possible we could provide a very descriptive
> shortname for vfat type failures, something along the lines of
> ECRYPTFS.SHORTENED.XXXX~md5
> where XXX represent as much of the actual name as we can possibly fit
> in the space provided.
It would be nice if you could provide us a few examples from your
running code of what upper and lower filenames look like when they're
too long.
> Alternately we can continue looking at storing in the ecryptfs header
> of a special ecryptfs "dentry" file.
>
>>> Current State:
>>> - Use xattrs to store longname on the file
>>> - Detects xattr support at mount time
>>> - Uses a mount flag for longname support
>>> - currently the mount flag is inverted. Longname support is enabled
>>> by default and the flag is used to disable it.
>>> - current method is some what hacky in that it was assumed this
>>> would be inverted, back to requiring a flag but if not this can
>>> be cleaned up.
>>
>> This is okay by me. I can add some support code in ecryptfs-utils
>> which would allow for this to be configured on a per-user basis in
>> ~/.ecryptfs with some flag file, perhaps
>> ~/.ecryptfs/disable-long-names.
>>
> Right, either way works. Its just a matter of choosing which is the
> best for ecryptfs and moving forward. I think either way will need
> some support code.
>
>>> - Currently the code is does not have a Kconfig to disable at compile
>>> time. Is this desired?
>>
>> Not desired by me, but I think others may have dissenting opinions and
>> valid reasons.
>>
>>> - the longname xattr is stored in the trusted namespace using the
>>> trusted.ecryptfs. prefix
>>> - the longname is encrypted using the same tag70 packet encoding as any
>>> other encrypted file name. It is not encoded to reduce the size of the
>>> xattr.
>>> - a file can have multiple longnames (hardlinks)
>>
>> Cool.
>>
>>> - each longname is stored as a single xattr name, value pair.
>>> - the xattr name is based off of the encrypted and encoded shortname
>>> without the ECRYPTFS_FNEK prefix
>>> eg.
>>> if the encrypted and encoded shortname is
>>> ECRYPTFS_FNEK_ENCRYPTED.FZYwryMXdKVUQZfN26kvrVp30Yif
>>> then the xattr name will be
>>> trusted.ecryptfs.FZYwryMXdKVUQZfN26kvrVp30Yif
>>
>> Okay, that sounds fine to me.
>>
>> As an aside (and feel free to break this off into a separate thread,
>> if you like)... Tyler and I have discussed several times shortening
>> the long-and-clunky preamble on encrypted filenames. This does eat
>> ~23 out of 255, so roughly 10% of the total available file name
>> length. It would be pretty rare to have encrypted and non-encrypted
>> files next to one another in the same directory, so this preamble
>> seems unnecessarily long, to me. Any thoughts about trimming this
>> down some? Tyler? Mike? John?
>>
> Biggest downside to trimming is losing some backwards compatibility, and only
> gaining a few bytes for it.
Okay, thanks. Yeah, not worth it.
>>> + it would be possible to reduce the size of the xattr name if it was
>>> based on the unencrypted and unecoded shortname
>>> - the value contains the encrypted long filename
>>> - if the expected longname is missing, the current code falls back to
>>> using the shortname.
>>
>> Good. I think we should really create some automated test cases
>> around here, generating thousands of files, testing and reporting on
>> the long names and shortnames, the mappings, etc.
>>
> yeah I have been messing with this a bit, and need to improve the tests
> I have, and kick them out.
Oh, good. We should also collect a list of applications that are
known to write really long filenames, and test some of those, in
particular. I *think* Evolution and Eclipse are guilty, though I have
no examples myself... I can put a call out for some examples.
>>> + a mount option could be added to force failure instead of trying to
>>> gracefully fallback
>>> + the patch extends the ecryptfs private dentry field with a longname flag
>>> that is used to indicate that the underlying dentry has a longname
>>> - a unique shortname is used as a place holder for the long file name in
>>> the lower filesystem.
>>> + the current encoding of the shortname will most like change a least some
>>
>> How so? Can you elaborate on this?
>>
> Well I think it will include something of the directory either before or in the
> hash. This fixes the name collision of two hardlinks having the same name but
> being in different directories. While this isn't a problem per say for the
> obfuscated name it is a problem for the shortname as it prevents the second
> hardlink from being created.
Ah, okay. Yeah, that could be troublesome in weird ways.
> That said we have to be careful with it and not just use the directory name,
> as we don't won't the names to break if the directory a file in is renamed.
>
>>> + the shortname generated is always the same for the same name, this
>>> leaks more information than it should and can result in collisions
>>> if the same name is used from different directories.
>>
>> That's no different than we have now for all encrypted filenames.
>> This is why I prefer to call this feature "file name obfuscation"
>> rather than encryption. The scheme for encrypting file contents is
>> particularly strong in eCryptfs, which each individual file being
>> encrypted with a unique, random key. This is clearly not the case for
>> filenames, and this is due entirely to the performance demands
>> necessary.
>>
>> In any case, this is in no way a blocker for. There's plenty of meta
>> information about an encrypted file which is already available --
>> permissions, ownerships, atimes, mtimes, ctimes. The filename is
>> merely an extension of this. Filename obfuscation is merely a subtle
>> layer of abstraction that makes the real filename simply non-obvious.
>> I maintain that the real value of eCryptfs is providing strong
>> security for the contents of each file at rest.
>>
> right, the only issue for me here is the hardlink name collision mentioned
> above.
>
>>> + the current shortname generation doesn't deal with potential collision
>>> between encrypted and encoded file names (this seems pretty unlikely),
>>> nor with name collisions of filenames that hash to the same md5 (again
>>> unlikely)
>>
>> Yeah, no worries here, by me. Someone would really have to try and
>> cause collisions for this to be a problem. This isn't a matter of
>> accidentally touching the stove. This is more like sticking your hand
>> in a blender. We don't recommend it. No, really; don't.
>>
>>> - currently the shortname is created from combining the the
>>> ECRYPTFS_FNEK_ENCRYPTED. prefix with the encoded md5 hash of the long
>>> file name.
>>> eg.
>>> ECRYPTFS_FNEK_ENCRYPTED.sdfjyo34n2lkh2lknlkafa--
>>> - the shortname is encrypted and encoded just like any other filename
>>> - both the shortname and the encrypted and encoded shortname must have
>>> the ECRYPTFS_FNEK_ENCRYPTED. for a file name to be considered a valid
>>> shortname
>>> - This design allows for the shortname to "work" to some degree, with
>>> older versions of ecryptfs. Name lookups based off of the long file
>>> name won't work but the shortname can be used so that files can
>>> be copied/moved without losing data.
>>
>> Hmm, okay. If I understand you correctly, I think I agree with this
>> approach. I will want to play with it a bit and see how it actually
>> behaves in practice. And we will want to establish some solid
>> documentation around this.
>>
> Yeah we will want to document the hell out of it because, while its nice
> that it can work on an older version of ecryptfs you risk "losing" the
> longname information if you rename files.
>
>>> - only the symlink name can be give a long name currently. The
>>> symlink target encryption hasn't changed.
>>> - this means symlinks don't use the shortname when being accessed
>>> by older versions of ecryptfs. So even if the long name file
>>> they reference exists they won't resolve to a long name file.
>>> - it is possible to have the target to use shortnames
>>> - it is possible to add support for long name targets, that after
>>> encrypting and encoding are too long. By using short names and
>>> an extra xattr for the long target name on the symlink.
>>
>> Yeah, this does sound desirable. I'd think symlinks should be able to
>> function by pointing to either a long name, or a short name, and that
>> eCryptfs would correctly handle both.
>>
> Hrmmm, yeah that would be easy enough to do. I'll update for that.
Thanks.
>>> = Supportting long file names =
>>>
>>> Since encrypting and encoding expand the length of the dentry, we need to
>>> either cancel out the expansion or store the extra information for the
>>> long name else where. This also necessitates putting a shorter place
>>> holder name as the name in the file system.
>>>
>>> Each method of dealing with long names have their own advantages and
>>> disadvantages.
>>>
>>> == compression ==
>>> Little gain, certainly not enough for all possible long file names. Several
>>> applications make random large file names, etc. Would also have to cope
>>> with language encoding etc.
>>
>> Yeah, agreed. This was the very first thing I thought about when this
>> problem surfaced. As I started looking at error reports, it became
>> clear that the (annoying?) programs that would systematically create
>> 200+ character filenames where often randomly generated. For this
>> reason, compression would never really guarantee us a working
>> solution.
>>
>>> == reducing ECRYPTFS_FNEK_ENCRYPTED prefix ==
>>> Some gain in size, but loses any potential backwards compatibility. Also
>>> doesn't deal with expansion caused by encoding, nor the tag70 packet header
>>> expanding the encrypted value.
>>
>> Okay, strike the paragraph I wrote above asking about this ;-) (I
>> won't bother deleting it myself from this mail, as this response is
>> quite stream-of-conscious at this point).
>>
>>> == long file names with xattrs ==
>>>
>>> Disadvatanges
>>> - requires lower file system to support xattrs
>>
>> This is a bummer. We need to handle this as gracefully as possible.
>> I'm thinking something like the old W95 approach of filena~1.txt for
>> fat32 -> fat16. Obviously, we'd have somewhere around 200 characters
>> to play with so hopefully that should suffice. Beyond that, we just
>> need to document the heck out of this.
>>
> Hrmm if you mean to encrypt and encode its less than that, closer to ~150
> characters if I remember correctly.
Right, sorry.
>>> - long file name information can be lost by copies, taring, backups, etc
>>> made on the lower file system that are unaware of xattrs
>>
>> Again, documentation will be required.
>>
>> I just checked the manpage of tar in Ubuntu and was surprised that we
>> don't have the xattr support that RHEL patches in. We might need to
>> consider pulling that into Ubuntu? I also couldn't find an star
>> package. This is something I'll need to chase down. We will want to
>> make sure that Ubuntu (and other distros) have some method for
>> archiving files which supports xattrs, and we'll want to make it
>> perfectly clear that that's what eCryptfs recommends for backups.
>>
> yes, /me is puzzled that this has been done yet.
Okay, I'll chase that one down. This is an Ubuntu-specific item that
doesn't specifically belong in the upstream discussion, except that
any/all distros that use this feature will also probably want to make
sure they have an archive utility that supports xattrs.
>>> - xattrs can be manipulated directly through the lower file system
>>
>> Hmm... So can filenames, permissions, ownerships, and timestamps. I
>> guess I'm not clear on the disadvantage here...
>>
> Well just its a little easier to mess with the filename than when its
> stored in the file, but your right going underneath for either lets
> you screw the with everything, so not much of a disadvantage
And we strongly discourage doing anything other than read-only backups
of the underlying files.
>>> Advantages
>>> - supports multiple names with space only limited by xattrs limits
>>>
>>> - no extra code to manage name value paris, if multiple long names are
>>> to be supported.
>>>
>>> - provides for partial backwards compatibility
>>>
>>> The ecryptfs header doesn't need to be modified, so previous versions
>>> can still read/write the file data. However version that don't support
>>> long names via xattrs, will see the short name, and will not update
>>> the long name xattrs.
>>
>> This is very important to me. Thanks.
>>
>>> - allows for long directory and symlink names
>>
>> Oh good.
>>
> This and the above point were actually the reason for me to choose xattrs
> (at least for the first pass).
>
>>> - can allow for long symlink targets
>>> If the encoded symlink target is to long an extra xattr containing the
>>> target can be stored, and a short name style encoding can be performed
>>> on the symlink target data.
>>
>> Very nice.
>>
> This isn't currently done but shouldn't be hard to add. With symlink targets
> generally having more space than for just a name (think using ../) I am not
> sure its worth adding. But maybe that is just my bias
>
>>> == long file names in the ecryptfs header ==
>>>
>>> Disadvantages
>>> - the space to store long file names is more limited than with xattrs.
>>>
>>> In practice this shouldn't be a problem as just supporting a single long
>>> file name would cover the majority of use cases, if multiple shorter
>>> name links are allowed.
>>>
>>> Even when storing multiple long names, being able to store 2 or 3
>>> should cover almost all use cases.
>>>
>>> - requires extra code to manage name value pairs, if multiple long names
>>> are to be supported.
>>>
>>> This is just a matter of code. Xattrs provide support for name value pairs,
>>> and supporting multiple long file names in the ecryptfs header would
>>> require creating some addition code.
>>>
>>> If however only a single long file name is supported then there is
>>> no extra code required. Though storing the long name as a name value
>>> pair is still advisable as it will allow catching rename operations
>>> that are done on the lower filesystem so that the stored long name is
>>> not properly updated.
>>
>> Okay, well, all things being equal, I would prefer seeing all of this
>> solved in the header itself, but I can see that it's non trivial.
>>
> honestly if this were the only issue with using the header I do it in a
> heart beat
Right...
>> I understand that you've sent this design information to the
>> ecryptfs-devel@ list first, for initial feedback. I think when you
>> send this to the Linux Filesystem list, you'll probably get much more
>> expert feedback on these issues.
>>
>>> - is not backwards compatible.
>>>
>>> Storing long file names in xattrs allows for some degraded backwards
>>> compatibily with older versions of ecryptfs. But storing long names in
>>> the ecryptfs header will prevent older version from being able to
>>> access the stored data.
>>>
>>> How important is this? Not very, while being able to access the data
>>> with an older version may be nice for data recovering it also risks
>>> losing the longer specially stored longer names.
>>
>> Agreed.
>>
>>> - requires header to be updated, for renames or hardlinks with long names
>>>
>>> This is mostly a non issue. It may even be faster than storing an xattr.
>>>
>>> - can not be used for directories, symlinks
>>>
>>> Storing the long file name in the ecryptfs header will only work for
>>> encrypted files, it won't work for directories or symlinks as they
>>> don't have a header.
>>
>> Dang. Okay, yeah, that's a dealbreaker, and big +1 for xattrs, IMO.
>>
>>> - can not work around the symlink target being to long.
>>>
>>> This is fs dependent but if the name for the symlink target is too long
>>> after encrypting and encoding, creation of the symlink may fail, and
>>> since symlinks have no header there is no place to store the extra
>>> information.
>>>
>>> Advantages
>>> - the lower filesystem does not require xattr support
>>>
>>> - long name information will not be lost by copies, taring, or backups
>>> made on the lower file system that don't store xattrs.
>>>
>>> == special dentry file ==
>>>
>>> XAttrs Notes
>>> - requires fs have xattr support
>>> - 4 namespaces (security, system, trusted, user)
>>> - security: used by smack/selinux not appropriate to use
>>> - system: is tied to acls for some filesystems, so affected by mount flags
>>> - user: can't be trusted, can't be set on symlinks, device files
>>> - trusted: need cap_sys_admin to see/set
>>> - not the same space restrictions as ecryptfs header, can use multiple xattrs
>>> - xattr can be ecrypted separate from file, so error in name encryption leaks
>>> name instead of data. Does this matter if relying on current encryption?
>>> - having longname xattr leaks that the file has a longname
>>> - is this anyworse than directory walking would leak
>>
>> Non-issue, as I stated above. Filenames, like other meta data, are
>> merely obfuscated, and not encrypted.
>>
>>> - use trusted.ecryptfs.<name>
>>
>> Hmm, I guess I'm confused about the statement above, requiring
>> cap_sys_admin ... Will every user have to have cap_sys_admin to use
>> these long filenames?
>>
> No, ecryptfs gets around the regular permission checks by storing calling
> the underlying filesystem xattr routine directly. This could be an issue
> except it only happens after the file checks so the user has already been
> validated to the file, and we are storing "system" information in the
> trusted xattr.
>
> For a user to be able to see or manipulate the trusted xattr directly they
> will require cap_sys_admin.
>
> There is some potential issue with this (that just came to me), this may
> prevent users from backing up the xattrs, as they can see or read them :(
> This is problematic as the other xattrs aren't really suited.
> security - is not the right place to stick this information
> system - has been too tightly tied to acls, at most fs implementation,
> and may not even be available without a mount flag
> user - would work except its not available without a mount flag, which
> hasn't been on by default in Ubuntu
Hmm, okay. I'll want to explore this a little more to make sure I
have a complete understanding of it. We can do that 1:1 in IRC, or
another more synchronous medium ;-)
> I'll work on getting the next revision up and try to get a post to fs devel
> out early next week.
Great, thanks!
--
:-Dustin
Dustin Kirkland
Ubuntu Core Developer
Follow ups
References