← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] lp:~jtv/maas/overwrite-file into lp:maas

 

Jeroen T. Vermeulen has proposed merging lp:~jtv/maas/overwrite-file into lp:maas.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)

For more details, see:
https://code.launchpad.net/~jtv/maas/overwrite-file/+merge/95597

Our FileStorage class needs to support overwriting of files.  That's not as easy as it sounds, of course: we can't serve up a half-baked file to one client while another is still writing their data.  We can't leave an incomplete file in place if overwriting an existing file breaks half-way through.  And we certainly can't destroy the existing data if the attempt to overwrite it rolls back.

Django has a solution for this: the Storage policy family.  We can use the FileSystemStorage implementation, which stores files in the filesystem in the way we want, but change the way it picks a suitable filename for a file.  The FileSystemStorage implementation just uses the requested name and raises an error if it's already in use; its base class Storage has a scheme that suffixes a number to the file name, incrementing until it arrives at a name that's still available.

As a client you won't notice this.  Your files are looked up by the FileStorage.filename column, which holds the filename that you or some other client stored it under.  The FileStorage.data column holds (well-encapsulated by Django) the actual name that the file is stored under on the filesystem.  There is no reason for those to be identical; any similarity is useful purely for debugging purposes.

So what happens when you overwrite a file?  You write a new file to the filesystem, and in the database, update the existing FileStorage.data to point to this new file.  Only when you commit that database change will anyone be able to access the old file; until then, or if you roll back, the old reference stays in place.  If anyone else was reading the old file, they can continue doing so because it too stays in place.

And by now you're thinking you're pretty clever because you've spotted the fatal flaw.  Nobody's deleting overwritten files!  And you're right: we need a garbage collector for this.  Any file in the storage directory that has no FileStorage pointing to it, and that hasn't been touched for so long that the request that wrote it can't reasonably still be in progress, is dead meat.  I'll file a ticket.

Oh, I'm limiting filenames to 100 characters.  That's what Django seems to like.  It allows some room for the rest of the storage path, as well as the version suffix.  We could get smarter about this (only store up to n characters of filename since the version suffix resolves clashes anyway) but let's cross that bridge if we ever get to it.

I dropped the test that peeks in the storage directory and reads the file.  That's really getting under the Django Storage classes' skin, and relies on details of that naming scheme.  The test passed for me, but I trust the Django classes to work as advertised.
-- 
https://code.launchpad.net/~jtv/maas/overwrite-file/+merge/95597
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~jtv/maas/overwrite-file into lp:maas.
=== modified file 'src/maasserver/api.py'
--- src/maasserver/api.py	2012-03-01 14:34:18 +0000
+++ src/maasserver/api.py	2012-03-02 15:48:21 +0000
@@ -443,12 +443,7 @@
         # As per the comment in FileStorage, this ought to deal in
         # chunks instead of reading the file into memory, but large
         # files are not expected.
-        try:
-            storage = FileStorage.objects.get(filename=filename)
-        except FileStorage.DoesNotExist:
-            storage = FileStorage()
-
-        storage.save_file(filename, uploaded_file)
+        FileStorage.objects.save_file(filename, uploaded_file)
         return HttpResponse('', status=httplib.CREATED)
 
     @classmethod

=== modified file 'src/maasserver/models.py'
--- src/maasserver/models.py	2012-03-01 15:40:36 +0000
+++ src/maasserver/models.py	2012-03-02 15:48:21 +0000
@@ -31,6 +31,10 @@
 from django.contrib.auth.backends import ModelBackend
 from django.contrib.auth.models import User
 from django.core.files.base import ContentFile
+from django.core.files.storage import (
+    FileSystemStorage,
+    Storage,
+    )
 from django.db import models
 from django.db.models.signals import post_save
 from django.shortcuts import get_object_or_404
@@ -561,25 +565,58 @@
 post_save.connect(create_user, sender=User)
 
 
-class FileStorage(models.Model):
-    """A simple file storage keyed on file name.
-
-    :ivar filename: A unique file name to use for the data being stored.
-    :ivar data: The file's actual data.
-    """
-
-    filename = models.CharField(max_length=255, unique=True, editable=False)
-    data = models.FileField(upload_to="storage")
-
-    def __unicode__(self):
-        return self.filename
+class VersionedFileSystemStorage(FileSystemStorage):
+    """File-system storage engine, but with versioned file names.
+
+    This uses the file naming scheme implemented in Django's `Storage` base
+    class, but is otherwise identical to Django's `FileSystemStorage`.
+    """
+
+    # FileSystemStorage.get_available_name won't store files with names
+    # that are already in use.  The parent class has an implementation
+    # that adds a unique suffix.
+    get_available_name = Storage.get_available_name
+
+
+versioned_file_storage = VersionedFileSystemStorage()
+
+
+class FileStorageManager(models.Manager):
+    """Manager for `FileStorage` objects.
+
+    Store files by calling `save_file`.  No two `FileStorage` objects can
+    have the same filename at the same time.  Writing new data to a file
+    whose name is already in use, replaces its `FileStorage` with one
+    pointing to the new data.
+
+    Underneath, however, the storage layer will keep the old version of the
+    file around indefinitely.  Thus, if the overwriting transaction rolls
+    back, it may leave the new file as garbage on the filesystem; but the
+    original file will not be affected.  Also, any ongoing reads from the
+    old file will continue without iterruption.
+    """
+    # TODO: Garbage-collect obsolete stored files.
+
+    def get_existing_storage(self, filename):
+        """Get an existing `FileStorage` of this name, or None."""
+        existing_storage = self.filter(filename=filename)
+        if len(existing_storage) == 0:
+            return None
+        elif len(existing_storage) == 1:
+            return existing_storage[0]
+        else:
+            raise AssertionError(
+                "There are %d files called '%s'."
+                % (len(existing_storage), filename))
 
     def save_file(self, filename, file_object):
         """Save the file to the filesystem and persist to the database.
 
         The file will end up in MEDIA_ROOT/storage/
+
+        If a file of that name already existed, it will be replaced by the
+        new contents.
         """
-        self.filename = filename
         # This probably ought to read in chunks but large files are
         # not expected.  Also note that uploading a file with the same
         # name as an existing one will cause that file to be written
@@ -590,7 +627,30 @@
         # it's safest left how it is for now (reads can overlap with
         # writes from Juju).
         content = ContentFile(file_object.read())
-        self.data.save(filename, content)
+
+        storage = self.get_existing_storage(filename)
+        if storage is None:
+            storage = FileStorage(filename=filename)
+        storage.data.save(filename, content)
+        return storage
+
+
+class FileStorage(models.Model):
+    """A simple file storage keyed on file name.
+
+    :ivar filename: A unique file name to use for the data being stored.
+    :ivar data: The file's actual data.
+    """
+
+    # Unix filenames can be longer than this (e.g. 255 bytes), but leave
+    # some extra room for the full path, as well as a versioning suffix.
+    filename = models.CharField(max_length=100, unique=True, editable=False)
+    data = models.FileField(upload_to="storage", max_length=255)
+
+    objects = FileStorageManager()
+
+    def __unicode__(self):
+        return self.filename
 
 
 # Default values for config options.

=== modified file 'src/maasserver/testing/factory.py'
--- src/maasserver/testing/factory.py	2012-02-29 13:48:37 +0000
+++ src/maasserver/testing/factory.py	2012-03-02 15:48:21 +0000
@@ -86,14 +86,11 @@
 
     def make_file_storage(self, filename=None, data=None):
         if filename is None:
-            filename = self.getRandomString(255)
+            filename = self.getRandomString(100)
         if data is None:
-            data = self.getRandomString(1024)
+            data = self.getRandomString(1024).encode('ascii')
 
-        storage = FileStorage()
-        storage.save_file(filename, BytesIO(data))
-        storage.save()
-        return storage
+        return FileStorage.objects.save_file(filename, BytesIO(data))
 
 
 # Create factory singleton.

=== modified file 'src/maasserver/tests/test_models.py'
--- src/maasserver/tests/test_models.py	2012-03-01 15:40:36 +0000
+++ src/maasserver/tests/test_models.py	2012-03-02 15:48:21 +0000
@@ -12,6 +12,7 @@
 __all__ = []
 
 import codecs
+from io import BytesIO
 import os
 import shutil
 
@@ -26,6 +27,7 @@
     Config,
     create_auth_token,
     DEFAULT_CONFIG,
+    FileStorage,
     GENERIC_CONSUMER,
     get_auth_tokens,
     MACAddress,
@@ -437,39 +439,87 @@
 
     def setUp(self):
         super(FileStorageTest, self).setUp()
+        # The development settings say to write storage files in
+        # /var/tmp/maas.
         os.mkdir(self.FILEPATH)
         self.addCleanup(shutil.rmtree, self.FILEPATH)
 
-    def test_creation(self):
-        storage = factory.make_file_storage(filename="myfile", data=b"mydata")
-        expected = ["myfile", "mydata"]
-        actual = [storage.filename, storage.data.read()]
-        self.assertEqual(expected, actual)
-
-    def test_creation_writes_a_file(self):
-        # The development settings say to write a file starting at
-        # /var/tmp/maas, so check one is actually written there.  The field
-        # itself is hard-coded to make a directory called "storage".
-        factory.make_file_storage(filename="myfile", data=b"mydata")
-
-        expected_filename = os.path.join(
-            self.FILEPATH, "storage", "myfile")
-
-        with open(expected_filename) as f:
-            self.assertEqual("mydata", f.read())
+    def get_storage_path(self, filename):
+        """Get the full path to the storage file of the given name."""
+        # The storage field is hard-coded to write files to a
+        # subdirectory of FILEPATH called "storage".
+        return os.path.join(self.FILEPATH, "storage", filename)
+
+    def make_data(self, including_text='data'):
+        """Return arbitrary data.
+
+        :param including_text: Text to include in the data.  Leave something
+            here to make failure messages more recognizable.
+        :type including_text: basestring
+        :return: A string of bytes, including `including_text`.
+        :rtype: bytes
+        """
+        # Note that this won't automatically insert any non-ASCII bytes.
+        # Proper handling of real binary data is tested separately.
+        text = "%s %s" % (including_text, factory.getRandomString())
+        return text.encode('ascii')
+
+    def test_get_existing_storage_returns_None_if_none_found(self):
+        nonexistent_file = factory.getRandomString()
+        self.assertIsNone(
+            FileStorage.objects.get_existing_storage(nonexistent_file))
+
+    def test_get_existing_storage_finds_FileStorage(self):
+        storage = factory.make_file_storage()
+        self.assertEqual(
+            storage,
+            FileStorage.objects.get_existing_storage(storage.filename))
+
+    def test_save_file_creates_storage(self):
+        filename = factory.getRandomString()
+        data = self.make_data()
+        storage = FileStorage.objects.save_file(filename, BytesIO(data))
+        self.assertEqual(
+            (filename, data),
+            (storage.filename, storage.data.read()))
+
+    def test_storage_can_be_retrieved(self):
+        filename = factory.getRandomString()
+        data = self.make_data()
+        factory.make_file_storage(filename=filename, data=data)
+        storage = FileStorage.objects.get(filename=filename)
+        self.assertEqual(
+            (filename, data),
+            (storage.filename, storage.data.read()))
 
     def test_stores_binary_data(self):
         # This horrible binary data could never, ever, under any
-        # encoding known to man be intepreted as text.  Switch the bytes
-        # of the byte-order mark around and by design you get an invalid
-        # codepoint; put a byte with the high bit set between bytes that
-        # have it cleared, and you have a guaranteed non-UTF-8 sequence.
+        # encoding known to man be interpreted as text(1).  Switch the
+        # bytes of the byte-order mark around and by design you get an
+        # invalid codepoint; put a byte with the high bit set between bytes
+        # that have it cleared, and you have a guaranteed non-UTF-8
+        # sequence.
+        #
+        # (1) Provided, of course, that man know only about ASCII,
+        # UTF-8, or UTF-16.
         binary_data = codecs.BOM64_LE + codecs.BOM64_BE + b'\x00\xff\x00'
+
         # And yet, because FileStorage supports binary data, it comes
         # out intact.
         storage = factory.make_file_storage(filename="x", data=binary_data)
         self.assertEqual(binary_data, storage.data.read())
 
+    def test_overwrites_file(self):
+        # If a file of the same name has already been stored, the old
+        # data gets overwritten.
+        filename = 'filename-%s' % factory.getRandomString()
+        factory.make_file_storage(
+            filename=filename, data=self.make_data('old data'))
+        new_data = self.make_data('new-data')
+        factory.make_file_storage(filename=filename, data=new_data)
+        self.assertEqual(
+            new_data, FileStorage.objects.get(filename=filename).data.read())
+
 
 class ConfigTest(TestCase):
     """Testing of the :class:`Config` model."""