Thread Previous • Date Previous • Date Next • Thread Next |
Summary: The data center machine is a hyperthreaded 4 core machine--8 effective cores--with 12 GB RAM. The tests seem to reinforce that having enough memory is essential, which is a lesson we also learned on EC2. Moreover, 2.4 GB/core seems necessary for the tests (remember that our test runs also store all disk "writes" to RAM in the ephemeral instances). That said, once you get up to 32 cores on EC2, that rate of RAM was less necessary: on EC2, 60 GB for 32 cores did not use any swap at all. Perhaps there is a constant or logarithmic element in addition to a linear element of the equation. When running tests with up to 5 cores in the data center (which was as high as the RAM headroom would let us go quickly), we had a roughly constant overhead of under 10 minutes and a divisible time of tests of about 280 minutes. That meant that five cores were just above the hour mark. If we had enough RAM, the equation would suggest that 8 cores might take us as low as 40 minutes; we observed 50 minutes on the (differently specced) EC2 machine. As it was, with apparently insufficient RAM, the eight and six core runs also ran at about the hour mark. Recall that a previous email I sent to the RT contained EC2 timings that also seemed to support these conclusions. As preparation for this email, I also ran two passing test runs on a hyperthreaded 16 core EC2 machine, with an effective 32 cores, and 60 GB RAM. As mentioned above, it did not go into swap (it didn't have swap). This took 36.5 minutes. The last five minutes of the run had a lot of the processes finished; if testr could spread things around better (understanding layers better might be necessary) perhaps we could get that closer to the half hour mark. I'll leave this up to Francis, Robert, and IS now to determine what kind of machines to get for our two slaves. If you'd like lots more details as to what I did, read on. Otherwise, I think that's a decent summary. Details: The format is HOURS:MINUTES:SECONDS.MILLISECONDS. These are records of individual runs, not averages. If we want a bigger sample size, just ask. I haven't seen a lot of variance on the EC2 machines, and the successful runs I've had in the data center have been within about 4 minutes of one another. 8 cores: 1:02:57.579 59:27.242 6 cores: 1:02:45.707 1:05:35.201 5 cores: 1:04:53.690 4 cores: 1:16:44.012 1:13:45.777 2 cores: 2:26:31.700 1 core: 4:44:10.200 I was a bit confused about the similarity between the 8, 6 and 5 core and times, since we saw a definite difference between these concurrency levels on the 8-core EC2 machine concurrency tests. I investigated a bit, and I suspect that the primary problem is that the machine only has 12G. We determined in our tests on EC2 that you need the machine to have at least 2G per core/concurrent LP process, and perhaps more. If that's correct then we would expect the test times to continue scaling linearly down if this machine had at least 16G. For reference, the 8 core EC2 machines we are using (m2.4xlarge) have 68.4 GB RAM, and run these tests somewhere between 49 and 52 minutes, depending on the run. Our hypothesis from ec2 was that the time spent on a test run is a combination of a constant, representing layer setup time that must be duplicated on every lxc container's test run; plus linearly divided effort. Put another way, [time spent on a test run] = [total time performing actual tests]/[number of cores] + [layer setup time, duplicated for each process] If we ignore the 8 core timings above because of the memory issue, and do the math comparing the 6 core and 1 core times, rounding the run time minutes, I get these results: total time performing tests (not setting up layers) = about 265 minutes (4:25) layer setup time duplicated on each core = about 19 minutes If this is true, then... 4 cores should be 85.25 minutes (1:25) 2 cores should be 1571.5 minutes (2:32) As you can see, that's within 10 minutes of the observed times--not a very good match, to be honest. If I use the timings for the 4 core runs instead of the 8 core runs, I get about 279 minutes of work and 5 minutes of setup; this predicts the 2 core number within a minute or two (2:25 predicted, 2:26.5 actual). 5 cores would be predicted at 61 minutes, and we actually got about 65--pretty close. 6 and 8 cores start futzing out, again, in my estimation because of the low RAM. Because of all this, I'm inclined to say that we need a bit *more* than 2 GB per process for these tests--maybe about 2.4 GB each. Since each process also has all disk writes stored to memory as well, that seems somewhat reasonable. I should add that I tracked free -m during one of the test runs, and swap was only used a bit, but perhaps only a bit is all it takes to noticeably affect the speed. Both Francis and Robert have asked about using one of the cc2.8xlarge instances on ec2 to see how that affects the tests. Since we are now at the stage of trying to decide what machine to buy, I decided to try and pursue that now as well. For reference, these are 16 cores, hyperthreaded to 32 cores, with 60.5 G RAM (not quite enough according to our calculations, but close, too small either by 3.5GB @ 2GB/process or by 13.3GB @ 2.4GB/process). This was a finicky setup, requiring some tweaks to wait enough time for all the LXC containers to spin up. Once we had that, the first time was 36 minutes 27 seconds, and the second was 35 minutes 12 seconds. I noticed that the last five minutes of both runs were spent with very few of the LXC containers still running. In fact, the first LXC container to be finished in the second run was done more than 10 minutes before the last LXC container finished. I roughly guess that this 32 core instance could be done about 5 minutes faster if testr could spread the tests around better. I suspect that the issue is that testr does not know that layers are not tests, and the timings for them throw its calculations off. This concludes the report on the timings for the data center test machine, and the 32 core EC2 machine. For easy reference, I include below the copy of the email I sent on March 22 to RT 50242 about the timing information we gathered on EC2 eight core machines. Thanks Gary --------------------------------------------------------- [Email from myself to RT 50242 on March 22] I am reporting test results so far. This is the summary. On an EC2 m2.4xlarge, cores affect test run time linearly up to eight cores, approximately following [time] = 4 hours/[number of cores] + 20 minutes. This may be enough information for Robert and Francis to advise IS on the kind of processor we want. I expect we will want some additional tests on the machine in the data center when it is ready to discover its slope. Details follow. We have run tests on an ec2 machine. The following are all on m2.4xlarge("High-Memory Quadruple Extra Large Instance") instances. These are the specs, taken from http://aws.amazon.com/ec2/instance-types/ : 68.4 GB of memory 26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units each) 1690 GB of instance storage 64-bit platform I/O Performance: High We used buildbot to run and time the tests, with a set up that can be duplicated by following the steps described the in the Juju buildbot master README file for initializing a Launchpad test environment. We hacked /usr/lib/python2.7/dist-packages/testrepository/testcommand.py to report different local concurrency levels and otherwise ran all tests identically, by forcing a build. Example: def local_concurrency(self): return 3 Tests were assigned to each process by testr using round-robin. We ran the eight core version 5 times, but all the others were only run once. Times were obtained by looking at the buildbot master for the time buildbot recorded for running "testr run --parallel," as found on pages such as /builders/lucid_lp/builds/0/steps/shell_8. Each test run has fewer than five failures, although the failures vary across runs. Values here are rounded to the nearest minute. 1 core: 4:17 2 cores: 2:23 3 cores: 1:40 4 cores: 1:21 5 cores: 6 cores: 0:59 7 cores: 8 cores: 0:51 These times roughly correspond to the following equation: [time] = 4 hours/[number of cores] + about 20 minutes I do not plan to run 5 core and 7 core tests unless requested. For interest, if you do not use the /dev/random hack I mentioned previously, you get these sorts of results: 1 core without /dev/random hack: 4:50 8 cores without /dev/random hack: 3:47 To get a comparable idea of performance on the machine in the data center, we probably should run tests with [max] cores, 1 core, and [max/2] cores. We can extrapolate a line from that and roughly verify it, assuming that it is linear. That said, I think we already have reasonable evidence that the parallel tests do scale roughly linearly up to eight cores. I believe that this should inform Robert and/or Francis on what kinds of processors would bring us the desired balance of improvement versus cost. Thanks Gary
Thread Previous • Date Previous • Date Next • Thread Next |