Thread Previous • Date Previous • Date Next • Thread Next |
If we can get the breaking point at k=2 or 3, then it might make sense to try optimizing the binary search tree code. We will have to fight the battle that "assembly time doesn't matter, it's solve time that counts". If we're only getting 10-15% speedup on assembly time and assembly time is 20% of solve time, this isn't really anything to brag about (.1 * .2 = an overall speedup of 2%. I wouldn't try to publish that). On the other hand, we might try to find the best solver in hypre to use (Rob Falgout claims there are things that destroy his AMG for Poisson).
On the other hand, maybe we should think big. The asymptotics favor binary trees as the number of nonzeros increase per row. One could ask if we formed the Jacobian for MHD using primitive variables with Taylor-Hood for fluid and piecewise linear Lagrange for electromagnetics whether that gives us enough dof per row without going to high polynomial degree.
Finally, we must prioritize -- I think that getting some results for various formulations of Stokes measuring error in pressure & velocity versus dof and nonzeros in the matrix is a more interesting result and hence valuable use of time.
Any thoughts? Rob On Sep 26, 2005, at 7:09 PM, Andy Ray Terrel wrote:
Yeah I will run some long scripts, I hadn't tried to blow the cache. It seemed anything that didn't blow the cache had the same ratio between the different assemblies.Andy Anders Logg wrote:Good point. Andy did Stokes originally though, with similar results.We also tried a number of different problem sizes, but not higher degree.Maybe Andy could make some more extensive testing now that we both have looked at the code and know it's doing what it's supposed to do. /Anders On Mon, Sep 26, 2005 at 05:47:01PM -0500, Robert C.Kirby wrote:Here is another thing to do before jumping into optimization: You have some knobs, turn them:1.) *Much* larger problem. Get out of cache, just in case.2.) Try higher polynomial degree. The number of nonzeros on a given row just isn't that large for linears. Try k=2,3,4,5. 3.) Try a mixed operator (say Stokes) -- there will be even more nonzeros per row.PETSc internally has an O(n^2) algorithm, you are doing O(n log n). Don't give up on linears -- most n log n algorithms lose for low n.Once you find a turning point (if ever), then worry about optimizing. Rob On Sep 26, 2005, at 3:46 PM, Anders Logg wrote:Hey everyone. Andy (Terrel) has been trying out some new ways to improve the speedof assembly in DOLFIN. We just sat down to make some timings and hereare the results/conclusions. Comments and suggestions are welcome. The basic idea is to store the sparse matrix as a std::vector of std::maps (implemented as a binary tree): std::vector<std::map<int, real> > rows(M); With this data structure, we replace the line A.add(block, test_dofs, m, trial_dofs, n); which just calls PETSc MatSetValues with the following code for insertion into the sparse matrix for(uint i=0; i<m; i++) { std::map<int, real>& row = rows[test_dofs[i]]; for(uint j = 0; j < n; j++) { const int J = trial_dofs[j]; iter = row.find(J); const real val = block[i*n+j]; if ( iter == row.end() ) row.insert(std::map<int,real>::value_type(J,val)); else (*iter).second += val; } } This is done once for each element. At the end of assembly, we copy the values into a PETSc matrix. The result: PETSc MatSetValues is about twice as fast as the STL.Here are the numbers, broken into timings for the different stages ofassembly (times in seconds for assembly of Poisson on a 128x128 triangular mesh): - Iteration over mesh + misc stuff 0.04 - Mapping dofs, updating affine map 0.23 (this can be faster) - Computing element matrix 0.01 :-) - Inserting into STL binary tree 0.58 - PETSc assemble begin/end 0.05 - Copying data to PETSc matrix 0.18 (this can be faster) As a comparison, just calling PETSc MatSetValues takes 0.24s, which should be compared to the sum 0.58s + 0.18s. Perhaps inserting into the STL binary tree can be improved if one could reserve the size of each map before assembly. (Anyone knows if this is possible? Something like std::vector::reserve()?)Again, PETSc wins. The results from previous benchmarks against PETScseem to hold: PETSc is about twice as a simple sparse data structure. This seems to hold both when the the sparse data structure is double** values; int** columns; or when it is std::vector<std::map<int, real> > rows(M); /Anders _______________________________________________ DOLFIN-dev mailing list DOLFIN-dev@xxxxxxxxxx http://www.fenics.org/cgi-bin/mailman/listinfo/dolfin-dev_______________________________________________ DOLFIN-dev mailing list DOLFIN-dev@xxxxxxxxxx http://www.fenics.org/cgi-bin/mailman/listinfo/dolfin-dev-- ==================== Andy Terrel Computer Science Dept University of Chicago aterrel@xxxxxxxxxxxx --------------------- _______________________________________________ DOLFIN-dev mailing list DOLFIN-dev@xxxxxxxxxx http://www.fenics.org/cgi-bin/mailman/listinfo/dolfin-dev
Thread Previous • Date Previous • Date Next • Thread Next |