Thread Previous • Date Previous • Date Next • Thread Next |
Sorry, I forgot to put an attachment: ... >> > But we also need to remember that >> > >> > 1. Solve time may dominate assemble anyway so that's where we should optimize. >> > >> > 2. Assembling the action instead of the operator removes the A.add() bottleneck. >> > >> > As mentioned before, we are experimenting with iterating locally over cells sharing common dofs and combining batches of element tensors before inserting into the global sparse matrix row by row. Let's see how it goes. >> > >> > Some other comments: >> > >> > Murtazo: It's interesting that femLego is that much faster. It would be interesting to see exactly why it is faster. Can you take a simple example (maybe even Poisson) and profile both femLego and DOLFIN on the same mesh and get detailed results for tabulate_tensor, >> > tabulate_dofs, A.add(). If femLego is faster on A.add(), then what linear algebra backend is it using? >> >> Yes, the test we did it is simple 2D Poisson with unite square mesh and an >> assembly in FemLego is 3 time faster, because A.add() is done in a way I wrote in previous mails. The linear algebra package is AZTEC. Perhaps, dolfin should be much faster than femLego if A.add() is the same, since FFC is very fast than quadrature rule. > > I thought AZTEC was just solvers. I mean what sparse matrix format is used. And what does the interface look like for communicating the exact position for insertion? The sparse matrix is just double* atw; Attached I send you the subroutine which does this A.add(): idxatw(el,li,lj) is index of global matrix for cell = ell, row = li, col = lj. > >> > Murtazo: It seems you suggest we should basically store some kind of index/pointer for where to write directly to memory when adding the entries. This has two problems: (i) we need to store quite a few of these indices (n^2*num_cells where n is the local space dimension), and (ii) it requires cooperation from the linear algebra backend. We would need to ask PETSc to tell us where in its private data structures it inserted the entries. >> >> Yes, maybe there is a better way to do it. If we store the global indices >> of the A it will be totally A.nz()*numVertices*num_components, but still we will have a speedup which is more important in some case. > > That doesn't look correct to me. I think we would need n^2*num_cells. We iterate over the cells and for each cell we have n^2 entries and we need to know where to insert each one of those. > I have contributed and have experience with femLego (i did my exjob with that). This way works well in parallel also. The problem may not be problem for a very large mesh, since anyway one need to use parallel processors. murtazo
Attachment:
addmel.f
Description: Binary data
Thread Previous • Date Previous • Date Next • Thread Next |