launchpad-dev team mailing list archive
  
  - 
     launchpad-dev team launchpad-dev team
- 
    Mailing list archive
  
- 
    Message #03574
  
 Help me understand bug data
  
Hello to everyone who would like to solve a puzzle,
In response to a diagram that mpt drew for me on the back of a napkin
in a pub in Dublin, I made this:
 http://people.canonical.com/~jml/convergence/
You can get the code from:
  lp:~jml/+junk/convergence
The graph is pretty, but the numbers are wrong. Although some of this
is due to my faulty processing, I suspect that some of it is due to
corrupt data.
I don't really know what's going on, so I'll just give you the data in
hopes that you can tell me.
The scope of the data is "bugtasks on launchpad-project" as returned
by IPillar.searchTasks.
Counting by the timeline algorithm that generates the graph, the
numbers at the far right are:
  "New" 774
  "Confirmed" 1057
  "Triaged" 4119
  "In progress" 81
  "Fix committed" 72
  "Closed" 11174
(That's 17277 in total, 6103 open)
Counting by status in the database:
  "New" 105
  "Incomplete" 162
  "Confirmed" 246
  "Triaged" 5165
  "In Progress" 74
  "Fix Committed" 49
  "Fix Released" 8794
  "Invalid" 2161
  "Won't Fix" 521
(That's 105 + 162 + 246 + 5165 + 74 + 49 = 5801 open bugtasks and 8794
+ 2161 + 521 = 11476 closed bugtasks)
Counting by BugTask.is_complete:
  false 5801
  true 11476
(yay consistency!)
Counting by "BugTask.date_closed == null"
false 11208
true 6069
(boo, inconsistency, although it's still the same total)
Bugtasks that have no dates, other than date_created:
  "Confirmed" 1
  "Incomplete" 162
  "Invalid" 268
  "New" 105
  "Triaged" 234
To summarize, about 500 bug tasks are being counted as "New" that
should not be, because they have no recorded date_foo column. About
162 are being recorded as "New" because they are "Incomplete" and
there's no exposed date_incomplete field (see
https://bugs.launchpad.net/bugs/591975).
Depending on how you count, there are either 6103, 6069 or 5801 open bugtasks.
Note that I gathered my data anonymously through the API. If you are
doing queries against the production database you might find more
bugtasks.
You'll also notice a discrepancy between (1057, 4119) "Confirmed",
"Triaged" wrt the timeline and
(246, 5165) wrt current status. If you add in the (1, 234) from the
strays to the timeline counts, you end up with the same total as the
status count, so at least the sums check out.
If you know what's going on, I'd love to know. I suspect some kind of
botched data migration when introducing a new status or deleting an
old one, but I'd love to know for sure.
jml
Follow ups