[gentoo] Bugzilla statistics followup
Dec. 9th, 2006 06:25 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
I meant to get back to doing more statistics on Bugzilla, but it fell by the wayside. The following is mainly for completeness, and the interest of those as to why Bugzilla has been so bog slow for Gentoo in the past.
First of all, I had some questions as to why I focused on specific actions in Bugzilla. The truth of this is, that we can break down Bugzilla's usage of the database into three specifics:
- Changes to bugs (INSERT, UPDATE)
- Loads of specific bugs and attachments (SELECT with a primary key)
- Searches for bugs (Complex SELECT)
Unfortunetly, the usage patterns are heavily against Bugzilla here. Searches for bugs using some string plus a variety of conditions are the most common action. Benchmarking slow queries? That's pretty much any of the complex SELECTS. "Add more indexes" I hear some people shouting. The indexes are already nearly the same size as the actual dataset they index (400mb of index for 500mb of data)! There is an index on every field that is used for searching! One of the problems is that mysql trashes it's caches on UPDATEs and INSERTs in many cases, so spends a lot of time reloading them.
Bugzilla could massively benefit from an external text indexing system like Apache's Lucene, that can handle live modifications to the index without wasting anything. Changes are fed realtime to the index, and searches for text are performed against the dedicated index (which can also be parallized easily).
More numbers
Stuart asked for some more actual numbers, so I've put them together.
Breakdown by Request Type | |||
---|---|---|---|
Type | Mean | Max | Min |
Total GET | 53809 | 60409 | 45094 |
—Static GET | 35401 | 39774 | 28797 |
—Dynamic GET | 18407 | 20635 | 16223 |
Total POST | 1394 | 1569 | 1106 |
Next, some graphs, showing a breakdown vs. UTC hour of the day
More graphs, showing histograms of GET and POST requests per minute.
I've trimmed off a very long flat tail from the static GET histogram, with it's final data point being one recorded case of 770 static requests in a single minute.
Edit: I've also constructed normalized histograms now, giving the probability instead of the raw value.
(no subject)
Date: 2006-12-10 04:04 am (UTC)(no subject)
Date: 2006-12-10 04:35 am (UTC)It's for the folks that read specific portions of my blog elsewhere, like http://planet.gentoo.org/.