robbat2: (Default)
[personal profile] robbat2

I meant to get back to doing more statistics on Bugzilla, but it fell by the wayside. The following is mainly for completeness, and the interest of those as to why Bugzilla has been so bog slow for Gentoo in the past.

First of all, I had some questions as to why I focused on specific actions in Bugzilla. The truth of this is, that we can break down Bugzilla's usage of the database into three specifics:

  1. Changes to bugs (INSERT, UPDATE)
  2. Loads of specific bugs and attachments (SELECT with a primary key)
  3. Searches for bugs (Complex SELECT)

Unfortunetly, the usage patterns are heavily against Bugzilla here. Searches for bugs using some string plus a variety of conditions are the most common action. Benchmarking slow queries? That's pretty much any of the complex SELECTS. "Add more indexes" I hear some people shouting. The indexes are already nearly the same size as the actual dataset they index (400mb of index for 500mb of data)! There is an index on every field that is used for searching! One of the problems is that mysql trashes it's caches on UPDATEs and INSERTs in many cases, so spends a lot of time reloading them.

Bugzilla could massively benefit from an external text indexing system like Apache's Lucene, that can handle live modifications to the index without wasting anything. Changes are fed realtime to the index, and searches for text are performed against the dedicated index (which can also be parallized easily).

More numbers

Stuart asked for some more actual numbers, so I've put them together.

Breakdown by Request Type
TypeMeanMaxMin
Total GET53809 60409 45094
—Static GET 35401 39774 28797
—Dynamic GET 18407 20635 16223
Total POST 1394 1569 1106

Next, some graphs, showing a breakdown vs. UTC hour of the day


More graphs, showing histograms of GET and POST requests per minute.


I've trimmed off a very long flat tail from the static GET histogram, with it's final data point being one recorded case of 770 static requests in a single minute.

Edit: I've also constructed normalized histograms now, giving the probability instead of the raw value.


(no subject)

Date: 2006-12-10 04:04 am (UTC)
From: [identity profile] euglena.livejournal.com
is it wrong that i don't understand any of this nor have i tried to understand it?

(no subject)

Date: 2006-12-10 04:35 am (UTC)
From: [identity profile] robbat2.livejournal.com
nah.
It's for the folks that read specific portions of my blog elsewhere, like http://planet.gentoo.org/.

May 2017

S M T W T F S
 123456
78910111213
141516171819 20
21222324252627
28293031   

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags