Dec. 16th, 2008

robbat2: (Default)

I was doing some statistics about Gentoo mirrors to see about future plans, and thought that the indirect crowd that read my blog via the various aggregators might be interested in numbers.

These are the traffic for boobie.gentoo.org, which is a newer box in the official rsync.gentoo.org box directly maintained by the Infrastructure team. Hardware specs are 2x Xeon 3050 @2.13Ghz, 4GB RAM. Disk is mostly irrelevant - the rsync workload is served purely from RAM (tail-packing reiserfs, backed via loop device pointing to a file on tmpfs).

Inbound traffic is spiky, but does not exceed 10Mbit by more than a little bit - we can the inbound rsyncs from the rsync1 master to 10Mbit. Outbound traffic varies between 4Mbit and 9Mbit, with an average around 6-7Mbit.

Numbers )
robbat2: (Default)

Now for the second set of statistics. These aren't directly useful to mirrors in estimating their traffic, but instead gives a good overview of how our mirroring setup works internally, and now much traffic is involved in the fan-out stage. Distfiles are the main content moved around by this system, but it is also used for the other directories for releases, experimental and snapshots.

A very quick overview of the existing setup:

  1. Developer uploads new distfile directly to dev.gentoo.org.
  2. The master-distfiles box pulls from dev.gentoo.org hourly.
  3. The master-distfiles box checks every ebuild, and downloads missing distfiles from their primary URI if they do not exist. The daily distfile report is also created at this point.
  4. Every hour, the cluster master of ftp.osuosl.org pulls the latest content from master-distfiles. (Averages 240MB/day of traffic).
  5. The OSL FTP cluster master (in Oregon) pushes to it's slave locations in Atlanta and Chicago.
  6. All distfiles mirrors pick up their content from one of the FTP nodes - Internet2-connected hosts are directed via DNS to an Internet2-connected slave for performance.

Each of the distfiles mirrors has about 140-160MB of upstream traffic every day (including both the new files and the rsync overhead for scanning). If there are no files changed, the rsync traffic for a directory scan is 1-2MB. While this isn't a lot of traffic, it's very spiky, as mirrors tend to be on fast links.

The new weekly builds from the Release Engineering team will probably be adding another 1.3GB per week, staggered as one arch per day.

I got a small subset of the logs from the OSU FTP cluster for processing some of these statistics. They cover the 24 hour period of 2008/08/07 UTC. It does not have data of which traffic went via Internet2, and I've grouped the sources by country code (using IP::Country::Fast from CPAN).

Numbers )

As a bit of analysis, I think that more than half of our mirrors (Europe, Middle East, RU) would benefit from having a box to sync against in Europe.

May 2017

S M T W T F S
 123456
78910111213
141516171819 20
21222324252627
28293031   

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags