robbat2: (Default)

Now for the second set of statistics. These aren't directly useful to mirrors in estimating their traffic, but instead gives a good overview of how our mirroring setup works internally, and now much traffic is involved in the fan-out stage. Distfiles are the main content moved around by this system, but it is also used for the other directories for releases, experimental and snapshots.

A very quick overview of the existing setup:

  1. Developer uploads new distfile directly to dev.gentoo.org.
  2. The master-distfiles box pulls from dev.gentoo.org hourly.
  3. The master-distfiles box checks every ebuild, and downloads missing distfiles from their primary URI if they do not exist. The daily distfile report is also created at this point.
  4. Every hour, the cluster master of ftp.osuosl.org pulls the latest content from master-distfiles. (Averages 240MB/day of traffic).
  5. The OSL FTP cluster master (in Oregon) pushes to it's slave locations in Atlanta and Chicago.
  6. All distfiles mirrors pick up their content from one of the FTP nodes - Internet2-connected hosts are directed via DNS to an Internet2-connected slave for performance.

Each of the distfiles mirrors has about 140-160MB of upstream traffic every day (including both the new files and the rsync overhead for scanning). If there are no files changed, the rsync traffic for a directory scan is 1-2MB. While this isn't a lot of traffic, it's very spiky, as mirrors tend to be on fast links.

The new weekly builds from the Release Engineering team will probably be adding another 1.3GB per week, staggered as one arch per day.

I got a small subset of the logs from the OSU FTP cluster for processing some of these statistics. They cover the 24 hour period of 2008/08/07 UTC. It does not have data of which traffic went via Internet2, and I've grouped the sources by country code (using IP::Country::Fast from CPAN).

Numbers )

As a bit of analysis, I think that more than half of our mirrors (Europe, Middle East, RU) would benefit from having a box to sync against in Europe.

robbat2: (Default)

Not sure who out there can help, but I'm looking for a number of old Gentoo distfiles, that were located on the Gentoo mirrors directly, and not copied from some other location. I do have every other version of the mysql-extras, so I am only looking for those listed here.

mysql-extras-20050904.tar.bz2
mysql-extras-20050919.tar.bz2
mysql-extras-20051205.tar.bz2
mysql-extras-20060114.tar.bz2
mysql-extras-20070104.tar.bz2

I have every other version of that distfile, those are the only ones I'm missing, and I'm after making a nice Git repo to trace the history. The SVN tree that was used for a short while doesn't contain some of the details from these either, hence the need for the tarballs.

Beyond those tarballs, it would be interesting to try and build an archive of every distfile ever used in Gentoo. I've got the diskspace (and tape backup) to do it. I already have an LTO3 tape that is getting every bit of release media/stages from Gentoo, so distfiles would be the next logical step.

Edit: Thanks to Lisa for 20050904.

May 2017

S M T W T F S
 123456
78910111213
141516171819 20
21222324252627
28293031   

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags