robbat2: (Default)

Those that have followed me for a while might have seen me previously complain at journalism that's misleading, wrong, or outright fictitious. Now I've got another case...
This article by Ed Bott at ZDNet:
Linux infection proves Windows malware monopoly is over; Gentoo ships backdoor? [updated]

The article was first published 2010/06/12 20:37 UTC.
It claims to be "worse" when updated at 2010/06/14 19:30 UTC.

Gentoo had a revision bump to a known good copy of the tarball at 2010/06/12 16:34 UTC (using a different filename, and verified against the GPG signature provided by upstream), so it was ALREADY fixed when the article was published. The old revision was explicitly removed at 2010/06/12 21:18 UTC.
Commit data for fixes:
Changes for unrealircd-3.2.8.1-r1.ebuild
Changes for unrealircd-3.2.8.1.ebuild

The trojaned tarball was then removed from the Gentoo master mirror at 2010/06/13 08:00 UTC, about 11 hours after the article was published. It would have been sooner, but it was a matter of bad timing.

Gentoo bug 323691.

The article also claims: "There’s a great deal of comment in the Talkback section of this post about how official repositories can be trusted. It appears that system broke down thoroughly in this case."
This claim is bogus. The developer that updated the package made perhaps a mistake in trusting that the upstream had not been tampered with. However, in lacking anything to verify against (the upstream apparently did not sign releases at that point), he couldn't have detected the backdoor except by manual inspection of all the code. He downloaded the package AFTER it had been tampered with (2009/11/11 I believe), so he never saw the tamper-free version either.

The entire point of the Gentoo Manifests are to ensure that OUR mirrors are not the point where a compromise is introduced. We can detect upstream changes by this same mechanism, but they mostly tend to be upstream deciding to 'fix' something without bumping the version number. In this regard, they functioned perfectly.

P.S. I'm not saying the existing Gentoo mirroring is perfect either, see my prior writings on tree-signing, and the "Attacks on Package Manager" papers by Cappos et al., which are blocked only with the full tree-signing system.

robbat2: (Default)

Sitting in the MirrorBrain talk at FOSDEM, taking notes.

Actively used since ~2007.
Split between the redirector and the tester, explicitly made separate.
SourceForge helped with the ASN/Closest-Network side.
Metalinks and P2P support.
Scans mirrors for filelist to see what's present.
Load limiting by making director support mirrors that are limited to a local network / AS / country etc.
MetaLinks don't have Magnet links presently, but I noted that it should be possible to include it.

Using GeoDNS directly for lookups can cause trouble with partial mirrors. Ideally need to put a MirrorBrain server on each continent/region, and GeoDNS to point to that. Also, from some countries, bandwidth to adjcaent countries that might have a mirror is MUCH worse than bandwidth to a well-connected country elsewhere. Past user experience noted with a user in Mozambique, for whom the fastest mirror was via satellite to Canada. Routing data IS needed to make that best choice.

MirrorBrain mailing lists also have a generic non-project-specific "networkers" list for talk between content providers and mirror admins, non-specific to any app.

robbat2: (Default)

solar was asking about release statistics, so I grabbed the current data from Bouncer. The nearly 34k releases for 10.0 is just in the 5 days that it's been out. I included the various architetures that were part of each released 'product', to make some degree of comparision possible.

WhatHitsArches
2005.1
installcd-minimum 228561alpha,amd64,hppa,ia64,ppc,ppc64,sparc64,x86
installcd-universal 374388alpha,amd64,hppa,ppc,sparc64,x86
packagecd 162537alpha,amd64,ppc,ppc64,sparc64,x86

2006.0
livecd 242422x86
minimal 287496alpha,amd64,hppa,ia64,ppc,ppc64,sparc64,x86
packagecd 42572amd64,ppc-g4,ppc-ppc,sparc64
packagecd-32ul 10909ppc64
packagecd-64ul 2981ppc64
universal 111359alpha,amd64,hppa,ppc,ppc64,sparc64

2006.1
livecd 307481amd64,x86
minimal 330505alpha,amd64,hppa,ia64,ppc,ppc64,sparc64,x86
packagecd 39118ppc,ppc-g3,ppc-g4,ppc64,ppc64-g5
universal 122280alpha,hppa,ppc,ppc64,sparc64

2007.0
bt-http-seed 72980ALL
livecd 411958amd64,x86
minimal 496943alpha,amd64,hppa,ia64,ppc,ppc64,sparc64,x86
packagecd 27593ppc-g4,sparc64
universal 137554hppa,ppc,ppc64,sparc64

2008.0_beta1
livecd 19426amd64,ppc64,x86
livedvd 4amd64,x86
minimal 14069alpha,amd64,hppa,ia64,ppc64,sparc64,x86
universal 1745ppc64,sparc64

2008.0_beta2
livecd 37771amd64,x86
livedvd 17842amd64,x86
minimal 55745alpha,amd64,hppa,ia64,ppc,sparc64,x86
universal 3142ppc,sparc64

2008.0
livecd 477934amd64,x86
minimal 406531alpha,amd64,hppa,ia64,ppc,sparc64,x86
packagecd 12308sparc64
universal 83600hppa,ppc,sparc64

10.0_pre20090926-1952
livedvd 4870amd64,x86

10.0
livedvd 33703amd64,x86

10.1
livedvd 0amd64,x86

Notes
  • 2008.* has the LiveDVD's pulled from mirrors due to size complaints.
  • bt-http-seed was an (failed) experiment with a set of mirror URLs for trying to load-balance Bittorrent's HTTP seeding
  • Bouncer really needs replacing, but there's nothing really good to do so that I'm aware of. mod_sentry isn't nice. Other suggestions welcome. Should support products, architectures within products, seperate check/serve URLs, detailed hit recording for analysis.
robbat2: (Default)

Now for the second set of statistics. These aren't directly useful to mirrors in estimating their traffic, but instead gives a good overview of how our mirroring setup works internally, and now much traffic is involved in the fan-out stage. Distfiles are the main content moved around by this system, but it is also used for the other directories for releases, experimental and snapshots.

A very quick overview of the existing setup:

  1. Developer uploads new distfile directly to dev.gentoo.org.
  2. The master-distfiles box pulls from dev.gentoo.org hourly.
  3. The master-distfiles box checks every ebuild, and downloads missing distfiles from their primary URI if they do not exist. The daily distfile report is also created at this point.
  4. Every hour, the cluster master of ftp.osuosl.org pulls the latest content from master-distfiles. (Averages 240MB/day of traffic).
  5. The OSL FTP cluster master (in Oregon) pushes to it's slave locations in Atlanta and Chicago.
  6. All distfiles mirrors pick up their content from one of the FTP nodes - Internet2-connected hosts are directed via DNS to an Internet2-connected slave for performance.

Each of the distfiles mirrors has about 140-160MB of upstream traffic every day (including both the new files and the rsync overhead for scanning). If there are no files changed, the rsync traffic for a directory scan is 1-2MB. While this isn't a lot of traffic, it's very spiky, as mirrors tend to be on fast links.

The new weekly builds from the Release Engineering team will probably be adding another 1.3GB per week, staggered as one arch per day.

I got a small subset of the logs from the OSU FTP cluster for processing some of these statistics. They cover the 24 hour period of 2008/08/07 UTC. It does not have data of which traffic went via Internet2, and I've grouped the sources by country code (using IP::Country::Fast from CPAN).

Numbers )

As a bit of analysis, I think that more than half of our mirrors (Europe, Middle East, RU) would benefit from having a box to sync against in Europe.

robbat2: (Default)

I was doing some statistics about Gentoo mirrors to see about future plans, and thought that the indirect crowd that read my blog via the various aggregators might be interested in numbers.

These are the traffic for boobie.gentoo.org, which is a newer box in the official rsync.gentoo.org box directly maintained by the Infrastructure team. Hardware specs are 2x Xeon 3050 @2.13Ghz, 4GB RAM. Disk is mostly irrelevant - the rsync workload is served purely from RAM (tail-packing reiserfs, backed via loop device pointing to a file on tmpfs).

Inbound traffic is spiky, but does not exceed 10Mbit by more than a little bit - we can the inbound rsyncs from the rsync1 master to 10Mbit. Outbound traffic varies between 4Mbit and 9Mbit, with an average around 6-7Mbit.

Numbers )

May 2017

S M T W T F S
 123456
78910111213
141516171819 20
21222324252627
28293031   

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags