Archive for the 'Programming' Category
When Ducktyping is Dangerous
Python’s ducktyping is very useful. It’s good to be able to generally treat something as a file, or a string even if it’s not exactly a file or a string. But when there are methods with the same signature(*) whose parameters change meaning, then you’ve got a problem.
When you have a string, you can use the string.translate(mapping) call to translate characters — to do things like force all whitespace to be a real space, or drop everything in the 8 bit zone. Unicode strings have a similar method, unicode.translate(mapping).
Except that the mapping is totally different for strings and unicode strings. The string translate takes a 256 character string, and the unicode one takes a hash. This leads to the error:
TypeError: character mapping must return integer, None or unicode
which is especially fun when you don’t know if you have a string or a unicode string, and stuff just worked before.
This is a working solution for the unicode case. Instead of mapping = string.maketrans('\r\n\t',' '), you need mapping = dict(zip(map(ord, u'\r\n\t'), u' ')) .
(*) well, technically the type would change, but you don’t see that in a non-statically typed language.
No commentsWhen IPMI Cards Attack
We’ve occasionally had issues with a database machine hanging at boot waiting for
a PepperC Usb Mass storage device. Turns out that it’s part of an IPMI
card for supermicro motherboards, and for some annoying reason, it was
asserting itself as the root device.
This is a big server, with a raid card, a bunch of drives and a couple of logical arrays off of that card. It’s meant to boot off of one of the arrays, on /dev/sda.
What is happening is that the SCSI and the USB are probed in parallel and there’s a race condition as to which one responds first and therefore gets the sda designation. The next gets sdb, then sdc, and so on. Normally, drives and partitions are referred to by that
designation. Normally, the order is static, and everything is good.
But if something takes longer than normal, perhaps due to an extra
drive in the case or something, then the other one has a chance to
take over.
And then you’re trying to boot off of something that you don’t expect, like the newly inserted big drive for backups, or the PepperC device. (Good thing that the IPMI card has KVM over ip. It’s nice when a piecce of equipment solves aas many problems as it causes. )
So, the other way to refer to drives is by uuid, a universal id. All
drives and partitions have a unique 128bit id, which makes it a bit harder to have some interloper of a device register a second or two early and mess everything up. This stable root device (AKA UUID) walkthrough worked for me with Debian Stable (etch) with the caveat that I had to mkswap again, as my swap partition didn’t have a uuid associated with it originally.
Incidentally, this IPMI card also will prevent booting if its event log is full by displaying a ‘Press F1 to continue’ prompt on the console. I’m at the point where I’m not sure if the IPMI card helps or hinders reliability, and I’m likely not to put any in machines in the future.
No commentsAmazon S3 and fast connections
On a fast connection to S3, I consistently get socket errors (socket.error: (104, ‘Connection reset by peer’)) when I’m pushing in big files. (big being ~ 250mb or so). Totally reproducible with the same file uploaded. Turns out that people have been having this issue for a while, and the solution is to throttle the tcp parameters so that it doesn’t overrun a tcp window somewhere.
This has the solution. In etc/sysctl.conf:
# Workaround for TCP Window Scaling bugs in other ppl's equipment:
net.ipv4.tcp_wmem = 4096 16384 512000
net.ipv4.tcp_rmem = 4096 87380 512000
No comments
Lighttpd and POSTs
Recently we shifted the core of the webapp at work to run on lighttpd/php5 connected by fcgi from the standard apache1.3/mod_php4. There have been a couple of cases where people using the POST api methods have consistently gotten a connection closed error with lighttpd version where everything worked properly on apache.
This is the post:
POST /api.php HTTP/1.1
Content-Type: text/xml
Host: www.example.com:443
Content-Length: 19742
Expect: 100-continue
Connection: Keep-Alive
X-Forwarded-For: 10.0.0.0
<TransactionRequest>
...
and this is the response from lighttpd:
HTTP/1.1 417 Expectation Failed
Connection: close
Content-Length: 0
Date: Thu, 30 Oct 2008 19:35:29 GMT
Server: lighttpd/1.4.13
and the response from Apache:
HTTP/1.1 100 Continue
HTTP/1.1 200 OK
Date: Thu, 30 Oct 2008 19:37:26 GMT
Server: Apache/1.3.34
X-Powered-By: PHP/4.4.4-8+etch6
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/xml
<?xml version="1.0" encoding="iso-8859-1"?>
...
To make matters worse, it appears that pound, our front end load balancer, doesn’t log the error, but it does show up in the lighttpd logs as expected.
So, what’s the problem? It’s the expect header. Lighttpd doesn’t support it, and sends back a 417 Expectation failed error if that header is included. It’s a known WontFix bug for the 1.4 series, and the 1.5 series has been expected for a long while now.
The thing is, there are a lot of command line http clients or libraries out there that send this header, and it’s distressing that lighttpd simply can’t work with them.
And then, what’s the fix?
I’m afraid that it’s goign to be to route around lighttpd for now, either by replacing with apache2 or ngnix. There’s an outside possibility that I’ll be able to filter the header on the load balancer, but that’s not looking likely right now.
No commentsIP Address Failover on Debian Etch
I’ve been using spread and whackamole for IP address failover on Debian Etch, but I’ve noticed that they’re not 100% reliable in my setup, in some cases hanging when I’m trying to query who owns a node, and in others, just not picking up the ip address that’s shared between two machines.
What I’m doing involves an active machine (or vmware instance) and a hot spare running the same code and ready to take over instantly. There’s generally very little state, as these are inbound load balancers or outbound gateways. Where I have lots of state, like database, the failover mechanisims are a little different.
I’m testing out ucarp as a replacement. It has it’s good points — it’s one piece that’s reasonably easy to setup, and bad points that it’s not exactly high profile. The current version (1.5) is a released version of the snapshot (1.5_0802222) in lenny/testing and has some significant bug fixes, so I’ve pulled the source package in from lenny/testing and the distribution from the main distribution site and compiled it on etch for a quick backport. There’s one patch line that failed — the options accepted on the command line, and that appears to have been integrated in the upstream package.
The package from lenny worked reasonably well in testing, but showed some issues in production related to not being able to “Can’t do IP_ADD_MEMBERSHIP”, which is necessary to recieve the multicast packets. This was fixed in 1.5 final.
Ucarp is a little different than wackamole. Wackamole just grabs an ip address when it detects that it needs to, Ucarp runs a script when it determines that the ip address needs to be aquired or dropped. This was something that I had wanted with wackamole for poking at daemons and such. The debian port has integrated the control of ucarp into the /etc/network/interfaces file, so straightforward configurations can be done completely in that file.
In /etc/network/interfaces
auto eth0
iface eth0 inet static
address 192.168.10.91
netmask 255.255.255.0
gateway 192.168.10.1
ucarp-vid 90
ucarp-vip 192.168.10.90
ucarp-password foobar
ucarp-advskew 1
ucarp-master yes
ucarp controlled.
iface eth0:ucarp inet static address 192.168.10.90 netmask 255.255.255.0
The options on ucarp bear explaining:
- vid: 90 an id. ucarp will communicate with any other instance using this id, so this needs to be consistent between the master and the spare
- vip: the address to be managed
- password: A shared password between the instances
- advskew: Lower numbers will win the election of which instance should be the master
- master: if this instance should startup as the master
This approach seems to work well for some tests, the failover time for pinging is 3-4 seconds total. But for apachebench and funkload, it seems that once they get an arp of the machine that’s getting taken over, that one stays resident until the next invocation of the app.
No commentsGoogle Gears as Your Own Private CDN
One of the things with webapps now is that they load a ton of extra stuff, javascript, css, images, static pages. Lots of extra stuff. The work site has something like 15 includes of one variety or another on each page, and a bunch more that are loaded as needed on one or two pages. That’s a lot of latency.
One way to fix it is caching and expires headers, another is to minify and compress the javascript and css, and another way is to just get the files near the user so that they can load all the items without round trips to the server. Big sites do this with a CDN, a content distribution network. But two drawbacks of that approach are that they’re not from the same site for origin purposes, and you generally have to pay for a CDN, and that’s out of the reach of smaller sites. On the other hand, they do it without you noticing, and gears does require confirmation and trust.
Google Gears has a local server and cache space that are intended for use in taking applications offline for later syncing with the cloud. But it’s just as useful for serving bits in conjunction with the online version. The important thing is that everytime there’s a request for one of the files in the cache, gears will serve it locally. That’s a lot less latency.
Best of all, it’s pretty easy to setup. You include the gears_init.js file, create a local server and assign it a manifest. The manifest controls everything that gears will download once and cache, until the version number in the manifest changes.
The javascript changes, cribbed from wordpress’s version of this and the google docs.
<script type='text/javascript' src='gears_init.js'></script>".
<script type='text/javascript'>
if ( 'undefined' != typeof google && google.gears ) {
var localServer = google.gears.factory.create('beta.localserver');
var store = localServer.createManagedStore('cdn-store');
store.manifestUrl = 'gears-manifest.txt';
store.checkForUpdate();
}
</script>
And the manifest file:
// site-manifest.txt
{
"betaManifestVersion": 1,
"version": "1",
"entries": [
{ "url": "gears_init.js" }
{ "url": "common/prototype.js" },
{ "url": "common/scriptaculous.js?load=effects" },
{ "url": "common/effects.js" },
{ "url": "and so on and so on...." }
]
}
I’m not sure that this is a total game changer for the general web, but for sites that are apps where the users both trust the site and spend a significant amount of time there, it’s a win. This is already being used for the admin interface of Wordpress 2.6. I can see it being used in a lot of places with rich interfaces where the users are highly invested in the site and latency is a concern.
No commentsBash/unix file renaming
Of course, it’s possible but it’s not quite as easy as saying mv *.txt *.html.
So that I remember this:
for i in `find *.txt` ; do mv $i ${i/txt/html}; done
Phantom Signups
I’ve seen a dramatic uptick in signups to this blog, from what look like junk emails @ gmail, .ru, and other places. I’m a little confused, since there aren’t even attempts to spam me. The only thing I can think is that someone is building up a stash of wordpress logins for the next time that there’s a sql injection attack that can be performed by a logged in user.
What I’d really like to do is add a field to the signup page, simply asking: Why?
But then, some robot would probably try to convince me that it’s human.
No commentsCrypto
Somehow, on top of all the other things that happened on vacation, something close to my worst sysadmin nightmare came up. A break in OpenSSL/SSH. It’s complicated, mission critical, and it can’t be kept away from the users, at least in the SSL case. I’d rate this a 4/5 in panic level. (a 5 would be a remote root hole in one of these services.)
Oh, wait, I haven’t talked about the vacation. Flying with a sick kid is not fun. Nor with 2. Staying in a hotel with 2 sick kids and 2 sick parents is even less fun. But it did get better after a few days.
Then, debian stable’s random number generator was found to be a little weak, so that the keys generated were extremely predictable. Trivially even. Which means that any key generated using openssl on those systems is suspect, any dsa key used on one of the systems is suspect, and everything needs to be updated quickly without locking myself out.
I did have a couple of things in my favor — while I was using dsa keys, they were generated on OSX, so they weren’t instantly bad. And I use ip address filtering on ssh where I can and fail2ban where I can’t, so attackers either get 0 or 5 chances to get in before their packets are dropped. Out of all of this, I think that there were 3 keys that didn’t need to be replaced, because they were putty generated rsa keys.
Issue #1: I’ve got enough different machines and images that I wanted to use something a little faster than one at a time to do the updating. it turns out that Capistrano is a good way to do that, but it doesn’t work on the stock OSX Tiger install, nor on my ubuntu 6.06 machine. But, eventually I figured out that it does work from MacPorts. But ou have to compile a bunch of stuff, which is a little slow on a g4/1.2ghz.
Issue #2: For some reason, there’s one essential package, libssl0.9.8 that doesn’t update well on debian without a terminal. It has a prompt for which services to restart, and will hang there if run from Capistrano. So, I had to log into all the servers and images to do the actual update.
At least Capistrano sped up the new key deployment, and will probably speed up things in the future, but for this operation, I don’t think that I netted any time savings.
No commentsObscure gpg options
I can never remember this when I’m looking for it.
If you have a signing key (not the primary) and an encryption key (the default key) for gpg, then when you need to sign a new key, the command is:
gpg --default-key ###### --sign-key ######
