Wednesday, October 7, 2009

The pain of "X is cheap"

There's a common assumption when developing on computers. "RAM is cheap", or "Disk is cheap", etc.

Certainly, it's cheaper than it was ten years ago. It's sure as heck cheaper than it was twenty years ago. But that doesn't make it cheap. For the idea I'm trying to plant in your head, there's no such thing as "cheap."

For something to be cheap, it must be affordable. For something to be affordable, there must be resources (i.e. money) to meet its requirements*. Since entrepreneurs have any of a various value of resources ranging from some number in the millions to as low as Zero dollars, that means that regardless of how cheap something is to one entrepreneur, it will be on the edge of affordability to others, and out of reach to still others.

So I suggest that making things more affordable (by way of reducing their inherent cost, not by subsidizing them), is not a goal that should be shrugged off just because some component of the inherent cost "is cheap" to get.

The other side of the argument might be that "if the entrepreneur can't afford something, then they need to fix their business model." Well, yeah, if their goal is to make money, then they should be continually improving their process of making money. That doesn't mean that the things they depend on should be allowed to lapse and become inefficient; If the tool can be more efficient then it poses a lower inherent cost. Ideally, an entrepreneur ought to be able to shop around and find a more efficient tool, but that's rarely an option for all of his tools. One tool may already be at the peak of potential efficiency, while another tool might have so many other perceived advantages that its inefficiency may be overlooked. Such as having an operating system with a massive base of available software, a programming language with a massive base of available libraries, or a software publisher that targets a demographic that buys on a whim.

As an anecdotal example, I'll bring up Rosetta Code, which is my own site. Recently, it came subject to a relatively massive sustained increase in traffic (several times its normal level), due to one of its pages becoming a hot item in StumbleUpon. The tiny 256MB Slicehost slice I was using simply didn't have the resources, despite my already having set up all practical caching mechanisms. Many of the users were getting HTTP 500 errors due to timeouts between Apache and fcgid. HTTP 500 errors aren't very descriptive so I changed the configuration over to mod_php. The default configuration of mpm_prefork allowed up to 250 clients to connect, and a process would be spawned for each of those clients. Each one of those processes tends to eat about 20MB of RAM, so with 250 clients being actively serviced at 20MB each on a virtual machine with 256MB of RAM, well, we were about four and a half gigabytes short. And then there's the database that didn't fit in memory as it was.

So I moved over to Linode, where I could get a better price/MB for RAM, took paints to configure and tune MySQL and Apache, and now the site runs fast enough that I can't overload it from my home internet connection. Yes, my MySQL configuration needed improving, and that improved performance.

But I want you to think about something...Why did Apache have to spawn a separate process for each client? Well, that's easy; I was using mpm_prefork, where that's the behavior by definition. But why was I using mpm_prefork? Because the PHP packages wouldn't allow me to use mpm_worker**. And why was that? Because the PHP core (or some of its possible extensions) isn't thread safe.*** Granted, coding in a thread-safe fashion is non-trivial for most coders of today's skill and/or experience, but I might go so far as to argue that there is very little about developing programming languages and their engines that is non-trivial to begin with.

I'm not writing this to criticize PHP specifically, I'm writing this to criticize a common theme behind problems it shares with many other programming languages and other software; It assumes X is cheap. X might be CPU. X might be RAM. X might be disk (though for disk not to be cheap in the context of web service languages, you're working in some fairly niche environments.)

In short, PHP is expensive in ways I hadn't sufficiently planned for, and its expense caused major issues for my site.

I believe what I'm describing is known as part of the "barrier to entry." What tool developers often forget is that while their tool may have all these cool dohickeys, whiz-bangs and context menus, those features will nearly invariably come at a cost that makes it more difficult for their potential customers to afford their product, either in the sense of purchase, or in the sense of execution. Even if they remember, I seriously doubt they think it's a major problem. "They need to fix their business model" is the resounding response I hear when one company or another industry complains that costs are too high, and that's why prices are as high as they are.

I already agreed that business models should be continually tuned and improved. What about the people who can't get into an industry because of the high barrier to entry? What about the industries that haven't been invented because the requisite tools are too expensive? The cheaper the tools you make, the more your tool can be abused to do something new or invent something new by someone who you weren't planning on trying to sell it to.

* And, no, I don't think there's any practical way of escaping that short of a shift in perspective so radical I haven't heard it yet.

** mpm_worker gives each client to a separate thread. Granted, the per thread heap allocation (which would have been most of it) doesn't go down, but code and data common to all of the threads needn't be loaded into multiple processes. (Granted, Linux may map shared libraries into one place in physical memory and put it in each processes's address space...But I' don't know.)

*** Of course, the obvious followup is "Why were you using PHP?" ... That's because I'm using MediaWiki. And if you want to critique me for using MediaWiki, I'll likely agree with your critiques, and be very happy to have a long discussion about why I continue to use it. (Please, feel free to do so; My primary reason is a lack of suitable alternatives, and any discussion I have with you may spark an alternative into existence.)

No comments:

Post a Comment