Monday, August 31, 2009

A quote on unfairness

"You know, I used to think it was awful that life was so unfair. Then I thought, wouldn't it be much worse if life were fair, and all the terrible things that happen to us come because we actually deserve them? So, now I take great comfort in the general hostility and unfairness of the universe."

   Marcus Cole, Ranger to Dr. Stephen Franklin, 'A Late Delivery from Avalon', Babylon 5

Sunday, August 23, 2009

A Linux Success Story (What Ubuntu is good for)

First, some background.

A few months ago, we got my grandmother an old laptop so that she could get online and check her email without bringing her old mid-tower PC up from the basement. My parents set her up with Windows XP and a few basic apps, and I set up Thunderbird to tie in to her GMail account via IMAP.

That worked fine for a month or two, until she tried to go to a website she'd seen in a flier she'd gotten in the mail. She always types URLs into the Google Search Engine, and, in this case, the top result for the URL wasn't the site itself, but a page that popped up a fancy in-page Javascript widget that looked very much like Windows XP, with a warning that a virus had been detected on her computer, and asked her if she wanted it removed.

In her panicked state, she clicked "Yes" or "Run, whichever looked to her like it might resolve the problem presented to her by the dialog she was seeing.

Symantec didn't catch it. In fact, near as I could tell, Symantec was no longer on the system; The updater program was there, but there weren't any antivirus binaries to be found. Yet the system was getting unusually flaky. I spent two days trying to clean it up with antimalware like Avast, Spybot and AVG, and none of them detected anything when I scanned.

It was time to wipe and reinstall, but I didn't have the software keys, my parents did. I booted into an Ubuntu 9.04 live CD, installed Thunderbird and configured it to connect to my grandmother's email via IMAP. As she tried the liveCD, I tweaked this or that to make the system more usable and convenient for her. I increased the system font sizes, increased the display DPI, made the mouse cursor bigger (which was tricky; Turns out one has to switch away from the default mouse theme to get control over the cursor size), made the panels 48px tall instead of their defaults, put her two most used apps in launchers (firefox and thunderbird, now), hid the pager, disabled the touch pad click and scroll functionality, and moved the GNOME menu applet to the bottom-left corner of the screen.

By this point, the system was working very well for her. I consulted with my parents, and explained to them that even with the keys for the Windows software, Ubuntu was going to be easier for me to configure and maintain for her, and she'd been using and enjoying the liveCD configuration I'd set up for her thus far. After the phone call, I imaged the laptop's hard disk (for backup purposes; It's entirely plausible that the hard drive disk image will need to be restored to the trojan-laden state at some point.), and installed Ubuntu to the hard disk itself, blowing away the Windows install that was currently on it.

So far, so good. I've even installed Pidgin, which gives her quick and easy access to family and friends, as well as a DVD player, which makes it easier for her to see movies and such.

Ubuntu isn't for a guy like me who wants to choose his window manager or frequently compile development versions of media codecs, but it's great for someone like my grandmother who gets good usage out of the usability work that's gone into GNOME. She doesn't configure it, I do. But she uses it once I've configured it, and it works well for her that way.

What I've watched to completion over the last couple weeks

Moby Dick, Last Exile, Paprika, Armitage: Dual Marix, Taken, Dark City, Maetel Legend, Herlock Saga, and maybe one or two more I'm not remembering off the top of my head.

Hakugei: Legend of the Moby Dick

Not the one you're thinking of, most likely. Probably not that one, either. Nope, not that one.

This version of Moby Dick has some very clear references back to the Herman Melville's original work, but you probably won't recognize them if you didn't read the original. Sure, Ahab isn't insane, and Patrick Stewart couldn't have portrayed Ahab with this script. Sure, it's a science fiction anime set so far into the future that there aren't any recognizable astronomical features. Sure, the science in it sucks. The villains are different. Only a few original cast members are necessarily recognizable. It's got a lot of comic relief, which I don't remember the original having. It's probably a good thing that the Ishmael character didn't discover the Queeqeg character in the same bed early on; We look at these things with suspicion rather than laughter, these days. It's also kinda campy, using 70s-freeze-frame-type scenes in emotional situations at eyecatch and in place of episode fade, but I don't remember if Melville's work did anything analogous.

Yeah, there are a lot of differences. But there are some rather striking similarities.

First, the Ishmael character takes on a very prominent narrator role, as in the original book. Second, just as the original book would dedicate whole chapters to what would be an establishing shot in a movie today, the anime devotes a great deal of screen time describing the particulars of the worlds and places the characters find themselves in.

All in all, I enjoyed it. Campy plot and occasional weird antics aside, it was fun watching and noting references to the original book.

Last Exile

Watched this one all the way through. The series is incredibly beautiful, making amazing use of CGI clouds, smoke and steam and distant-horizon scenes, showing an incredible amount of detail. The series would almost be worth buying if that were the only thing it had going for it.

However, the series plot and characters really carry it. In one way or another, pretty much every character with a speaking role has important backstory or is revisited later, sometimes even the actions of seemingly minor characters are revealed to have significant impact long after they've left the stage. There are stories of betrayal, heartwarming, comedy, tragedy, suspense leading to relief and sadness, mystery, rage, devotion, twists, legend, epic tales, sacrifice, duty and honor. And all of this within 24 episodes.

And did I mention it's beautiful to watch? All in all, one of the absolute best stories I've had the pleasure of watching. I strongly recommend it for anyone who likes a little depth.

Paprika

Not the first time I've seen it. I just mention it because I watched it again, and I like it. Each time I watch it, I change my mind on what exactly is going on. Is it a dream? Is it reality? What is reality? A fun bit.

Armitage: Dual Matrix

The second in an action thriller series that revolves around figuring out the man and sentient machine problem, throwing in analogous questions of civil rights, a variation on interstellar democracy, and what constitutes parenting and instinct.

I enjoyed it, but I like dark, moody things. Will need to pick up the next movies.

Taken

Liam Neeson kicks ass in order to find and save his daughter. Almost no unusual gadgets. His methods are pretty much a combination of contacts, social engineering, brawling capability and determination.

And I only knew as much as the first sentence when I bought the BD. Very worth watching.

Dark City

It took me a while to find out where the song in this AMV came from, and when I had the opportunity to watch it this week, I took it.

Coming out a year before The Matrix, Dark City also addresses the themes of Who Are We and What's Wrong With this Picture. Add a character capable of modifying the world around him.

Dark City differs greatly in that rather than being on Earth, no reference to Earth is made at all. In fact, nobody In The Know really knows where they are at all. Additionally, rather than being energy sources, everyone's a rat in the maze.

The movie follows a few of those in the maze as they slowly notice something's wrong, and start trying to figure out what exactly it is.

The movie has a bittersweet ending. And then the fridge logic sets in.

I think only two things keep this movie from being well-known. A Michael Bay film came out the same year, and The Matrix had more T&A, guns and robots. Having seen it, I happen to think Dark City's special effects were better done and more contributory to the story than those in The Matrix were.

By the way, can you tell me how to get to Shell Beach?

Maetel Legend

After watching a couple Captain Herlock series, we've sorta begun picking up on watching the rest of the Leiji Matsumoto universe. There are a lot of interconnects, either artistic, plot-related or even common universe elements. It's the sandbox game of anime watching; There are a ton of different places you can go, and they may or may not be directly related, but most of them take place in the same universe (the sandbox). You could probably create a Bingo card set for watching for common components.

Herlock Saga

It's hard to watch this one if you saw Space Pirate Captain Herlock first; While he's by no means hyper, he's a go-getter by comparison. First Officer degrades from a likable, laid-back genius to what feels like a terrified child, by comparison. One of the crew members that seems mysterious and weird in SPCH appears in Herlock Saga in a much more central role, but not part of the crew.

Daiba appears, and it looks like the same character (though that character's appearance w also in Maetel Legend, its appearance there was completely unrelated), and even has the same name, but he has a different personality and ethic.

The best way I can relate Herlock Saga to SPCH is that Herlock Saga provides background for some of the characters in Space Pirate Captain Herlock, but there's no direct relationship within the universe's continuity. In fact, I suspect the Leiji Matsumoto doesn't try to preserve continuity, but rather expects the viewer to take the pieces of the same characters from the various works and form a mental image of the character that works for them.

I'm hoping to see more of Matsumoto's works; There are so many unusual relationships between them that it's interesting to watch and piece things together.

Friday, August 21, 2009

From the Trenches (Unpolished)

I whipped this up as an outline months ago, but it doesn't look like I'll ever get around to polishing it. I was going to try to do it like a series of "letters to home"; There's plenty of material to stretch, fill and elaborate on.

Shells falling all around. Sound of thrashing, dying disks, the ratatat of whining fans. Magic smoke everywhere; We were having a hard time signaling Major Domo. Good programs dumping their cores all around us, leaving backtraces as they crawled into memory holes. There were rumors the Colonel himself was wounded. Hayes bats screetching through the night. Infestation of racing bugs. The debugger had been called, but wasn't able to attach to the unit. The best we could do is taking things step-by-step and stick with the program.

Monday, August 17, 2009

The kind of coverage that makes you want to throw up.

I heard a segment on NPR's On the Media Sunday that made me physically nauseated. They were talking about the dangers of the open, anonymous forum that is the Internet.

They talked about the Star Wars Kid, as well as a young girl who committed suicide after being manipulated by a fictitious persona, and a couple female law school students who were slandered on a website. The discussion of that last piece was their segue for bringing in a representative of ReputationDefender in on the discussion. From that point forward, they talked about how these two women were stymied by the 1st ammendment (in the case of the law school website), and then by immunity clauses of the CDA (when they tried to get the pages delisted from Google).

It was then advocated that the average individual be given more power to control what was said about them online, and Youtube, IP holders and the DMCA were brought in as an example where such rights were already given to IP holders. I took it as implied that they were advocating that similar rights be given to individuals.

They also had a representative from the EFF on the piece, but it didn't sound like they gave him even half the audio time as the ReputationDefender representative, much less the time they spend advocating the ReputationDefender rep's perspective.

If the folks at On The Media had done enough research, they might have discovered that the DMCA has already been abused by people falsely claiming to be copyright holders, in efforts to censor the material taken down. That doesn't even begin to address screwups like MediaSentry's overzealously identifying content by pattern matching titles. I've often wondered what would happen if I posted a video to Youtube with a title "TUBTHUMPING", but where the video was just of someone hitting a tub with a mallet with no discernable rhythm. I imagine it'd get a takedown notice, and I wouldn't have the resources to back a counterclaim.

I don't want power like that in the hands of every teenage griefer on the Internet.

Saturday, August 15, 2009

Time to get meta

Got my vanity plate in the mail today. It's an abbreviation of the word "Gargoyle", chosen for its meaning in the context of the book Snow Crash.

The Wikipedia article on Snow Crash has this bit of text that may be enlightening: a "gargoyle": someone constantly wired into the Metaverse.

Sounds like it fits. :)

Friday, August 14, 2009

What can happen when you get a little too enthusiastic

So, last week, we used my car as a sound booth. Rather than lean forward to get closer to the mic, I chose to use my seat's adjustments so that I could get closer without having to strain by continually leaning forward, etc.

DSCF6415.JPG

And then I decided to see how close I could get...

DSCF6417.JPG

DSCF6418.JPG

The Rain

There's little more depressing than the rain.
To depress is to calm.
To calm is to relax.
There's little more relaxing than the rain.

Wednesday, August 12, 2009

Space Pirate Captain Herlock

Avast, there be spoilers ahead, though they be small!

I just finished watching Space Pirate Captain Herlock: The Endless Odyssey again. It's a great series, with a great story. It tends use handwaving technobabble on occasion, and the art is different enough to be initially jarring, but the intricate clarity of the story really makes up for all of it. It's a romantic swashbuckling story that makes a strong showing tackling a few basic, important issues.

Setting aside, for the moment, the concepts of good and evil, there are three major concepts woven through the story: Order vs freedom, responsibility and integrity and science vs humanity.

For order vs freedom, there are two key players. In one corner, Chief Illita, the head of the galactic military/police organization. His organization has one recurring motto, "Bring Order to Space." The motto is plastered everywhere, reminiscent of "Big Brother is Watching You" posters. They even take advantage of two-way video screens throughout the civilized portions of the galaxy; When the crew of the ship is about to be executed, hundreds of screens within the execution environment (called the "Panopticon" at one point in that episode.) show the faces of people throughout civilization whose screens have been coopted for a no-opt-out display of public execution.

In the other corner, you have Captain Herlock. He's a pirate, though you see very little actual piracy in the series. The Captain runs a ship called the Arcadia which is crewed by volunteers. He has two standing rules, stated explicitly: "Nobody who comes to the Arcadia will be turned away," and "Every person may do as they see fit. From time to time, I will give orders. You may choose not to follow them." Despite this, the crew of the Arcadia functions well, partially out of comradarie, and very much out of respect, love and appreciation for the Captain.

Both of these men are extremely strong-willed. Chief Illita harbors a very, very deep resentment of Captain Herlock. Herlock, on the other hand, holds no resentment for any individual, including those who might be his enemies; As one character put it, "The Captain is a very forgiving man, and for those men he can't forgive, he kills." A scary-sounding statement, but it belies hies bias towards forgiveness, and his active awareness and consideration of his actions. Contrast, again, against Chief Illita, who lays traps using the lives of prisoners as bait, and who is willing to ruin the the careers of hundreds of subordinates rather than admit his own error and lose face. (Rather than accept that his plan may have been incomplete, or even that his opponents might have outsmarted him, he declares the failure as the result of a leak, and has all subordinates dismissed under suspicion of being part of a leak.)

The second major theme is integrity and responsibility and its scope. As I just mentioned, Chief Illita ruined the careers of his subordinates rather than admit that anything could have been wrong with his plans or instructions. Captain Herlock, on the other hand, is the series' greatest example of integrity and responsibility. Faced with the prospect that the entiriety of the universe would be enslaved to what amounts to a demon, Herlock stated, "It's not my problem." However, if he makes an oath, keeping it takes priority over all else, including his own life, if necessary, though he'll take on risk himself rather than involving those around him, who he doesn't believe bear his own responsibilities. Prior to the beginning of the anime, he made an oath with a character, an oath that would paradoxically require him to save the universe in order to fulfill it. Near the beginning of the anime, he makes an even more important oath. One which represents the actual cliffhanger of the series.

To recap, for the moment, the actual character of these two primary characters, look at it this way. Chief Illita advocates, persues and enforces authoritarian lawfulness, while not holding himself to the same senses of ethics and behavior that the civilization he enforces are required to adhere to. Captain Herlock, on the other hand, recognizes no law but his own, that of integrity, and he's prefectly willing to enforce it upon those around him—but he enforces it even moreso on himself. These are both characters of law, but they differ on whether they hold themselves to the same standards as they demand from others.

The third major theme of the series is a warning that science can and will touch what it's not supposed to, that there are things that are simply too dangerous and too destructive to know. In the series, this is manifest as several scientists' inability to resist learning and researching an area of the Hourglass Nebula and things found in it. The result of their research, and the demons they unleash, set in motion the entire series.

Now, I have to digress, again, for a moment. While this is certainly a major theme of the series, it's also the least-well-executed. The scientists focused on in the series are almost all shallow obssessive archetypes depicted as people who entirely lack common sense. Ok, sure, geeks and scientists have had that reputation for ages, but anyone with half a brain would know better than to do something a cosmic-sized obelisk tells them to do in a deep, rumbling voice, or to agree to decipher the secrets behind the imprisonment of undead who'd killed your colleages for no other reason than to have access to the secrets. This series falls deeply into the Science is Bad trope.

There's still that piece we set aside...good vs evil. I'll just say this much...there is redemption, in several characters, on large and small scales. Unfortunately, only one of those characters on the large scale really pulls it off convincingly. While several others also have a "tell everyone I was wrong" moment, only one of them really gets enough screen time to allow us to watch the character grow into the moment.

Throughout the anime, there are several character-character relationships and implied histories which look like they could be explored a great deal further, though it wouldn't be possible within this anime without detracting from the overall plot. Many of the characters surrounding Herlock are very richly detailed, and many characters whose roles in the overall plot are minor still get just enough screen time and story to really tell you something about them. This is a 13-episode series that sees more character development than many multi-year television shows.

Easily one of my favorites.

Tuesday, August 11, 2009

Complete site backups in under five minutes

I've changed how site backups for Rosetta Code work.

Previously, a site backup was a rather manual affair of mysqldump, tar and scp. I've got a fair number of large tarballs that only contain the contents of the httpd root as well as database SQL dumps. Consumed space and bandwidth grows fairly quickly; This is part of why incremental backup strategies get devised.

Now, I have a set of nested makefiles on an offsite system with a few different targets. The root level makefile has 'backup', 'recurse' and 'git' targets. The 'backup' target depends on the other two. The recurse target drops into a few different subdirectories, one each for databases, webroot and logs. Each subdirectory has its own make file with a 'backup' target.

The databases subdirectory connects to the server, has the server do a mysqldump of the databases to a server-local file, and then uses rsync to copy the SQL dump file to the local system.

The webroot subdirectory uses rsync to copy the webroot to the local system.

The logs subdirectory uses rsync to copy system log files to the local system.

After running the recurse target, the root makefile runs the git target, which updates a local git repository with the modifications since the last time a backup was done. This is relatively cheap; Since the data has already been copied to the local system, the server isn't loaded down with the subsequent processing.

Once the whole thing has been primed (the first backup takes quite a while, as all of the data has to be copied), a full backup run takes less than five minutes to save off the changes from an hours' subsequent site traffic.

The biggest problem with the system is that serverside CPU usage is fairly heavy with the mysqldump and the rsync work. Serverside work currently takes four of the five minutes, while the git processing takes the rest. Hopefully, I'll be able to offset this by moving some batch processing typically done on the server to offsite, on better, faster hardware.

In the near future, I plan to add chunks of /etc to the backup process.

Monday, August 10, 2009

Math for the modern generation

1. Billy-Sue is downloading the latest blockbuster via BitTorrent. The total download size will be 500MiB. After 20 minutes, she's downloaded 100MiB of the data.

1.a How fast has her torrent been downloading?

1.b The torrent cloud picks up speed, and she starts pulling data at 3Mb/s. How much data will she have fifteen minutes later?

1.c Billy-Sue's ISP starts throttling her internet connection to 50kB/s. How much data will she have ten hours after she started the download?

(All units and values intentional.)

Friday, August 7, 2009

Another day, another $500

Car was making an incredible squeaking noise yesterday. Tracked it down via a microphone(You've got to see the video) to the power steering pump. Getting that, my muffler and another exhaust leak fixed today.

Incidentally, one good way to be sure you have an accurate estimate from your mechanic seems to be saying something like this: "Hm. Ok, let me come down there real quick and see if my credit card will take it." Suddenly, he remembered there were a couple things he hadn't included in the number he gave me.

Thursday, August 6, 2009

The General Trend

The general trend is to guarantee stability in eschewance of privileges that individuals see as disposable. The net effect is a feedback loop that narrows down the list of privileges to those that absolutely nobody sees as disposable, a categorical intersection of the privileges each individual wants—such as actual life—and then those disappear at the individual level for the statistical gain that they might be retained for the majority.

Wednesday, August 5, 2009

Heh. I think I've shifted.

My Political Views
I am a centrist social libertarian
Right: 0.56, Libertarian: 6.7

Political Spectrum Quiz

vi vi vi

(Hey, Hey)
vi, vi, vi
vi, vi...
vi, vi...
Oh, Oh..

I'm codin' this tonight,
I'm gonna have to do it right.
This code is really tight.
Hey baby come on,
I've loved you endlessly,
When emacs wasn't there for me.
So now it's time to think;
I'm gettin' things done.
I know that I can code some more
It ain't no lie
I wanna see it out that door
Baby, vi, vi, vi...

vi! vi!

(with apologies to N*Sync, and to those folks whom I've reminded the band exists.)

Tuesday, August 4, 2009

Watching for the products you want on Newegg

Originally mentioned in passing by someone on Multiply*.

Anyway, if you know how to do a product search on Newegg, getting automated responses to those searches is ridiculously trivial. Go ahead and do your search until you have a results page that has the kind of products you're looking for (This won't work if there's only one result, as Newegg will redirect you to the product page for that result). Then take the URL of the page. For example, a search for high-end SSDs might look something like this:

http://www.newegg.com/Product/ProductList.aspx?Submit=Property&Subcategory=636&Description=&Type=&N=2010150636&srchInDesc=&MinPrice=&MaxPrice=&PropertyCodeValue=4213%3A30853&PropertyCodeValue=4213%3A36521&PropertyCodeValue=4213%3A30854&PropertyCodeValue=4213%3A48677&PropertyCodeValue=4214%3A46300&PropertyCodeValue=4214%3A46499&PropertyCodeValue=4214%3A44038&PropertyCodeValue=4214%3A46019&PropertyCodeValue=4214%3A47171&PropertyCodeValue=4215%3A41071

All you have to do is change that ProductList part to RSS:

http://www.newegg.com/Product/RSS.aspx?Submit=Property&Subcategory=636&Description=&Type=&N=2010150636&srchInDesc=&MinPrice=&MaxPrice=&PropertyCodeValue=4213%3A30853&PropertyCodeValue=4213%3A36521&PropertyCodeValue=4213%3A30854&PropertyCodeValue=4213%3A48677&PropertyCodeValue=4214%3A46300&PropertyCodeValue=4214%3A46499&PropertyCodeValue=4214%3A44038&PropertyCodeValue=4214%3A46019&PropertyCodeValue=4214%3A47171&PropertyCodeValue=4215%3A41071

Now you have an RSS feed with the latest results for your search.

* I forget who, sorry!

The Chatter of the Matter in the Hanger at Bangor

The chatter of the matter in the hanger at Bangor

Produced the Slammer of the Hammer of Grammar.

The Slammer of the Hammer of Grammar

Beat down the chatter of the matter at Bangor.

The chatter of the matter at Bangor

Consumed the Slammer of the Hammer of Grammar

The matter of the chatter in the hangar at Bangor,

Born of the chatter of the matter in the hanger,

Became the matter of the master of the hanger at Bangor.

Could the master and commander of the hanger at Bangor

Trounce the chatter and the matter of the hanger?

Monday, August 3, 2009

More wireless communications musings

So while I was working on my car Saturday, I had my laptop in the garage, connected to my wifi, playing music, etc. (DNLA/UPnP is just the awesome.)



Or, rather, I tried to; It took me a while to get working. Despite the seemingly high power output from my router, my laptop simply couldn't reciprocate at the same level; communications between it and my laptop got dodgy. I got through a couple songs, and then I lost signal, and wasn't even able to get enough signal quality to get a DHCP response. Not terribly surprising, as the signal had to get through two formerly-exterior-facing walls. I didn't have one of my 330gEs handy, so I couldn't set it up as a repeater or adapter, and had to get the laptop's internal antenna (wherever it is) oriented and positioned.



And, honestly, 802.11g hasn't been very kind for streaming video; When I watch the bitrate on my PS3, it'll typically hover between 300kpbs and 700kbs in low-activity scenes, bounce up to 1.5-3Mbps for moderate and high activity scenes or "grainy" film effects, and shoot up to 15Mbps for extremely tough scenes (such as sustained fullscreen fuzz).



I noticed that there's an ISM band at 24GHz. 802.11b and 802.11g operates in an ISM band at 2.4GHz, and 802.11a operates in an ISM band at 5GHz. 802.11n operates in both the 2.4GHz and 5GHz ISM bands. But nothing uses 24GHz, as far as I know.



And that makes sense; 802.11b has a hard enough time penetrating using the 2.4GHz band, 802.11a has an even rougher time up there at the 5GHz band. In the 24GHz band, you'd be lucky if you got from one side of a wall in my house to the other.



But what if you didn't *need* penetration, but were only interested in throughput? If, for example, your access point was in a highly visible point in the room, and any device it would need to connect to would have line-of-site? 24GHz sounds mighty nice at that point. Except that it's extremely hard to create circuits that deal with those frequencies; Capacitance between wires in your circuits will be enough to give you headaches; There's a reason PCs don't have internal signal paths that operate at more than a few GHz.



What would be nice would be to be able to operate at a multi-GHz band and consume a huge amount of bandwidth to increase throughput. But that's going to screw with anything else using that area of EM spectrum; That's why we have parts 15 and 18 of the FCC regulations.



But wait...There is an area of the spectrum that isn't really regulated that much by the FCC. There's a nice little spot around 344,000 GHz (344THz, but that's not a number often bandied about.) Yeah, it's going to have severe penetration issues; A piece of paper will stop it, for the most part. But that's because we're talking about infrared radiation now.



Rather than using frequency modulation, use amplitude modulation. That lets you use a source like, I don't know, a standard 870nm LED.



The trick is finding out what the signal frequency response would be of the IR LED and an IR-sensitive photodiode or phototransistor. (Photodiodes will have a faster response time, but then you have to do your own amplification of the signal.)

Sunday, August 2, 2009

A comparison of compressors on SQL

I was cleaning and reorganizing data on my computers, and taking the opportunity to compress anything large. that I wouldn't need to see inside as part of, e.g. indexing.



I looked at one of my snapshots of Rosetta Code's database; Uncompressed, it occupied over 600MB. Compressed with bzip2, it occupied about one tenth of that. I decompressed and recompressed it with rzip, and was sufficiently surprised at the results that I tried to do a fairly thorough comparison of bzip2, rzip and gzip. Based on my use case, I collected data on compression ratio and speed. I did not collect data on RAM usage. (Though I do know that rzip at max compression exceeds the amount of RAM available on my basic Slice.)



It took (bzip2,rzip,gzip) (5m29s, 3m1s, 1m3s) to achieve compression ratios of (11.2, 605, 7.48).



Here's the raw data:

shortcircuit@dodo~/comprcompa
04:54:56 $ ./comprcompa.sh rcode_20090704_2029.sql
The primary purpose here is to compare compression ratios for database SQL dump backups.
Running environment is a Gentoo system running an AMD Phenom 9650.
As most things in Gentoo are typically compiled from source, these are the CFLAGS used:
CFLAGS="-march=amdfam10 -O2 -pipe"
our source database dump
-rw------- 1 shortcircuit shortcircuit 633986350 2009-08-02 02:58 rcode_20090704_2029.sql
Starting memory conditions; If there's a great deal of room for cache, we won't hit disk as frequently
total used free shared buffers cached
Mem: 7936588 7730932 205656 0 207452 6504216
-/+ buffers/cache: 1019264 6917324
Swap: 0 0 0
streaming original file into /dev/null via dd, to pull it into cache
1238254+1 records in
1238254+1 records out
633986350 bytes (634 MB) copied, 1.07369 s, 590 MB/s
starting uptime: 04:55:01 up 13 days, 17:13, 3 users, load average: 0.67, 0.85, 0.71
Starting bzip2 -fk9

real 5m28.838s
user 5m28.279s
sys 0m0.537s
Post-bzip2 uptime: 05:00:29 up 13 days, 17:18, 3 users, load average: 1.05, 1.03, 0.83
Pull original back into cache, for fair comparison
1238254+1 records in
1238254+1 records out
633986350 bytes (634 MB) copied, 1.097 s, 578 MB/s
Starting rzip -k9

real 3m0.765s
user 3m0.445s
sys 0m0.307s
Post-rzip uptime: 05:03:31 up 13 days, 17:21, 3 users, load average: 1.21, 1.08, 0.87
Pull original back into cache, for fair comparison
1238254+1 records in
1238254+1 records out
633986350 bytes (634 MB) copied, 1.0498 s, 604 MB/s
Starting gzip -9

real 1m33.356s
user 1m32.011s
sys 0m0.890s
Post-gzip uptime: 05:05:06 up 13 days, 17:23, 3 users, load average: 1.11, 1.07, 0.89
Final file sizes:
-rw------- 1 shortcircuit shortcircuit 56789389 2009-08-02 02:58 rcode_20090704_2029.sql.bz2
-rw------- 1 shortcircuit shortcircuit 84747906 2009-08-02 02:58 rcode_20090704_2029.sql.gz
-rw------- 1 shortcircuit shortcircuit 10472407 2009-08-02 05:03 rcode_20090704_2029.sql.rz
shortcircuit@dodo~/comprcompa
05:05:06 $

Saturday, August 1, 2009

Harvesting of SourceForge projects and spamming SF users

I got an email from "Apparition " telling me my ActivityRank value was 0, and that I should add photo and blog entires to increase it. The email included a link to "http://group.ps/apparition"



"Apparition" is the name of a project I created years ago when I was in college and was convinced that I could do a better job writing computer lab imaging software than the "Ghost" software that was being used in the lab I worked in at the time. (Heck, I suspect that's even more true, now. Imagine imaging a lab, but having the imaging software using bittorrent on the local switch to distribute the drive images. Certainly would have worked better than imaging one machine in the lab to get past the building-building bottleneck, then having that machine serve up to the other 63 PCs in the lab...)



I never went anywhere with it. Haven't even really thought about it in the last few years. The website that looks like it sent me the email appears to have found my old project on SourceForge and sent an email to my SourceForge account, which was forwarded to my personal email. My first impression was a targeted malware campaign. I've grabbed grou.ps and grou.ps/apparition with wget and examined them with less and links, and neither *appears* to contain malicious code to my untrained eye, mostly jQuery code to control the interface to the social networking site. That's not to say it's safe; I wouldn't open it in a full browser outside of a clean VM, for the sake of being paranoid.



I'm fairly confident it's a programmatic attack, as Apparition is probably the least interesting of the SF projects I started and never went anywhere with. Even if it's not an attempt at spreading malware or collecting personal info of technical users and people with access to source code repos, it bothers me a bit that someone appears to be using programmatic means to harvest SF accounts, create places for them on a social networking site, and it bothers me that that email somehow got through SourceForge's filters.



Here are the headers:


Delivered-To: mikemol@gmail.com
Received: by 10.150.123.8 with SMTP id v8cs281312ybc;
Fri, 31 Jul 2009 23:44:12 -0700 (PDT)
Received: by 10.100.216.7 with SMTP id o7mr4434688ang.120.1249109052582;
Fri, 31 Jul 2009 23:44:12 -0700 (PDT)
Return-Path:
Received: from mx.sourceforge.net (mx.sourceforge.net [216.34.181.68])
by mx.google.com with ESMTP id 13si10290449yxe.76.2009.07.31.23.44.10;
Fri, 31 Jul 2009 23:44:11 -0700 (PDT)
Received-SPF: fail (google.com: domain of bounce@grou.ps does not designate 216.34.181.68 as permitted sender) client-ip=216.34.181.68;
Authentication-Results: mx.google.com; spf=hardfail (google.com: domain of bounce@grou.ps does not designate 216.34.181.68 as permitted sender) smtp.mail=bounce@grou.ps; dkim=pass (test mode) header.i=@grou.ps
Received-SPF: pass (3b2kzd1.ch3.sourceforge.com: domain of grou.ps designates 67.228.206.32 as permitted sender) client-ip=67.228.206.32; envelope-from=bounce@grou.ps; helo=mail01.grou.ps;
Received: from mail01.grou.ps ([67.228.206.32])
by 3b2kzd1.ch3.sourceforge.com with esmtp
(Exim 4.69)
id 1MX8K6-0001oT-B0
for shortcircuit@users.sourceforge.net; Sat, 01 Aug 2009 06:44:10 +0000
Received: from mail01.grou.ps (localhost [127.0.0.1])
by mail01.grou.ps (Postfix) with ESMTP id 354613106EB
for ; Thu, 30 Jul 2009 19:41:38 -0500 (CDT)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=grou.ps; h=date:to:from
:subject:message-id:list-unsubscribe:mime-version:content-type;
s=s1; bh=1s8xCFXOMjsEryNQjM/Jgn/L6VI=; b=D1qzcHRVWOs5w7fXza79KX
QN5oOAE19VQ2tLJZsXbuSYJ22ZUBqdp5RoA4cXBbxta4f+9VOc8QSaPmytOFcURt
0gQ2k9LeWahR63fxVLPDqkLpBmtRl59VKZN7TF4f9IfJ19/RdfhYqvnV/GbCcoE1
XNNHwnMdiKiDfxSmZTxAI=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=grou.ps; h=date:to:from:subject
:message-id:list-unsubscribe:mime-version:content-type; q=dns;
s=s1; b=HLOKH9ucB8bSXrWREFZ47U5qEHfgyCo2LN/5MvP+rrc6A1ZrDmF8zdL
cpZzq1P4n43XFjssW18HRk/076lQHYxvi8XlcMuOk9hleImk22W366VZo+mnID+V
5JYJlMNR1nMrB0x76i9RJ9fiCcSGTivRoDi6vrOOVmyj/FIIhqM0=
Received: from localhost.localdomain (unknown [67.228.115.98])
by mail01.grou.ps (Postfix) with ESMTP id 2A2103106EA
for ; Thu, 30 Jul 2009 19:41:38 -0500 (CDT)
Date: Thu, 30 Jul 2009 19:41:38 -0500
To: shortcircuit
From: Apparition
Subject: Apparition: Weekly Newsletter
Message-ID: <8507431ef5f59bdc6fecbb3f67dfa0e1@localhost.localdomain>
X-Priority: 3
X-Mailer: GROU.PS Mailer
List-Unsubscribe: http://grou.ps/noemail.php?x1=%25qCGbT5-%3B%5C9Wr%2BK8Asq4%27%3FWmJIX6%24%272%23xR&x2=%251H3qV%7B%23O%5Bd%5Bb%27%7E1%27%27%3A1%7CV%5C6vC%2FA%7ByN%21lU
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="b1_8507431ef5f59bdc6fecbb3f67dfa0e1"
X-Spam-Score: -0.5 (/)
X-Spam-Report: Spam Filtering performed by mx.sourceforge.net.
See http://spamassassin.org/tag/ for more details.
-1.5 SPF_CHECK_PASS SPF reports sender host as permitted sender for
sender-domain
-0.0 SPF_PASS SPF: sender matches SPF record
-0.0 DKIM_VERIFIED Domain Keys Identified Mail: signature passes
verification
0.0 DKIM_SIGNED Domain Keys Identified Mail: message has a signature
1.0 HTML_MESSAGE BODY: HTML included in message
0.0 AWL AWL: From: address is in the auto white-list
X-Headers-End: 1MX8K6-0001oT-B0