Programming


I have been developing a couple projects with some friends for the Propeller chip. Which is an extraordinarily powerful micro-controller. The latest project uses the Ping))) sensor and the Hydra Sound System (HSS) to generate sounds based on your hand position over the ping sensor.

To make sense of the HSS sfx_play interface I wrote a little program that allows me to adjust the parameters in real time using a keyboard attached to the propeller. I use it to find a sound then write down those parameters for future use.
tp03.zip This requres the HSS.

Given a directory of images you can just do:

ls | awk '1 {printf("<img src=\"%s\"><br/>\n",$1);}' >index.html

I don’t think it gets any easier than that.

As part of the purging process of becoming a nomad we are trying to make an archive of important data. This is not an easy task. We have about 30 Gigs of photos. I know there are duplicates in the data but I haven't done anything about it until now.

In integrity part1 I told how to check the md5deep database (just a text file with md5sum and filename) to see if there are any duplicates. Example:  sort -n md5.test_data.txt | uniq -D -w 32 This will check the first 32 bytes of the md5 sums after sorting them. This works great for detecting duplicates.

But what can I do about them? Sometimes I have a duplicate on purpose. For example if I have a directory tree with 500 photos from a single shoot I want to make a directory with the "best of." I could do several things: Make symbolic links to the files, copy them to a new folder, or store best of stuff outside the main backup tree. I opt for just making a copy of the file into a "bob" folder. It is wasteful of space but this is the method that I choose.

So now that I have copied them I have files with duplicate md5sums. After pondering on this for a while I came up with the idea of changing the jpeg comment field to say something like: "I know this is a copy" or "Best Of Photos."

To accomplish this I am using my good friend jhead – the jpeg header manipulator. Example:

find ./best_of_best/ -type f | xargs -n1 jhead -cl \"Best of Photos\" 

This does the trick… now each file even though it is really the same photo with the same image data and same file name has a different md5sum.

Some duplicates in the archive are caused by sloppy photo management. Sometimes I do not delete the files off of a card before taking more and end up having more than one copy of a photo in different directories. With the jpeg header trick I can now either delete them or just change the comment field.

I suppose in the future when I start using the comment field more this method could overwrite valuable comment info. I guess when that becomes a problem I will add checks to make sure the comment field is empty before overwriting.

Have you ever just wanted to see a bunch of people doing random stuff. I know I sure have.

That's why I made the "People Doing Stuff – Random Image Finder." 

It's basically a random GIS query tool. The twist is that it searches for images based on Proper Noun + Verb. Its written using JavaScript. I used to use the diddly.com Random Personal Picture Finder for hours on end. I still do. I just wanted something a little different. Thus my People Doing Stuff finder was born.

I know it uses frames. I couldn't think of any other way to accomplish the goal. Using AJAX I kept getting permission denied when trying to pull data from off site. I guess it had something to do with preventing cross site scripting.

Enjoy. If you have any suggestions let me know.

I have been working hard and furious on the next generation of POP Rage. Originally I had designed the site to be a one-a-day article about pop culture. The idea was that if you didn't have time to scour all the popular sites on the Internet, you could come to POP Rage and see the 1 story that mattered for the day. The one thing that everyone would be talking about tomorrow. It didn't work out that way. The first few articles were about what was hot that day. Of course if it were hot there were already a billion articles about it and it was super well covered. So the question became, 'what makes this site different?'

To fix this perceived problem we started covering things that nobody else was covering or things that hadn't been covered in a while. We did some great articles but finding subject matter became difficult. The problem was if it were something nobody else was covering; how could it be a pop subject? We did some great articles in this format. Staring with an article about Mashups. Then moving on to subjects like, image generators, celebrities with blogs, and one about why celebrities seem to move frequently.

The articles were great but didn't generate much interest from the cold cruel world. So I decided it was time for a change. Of the topics that we covered the two most popular were 'blogging with the stars' and 'your locks are unsafe and useless.' The one about celebrities who blog caught my eye.

I decided that a really neat thing would be to have a site that cataloged celebrity blogs. Not bloggers who became celebrities but rather celebrities who became bloggers. I thought this would be a good resource because while I was researching the 'blogging with the stars' article I discovered how hard it is to track down quality celebrity bloggers. I ran in to un-official blogs that claimed to be official. Found sites that used to be actively maintained but had been abandoned. And kept running into sites where it seemed like a publicists told someone, 'you need to blog to reach these kids today,' the celeb did but only for the duration of that promotion.

I remembered how much I was impressed by Wil Wheaton's blog and thought 'how about not just listing the blog, but telling people how good it is?' This is what gave me the full idea for the celebrity blog rank. 

So I set out not only to catalog the blogs but to rank them. How do you rank a blog for quality? Turns out it is pretty difficult. One of the easiest things to measure is frequency of posting. (That is if they have an RSS feed.) I decided this was the most important factor in determining rank.  Next I thought about the problems with sites that are not blogged by the celeb themselves but rather by an agent. I didn't want to discriminate against these sites but rather wanted sites that were made by the celeb to rank higher. (Or at least that seem to be by them.) So I added some qualitative measurement to the formula. Things like: "How often does the celeb respond to fans?", "Is the blog just — I will be here on this date and here is what I am working on — or is it here is how I feel about what I'm working on" and "does the celeb allow fans to comment or discuss on their site?" Moby as an example not only has a board where people can chat but he even goes so far as to try to allow it to be self moderating.

Now the tech. The site is basically a RSS aggregator. The feeds are collected and analyzed in a 6 hour cycle that updates something every few minutes. A cron script runs lynx. Lynx pulls up a page that that sets off the update. Why do it this round about way rather than call the php script directly? Logs. The PHP script downloads the feed parses it with Magepie and stores the results in a MySQL database. Then another script is called that runs an analysis of all the data not only from that rss but from previous ones. I wanted to judge frequency not only on the few items in the RSS but over longer periods of about a month. Once these scripts run and determine an new rank for the blog the master table is updated to reflect the new rank. No ground breaking tech. Actually the tech side is fairly boring on this project.

So there it is. The why and how of the celebrity blog ranking. Will it do better than the article-a-day POP Rage? We will see. 

I am launching a new site today called POP Rage! The idea is that every day we will cover one and only one topic that is related to Internet Pop Culture and news. One day the site may cover a what's happening in the news, the next it may cover the latest viral video, or the hottest whatever on the net. The philosophy is that a user can come to the site and with 30 seconds know what thier friends are going to be talking about today and hopefully be better informed on the topic.

I am working with a team of really talented writers who take care of the articles which gives me time to focus on the backend. I did the backend software in PHP5 and mySQL.  It's complete custom software that is designed to handle the philosophy of 'one and only one topic' each day. There is a commenting system that allows users to post thier feedback on the article and hopefully contribute intellegent discussion on the topic. I have also implemented a comment ranking system in JavaScript. This allows people to tag comments as good/bad as they are reading without having to leave the thread.

Here are some of the JavaScript resources I have been using to write the accounting software.

I am still working on the bookkeeping software. Its coming along nicely. I still don't know how it will fare after an accountant looks at it. But we will see. I should have a alpha. RSN.

Currently I am working on a Javascript packacge for doing double entry book-keeping . Inspired by TiddlyWiki, I figured it should be pretty easy to write one. Well its quite tricky. Some problems I've run into so far are things like this: In javascript sometimes when you say a=1; b=1; c=a+b; you may end up with c being equal to "11" or to "2". There are ways around it of course but imagine trying to hunt down a bug like that.

Then of course it would be helpful if I really understood accounting. Credit – Debit? It is hard to keep the two straight. As far as I can tell a credit is money being received and a debit is money being spent. Of course I've read a couple books and can fake my way through using a spreadsheet.

I will probably release a sneak peek at the alpha version soon. Just so that other people can see what I have going on. I plan on having a CPA look it over before I release the beta. 

One of the things that I pan to have in the app will be CSV. Right now all the data is stored as CSV. I have it writing to separate files right now. Soon I will try to integrate the TiddlyWiki idea into it. That is have it save the data within the application file its self. But I will still stick with the CSV idea. I think its important that the data be easily exportable to a spreadsheet.

I have tons of archives. Mainly graphics files that I have created in the past or the some 15,000 photos that I have taken with my D70 since buying it. Not to mention the thousands of photos that I took before the D70. Then I also have my scans. I scan almost every piece of paper that comes into my house. Bills, receipts, and the like. (I will post an article about that some day.)

I like to keep everything accessible on a shared server so that I can easily get to them. But all of this is in the 100+ Gig range. It is easy enough to store all that somewhere. But the problem is backup.

Sure I can split the archive into 4.5G chunks and write it to dvd, but how do I know it worked. MD5SUM you say? Yes.

In the past I have used programs like tripwire or aide to do server integrity sweeps. Why not use one of them for making sure that when I backup my data that it really is saved. Well, the problem is that those are pretty rigid and have config files stored in /etc/blah. They are meant more for intrusion detection. Not really what I need.

I could write a script that recurses directories and stores the md5sum along with each file name. Then after making a backup I could run those again recursively against the stored data. Not good again. That would require me to have to write software. That is a bad idea. Anything that creates real work for me is a bad idea.

Enter md5deep. After searching for a few hours and looking at all the integrity programs I stumbled upon md5deep. Wow. It is just what I am looking for. It allows me to do something like this:

md5deep -r /etc >/tmp/file.database.txt <- this recurses the directory and generates a flat text file with all the md5sums.

md5deep -r /etc -X /tmp/file.database.txt <- this reads the database and recurses the directory telling me if any files don't match the database.

 Couldn't be simpler.

To test it I will run it on my photo archive. Then make a backup to dvd-r. Run the checker and see if what I think is on them is on them.

« Previous PageNext Page »