PeteSearch

Tips and tools for better searching

Subscribe in a reader

Help build trails

  • Help build trails by shopping at REI
Lijit Search

Categories

  • Outdoors

    Email

    Implicit Data

    Facebook

    Browser plugins

    Coding

Recent Posts

  • Getting Tokyo Tyrant to work with files larger than 2GB
  • Plug and Play Tech Center spam
  • Hate your bank? Use a credit union
  • Super-simple A/B testing in PHP
  • How to log to custom files from PHP
  • Balsamiq: So simple, even a programmer can use it
  • Blogs I'm reading now
  • Accidental Haiku
  • Information wants to be free, even at WalMart
  • A lovely little online icon editor

Archives

  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • June 2009
  • May 2009
  • April 2009
  • March 2009
  • February 2009
  • January 2009

More...

About

Blog powered by TypePad

Getting Tokyo Tyrant to work with files larger than 2GB

Godzillavskitten
Photo by Gen Kanai

I use Tokyo Tyrant/Cabinet as the key-value database for Mailana, and after some initial hiccups I've been very happy with its performance. Last night though it stopped working in the middle of preparing several hundred nightly emails, and I wanted to document the problem and the fix to help anyone else who hits this.

After a bit of investigation, I noticed that the Tyrant server kept dieing with "File size limit exceeded". My casket.tch hash database file had grown to 2GB, and running on a 32 bit EC2 server Tokyo couldn't cope with anything larger. There's a standard called Large File Support on Linux that allows you to access >2GB files, but it requires a few things to work:

- A modern version of Linux. I'm on 2.6, so it has support for LFS built in.

- A modern file system that supports large files. I'm on XFS, so that was also ok.

- You need to recompile your program to use the 64 bit versions of file operations. Happily Tokyo was using the correct off_t type for file offsets, rather than int, so I was able to add the -D_FILE_OFFSET_BITS=64 compile flag to the configure script in both Cabinet and Tyrant, rebuilt them both and they then ran with 64 bit file offsets on a 32 bit system.

There was one other quirk I discovered. By default Tokyo only uses a 32 bit index for the hash database, so you also need to pass in the l option at runtime to cope with the larger files, eg:

/usr/local/bin/ttserver -host /sqlvol/tokyo.sock -port 0 -le /sqlvol/casket.tch#opts=l

After doing those changes, I was able to restart my server and run the daily email updates again. The meta-data for my database seemed to have been corrupted by the issue, but all my data integrity checks passed, so I patched around the problem. Specifically in tchdb.c:tchdbopenimpl() the file size returned from fstat() didn't match the one stored in the meta-data header, so I skipped the check:

sbuf.st_size < hdb->fsiz

October 31, 2009 | Permalink | Comments (0) | TrackBack (0)

Plug and Play Tech Center spam

I don't usually post spam, but for anyone out there who gets an email like this and googles it, no, I don't think it's that dream investor you've been waiting for. The fact they can't even figure out my first name is a strong sign, and I'm not the only one getting these.

From: Nickolas Turner <nturner@plugandplaytechcenter.com>

Subject: Funding Opportunity through Plug and Play Tech Center

Dear Mailana,

Are you looking for funding? Please contact Alireza@plugandplaytechcenter.com to get in touch with our seed and early stage venture arm, as well as our partners.

Best of luck in your ventures.

Regards,

Nick Turner

Business Relationship Associate

Plug and Play Tech Center

(650) 207-7001

October 29, 2009 | Permalink | Comments (0) | TrackBack (0)

Hate your bank? Use a credit union

Bankteller
Photo by Ronn Ashore

I spent several years suffering with grotty customer service at Citibank, and then I was hit by a check fraud that spiraled into a kafka-esque nightmare. A house-mate snuck into my room, stole a check, forged my signature (poorly) and then cashed it for $1000. Einstein that he was, he'd had to write his driver's license and social security number on the back, which showed up when I got the photocopy back. Not wanting to tip him off, me and the other housemates contacted the police, who were very helpful and interested. Now all we needed was the location where the check was cashed, which didn't show up on the statement.

After 3 months of both me and the police constantly calling and visiting Citibank, they refused to provide us with any details. I was constantly fobbed off with bogus excuses, since the case was allegedly in the hands of their fraud department who must live on an island somewhere in the south Atlantic with no means of communicating with the outside world, since I was never able to get a phone number or address to contact them. I finally received a refund after blowing my top at the local branch, and then promptly closed my account, threw the house-mate's possessions out on the front lawn and sent a copy of the forged check to his parents.

I was reminded of that when I saw this article on someone being hit with a $888,888.88 bank charge, with no explanation or help from the bank staff. It sounds like exactly the same sort of organizational failure that stymied my efforts to get help. From what I can see, the big banks have spent the last decade trying to build automated systems and procedures so they can get rid of expensive staff. That mostly works for routine operations, but as soon as something unusual happens you need somebody with judgement and authority to make decisions.

So what's the answer? I moved my account to a credit union eight years ago and I've been incredibly happy with them ever since:

- The customer service has been fantastic. They have trained, motivated bank staff able and willing to sort out problems for me, both in the branch and on the phone.

- I pay zero ATM fees, even when I'm traveling, since I can use any other credit union's machine for free.

- They don't gouge me with any other fees either. The big banks make nearly 40% of their revenue from 'non-interest income', and the bigger they are, the more they rely on them. Even worse, the 20% of households who pay the majority of overdraft fees (ie the poorest) pay 80% of those, averaging around $1300 each annually.

- I also get a warm glow inside because my deposits are funding straight-forward loans to local people and businesses, not financial speculation or empire building by bank CEOs. I'd rather be helping George Bailey than Gordon Gekko.

My personal account is with Keypoint Credit Union, and my business one with Lockheed, and they've both been stellar. If you're sold on the idea, there's almost certainly one that you can join, either because of where you live or the industry you work in. If you're current with a large bank, you won't regret switching.

October 28, 2009 | Permalink | Comments (0) | TrackBack (0)

Super-simple A/B testing in PHP

Alphabeta
Photo by Roadside Pictures

To really learn about what your users want you need to see how they respond to the different alternatives. Running A/B tests is a great way to do this, but even though the concept is simple, I always felt like it would require some complex coding and database setup to implement. I was wrong: inspired by Eric Ries's tips from a recent workshop I've been getting a lot of valuable feedback using just a 32-line PHP module and plain old logging to a file.

To use it yourself, all you need to do is think up a name for your test, and surround your alternatives with an if (should_ab('yourtestname', $userid)). That's it. I've deliberately made it so there's zero configuration, you can just pick an arbitrary test name, to encourage myself to test early and often. It's best if you have a proper user id to supply to the test function, but if you omit it, the client IP address will be used instead.

Now when your users load up a page they should see one version or another based on who they are, but how do you gather the information about which one worked? I'm logging all my user events to a file on the server using my custom_log() function, so whenever a user views a page I want to store what options they viewed it with. To do that, the only other function in the module returns an array containing what A/B choices were made for the current page. With that appending as a JSON string to each log entry, I can run analytics on the user's subsequent behavior, to tell which version of a front page led to the most conversions for example. The only tricky part of this approach is that you need to make sure you're logging the event at the end of the page, after all the choices have been made.

If you want to dive deeper, there's lots of strong frameworks out there for split-testing (I particularly like kissmetrics' approach), but even using something as brain-dead as my 32 line module will be a massive leap forward if you're a non-split-tester like I was.

[Update - Doh! I got the random generator wrong, it only returned true about 30% of the time using the md5 test. I've switched it over to crc32 below and in the file]

Download abtesting.php

<?php
// A module to let you do simple A/B split testing.
// By Pete Warden ( http://petewarden.typepad.com ) - freely reusable with no restrictions

// An array to keep track of the choices that have been made, so we can log them
$g_ab_choices = array();

function should_ab($testname, $userid=null) {
    // If no user identifier is supplied, fall back to the client IP address
    if (empty($userid))
        $userid = $_SERVER['REMOTE_ADDR'];
   
    global $g_ab_choices;
    if (isset($g_ab_choices[$testname]))
        return $g_ab_choices[$testname];
       
    $key = $testname.$userid;
    $keycrc = crc32($key);
   
    $result = (($keycrc&1)==1);
   
    $g_ab_choices[$testname] = $result;
   
    return $result;
}

function get_ab_choices()
{
    global $g_ab_choices;
    return $g_ab_choices;
}
?>

October 27, 2009 | Permalink | Comments (0) | TrackBack (0)

How to log to custom files from PHP

Logcabin
Photo by Old Shoe Woman

I needed a function in PHP that worked like error_log(), but appended to a set of custom files rather than to the standard error_log. I wanted to have an easier way to organize the different types of information, so that important messages weren't buried in an avalanche of less-crucial warnings, but this sort of thing is also great fodder for analytics if you write user events to their own file.

The result is custom_log(). It takes two arguments, a category name that determines which file to write to, and the message you want to log. The message gets written to that file, prefixed with the time and client IP. You can download the code as customlog.zip or it's included below:

<?php

// A module to write out events to a set of log files. Similar to error_log(),
// but with multiple output files.
//
// You'll need to set up a directory that the process running PHP (eg Apache) has
// permission to write to. You'll also need to keep an eye on the size of the log
// files, rotate out old ones once they get too large, etc.
//
// By Pete Warden ( http://petewarden.typepad.com ) - freely reusable with no restrictions

// Edit this to set it to the folder on your server where you want the logs to live
//define('CUSTOM_LOG_ROOT_DIRECTORY', '/private/var/log/apache2/'); // OS X default Apache log directory

define('CUSTOM_LOG_ROOT_DIRECTORY', '/var/log/httpd/'); // Red Hat Linux default Apache log directory

$g_custom_log_categories = array();
$g_custom_log_shutdown_registered = false;

// This function works like error_log(), but takes an extra category argument that
// determines which file the message is appended to.
function custom_log($category, $message)
{
    global $g_custom_log_categories;
   
    // If the file hasn't been opened for appending yet, create a new file handle
    if (!isset($g_custom_log_categories[$category]))
    {
        // Make sure there's no shenanigans with special characters like ../ that
        // could be abused to write outside of the specified directory
        $sanitizedcategory = preg_replace('/[^a-zA-Z0-9]/', '_', $category);
        $filename = CUSTOM_LOG_ROOT_DIRECTORY.$sanitizedcategory;
        $filehandle = fopen($filename, 'a');
        if (empty($filehandle))
        {
            error_log("Failed to open file '$filename' for appending");
            return;
        }

        // To close any open files once the script is done, and so ensure that
        // all the messages are written to disk, register a global shutdown
        // function that fclose()'s any open handles
        global $g_custom_log_shutdown_registered;
        if (!$g_custom_log_shutdown_registered)
        {
            register_shutdown_function('custom_log_on_shutdown');
            $g_custom_log_shutdown_registered = true;
        }
       
        // Urghh, this is required to prevent a spew of warnings when more recent
        // PHP versions are set to strict errors
        if (!ini_get('date.timezone'))
            date_default_timezone_set('UTC');
       
        $g_custom_log_categories[$category] = array('filehandle' => $filehandle);
    }

    // Create the full message and append it to the file
    $categoryinfo = $g_custom_log_categories[$category];   
    $filehandle = $categoryinfo['filehandle'];
   
    $timestring = date('D M j H:i:s Y');
    $ipaddress = $_SERVER['REMOTE_ADDR'];
    $fullmessage = "[$timestring] [$category] [client $ipaddress] $message\n";
   
    fwrite($filehandle, $fullmessage);
}

// A clean-up function called to make sure all open file handles are closed
function custom_log_on_shutdown()
{
    global $g_custom_log_categories;
    foreach ($g_custom_log_categories as $category => $categoryinfo)
        fclose($categoryinfo['filehandle']);
}

?>

October 25, 2009 | Permalink | Comments (0) | TrackBack (0)

Balsamiq: So simple, even a programmer can use it

Balsamiqshot

Mock me mercilessly, I deserve it, but I've really been struggling to prototype on paper before I code. Back at Apple there was always a white-board handy and a bunch of colleagues and customer-surrogates I had to collaborate with on any feature, so I did plenty of documentation before doing any serious engineering. As a lone founder, it's seriously tempting to think I have a good enough picture in my head to just go ahead and try it out.

Wrong, wrong, wrong! For one thing I end up involving users way too late in the process, since it takes a whole bunch of coding effort before I can show them something. Even ignoring that, I've never thought things through as completely as I think I have. Just a few minutes trying to sketch out the result I'm trying to achieve will always show me something I'd missed, and that's a lot cheaper than spending hours of programming to get to the same conclusion.

One of my mental blocks to prototyping is that I couldn't find a method I felt comfortable with. I'd tried the Pencil Sketch Firefox plugin, but it just didn't work the way I wanted. OmniGraffle is fantastic for creating beautiful diagrams, but it's painful to build something that looks like a UI sketch out of it's primitives. I've fallen back to using pen and paper, but it's really hard to alter and evolve hard copy, and you have to scan it in to share it remotely. Finally I tried out Balsamiq last week, and I'm in love.

I could rhapsodize about its ease of use, but the single best feature is that it looks like a sketch. This visual metaphor is really important, it clearly marks the results out as conceptual designs, not detailed blue-prints. This stops both other people and myself from focusing on nit-picking the look-and-feel, and forces a focus on the big questions about content and placement. I don't spend hours obsessing about aligning elements, because they naturally look a bit wonky, so I'm freed to think about what the overall content should be.

You can give it a try for yourself with the online version, and the full desktop product is $79, though I got it for $40 with a Techstars discount. If you're at all involved in product development, I think you'll end up buying it too.

October 21, 2009 | Permalink | Comments (0) | TrackBack (0)

Blogs I'm reading now

Booklist
Photo by MargoLove

Paul Jozefak just posted a list of the startup-related blogs he's reading, and that reminded me that I'd been intending to highlight some of my favorites too. I'm skipping the obvious ones (Brad, Fred, Eric Ries) to focus on lesser-known gems I'd love to see more widely read.

Bill Flagg

Bill's a Boulder entrepreneur with several great companies under his belt, but what really makes him stand out is that he's a boot-strapper. During TechStars he was a great counter-point to the focus on raising money, and he posts some awesome advice on building a company that actually generates cash. How about a billing department that encourages customers to mark down their invoices if they didn't feel like they got their money's worth? It's working for RegOnline.

Rick Segal

I love Rick's blog because of his willingness to risk offending people. I actually got fairly irate at a post he did last year, but I wouldn't have him any other way. What's even more interesting is that he's recently started on a journey from VC to startup founder, so there's been lots of great "Eat your own dogfood" posts, including a mea culpa on ever uttering the words 'lifestyle business' as a VC.

Highway 12 Ventures

Mark and George were very active in TechStars, but I never realized they blogged until Mark's stellar "Don't let the bastards grind you down". Since then I've been working through their archive, and they're chock full of other great posts, even tips from a hostage negotiator!

Jay Parkhill

Talking of negotiations, Jay's latest post on telling who wants to actually do a deal and who's just there to argue is a must-read. He's a lawyer specializing in startups, so there's loads of other great advice like how to cope with the loss of co-founders without sinking the business.

October 19, 2009 | Permalink | Comments (0) | TrackBack (0)

Accidental Haiku


Haikushot


I've always been fascinated by haiku, and the launch of Drunken Haiku by a good friend gave me a brain wave. There's a massive number of updates on Twitter, some of them must be unintentional haiku!

A couple of hours later, Accidental Haiku was born. It sits on Twitter looking for messages with the right syllable patterns. It's not always perfect at counting the sounds, and being Twitter there's lots of fluff, but if you watch it update I guarantee some gems.

It's all pulled together from open source components and you can download the modified phirehose code here. Now if I could just learn to write good haiku myself...

Haikushot2

October 15, 2009 | Permalink | Comments (0) | TrackBack (0)

Information wants to be free, even at WalMart

 Walmart
Photo by El Neato

I was reading The Wal-Mart Effect when I came across a passage that summed up exactly how I want to change the world. Sara Lee had a business relationship with Wal-Mart, and as one of the negotiators recalls:

Senior officials were always coming down there [to Bentonville] for meetings, and they always had their sheets of paper bent up so the Wal-Mart person couldn't see them. The idea was, why didn't we just put the sheets of paper on the table?

So they opened up traditionally closed information, and immediately discovered ways of saving money that benefited both companies. Wal-Mart had empty trucks returning from Florida that could transport Sara Lee's stock after it was shipped from South America. Underwear cartons were too large, Wal-Mart wasted time and money splitting them to send the contents to different stores, so Sara Lee shrank the carton size. As the book puts it, all of these efforts eliminated pure waste, the equivalent of turning off a light in an empty room.

I spent years in a corporate environment where I saw hundreds of opportunities to save money and make the world better for everyone, if only people would talk and share information. I was surprised to see I had that in common with Wal-Mart, but it makes sense given their fanatical approach to efficiency. If you're really trying to be productive, it just doesn't pay to be secretive.

Are there downsides to this? One of the biggest hurdles is trust. Knowledge is power, so you're handing over power to people who's interests may not align with yours. Wal-Mart is the 800lb gorilla with a history of using its market power ruthlessly, and one of the strengths of the book is its detail on the negative side of their dominance. I'd argue that this trust argument is usually a cop-out, hiding worries about turf and control. In most cases it's clear that it's not in the other party's best interest to screw you over, and if it is, why are you dealing with them at all? The worst cases I saw were between departments within the same company, often we shared more information with competitors than the guys down the hall.

Once you're in a business relationship, there's a lot to be gained by putting all the sheets of paper on the table.

October 08, 2009 | Permalink | Comments (0) | TrackBack (0)

A lovely little online icon editor

Iconfushot

My design skills are non-existent, but I often need functional little buttons or badges. Using Photoshop for that sort of thing is like taking a sledgehammer to a nut, so I was extremely happy when I found iconfu.

It appears to be pure Javascript, which is impressive just as a technical feat, but it's also an extremely usable and surprisingly full-featured tool for building tiny icons. It's got undo, nicely anti-aliased primitives and some handy filters. Even better, it's completely free for up to 16x16 images. It's no Photoshop so don't expect to see layers or freehand, but that's part of what I love about it. It takes me back to the paint programs I'd use in the late 80's, and the hours I spent clicking individual pixels to create a massive 320x240 demo background.

The only drawback is that Internet Explorer is not supported, but if doing any web design work I'm sure you'll have a better an alternative browser installed anyway.

October 02, 2009 | Permalink | Comments (0) | TrackBack (0)

Get visual bug reports with SnapABug

Ladybug
Photo by Hamed Saber

One of the most frustrating parts of trying to fix a customer's problem is trying to understand what on earth the problem is. I've spent enormous amounts of time bouncing emails back and forth, or talking on the phone, just to get enough information to start debugging. I've long been a fan of tools like CrossLoop that let you share screens with a remote user, but I'm really excited to see what my fellow Techstars alumni Timzon have come up with.

SnapABug is a small widget you can include in any web page, and it gives users a button they can press to take a screenshot and email it off to your support team along with some notes. Dead simple but incredibly useful! Jerome, Jerome and Tony have done an awesome job of identifying a great market for their technology, I can definitely see this appearing on a lot of sites and becoming a valuable product.

September 29, 2009 | Permalink | Comments (0) | TrackBack (0)

A coming privacy freakout?

Freakout
Photo by Thatha

People don't know how much information about them is freely available on the internet. I was reminded of that by this thread on the WebFinger list about a prototype YQL implementation that lets you look up information about any Yahoo user from their email address. I'll quote Kaliya:

I went and tried the page out to see what it exposed about me.

Both for my "public" use around the web a lot yahoo handle and another one
that I have explicitly kept my "real name" not attached to in any public
forums.

My name listed in both accounts was Kaliya however when you expose people's
"profile names" in web finger you might be exposing information people don't
think is public on the web.  Needless to say I went in and immediately
changed my profile name in my more private account.

I just shared this with guy friend who has several yahoo accounts - one of
them for dating.  I said do you have your "regular name" listed in the
"profile name" - he thought he might. It sort of made him cringe that this
was now exposed.

I think you might have a real uproar from users by exposing their profile
names publicly on the web without letting them know you are doing this. It
would be good to send people a note asking telling them this information
will be exposed to ANYONE WHO ASKS before you make it available via
webfinger.

I was thinking about the difference between twitter and almost everyone
else.  Twitter starts at Radically Open and explicitly so - so as a user I
know what bargain I am striking in using the tool.

Everyone else is trying to go from "closed" as the default and move towards
more open and pulling users along is a challenge - it is changing the rules
of the space and it needs to be well thought out or it will back fire badly.

The response from the developers has been 'Silly user with your expectations of privacy! Didn't you know there's been a Yahoo profile page with that information up for years?' That's my instinct too, there's so many wonderful new services we can build with more open profile information, and we'll never get anything done if we spend all our time sitting around worrying about potential problems.

From talking to people outside the bubble though, I do wonder if there will be trouble ahead. They don't know that Facebook puts up a public profile for them by default, with pictures, your location and some of your friends all open to the world. Services like Flickr, Amazon, Google, AIM, Vimeo all provide APIs to look up information about someone based on their email address. Rapleaf claims to have social network infomation on 375 million people. If you're interested in the information that's available on you, try this example:
http://web.mailana.com/labs/findbyemail/

The worst case is that we plow ahead with what's technologically possible, trigger a moral panic and we end up with restrictive legislation, but even a mild backlash would cause providers to neuter their APIs and remove access to all that lovely data. So what's the answer? I'd love to see something like the apparently defunct attentiontrust.org pledge to help us self-police our use of the data, before someone else comes in to police us!

September 28, 2009 | Permalink | Comments (0) | TrackBack (0)

The case against the Founder's Visa

Immigrantbowling
Photo by Vaguely Artistic

In my last post, I talked about why I'm in favor of tweaking the immigration system to provide a Founder's Visa. It's reasonable to ask what the downsides might be, so I thought it might be useful to lay out some possible objections and how they can be managed.

Gaming the system. Will people be able to pretend to be founders to get into the country? As Pat put it, can I just open a restaurant to claim the visa? The safeguard here is money. The proposal is that you need to have over 10% of a company with $250,000 raised in a recent investment. Once the founder's in the US, that investment will flow into the rest of the economy. It's already possible to get a visa if you self-fund a company with a large investment, the main change here is to open the door to raising money from external investors.

So could you figure out a shady deal to fake the investment? Certainly people have been caught using undeclared loans to raise the money needed for the standard E2 visa, but this proposal doesn't make that kind of fraud any easier.

More spaghetti code
. I grew to loathe the forest of obscure rules that make up the immigration system. It's truly mind-blowing how hard it is to get a simple answer to any question, there's so many special cases built-in. It's like a piece of code where years of bugs have been patched by adding yet another if statement to a monster function. The system fundamentally doesn't work, whether your goals are to reduce immigration (it has all sorts of obscure loopholes) or provide a fair way to encourage productive immigrants (it's so complex there's no guarantees you'll be able to stay).

I'm a big fan of the Canadian points system, where they give points for attributes they want in immigrants, like college degree, or work experience in an in-demand industry, and accept those with a score above a threshold. The Founder's Visa is yet another patch to the system, adding a little more complexity to the process, rather than the true overhaul it needs.

That said, I don't see any practical way to do a proper fix of the system, so patching it for this special case seems like the least-worst option.

Giving too much power to investors. There's a delicate balance of power between investors and founders, and knowing that the founder's very presence in the country depends on that term-sheet will give investors a lot of leverage during negotiations. A lot of founders are pretty nervous about this aspect, but I'm more sanguine. If you're a first-time founder getting investment, you're likely to be in a weak position anyway, and are to a large extent relying on the investors being sensible in their demands so that they don't weaken your motivation to make the company a success. In theory this tips the scales against you even more, but in practice the investors have their thumb on the tray anyway.

I'd like to finish by highlighting a comment from someone who ended up in prison for two days after he reduced his class schedule while here on an educational visa! My story's actually pretty tame, I know plenty of people who ended up having to leave the country, so I'm grateful mine has a happy ending.

September 26, 2009 | Permalink | Comments (0) | TrackBack (0)

My immigration story

Starsandstripes
Photo by Kalwa

I'm a big supporter of the Founder's Visa idea, but to anyone who hasn't been through the US immigration process it may be unclear why it's so important. I loved the way Manu Kumar talked about his experiences, so to contribute to the debate I'm going to give a brief run-down of my 7 year journey through the bureaucracy. A lot of friends and colleagues have been through the process too, so I know my story isn't particularly remarkable, but it may shed some light on why so many talented people give up on moving to the US. It's actually pretty long and boring, but that's kind of the point!

When I was 18, I maxed out my credit cards and took a three-month vacation in Juneau, Alaska, teaching archery at a boy scout camp in return for lodging. I fell in love with the US, there was a freedom here I'd never even imagined at home.

Back in the UK I completed a BS in Computer Science at Manchester University, which I'd chosen because of their awesome history in computers, with teachers from Turing to Steve Furber, designer of the ARM chip. After that I spent five years diving into game programming, and became a specialist in programming console GPUs, starting with the original Playstation, and moving onto the PS2, XBox and Gamecube.

The lure of the US was always in the back on my mind, and once I'd finally paid off my college loans and credit cards, I decided to take up one of the recruiters who kept contacting me, and spend some time working for an American company.

I ended up getting a job at Left Field, based just outside of LA. I accepted their offer in March 2001, but I had to wait until August for the H1B visa application to make it through the INS, not for any errors in the paperwork, that was just how long it took them to look at it.

By the time I got there, only 3 months were left on the project, but I had enough experience I was able to dive in and make a contribution. When the project was over, my manager took me aside and told me he and several other folks were leaving to start their own company. I eagerly followed, and for once the visa transfer process was trivial.

I stuck it out at the new company Kush for a year until the first project was done, but at the point where I was working 90 hour weeks and then got reprimanded for being late on a Sunday, I decided I needed a change. Over the last couple of years I'd developed some open-source image processing plugins to help out with my hobby of providing visuals for clubs and concerts, and I'd recently ported my 40 filters over to After Effects. I was astounded by the response, I'd stumbled into a market where people were charging several thousand dollars for a handful of effects, and I could produce popular ones very easily!

So, I left Kush and set out to build my own company with commercial versions of my effects. Unfortunately I quickly discovered that even with my modest savings and a proven market there was no way for me to set up a company in the US. As luck would have it, Apple approached me on the basis of my open-source technology, offered to buy what I'd created so far and give me a job. The most important part was they not only offered an H1B, they would also sponsor me for a green card.

Predictably the H1B process took longer than expected, and I was stuck back in the UK for 6 months again before I could start at Apple. Once I was there, I had to wait a year before the company would start the sponsoring process. Then, I began the first step, 'labor certification', basically proving my job was so specialized I wasn't shoving an American out. Once all the paperwork was in, I waited and waited, and it took over two years for them to finally look at my application. It was approved, and I was overjoyed, until I dug deeper through the results and realized my lawyer had filed me for an EB3 not an EB2. She'd mistakenly put me down as an unskilled worker, which would mean a decades-long wait! I had to refile under the right designation.

This time it went a lot more quickly, and I got the right certification back, and moved onto the next stage in my green card application. After lots more paperwork, including a medical examination to ensure I wasn't a psychopath or sexual deviant, I was back in another queue. This one took around 18 months, mostly because I got flagged for an FBI background check. I understand the need for checks, but it seemed slightly crazy that they had such a long delay, everyone was already resident in the US so anybody dangerous had a long time to get up to mischief.

Finally my green card came through in May 2008, five years after I started the process with Apple. The biggest sensation was relief, I could finally make plans knowing I'd be here permanently. I tidied up my work with Apple, and was free to start Mailana two months later in July.

So what's the point of this story? I love Apple dearly, I can't imagine a better large company to work for, but I'm a startup guy at heart and I ended up taking the corporate path for a long time purely because of visa issues. The Founder's Visa would have offered me a chance to create my own company much earlier, and hopefully employ a lot of Americans too! Beyond that, there was a constant sword of Damocles hanging over me all the way until I got my green card, at any point a paperwork SNAFU or employer decision could have kicked me out of the country.

September 25, 2009 | Permalink | Comments (0) | TrackBack (0)

How to follow your Apache error logs in a browser

Pawprinttrail
Photo by Hunting Glee

As I work to make the data import process more reliable, one of the patterns that was recommended to me was having your import processes append their results to a log file, and then have a database load process that watches the file and updates the database as it spots new content.

This potentially solves a lot of problems for me, but it's not at all obvious how to implement the file tailing functionality in PHP, so I implemented a standalone example to test some approaches. Since I've always wished there was a better way than ssh-ing into my servers to view a live version of the Apache error logs, I made the example a long-running PHP process that outputs updates from the logs to your browser. You can see it running here:

http://web.mailana.com/labs/logviewer/

The code is downloadable as logviewer.zip, or it's included below. Be warned, before you use this on a production server make sure it's password-protected, since sensitive information like passwords or credit-card numbers might leak into your error messages inadvertently!

If you're looking for a Perl version, you might want to look at the File::Tail module.

<?php

//! This function never exits!
//! It sits on the specified file, calling the process function for each new line of
//! input as it's appended by some other process. The cookie value lets you pass in
//! an opaque object to be used by the process function callback.
function tail_and_process_file($filename, $processfunction, $cookie=null)
{   
    $retrycount = 0;
    while ($retrycount<5)
    {
        if (file_exists($filename))
            $retrycount = 0;
   
        $filehandle = fopen($filename, 'r');
        if (!$filehandle && file_exists($filename))
            die("tail_and_process_file($filename, $processfunction, $cookie): The input file exists but I couldn't access it");
       
        $lastaccesstime = 0;
        $lastfilesize = 0;
        $lastfileposition = 0;
       
        while (file_exists($filename))
        {
            $currentaccesstime = fileatime($filename);
            if ($currentaccesstime!=$lastaccesstime)
            {
                $currentfilesize = filesize($filename);
                if ($currentfilesize<$lastfilesize)
                {
                    $fclose($filehandle);
                    $filehandle = fopen($filename, 'r');
                    $lastfileposition = 0;
                }
                       
                fseek($filehandle, $lastfileposition);
                while (!feof($filehandle))
                {
                    $currentline = fgets($filehandle);
                    if ($currentline!='')
                        $processfunction($currentline, $cookie);
                }
               
                $lastaccesstime = $currentaccesstime;
                $lastfilesize = $currentfilesize;
                $lastfileposition = ftell($filehandle);
            }
           
            // Without this, the results of the fileatime() may be cached
            clearstatcache();
            sleep(1);
        }
       
        // The file no longer exists, so wait a progressively longer interval
        // and try it again. After too many retries, die with an error
        $retrycount += 1;
        sleep($retrycount*15);
    }
   
    die("tail_and_process_file($filename, $processfunction, $cookie): The input file was not present after multiple retries");
}

?>

September 24, 2009 | Permalink | Comments (0) | TrackBack (0)

When Sears was a startup

Therainfallsalike

The rain falls alike on the Just and Unjust. All should be supplied with mackintoshes

I'm here in Seattle for a quick family visit (though I couldn't resist visiting a couple of the local startups) and I found myself flicking through a copy of the 1897 Sears catalog left at our B&B. I was astonished at the energy that leapt off every page, these guys were building their own startup! Here's what seemed so familiar:

On a mission with a new business model. They can't stop talking about how they're cutting out the middle men who've been gouging their customers, with pages devoted to messianic rants against the monopolies trying to put them out of business. They contrast their order fulfillment process (dozens of clerks dealing with tens of thousands of orders a day) with the inefficient country stores full of assistants being paid to idly wait for customers, explaining how they can offer such low prices despite the shipping.

The customers are their evangelists. Want to save on shipping? Here's some examples of how you can get $10 of goods for $6 by persuading your neighbors to order along with you.

Information wants to be free
. Want to know more than you ever believed possible about all the differences in pocket-watch mechanisms? Here's several pages of detail that your local jeweler will never tell you, but we want you to understand what you're buying so you'll feel comfortable buying sight-unseen.

Trust in technology
. The very notion of sending money to some company a thousand miles away and hoping they'll send you decent goods in return was a leap of faith even bigger than typing your credit card into a web site. Instead of SSL certificates, they have an engraving of their building and letters from their bankers.

Searsfarm

Selling a dream. They knew people weren't just looking to buy something when they picked up the catalog, so they offer a hope of a better life in their descriptions and illustrations. People didn't just want barbed wire, they wanted that perfect farm, and Sears used that to sell.

They were completely nuts:

Searsdogpower

Like most innovators, they weren't afraid to screw up. Happily I'm pretty certain this dog-powered butter-churner never hit the mainstream, but they have thousands of new product ideas they were constantly trying out, along with really zany ads.

Grasssuits

Sure, online ad placement can be weird, but a real human being decided people interested in grass suits for hunting wild geese will also like a nice pram!

September 22, 2009 | Permalink | Comments (0) | TrackBack (0)

The BOSS, Bing and Google search APIs from a legal perspective

JudgewigsPhoto by Steve Punter

One of the most promising features of the cloud is the ability to leverage other companies' APIs to power your business. More than just saving money, it lets you do things that would be impossible for a startup to build in-house, like a search index for the whole web.

Of course there's always a downside. As Todd Vernon points out in his latest blog post, you're trusting a third-party with your company's future. This is an especial problem with Google since they tend to automate everything, so it can be near impossible to reach a real human being to fix any problems if they do decide to cut off your access. Jud Valeski spotted a classic example of this when some API providers 86-ed IP addresses on App Engine and EC2.

With those fears in mind, it was great to read Jay Parkhill's analysis of the BOSS, Bing and Google Search APIs. There's still plenty of ambiguity left in the agreements, and I've no doubt that the providers could arbitrarily cut me off despite anything in the terms, but it's a great insight into what the providers care about. It should make it a lot easier to skirt any uses that would hit hot-button issues for them, so I'm very grateful to Jay for taking the time to research this.

September 21, 2009 | Permalink | Comments (0) | TrackBack (0)

Why I switched my search API from Bing to Google

Googlelogo
Image by Mark Knol

Going through the Techstars program, a few of my mentors worried about how much I was revealing through my blog. Fundamentally it isn't a calculation, I love what I'm doing and I love talking about it, but I just ran into yet another situation where being open paid off.

Joehtweet

Joe Heitzeberg dropped me that note in reply to my last blog post on switching to Bing from BOSS, and it was gold-dust. I was aware of the Ajax API from a couple of years ago, but when I last looked into Google's offerings they were extremely restrictive about what you could do with the interface. Checking out their documentation I saw they talk about more than client-side apps, they offer a REST interface and even have some PHP examples! The terms-of-service don't prohibit non-client use, though they do specify that your application must be freely available to users.

After a bit of experimentation I was able to get it up and running, and it made me extremely happy. In the test case I'm running, Google finds 44 Facebook profile pages for Susan Fogg, Bing finds 6 and BOSS only finds 1. That makes a massive difference to the usefulness of the friend suggestion part of Mailana.

There are a few wrinkles to the API. By default it only returns 4 results per call, and I had to add the &rsz=large to get 8. Since I'm getting 50 at a time from the other providers, I then had to loop through adding &start=0, &start=8 , etc to pull in multiple pages. Google also don't include possible duplicate results by default, but adding &filter=0 fixed that.

Updated code included inline below, or you can download the complete source here

<?php

// You'll need to get your own API keys for these services. See

// http://developer.yahoo.com/wsregapp/

// http://www.bing.com/developers/createapp.aspx

// http://code.google.com/apis/ajaxsearch/signup.html

define('BING_API_KEY', '');

define('YAHOO_API_KEY', '');

define('GOOGLE_API_KEY', '');

function pete_curl_get($url, $params)

{

$post_params = array();

foreach ($params as $key => &$val) {

  if (is_array($val)) $val = implode(',', $val);

$post_params[] = $key.'='.urlencode($val);

}

$post_string = implode('&', $post_params);

$fullurl = $url."?".$post_string;

$ch = curl_init();

curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);

    curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);

curl_setopt($ch, CURLOPT_URL, $fullurl);

curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

curl_setopt($ch, CURLOPT_USERAGENT, 'Mailana (curl)');

$result = curl_exec($ch);

curl_close($ch);

return $result;

}

function perform_boss_web_search($termstring)

{

$searchurl = 'http://boss.yahooapis.com/ysearch/web/v1/';

$searchurl .= urlencode($termstring);

$searchparams = array(

'appid' => YAHOO_API_KEY,

'format' => 'json',

'count' => '50',

);

$response = pete_curl_get($searchurl, $searchparams);

$responseobject = json_decode($response, true);

error_log(print_r($responseobject, true));

if ($responseobject['ysearchresponse']['totalhits']==0)

return array();

$allresponseresults = $responseobject['ysearchresponse']['resultset_web'];

$result = array();

foreach ($allresponseresults as $responseresult)

{

$result[] = array(

'url' => $responseresult['url'],

'title' => $responseresult['title'],

'abstract' => $responseresult['abstract'],

);

}

return $result;

}

function perform_bing_web_search($termstring)

{

$searchurl = 'http://api.bing.net/json.aspx?';

$searchurl .= 'AppId='.BING_API_KEY;

$searchurl .= '&Query='.urlencode($termstring);

$searchurl .= '&Sources=Web';

$searchurl .= '&Web.Count=50';

$searchurl .= '&Web.Offset=0';

$searchurl .= '&Web.Options=DisableHostCollapsing+DisableQueryAlterations';

$searchurl .= '&JsonType=raw';

$response = pete_curl_get($searchurl, array());

$responseobject = json_decode($response, true);

if ($responseobject['SearchResponse']['Web']['Total']==0)

return array();

$allresponseresults = $responseobject['SearchResponse']['Web']['Results'];

$result = array();

foreach ($allresponseresults as $responseresult)

{

$result[] = array(

'url' => $responseresult['Url'],

'title' => $responseresult['Title'],

'abstract' => $responseresult['Description'],

);

}

return $result;

}

function perform_google_web_search($termstring)

{

$start = 0;

$result = array();

while ($start<50)

{

$searchurl = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0';

$searchurl .= '&key='.GOOGLE_API_KEY;

$searchurl .= '&start='.$start;

$searchurl .= '&rsz=large';

$searchurl .= '&filter=0';

$searchurl .= '&q='.urlencode($termstring);

$response = pete_curl_get($searchurl, array());

$responseobject = json_decode($response, true);

if (count($responseobject['responseData']['results'])==0)

break;

$allresponseresults = $responseobject['responseData']['results'];

foreach ($allresponseresults as $responseresult)

{

$result[] = array(

'url' => $responseresult['url'],

'title' => $responseresult['title'],

'abstract' => $responseresult['content'],

);

}

$start += 8;

}

return $result;

}

if (isset($_REQUEST['q'])) {

$termstring = urldecode($_REQUEST['q']);

} else {

$termstring = '';

}

?>

<html>

<head>

<title>Test page for Google, BOSS and Bing search apis</title>

</head>

<body>

<div style="padding:20px;">

<center>

<form method="GET" action="searchexample.php">

Search terms: <input type="text" size="40" name="q" value='<?=$termstring?>'/>

</form>

</center>

</div>

<?php

if ($termstring!='') {

$googleresults = perform_google_web_search($termstring);

$bingresults = perform_bing_web_search($termstring);

$bossresults = perform_boss_web_search($termstring);

print '<br/><br/><h2>Google search results ('.count($googleresults).')</h2><br/>';

foreach ($googleresults as $result) {

print '<a href="'.$result['url'].'">'.$result['title'].'</a><br/>';

print '<span style="font-size:80%">'.$result['abstract'].'</span><br/><hr/>';

}

print '<br/><br/><h2>Bing search results ('.count($bingresults).')</h2><br/>';

foreach ($bingresults as $result) {

print '<a href="'.$result['url'].'">'.$result['title'].'</a><br/>';

print '<span style="font-size:80%">'.$result['abstract'].'</span><br/><hr/>';

}

print '<br/><br/><h2>BOSS search results ('.count($bossresults).')</h2><br/>';

foreach ($bossresults as $result) {

print '<a href="'.$result['url'].'">'.$result['title'].'</a><br/>';

print '<span style="font-size:80%">'.$result['abstract'].'</span><br/><hr/>';

}

}

?>

September 09, 2009 | Permalink | Comments (0) | TrackBack (0)

Why I switched my search API from BOSS to Bing

Bing

I'm a massive fan of Yahoo's developer tools, I think they're massively underrated by geekdom, and I'm still heavily reliant on their geo-coding services like Placemaker. It makes me pretty sad to admit I've recently switched from Yahoo BOSS to Bing's search API, so I thought I'd share my reasons, together with some PHP sample code.

In a nutshell, BOSS wasn't finding enough results for the sort of work I'm doing. Here's an example search, looking for people called Susan Fogg with public Facebook profiles:

http://www.bing.com/search?q=site%3Awww.facebook.com%2Fpeople+intitle%3A%22Susan+Fogg%22
6 results

http://search.yahoo.com/search?p=site%3Awww.facebook.com%2Fpeople+intitle%3A%22Susan+Fogg%22
1 result

http://www.google.com/search?q=site%3Awww.facebook.com%2Fpeople+intitle%3A%22Susan+Fogg%22&filter=0
15 results

This is not a scientific survey by any means, but Bing seems to index a lot more of the obscure pages on social networks than Yahoo. If only Google offered an API, they would be even better, but switching to Bing still offers a big improvement for my application.

I was nervous that Bing would be crippled by usage terms, but luckily they are effectively unrestricted and can be used for non-user-facing applications like mine.

Here's the code I'm using, as a download or included inline below:

<?php

// You'll ned to get your own API keys for these services. See
// http://developer.yahoo.com/wsregapp/
// http://www.bing.com/developers/createapp.aspx
define('BING_API_KEY', '');
define('YAHOO_API_KEY', '');

function pete_curl_get($url, $params)
{
    $post_params = array();
    foreach ($params as $key => &$val) {
      if (is_array($val)) $val = implode(',', $val);
        $post_params[] = $key.'='.urlencode($val);
    }
    $post_string = implode('&', $post_params);

    $fullurl = $url."?".$post_string;

    $ch = curl_init();
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
    curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);
    curl_setopt($ch, CURLOPT_URL, $fullurl);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_USERAGENT, 'Mailana (curl)');
    $result = curl_exec($ch);
    curl_close($ch);

    return $result;
}

function perform_boss_web_search($terms)
{
    $searchurl = 'http://boss.yahooapis.com/ysearch/web/v1/';
    $searchurl .= urlencode(implode(' ', $terms));
    $searchparams = array(
        'appid' => YAHOO_API_KEY,
        'format' => 'json',
        'count' => '50',
    );

    $response = pete_curl_get($searchurl, $searchparams);
   
    $responseobject = json_decode($response, true);
   
    if ($responseobject['ysearchresponse']['totalhits']==0)
        return array();
   
    $allresponseresults = $responseobject['ysearchresponse']['resultset_web'];

    $result = array();
    foreach ($allresponseresults as $responseresult)
    {
        $result[] = array(
            'url' => $responseresult['url'],
            'title' => $responseresult['title'],
            'abstract' => $responseresult['abstract'],
        );
    }

    return $result;
}

function perform_bing_web_search($terms)
{
    $searchurl = 'http://api.bing.net/json.aspx?';
    $searchurl .= 'AppId='.BING_API_KEY;
    $searchurl .= '&Query='.urlencode(implode(' ', $terms));
    $searchurl .= '&Sources=Web';
    $searchurl .= '&Web.Count=50';
    $searchurl .= '&Web.Offset=0';
    $searchurl .= '&Web.Options=DisableHostCollapsing+DisableQueryAlterations';
    $searchurl .= '&JsonType=raw';

    $response = pete_curl_get($searchurl, array());
   
    $responseobject = json_decode($response, true);
    if ($responseobject['SearchResponse']['Web']['Total']==0)
        return array();
   
    $allresponseresults = $responseobject['SearchResponse']['Web']['Results'];

    $result = array();
    foreach ($allresponseresults as $responseresult)
    {
        $result[] = array(
            'url' => $responseresult['Url'],
            'title' => $responseresult['Title'],
            'abstract' => $responseresult['Description'],
        );
    }

    return $result;
}

if (isset($_REQUEST['q'])) {
    $terms = explode(' ', urldecode($_REQUEST['q']));
} else {
    $terms = array();
}

$termstring = implode(' ', $terms);
?>
<html>
<head>
<title>Test page for BOSS and Bing search apis</title>
</head>
<body>
<div style="padding:20px;">
<center>
<form method="GET" action="index.php">
Search terms: <input type="text" size="40" name="q" value="<?=$termstring?>"/>
</form>
</center>
</div>
<?php
if (count($terms)>0) {

    $bingresults = perform_bing_web_search($terms);
    $bossresults = perform_boss_web_search($terms);

    print '<br/><br/><h2>Bing search results ('.count($bingresults).')</h2><br/>';
    foreach ($bingresults as $result) {
        print '<a href="'.$result['url'].'">'.$result['title'].'</a><br/>';
        print '<span style="font-size:80%">'.$result['abstract'].'</span><br/><hr/>';
    }

    print '<br/><br/><h2>BOSS search results ('.count($bingresults).')</h2><br/>';
    foreach ($bingresults as $result) {
        print '<a href="'.$result['url'].'">'.$result['title'].'</a><br/>';
        print '<span style="font-size:80%">'.$result['abstract'].'</span><br/><hr/>';
    }

}

?>

September 08, 2009 | Permalink | Comments (0) | TrackBack (0)

Net Promoter Score (NPS) Example Code

Orb
Photo by Jjjohn

I first ran across the Net Promoter Score through a post by Eric Ries and it seems like a simple but effective measure of how happy your customers are. Its beauty is how basic it is, which both makes it easy to interpret and straight-forward to gather without annoying your customers.

Unfortunately there don't seem to be any off-the-shelf solutions to help gather the information. In the past I've created surveys through companies like SurveyMonkey, but that's both a pretty intrusive experience and they don't give you any way to calculate an NPS without downloading the raw data into Excel and doing it yourself!

What I wanted was a way to survey my customers from within the site, without sending them to an external page or another window, store the results on my own server and then have a simple way of viewing reports on the data over time. Since there was nothing else out there, I wrote a simple Javascript/PHP/MySQL module to handle these requirements, and since I'm sure there are other people who could use something similar and I hate seeing wheel-reinvention, I've released it under a BSD license.

It works by randomly bringing up an in-page popup, asking the user whether they'd recommend the service to a friend, and then requesting any other comments. The data gets passed back to the server and stored in the database, where you can then get a very basic HTML report, or pull the data as JSON to pass into your metrics pipeline, for things like daily email reports.

You can download the code here:
http://code.google.com/p/npspopup/

If you want to try in action, here's the demo page:
http://web.mailana.com/labs/npspopup/index.php

I'm testing it on my own site but the code is still pretty pre-alpha, so don't be shy with bug-fixes and other modifications.

August 23, 2009 | Permalink | Comments (0) | TrackBack (0)

Next »