
No security is perfect, but there's a number of important restrictions built into MashProxy and SearchMash.
The security in MashProxy is based around the Java applet being digitally signed. This signature confirms that the code has not been tampered with, and was written by the author. I'm using a certificate from Verisign, which I had to give solid proof of my identity to obtain.
This proof of identity means that it's very easy to track someone down who did write malicious code. It reduces the decision of 'Do I want to run this code on my machine?' to 'Do I trust the author of the code?'.
I'm an established and reputable open source programmer, as a search on my name will verify, and I've been distributing widely used executable code (such as FreeFrame plugins for many years with no security problems. This should give you some assurance that my intentions are good.
The other question is whether I've done a competent job safeguarding the security in my implementation? I believe I have, and to back that up I'll cover the details of what I've done below. The source code is also freely available through the SourceForge project for anyone who wants to examine it for themselves.
The basic foundation of MashProxy is that signed Java applets running within a web browser are able to fetch pages from anywhere on the web, and aren't restricted by the same-domain policy that controls most page access functions.
My initial attempt at a search mash ran entirely within a signed applet, but I discovered that support for parsing and rendering HTML was too limited in Java, and I wasn't happy with what I could achieve.
Taking what I'd learnt, I looked into other possible ways of implementing the functionality I wanted. JavaScript fitted the bill, since it has great support for page display and parsing, is well supported and documented, and is cross-browser. The one thing it didn't have is the ability to load pages from other domains.
However, it was able to call into a signed applet, and the applet could then load the pages from any domain. So, I took my original code for the full applet, and reduced it to just a single public function to load a web page. I was then able to call it from JavaScript and build SearchMash.
Now, since the applet doesn't know what JavaScript is calling it, this opens up a lot of possibilities for abuse. I set out to engineer some restrictions into the applet to prevent malicious usage:
Applet Safeguards
The major safeguard is that I only allow the applet to run from pages on the petewarden.com or mashproxy.com, sites I control. This ensures that the applet can't be used with my signature on other people's sites. I also allow the applet to be run from pages that are on the local machine (checking for file as the protocol), with the assumption that local files are trusted. This means that development using my signed applet will still be possible by third parties using local files, but to deploy they'll need to get a certificate and rebuild their own applet with their site added to the whitelist of trusted domains.
To check the page I'm running on, I call JApplet's getDocumentBase() function, which returns the page that invokes the applet (not the location of the applet, that would still allow untrusted third parties to call it).
I also make sure that all requests begin with "http://", (even though the HTTPURLConnection code that I call in Java shouldn't allow anything else), to restrict any other protocols from being invoked.
The one thing I wasn't able to do was prevent the passing of cookies automatically. I would prefer to always work without cookies, to avoid the possibility of accessing personal data, but these seem to be added automatically by the browser to all requests made through HTTPURLConnection.
SearchMash JavaScript
These are the measures I've taken to ensure the applet can't be used maliciously by someone else. I'll now cover what the JavaScript that implements SearchMash actually does, step by step, to provide evidence it won't leave you vulnerable.
When the main index.html is loaded, it contains three frames, the one for results, the one for the preview, and a small one to hold the applet. The small one is necessary because some browsers don't want to run invisible applets, which seems sensible.
The onload function calls code that Internet Explorer needs to run the applet automatically, because of patent issues. Then, the applet is loaded, and calls back to SB_NotifyAppletLoaded(), which triggers the first fetch of the google search page.
When a search page is loaded, I add JavaScript hooks into the page links so the user can navigate through pages without leaving the mashup. I also add mouse-over hooks to activate the preview frame. I parse all the links in the page, and put in page requests to the applet for all that appear to be external to Google.
The applet calls back to JavaScript when a page is loaded or determined to be missing. If it's an external page, the current search page is searched for the link, the status text is updated, and the contents are stored for later use in the preview frame. If it's a search results page, then the link hooks are added, and the search frame is updated.
At no point in the code is any information from the fetched pages passed back to my site, or any other external site. Those pages never leave your local machine, which ensures that your information is kept private.
Remaining Vulnerabilities
Being able to read from arbitrary domains opens up some malicious possibilities:
- Reading from web pages using the browsers' cookies to access personal information
- Reading from web pages that are within a private intranet
- Passing any information obtained back to the attackers website by encoding the information in a URL
Because the HTML from the remote sites that SearchMash fetches is loaded into frames that have access to the applet, any scripts inside the HTML loaded from external sites could use the applet's functions.
I've added some checks to try and remove scripts from loaded pages, but it's notoriously tough to parse out all scripts (see the myspace samy exploit).
Something that would limit the threat would be removing access to cookies. I'm looking into using a lower level functions in Java, or a third-party library that would do let me disable them.