One key principle of Agile development is Don't Repeat Yourself. If you've got one piece of data, make sure it's only defined in one place in your code. That way it's easy to change without either having to remember everywhere else you need to modify, or introducing bugs because your software's inconsistent.
This gets really hard when you're dealing with data flowing back and forth between XML and SQL. There's a fundamental mismatch between a relational database that wants to store its information in columns and rows, and the tree structure of an XML document. Stylus do a great job describing the technical details of why XML is from Venus and SQL is from Mars, but the upshot is that it's hard to find a common language that you can use to describe the data in both. A simple example is a list of the recipients of a particular email. The natural XML idiom would be something like this:
<message>
<snipped the other data>
<recipients>
<email>bob@bob.com</email>
<email>sue@sue.com</email>
</recipients>
</message>
But in mysql, you're completely listless. To accommodate a variable length collection of items you need to set up a separate table that connects back to the owner of the data. In this case you might have a seperate 'recipients' table with rows that contained each address, together with some identifier that linked it with the original message held in another table. It's issues like this that make a native XML database like MarkLogic very appealing if you're mostly dealing with XML documents.
What I'd like to do is define my email message data model once, and then derive both the XML parsing and mysql interaction code from that. That would let me rapidly change the details of what's stored without having to trawl through pages of boiler-plate code. I'm getting close, sticking to a simple subset of XML that's very close to JSON, but defining a general way to translate lists of items back and forth is really tough.
I'm trying to avoid being an architecture astronaut, but it's one of those problems that feels worth spending a little bit of upfront time on. It passes the "will this save me more time than it takes in the next four weeks?" code ROI test. I'd welcome any suggestions too, this feels like something that must have been solved many times before.
Comments