null or empty

Best feature of Outlook 2010

Office 2010 is almost ready to ship! I'm an Outlook user by day, and Gmail user by night. But I find that Gmail doesn't scale well when you are being flooded with e-mail -- for example, basic UI metaphors like shift-click don't work, and labels just don't cut it compared to Outlook rules. So, here's my favorite new feature from Outlook 2010 for dealing with floods of e-mail:

Basically, it deletes any e-mails that are entirely contained within replies later in the conversation. This is great for high traffic discussion aliases and long-winded threads. There's just something really gratifying about pressing a button and seeing half my Inbox disappear..

Uh-oh for Windows?

For most people, the two biggest advantages of a PC over a Mac are that Macs cost more, and you can't play (most) games on a Mac. Most Mac owners I know either have a separate gaming rig or dual boot to Windows just for video games.

Today marks an inflection point in the Mac vs PC war: Steam has been ported to Mac! The only games I play on a PC anymore are those from Valve (Left 4 Dead, Counterstrike, Half-life, etc) and from Blizzard (Starcraft, Warcraft, etc). Most other games are better experienced on a console. Well, both of those sets of games are now going to be released for the Mac on the same day as the PC!

As someone who owns Microsoft stock, this is a big problem. You do not want an OS where your main differentiator is that it's cheaper, or to rely on mass-market inertia. My computer use is split amongst internet use, coding, creativity software, office software, and video games. If I were to buy a computer today, for the first time, I would actually consider a Mac. For the first time, Mac has achieved parity with PC across my usage scenarios.

This is a dangerous time for Microsoft.. tread carefully.

Color calibration, or lack thereof

Every monitor displays color differently. If you've ever used dual monitors, you know what I'm talking about. The picture below is my Lenovo T500 on the left, a Dell 2005WFP on the right:

I suppose how much of a color difference you see in the two monitors above depends on your monitor's color profile, but for me, the standalone monitor comes across as having greener greens and redder reds. In fact, my laptop portrays this blog as a nice cool blue, whereas on my monitor it is a hideous shade of green. My intention is most certainly the blue variant, but I have no idea what other people are seeing.

Anyways, this is really important for web design and photography. So, I am using this as an excuse to go buy a Dell U2410 IPS monitor and a Spyder3 color calibrator. That will ensure I am seeing what I am "supposed" to see, but presumably it remains a crapshoot for the remaining 99% of the world with uncalibrated monitors. They, no doubt, will take a look at this blog and see some unflattering and garish hue. Yuck.

Microsoft Azure Services

Microsoft is getting ready to release their cloud computing platform, Azure, and there's a pretty good overview written by David Chappell. One snippet which I found amusing was:

Windows Azure platform AppFabric provides cloud-based infrastructure services. Microsoft is also creating an analogous technology known as Windows Server AppFabric. [...] Don’t be confused; throughout this paper, the name “AppFabric” is used to refer to the cloud-based services. Also, don’t confuse the Windows Azure platform AppFabric with the fabric component of Windows Azure itself. Even though both contain the term “fabric”, they’re wholly separate technologies addressing quite distinct problems.

Don't be confused? Really? Then don't call everything "fabric"! I thought Microsoft had learned from the "Windows Live" naming debacle. Somebody needs to buy Microsoft a thesaurus..

Algorithms for storing and querying hierchical trees

I've often found myself needing to represent hierarchical data in my database -- navigation trees, forum threads, organizational charts, taxonomies, etc. I've been trying different approaches to maintaining a hierarchy, and thought others might be interested in my findings. For purposes of illustration, our sample tree is the following:

Approach #1: Adjacency list
The idea here is simple, you store each node's parent in a table:

table: nodes
id	parent_id
1	null
2	1
3	2
4	1
5	2

This is trivial to implement, but hierarchical queries become hard. In order to query for all nodes under a given branch, you have to recurse through its children. If you don't have too many nodes, you can just read the entire table into memory and cache it -- which is sufficient for most web site navigation structures, for example.

Approach #2: Store the Path as a string
Here, the idea is that each node stores its path as string. For example, a node might have a path of "1_8_13". Thus, you could find the children of node "8" by querying for all nodes with a path of "1_8_%".

table: nodes
id	path
1	"1"
2	"1_2"
3	"1_2_3"
4	"1_4"
5	"1_2_5"

This gives you the benefit of hierarchical queries, but only if you add an index on the "path" column, forcing SQL to do the heavy lifting. And, since it's a string column, your performance will not be as fast as if it were integer-based.

Approach #3: Nested subsets
The idea here is that each subtree is kept within a range of IDs, and the range of its subtree is stored in the node. In the example, the subtree of 1 is (obviously) within the range of 1..5. However, you'll notice the subtree of 2 is NOT within the range of 3..5 because node 4 violates that rule. As a result, we need a mutable ID in order to maintain the subset.

table: nodes
id	mutable_id	min_mutable_id	max_mutable_id
1	1	1	5
2	2	3	4
3	3	3	3
4	5	5	5
5	4	4	4

Note how we had to swap the IDs of 4 and 5, so that node 2 could have a valid nested subset range of 3..4. This can easily happen on insertions as well and force us to recompute large parts of the table if shifting is required. However, hierarchical reads are fairly inexpensive, as they just become numerical range queries.

Approach #4: Expanded tree
The idea here is that you store the normal adjacency list, but maintain another table of the tree already recursively expanded-out:

table: nodes
id	parent_id
1	null
2	1
3	2
4	1
5	2

table: nodes_expanded
id	expanded_parent_id
1	1
2	1
2	2
3	1
3	2
3	3
4	1
4	4
5	1
5	2
5	5

Essentially, the expanded table acts as a hierarchy cache. For example, to get all nodes under the "2" subtree, just find all nodes with (expanded_parent_id == 2), which will return matches on 2, 3, and 5 as expected. The main benefit of this approach is that all your SQL queries are based on exact match, whereas the last two approaches use range queries. Likewise, while an insertion will require you to futz with the "nodes_expanded" table, the data in the "nodes" table stays intact. With the nested subsets approach, you may find your main "nodes" table locked on reads while all the IDs get shuffled around.

So, to summarize:

	Pros	Cons
Adjacency list	Easy to implement Minimum storage	Slow calculation of subtrees (can mitigate with in-memory caching)
Path substrings	Easy to implement Handles hierarchical queries	Relies on SQL index on a string column Inefficient storage (only using 0-9 and "_" in the char range)
Nested subsets	Handles hierarchical queries	Insertions can be expensive Insertions can result in lock contention
Expanded tree	Handles hierarchical queries Hierarchy is pre-cached as a simple "equality" join	Requires maintaining separate "nodes_expanded" table Insertions can be expensive, but not against the main "nodes" table

Later, I hope to implement and benchmark each approach against each other. Any other algorithms worth investigating?

Windows 7 Shortcuts

Just thought I'd share some shortcut keys I use all the time:

Windows + D: Show Desktop
Windows + Tab: 3D Flip
Windows + #: Runs the #'th program on your Quick Launch

And in Explorer:

Shift+Right-Click on a folder/file: Additional options like "Open command window here"
Alt+Up: Goes up a folder level in Windows Explorer (plus Alt+Left/Right for Back/Forward)

My love for dependencies ...

Once upon a time, we had a developer whose full-time job was debugging random issues in some particular feature. That feature had a dependency on an external team who had no vested interest in this feature, and therefore using their library was a bit like using chopsticks (their library) to eat steak (of course our feature is the delicious steak). Sure, you can use the chopsticks, but every time you do you question whether you'd be better off without them and just eating the steak with your hands.

Couldn't get any worse, right?

So, when a different team approached us with a product that was a perfect fork and knife that they used to eat steak every day for the last three years, we chomped at the bit to get a hold of it. Long story short, their utensils were made of plastic and were constantly breaking, and now we have two developers whose full-time jobs are debugging random issues in this feature.

We long for the days of having chopsticks to eat our steak. Do not take dependencies lightly.

Hard Drive Backup with Live Mesh

I hope everyone out there is backing up their data. Up until now, I've used the tried-and-true method of copying my files periodically to another drive. Of course, in the event of data catastrophy, I would lose all my changes since the last xcopy .. which was .. about 9 months ago. A file backup gestation period, if you will.

In any case, I'm now using Live Mesh. It's cross-platform and you get 5gb of online storage for free (you can sync unlimited data between machines). I've synchronized my musics, videos, and documents between all my machines which is pretty fantastic. In case you want to try it, here's what I would have liked to know beforehand:

You cannot synchronize your Desktop folder.
Your first 5gb of synchronized files ends up in the cloud. Choose wisely.
You have to login with a LiveID, but it doesn't share cookies with the browser. So, if you will ever want to sync with a friend, create a new LiveID to share.
When you add a folder to be sync'd, it will show up on every other machine as a virtual folder. This can be very confusing when you've named them all "Documents" -- prefix folder names with the computer name.

My next step is to set up a sync with my a friend in another state, in case my home with all my computers burns down. Overall, it was pretty easy to setup, although I now have a paranoia that one node will decide to delete something, and spontaneously trigger all my files to be deleted on every machine simultaneously.

Concurrency bug..

OK, spot the bug in the code:



    object m_lockObject = new object();

    object[] m_collection = null;



    public object[] GetCollection() {

        lock (m_lockObject) {

            if (m_collection != null) {

                // already initialized

                return m_collection;

            }

            else {

                // needs to be initialized

                m_collection = new object[5];

                initialize(m_collection);

                return m_collection;

            }      

        }

    }

.. the bug is that a second call could come after m_collection is new'd up, but before it's initialized, resulting in an empty collection being returned. The first call works, the second call sometimes fails, and the third call onwards likely succeeds. Bugs like this can be a pain to track down as, depending on what these objects do, the symptoms will appear really strange...

Managing your time wisely

I'm one of those people who strive to be “efficient”. I learned this playing games like Starcraft. To win, you have to click like a madman to control everything at once. The best players were above 200 clicks per minute. And, you better type at light speed, otherwise you will get clobbered while writing messages to your teammates. At work, this means I don't sit around for process recycle or rebuild, I always go quick-check something else while I wait. I've read that your brain thinks at around 400 wpm (words per minute), so even if you type at a zippy 150 wpm then you are wasting braincycles. When I watch people type at a very reasonable 60wpm, it takes every ounce of resistance in my body not to rip the keyboard away and type for them.

So yes, patience is not one of my virtues. As a result, I cannot believe a 3.0ghz quad-code computer makes me wait. Ever. Everytime Outlook hangs while I'm in the middle of typing my e-mail, I can't help but flip it the bird. What on earth is it doing? If not for NetBIOS name restrictions, my computer's names would be !$(^$@( and %!%^(*.

Anyways, some tips for dealing with e-mail:

Reply to the e-mail the first time you read it. It takes a few minutes to context switch into a problem, so make sure to only do it once. Don't “save this mail for later“, because you'll either forget, waste time reading it again, or you're making the other guy wait. Even a brief initial response is often enough for the sender to figure the problem out.
Delete the e-mail as soon as you reply. Don't worry, it'll be in your trash for a while, and you have your "sent items" to fall back on too. But the net result will be a clean Inbox which reads like a to-do list so you won't lose track of things.
If your response is going to be more than a paragraph or two, go talk in person. I could not believe how long it takes to craft a well thought-out e-mail -- try timing yourself sometime. And even then, the recipient usually just asks you to schedule a meeting and it quickly becomes clear they didn't even bother reading the mail.

Some tips for software development:

Invest in your development environment. Spend time learning all the shortkeys, discovering ways to customize the tools you use every day, and get the hardware that will make you most productive. Start with a new monitor. A big one.
Given that your typing speed is a constant, reduce the amount you have to type. I create batch files for everything -- "n" for notepad, "d" for diff, "b" to rebuild, shortkeys to take me directly to common directory paths, etc. I ditched my hardware KVM switch because of the two-second switching lag -- using software to swap desktops is instantaneous. It may not sound like much, but instantaneous is an order of magnitude better and can change the way you work.
Automate repetitive tasks. If you find yourself doing the same thing over and over, you can save tons of time by automating it. I've written tons of tools that do repetitive, labor-intensive tasks automatically, and your peers will appreciate it too when you share it with them.

Strategies that haven’t worked out for me:

Closing the door doesn’t prevent people from stopping by, and it shouldn’t. The fact that they invested the time to pay you a visit, means that it must be important to them. Ignoring them may only save you five minutes, but cost them an hour.
Having separate dedicated boxes for coding and e-mailing doesn’t allow me to focus single-mindedly on programming. It just makes me switch between machines all the time.
When I come in early in the morning, I don't get any additional work done. If I don't get sleep, I will spend the morning sipping tea and reading the news. Even more than usual.

What strategies work for you?