Thursday, July 30, 2009

Moving to WordPress

WordPress has built in code formatting (SyntaxHighlighter) so no more manual formatting of code. My new blog is here


Saturday, July 18, 2009

Random thoughts


Reading a book called Slack by Tom DeMarco.

Very large text files processing

If you are dealing with very large text files, i.e. more than 500MB in size, here are a few tips that might help
  1. On windows, use Textpad for viewing/editing files. It handles large files very well. Alternatively you can use unix utilities or cygwin if you are working on windows.
  2. Java doesn't handle large files very well. Consider using Perl or unix shell script. You will be amazed at the performance gains.
  3. If you need to save this to a database, consider a direct bulk copy using your database's load utility e.g. sqlldr (oracle) or bcp(sybase, ms-sql).

Procrastinators logic: Cleaning your apartment is O(1) complexity

N being the number of days since you last cleaned your apartment, for small N, the time taken, t, to clean your apartment will not vary much over N.

This makes apartment cleaning an O(1) complexity algorithm.


Tuesday, June 30, 2009

Netbeans 6.7 is released. I am still not happy with "Go To File".

Netbeans 6.7 is released.

Despite "Improved search" as one of the features of the new release, the "Go To File" feature (Alt+Shift+O) is still as slow as the previous version. This is a bummer as I use this feature most often. In Netbeans, file search is either very slow or throws a <No Files Found > even when the file exists.

Compare this with "Open Resource" (Ctrl + R) feature in Eclipse. Works like a charm and gives you a filtered list of all matching resources even before you've finished typing.


Are you testing your units ?

Read a brilliant and very apt article on Functional testing by Tim Sutherland. The article makes a case of why functional testing is more important than unit testing in some applications specially those that do not have complex algorithms or APIs in the code.

The application I work on at my workplace is a case in point. It's a highly data centric, legacy, ETL application written in Java. Most of the code does not have any complex business logic that requires testing at a unit level. In fact it is the integration of the tiny java components and how they collaborate during run time that contributes to the complexity of the application. In the last 2 years that I have worked on this code, I have seen very few cases where a bug could have been caught during unit testing. Typically, most defects occur due to unexpected or bad data.

In such cases, I strongly agree with the author of the above post that a small, carefully written set of functional tests is more useful than unit tests. We can run these tests nightly as part of continuous integration and also for smoke testing during every release.

I do think, however, that at the unit level, a test driven approach might still be useful. So when I am writing, let's say, a DAO, I can write a few integration tests first for testing the DAO. Even in such cases, hard core unit testing (with mocking etc.) does not yield much benefits. These tests could be reused later for low level integration testing of individual components. But they need not be run regularly as part of the continuous integration process to save time.


Monday, June 22, 2009

Why isn't my unix sort working?

Gaah.Today I ran into a strange problem while running the 'sort' command on Unix. On running this command with the following input,


I was getting


as the output. I was expecting the output to be


It was as if the '@' character in my input data was completely being ignored. This caused a long running data load process to fail due to wrong data as I was using sort and merge logic to eliminate duplicates and merge data from multiple files.

On seraching the internet, I found that the 'sort' command depends on locale to decide the ordering of characters. you can check the default locale by using the 'locale' command.

The solution to fix the above sort is to set LC_ALL to "C" before calling sort. "C" stands for collation locale.

> export LC_ALL=C
> cat inputdata sort -s -T .

Turns out that there are some other comands that depend on locale. Read more on this subject here.


Friday, May 15, 2009

Ruby on Rails

I attended a 2 day course on RoR during the Good Friday weekend. It was a beginners course.

RoR has a lot of "magic" moments when you just click a few buttons and 'viola', it spews out a shiny new web application for you. Nothing hard core though as the "scaffolding" as it is called is only good for the very basic CRUD web apps.

On the other hand, I found very little information available on the net that could explain what was happening under the hoods. It is possible that I did not look in the right places, though.

I am now practicing the concepts by creating a simple effort tracking web application in my free time.


Wednesday, April 22, 2009

Beware of ls --color on Unix

I ran into a ver interesting problem today.

I was trying to redirect the output of the 'ls' command to a file.

$ ls
1.dat 2.dat 3.dat
$ ls > log

To my surprise the output file contained lots of 'special' characters'.

$ vi log





This was giving several errors in some other process that was using this file.

After spending an hour on this problem, I figured out that the culprit was an 'alias' that had mapped 'ls' to 'ls --color'. This had caused the output to contain the escape sequences for colors.

The problem got resolved by unaliasing with 'unalias ls'.