FitProgrammer@Work

Sunday, April 15, 2007

Understanding Cable Broadband Technology

Below is an introduction to the terms, concepts and summary of the features of DOCSIS standard versions which i have compiled from several sources in an effort to learn about the cable broadband technology.

Cable Modem: is a device that is designed to bridge customer's home computing network to an external network, usually the Internet. This is accomplished by using the preexisting coaxial cable network, originally designed for the cable TV infrastructure, known as Community Antenna Television (CATV).

Coaxial Cable (RG-6 type): Many video channels, each carried at a specific frequency, are superimposed by the cable provider onto a single carrier medium - a standard coaxial cable. This process modulates each channel so that it is exactly 6 MHz (8MHz in Europe) away from the previous channel, and the frequency range available for a CATV provider to use typically runs from 42 to 850 MHz. When a user is watching a channel, the TV is tuned to the frequency that represents the channel and so displays only the part of the cable signal that corresponds to that channel. The legacy CATV infrastructure was designed as a one-way communication network.

ADSL: As demand for faster home internet service increased, cable companies began using their existing coax cable networks to offer digital internet connectivity. At the same time, telcos (phone companies) started using their existing copper two-wire phone lines to offer a similar service known as ADSL (Asynchronous Digital Subscriber Line), where the downstream connection is faster than the upstream connection. Unlike dialup, DSL uses a sophisticated frequency-modulation method to transmit data through copper line wires without disrupting the regular phone service over the line.

Quick comparison between Cable modem and ADSL broadband technologies:

Cable Modem	ADSL
	DSL is decent for browsing web, sending emails, sending and receiving pictures and downloading music but it usually lacks bandwidth for anything having to do with video.
Almost insensitive to distance between CMTS and CM as fiber optic cables can support digital data transmission over longer distances.	DSL is distance sensitive: the signal decreases with increasing distance between the modem and the network service provider, which results in a loss of data throughput. As a result, DSL modem may achieve only a fraction of the advertised data speeds.
Cable service operates on a coax cable which has a higher informational density and is physically thicker than phone wire. This provides a cleaner signal and allows you to modulate more data at higher frequencies with fewer errors.
Coax cable is a shared medium, meaning every house in the area around a local hub of coax (known as drop) is physically connected to the same coax cable.	A DSL home line is a dedicated connection that connects the home user directly with the service provider (the phone company).
Cable modems can upload faster than DSL modems can (max download speed being 38 Mbps and max upload speed being 30 Mbps) but the upstream bandwidth is usually limited by the ISP to a much slower rate.

In short, i think the one can decide which of the two technologies to choose based on which one works out cheaper and/or reliable in their area as both these technologies are capable and can coexist as means to achieve broadband internet connection for home users. I used to use ADSL from airtel in India (Bangalore) and am using Cable modem from comcast in USA (Maynard, MA) and have noticed no significant difference in the service quality during video chats on internet (viz a scenario where high usage of upstream and downstream is made).

A Cable network:

A cable coax network is a bus topology - ie all service nodes (cable modems) are connected to a common medium, the coax bus. Each modem connected to a bus shares this line with every other modem when sending and receiving data. But generally cable modem networks use a technology called hybrid fiber coax (HFC), which incorporates both optical fiber along with coaxial cable to create a broadband network.

A fiber optic node has a broadband optical transmitter and receiver capable of converting the downstream optically modulated signal coming from the headend to an electrical signal going to the homes as well as electrical signals from the home into optical signals in the reverse path. Today, this downstream electrical output is a radio frequency modulated signal that ranges from 50 MHz to 1000 MHz. Fiber optic cables connect the optical node to a distant headend or hub in a point-to-point or star topology or in some cases, in a protected ring topology. The fiber optic node also contains a reverse path transmitter that sends communication from the home back to the headend. In the United States, this reverse signal is a modulated radio frequency ranging from 5 to 42 MHz while in other parts of the world, the range is 5 to 65 MHz.

The coaxial portion of the network connects 25 to 2000 homes (500 is typical) in a tree-and-branch configuration. Radio frequency amplifiers are used at intervals to overcome cable attenuation and passive losses caused by splitting or "tapping" the cable. Trunk coaxial cables are connected to the optical node and form a coaxial backbone to which smaller distribution cables connect. Trunk cables also carry AC power which is added to the cable line at usually either 60V or 90V by a power supply and a power inserter. The power is added to the cable line so that trunk and distribution amplifiers do not need an individual, external power source.

From the trunk cables, smaller distribution cables are connected to a port of the trunk amplifier to carry the RF signal and the AC power down individual streets. If needed, line extenders, which are smaller distribution amplifiers, boost the signals to keep the power of the television signal at a level that the TV can accept. The distribution line is then "tapped" into and used to connect the individual drops to customer homes. These taps pass the RF signal and block the AC power unless there are telephony devices that need the back-up power reliability provided by the coax power system. The tap terminates into a small coaxial drop using a standard screw type connector known as an “F” connector. The drop is then connected to the house where a ground block protects the system from stray voltages. Depending on the design of the network, the signal can then be passed through a splitter to multiple TVs and a cable modem.

A single downstream 6 MHz television channel may support up to 27 Mbps of downstream data throughput from the cable headend using 64 QAM (quadrature amplitude modulation) transmission technology. Speeds can be boosted to 36 Mbps using 256 QAM. Upstream channels may deliver 500 Kbps to 10 Mbps from homes using 16QAM or QPSK (quadrature phase shift key) modulation techniques, depending on the amount of spectrum allocated for service. This upstream and downstream bandwidth is shared by the active data subscribers connected to a given cable network segment, typically 500 to 2,000 homes on a modern HFC network.Most cable modem systems rely on a shared access platform, much like an office LAN. Because cable modem subscribers share available bandwidth during their sessions, there are concerns that cable modem users will see poor performance as the number of subscribers increases on the network. “Common sense dictates that 200 cable data subscribers sharing a 27-Mbps connection would each get only about 135 Kbps of throughput -- virtually the same speed as a 128-Kbps ISDN connection -- right? Not necessarily (Crockett, 99)”.

Unlike circuit-switched telephone networks where a caller is allocated a dedicated connection, cable modem users do not occupy a fixed amount of bandwidth during their online session. Instead, they share the network with other active users and use the network's resources only when they actually send or receive data in quick bursts. “So instead of 200 cable online users each being allocated 135 Kbps, they are able to grab all the bandwidth available during the millisecond they need to download their data packets -- up to many megabits per second (Fitzgerald, 99)”.

If congestion does begin to occur due to high usage, cable operators have the flexibility to add more bandwidth for data services. A cable operator can simply allocate an additional 6 MHz video channel for high-speed data, doubling the downstream bandwidth available to users. Another option for adding bandwidth is to subdivide the physical cable network by running fiber-optic lines deeper into neighborhoods. This reduces the number of homes served by each network segment, and thus, increases the amount of bandwidth available to end-users.

The cable modem access network operates at Layer 1 (physical) and Layer 2 (media access control/logical link control) of the Open System Interconnect (OSI) Reference Model. Thus, Layer 3 (network) protocols, such as IP traffic, can be seamlessly delivered over the cable modem platform to end-users. A cable modem has atleast two MAC addresses, one for the coax interface (aka HFC MAC) and one for the Ethernet interface (aka CMCI MAC for cable modem to CPE MAC).

The DOCSIS Standard:

Almost all cable modems available in retail stores are DOCSIS-certified, which means they can work on the network of any Internet service provider that supports DOCSIS (Data Over Cable Service Interface Specification). DOCSIS is a widely agreed-upon standard developed by a group of cable providers (MSOs - Multiple Service Operators). The company CableLabs runs a certification program for hardware vendors who manufacture DOCSIS-compatible equipment.

The physical hardware of a cable modem includes, a CPU, chipset, RAM and flash memory. There are only a few DOCSIS-compatible microcontrollers in the market - major manufacturers being Broadcom and Texas Instruments.

The DOCSIS standard covers every aspect of cable modem infrastructure - from CM (cable modem at customer premise) to operator's headend equipment (CMTS). This specification details many of the basic functions of the customer's cable modem,

including how frequencies are modulated on the coax cable,
how the SNMP protocol applies to the cable modem,
how data is sent and received,
how the modem should network with CMTS, and
how privacy is initiated.

Due to this standardization, consumers can purchase off-the-shelf retail modems for use with many different service providers, and cable operators can deploy newer and more innovative services to consumers.

As QAM level increases, the points that represent symbols have to be placed closer together and are then more difficult to distinguish from one another because of line noise, which creates a higher error rate. Cable modems use an entire TV channel's worth of bandwidth (6MHz for NTSC) for their downstream data. Because of the combined upstream noise from ingress (the distortion created when frequencies enter a medium), the upstream symbol rate is less than the downstream, which has no combined ingress noise issues.

To detect and troubleshoot network problems, cable engineers examine packet error statistics. Each time a cable modem detects a packet error, it will record it. By comparing the total number of received packets with the erroneous ones, the cable modem will produce what's known as the codeword error rate (CER).

NonErr - docsIfCmtsCmStatusUnerroreds

CorrErr - docsIfCmtsCmStatusCorrecteds
UnCorr - docsIfCmtsCmStatusUncorrectables

CER (%) = 100*(UnCorr/(NonErr+CorrErr+UnCorr))

Error ratios higher than 1% should trigger CM maintenance. Formula as mentioned here.

How modems register online?

DOCSIS specification details the procedure a modem should follow in order to register on the cable network - called provisioning process. Across DOCSIS versions, the registration process is same.

1. Tune: When a modem is powered on for the first time, it has no prior knowledge of the cable system it may be connected to. It creates a large frequency scan list for the region for which the modem was designated, which is also known as frequency plan (There are 4 major regions - North America, Europe, China, Japan and each use different channel frequencies so the modem only needs to have a list of frequencies of its intended region of use). With the list retrieved, modem begins to search for a downstream frequency from the list to connect to.

A modem scans for frequencies until it locks on to one. Since a single coax cable can contain multiple digital services, it is up to the CMTS to determine if the new device (the modem performing frequency scan) is supposed to access that particular frequency. This is accomplished by checking the modem's MAC address. Once a modem has locked on to the download channel, it proceeds to obtain the upstream parameters by listening for special packets known as UCDs (Upstream channel descriptors), which contain the transmission parameters for the upstream channel.

2. Range: Once both downstream and upstream channels are synched, the modems makes minor ranging adjustments. Ranging is the process of determining network latency (the time it takes for the data to travel) between cable modem and CMTS. A ranging request (RNG-REQ) must be transmitted from cable modem to CMTS upon registering and periodically there-after. Once the CMTS receives a ranging request, it sends the cable modem a ranging response (RNG-RSP) that contains timing, power, and frequency adjustment information for the cable modem to use. Ranging offset is the delay correction applied by the modem to help synchronize its upstream transmissions.

3. Connect: Next the cable modem must establish IP Connectivity. To do this, it sends a DHCP discover packet and listens for a DHCP offer packet. A DHCP server must be set up at the headend to offer this service, such as Cisco Network Registrar (CNR) or similar. The DHCP offer packet contains IP setup parameters for the cable modem, which includes the HFC IP address, the TFTP service IP address, the boot file name (aka the TFTP config) and the time server's IP address.

4. Configure: After this is done, the modem can (optionally) use the IP protocol to establish the current time of day (TOD) from a Unix type time server running at the headend.

Now the modem must connect to the TFTP server and request the boot file. The boot file contains many important parameters, such as downstream and upstream speed settings (DOCSIS 1.0 only), SNMP settings, and various other network settings. The TFTP server is usually a service that runs in the CMTS.

5. Register: Once the modem downloads the config file, it processes it. It then sends an exact copy of the config file back to the CMTS server, a process known as transferring the operational parameters. This part of registration process is also used to authenticate the modem. If the modem is listed in the CMTS database as valid, the modem receives a message from the CMTS that it has passed registration.

At this stage, the modem has been authenticated and is allowed to initialize its baseline privacy, an optional step that permits modem to initiate privacy features that allow it to encrypt and decrypt its own network traffic to and from the CMTS. The encryption is based on the private digital certificate (X.509 standard) that is installed on the modem prior to registration.

Finally the modem connects to cable operator's internet backbone, and is allowed to access the Web. The cable modem is now operational.

Versions of DOCSIS:

DOCSIS 1.0 key features:

10Mbps upstream capability
40Mbps downstream capability
Bandwidth efficient through use of variable packet lengths
Class of service (CoS) support
CMTS upstream and downstream limitations
Extensions for security (BPI)
QPSK and QAM modulation formats
SNMP v2c

DOCSIS 1.1 key features: Focused more on security. No hardware requirement changed so many DOCSIS 1.0 certified modems were able to use this 1.1 versio with just a simple firmware upgrade.

Baseline privacy interface plus (BPI+)
MAC collision detection to prevent cable modem cloning
Service flows that allow for tiered services
SNMP v3
VoIP support

DOCSIS 2.0 key features: focuses more on data-over-coax technology. Using Advanced TDMA (A-TDMA), this spec allows cable modems to be upstream capable of up to 30Mbps while previous releases allow up to 10Mbps only. This higher bandwidth allows providers to offer consumers two-way video services, such as video phone service. However, this new standard requires consumer modem upgrade because earlier modem hardware is not capable of this faster upload speed.

DOCSIS 3.0 focuses on data speed improvements to both upstream and downstream channels, as well as many innovations for services other than Internet. These enhancements are accomplished by bridging multiple channels together at the same time, also known as channel bonding. Thus a bandwidth of 200Mbps for downstream and 100Mbps for upstream will be possible. Additional features include support for IPv6.

So that's in short about the cable broadband technology. The DOCSIS 3.0 standard is slated to capture only 60% of the cable market by 2011. The first few products which comply with 3.0 will be released this year 2007 sometime. I have collected the above information from various sources noteable among them being wikipedia and "Hacking the Cable Modem: What Cable companies don't want you to know" book by DerEngel, a very readable work on the cable modem internals.

Wednesday, April 11, 2007

Understanding RRDTool

RRD Tool is a product that grew out of MRTG. It creates a very compact database structure for the storage of periodic data, such as is gathered by OpenNMS. RRD data is stored in files that are created when initialized to hold data for a certain amount of time. This means that with the first data collection these files are as large as they will ever get, but it also means that you will see an initially large decrease in disk space as collection is first started. Once the RRD file is full, the oldest data is discarded.

Each RRD is made up of Round-Robin Archives. An RRA consists of a certain number of steps. All of the data that is collected in those steps is then consolidated into a single value that is then stored in the RRD. For instance, if I poll a certain SNMP variable once a minute, I could have an RRA that would collect all samples over a step of five minutes, average the (five) values together, and store the average in the RRD.

Step: The first line, the rrd step size, determines the granularity of the data. By default this is set to 300 seconds, or five minutes, which means that the data will be saved once every five minutes per step.

The RRA statements take the form:

RRA:Cf:xff:steps:rows

Where,

Cf: consolidation factors. It can take one of four values, AVERAGE, MAX, MIN, or LAST.

AVERAGE Average all the values over the number of steps in the RRA.
MAX Store the maximum value collected over the number of steps in the RRA.
MIN Store the minimum value collected over the number of steps in the RRA.
LAST Store the last value collected over the number of steps in the RRA.

xff: This is the "x-files factor". If we are trying to consolidate a number of samples into one, there is a chance that there could be gaps where a value wasn't collected (the device was down, etc.). In that case, the value would be UNKNOWN. This factor determines how many of the samples can be UNKNOWN for the consolidated sample is considered UNKNOWN. By default this is set to 0.5 or 50%.
steps: This states the number of "steps" that make up the RRA. For example, if the step size is 300 seconds (5 minutes) and the number of steps is 12, then the RRA is 12 x 5 minutes = 60 minutes = 1 hour long, and it will stored the consolidated value for that hour.
rows: This states the number of "steps" that make up the RRA. For example, if the step size is 300 seconds (5 minutes) and the number of steps is 12, then the RRA is 12 x 5 minutes = 60 minutes = 1 hour long, and it will stored the consolidated value for that hour.

So, we may have the following rrd element in poll-config.xml:

RRA:AVERAGE:0.5:1:8928

RRA:AVERAGE:0.5:12:8784

RRA:MIN:0.5:12:8784

RRA:MAX:0.5:12:8784

For example, consider the following line:

RRA:AVERAGE:0.5:1:8928

This says to create an archive consisting of the AVERAGE value collected over 1 step and store up to 8928 of them. If, for any step, more than 50% of the values are UNKNOWN, then the average value will be UNKNOWN. Since the default step size is 300 seconds, or five minutes, and the default polling cycle (in the collectd configuration) is five minutes, we would expect there to be one value per step, and so the AVERAGE should be the same as the MIN or MAX or LAST. 8928 five minute samples at 12 samples per hour and 24 hours per day is 31 days. Thus this RRA will hold five minute samples for 31 days before discarding data.

The last 3 lines of our RRD config are:

RRA:AVERAGE:0.5:12:8784

RRA:MIN:0.5:12:8784

RRA:MAX:0.5:12:8784

The only difference between these lines is the consolidation function. We are going to "roll up" the step 1 samples (5 minutes) into 12 step samples (1 hour). We are also going to store three values: the average of all samples during the hour, the minimum value of those samples and the maximum value. This data is useful for various reports (the AVERAGE shows throughput whereas MAX and MIN show peaks and valleys). These will be stored as one hour samples 8784 times, or 366 days.

So, to summarize, by default an SNMP collector will poll once every five minutes. This value will be stored as collected for 31 days. Also, hourly samples will be stored which include the MIN, MAX and AVERAGE.

Saturday, March 31, 2007

Using Quartz Scheduler in a Java EE Web Application

At times, you may have wanted to perform some action periodically in your web application. Quartz is an enterprise grade scheduler which can be used for such a task. Read here for the complete list of features of Quartz. For using the Quartz scheduler library in a Java EE web application, following needs to be done:

Include the quartz jars (quartz-all.jar and others in the lib path). In my case, some of the commons-xxx.jar files were already included in the project due to the dependency of another library (displaytag) on those jar files. So in my quartz setup i had to disinclude them. In the lib/build path, i only included, jta.jar and also everything which was not already there in the project from lib/optional path too (they are not many anyway).

We then had to create the tables required by quartz for storing job details and triggers across restart of application. This is an optional feature but an important one (which made us decide to use quartz in the first place over the JDK Timer). We used the docs/dbTables/tables_mysql.sql script to create the tables.

Then we copied (example_quartz.properties), modified and saved the quartz.properties file in the project and changed the packaging settings to include the properties file in the WEB-INF/classes path in the IDE. In the properties file, we changed the configuration to have quartz point to the data store we created in step 2.

# Configuring Main Scheduler Properties
org.quartz.scheduler.instanceName = MyScheduler
org.quartz.scheduler.instanceId = 1
org.quartz.scheduler.rmi.export = false
org.quartz.scheduler.rmi.proxy = false

# Configuring ThreadPool
org.quartz.threadPool.class = org.quartz.simpl.SimpleThreadPool
org.quartz.threadPool.threadCount = 30
org.quartz.threadPool.threadPriority = 5

# Configuring JobStore
org.quartz.jobStore.class = org.quartz.impl.jdbcjobstore.JobStoreTX
org.quartz.jobStore.driverDelegateClass = org.quartz.impl.jdbcjobstore.StdJDBCDelegate
org.quartz.jobStore.useProperties = false
org.quartz.jobStore.dataSource = quartzDS
org.quartz.jobStore.tablePrefix = QRTZ_
org.quartz.jobStore.isClustered = false

# Configuring datasource
org.quartz.dataSource.quartzDS.driver = com.mysql.jdbc.Driver
org.quartz.dataSource.quartzDS.URL = jdbc:mysql://localhost:3306/mydb
org.quartz.dataSource.quartzDS.user = me
org.quartz.dataSource.quartzDS.password = secret
org.quartz.dataSource.quartzDS.maxConnections = 31

# Rest of config was retained from example_quartz.properties.

We added following lines to our web.xml:

<description>Quartz Initializer Servlet</description>
<servlet-name>QuartzInitializer</servlet-name>
<servlet-class>org.quartz.ee.servlet.QuartzInitializerServlet</servlet-class>
<init-param>

<param-name>shutdown-on-unload</param-name>
<param-value>true</param-value>

</init-param>
<init-param>

<param-name>start-scheduler-on-load</param-name>
<param-value>true</param-value>

</init-param>
<load-on-startup>1</load-on-startup>

</servlet>

This sets up the initializer servlet which can initialize the default scheduler and start the scheduler at application bootstrap time.

Now in our webservice/jsp/servlet of our web application we do the following:

try {

// A. Get the default scheduler.
Scheduler sched = StdSchedulerFactory.getDefaultScheduler();

// B.Generate a unique name identifier for jobs and
// triggers of your application as required.
// One way is to use hashCode() if its a string param.
String dataToPass = "someParamToPassToJob";
int id = dataToPass.hashCode();

// C. Create/Replace a poll job and add it to the scheduler.
JobDetail job =
new JobDetail("job_"+id, "SomeJobGroup", com.mycompany.MyJob.class);
job.setRequestsRecovery(true);
// Pass data to the poll job.
job.getJobDataMap().put("param", dataToPass);
sched.addJob(job, true);

// D. Create a Trigger with unique name
SimpleTrigger trigger = new SimpleTrigger("trig_"+id, "SomeTriggerGroup");

// E. Check if a trigger is already associated with this job
// This step is optional and depends on your application's requirement.

Trigger[] jobTriggers;
jobTriggers = sched.getTriggersOfJob("job_"+id, "SomeJobGroup");

boolean isTriggerAlreadyAssociated = false;
for (Trigger trig : jobTriggers) {

if (trig.getName().equals("trig_"+id) &&
trig.getGroup().equals("SomeTriggerGroup")) {

// the job already has this trigger associated with it
isTriggerAlreadyAssociated = true;

}

}

// F. Associate this trigger with the job
trigger.setJobName(job.getName());
trigger.setJobGroup(job.getGroup());

// G. Initialize the trigger with duration and resolution to fire
trigger.setStartTime(startTime.getTime());
trigger.setEndTime(endTime.getTime());
trigger.setRepeatCount(SimpleTrigger.REPEAT_INDEFINITELY);
trigger.setRepeatInterval(repeatInterval); //in milliseconds

if (isTriggerAlreadyAssociated) {
// Reschedule the job with the existing trigger.
sched.rescheduleJob("trig_"+id, "SomeTriggerGroup", trigger);
} else {
// Schedule the job with the new trigger.
sched.scheduleJob(trigger);
}

} catch (SchedulerException se) {

}

Of course, the last thing is to write the Job class which does the actual work. The following is code from examples of Quartz.

public class PrintPropsJob implements Job {

public PrintPropsJob() {
}

public void execute(JobExecutionContext context)
throws JobExecutionException {

JobDataMap data = context.getJobDetail().getJobDataMap();
System.out.println("someProp = " + data.getString("someProp"));
System.out.println("someObjectProp = " + data.getObject("someObjectProp"));

}

Some important points to note about jobs and triggers:

Jobs have a name and group associated with them, which should uniquely identify them within a single Scheduler.
A Job can be associated with multiple triggers.
Triggers are the 'mechanism' by which Jobs are scheduled.
Many Triggers can point to the same Job, but a single Trigger can only point to one Job.
JobDataMap holds state information for Job instances. JobDataMap instances are stored once when the Job is added to a scheduler. They are also re-persisted after every execution of StatefulJob instances.
JobDataMap instances can also be stored with a Trigger. This can be useful in the case where you have a Job that is stored in the scheduler for regular/repeated use by multiple Triggers, yet with each independent triggering, you want to supply the Job with different data inputs.
The JobExecutionContext passed to a Job at execution time also contains a convenience JobDataMap that is the result of merging the contents of the trigger's JobDataMap (if any) over the Job's JobDataMap (if any).
We can have different job types:

Stateful Jobs - where state passed to job (in JobDataMap) is remembered (like static values) across executions of the job. Also, stateful jobs are not allowed to execute concurrently, which means new triggers that occur before the completion of the execute(xx) method will be delayed.
Interruptable job - provide a mechanism for having the Job execution interrupted by implementing a callback method interrupt(), which will be called when scheduler's interrupt method is invoked on the Job.

That's all, in short, about how to integrate Quartz scheduler library in a web application.

Saturday, March 24, 2007

Is it okay to put business logic in stored procedure?

In my experience, i have come across maintaining some software where the business logic was written in stored procedures (or functions/triggers). Recently i collected some points on why this is not such a good idea. The main reason for the use of stored procedures in one product was to have multiple client types to be able to invoke the same business logic. But then we can achieve the same effect applying MVC (model view controller) pattern and keeping the model/business logic in the application code rather than stored procedure. Here are some other reasons why its not a good idea to write BL in stored procedures:

NOTE: Most of the ideas presented below are excerpted from this article.

1. If there are more than one interfaces and the BL is part in stored procedure and part in presentation tier then it becomes a maintenance headache to keep the different presentation tiers in synch.

2. Stored procedures form an API by themselves. Adding new functionality or new procedures is the "best" way to extend an existing API.This means that when a table changes, or behaviour of a stored procedure changes and it requires a new parameter, a new stored procedure has to be added. When stored proc is changed, the DAL/BL code needs to change too to call the changed/new stored proc whereas, if the SQL is generated on the fly from the DAL/BL code and there is no stored proc then only DAL code will change.
Microsoft also believes stored procedures are over: it's next generation business framework MBF is based on Objectspaces, which generates SQL on the fly.
In Java world, ORM (Object to Relational Mapping) frameworks like Hibernate and TopLink (and now Java Persistence Architecture, JPA) are meant to generate SQL on the fly too.

3.Business logic in stored procedures is more work to test than the corresponding logic in the application. Referential integrity will often force you to setup a lot of other data just to be able to insert the data you need for a test.Stored procedures are inherently procedural in nature, and hence harder to create isolated tests and prone to code duplication. Another consideration, and this matters a great deal in a sizable application, is that any automated test that hits the database is slower than a test that runs inside of the application. Slow tests lead to longer feedback cycles.

4. While stored procedures may run faster, they take longer to build, test, debug and maintain, therefore this extra speed comes at a price.

5. BL in stored procs does not scale - If you have a system with 100's of distributed databases it is far more difficult to keep all those stored procedures and triggers synchronized than it is to keep the application code synchronized.

6. Locked in to the DB for which stored procs are written.

7. Porting the data will be one exercise, but porting the stored procedures and triggers will be something else entirely. Now, if all that logic were held inside the application, how much simpler would it be?

All changes made to the database can be logged without using a single database trigger. How? By adding extra code into the DAO to write all relevant details out to the AUDIT database. This functionality is totally transparent to all the objects in the Business Layer, and they do not need any extra code to make it work.

Friday, March 23, 2007

An Events Browser - Using Displaytag and DWR

Problem:

To view application-specific events in the web browser.

Requirements:
The implementation should satisfy the following requirements:

support viewing all the logged events (in a DB) with provision for specifying the number of rows per page to show in a data grid of event records.
support pagination.
support sorting on columns like event time, id, type etc.
support exporting events data to CSV, excel etc formats.
change row color to highlight rows based on severity of events.
enable viewing live events instantaneously as they occur.

Solution:
The first 5 requirements are easily met by Displaytag, an excellent JSP tag library for displaying data grids. We can also enable AJAX support for displaytag data grid pagination and sorting using ajaxtags library.

We did not like the live grid type views (which may be better suited to the search examples) for our requirement as we believe the pagination with Ajax support is more appealing and common experience for users.

The last requirement is met by DWR 2.0 (which as of this writing is still being developed and the current release is 2.0 RC2). DWR 2.0 has a new feature called Reverse AJAX, which enables pushing the data from server to browser client, asynchronously.

So, we have two data grids in our UI;

First data grid, displays the existing event records from the DB.
Second data grid, displays the live events using DWR's Reverse Ajax feature. One can use the recently added proxy for scriptaculous Javascript effects library, to highlight the new rows as they get added to the live events data grid. Adding new rows to the HTML table is facilitated by org.directwebremoting.proxy.dwr.Util class' addRows(String elementId, String[][] data) method. The implementation can choose to route the events to all browsers or selected browser sessions only - an event routing behavior. An example on reverse ajax feature is found here (see stock ticker).

The user can navigate the existing data, paginate, sort, export and at the same time can continue to view the live events being added to the second data grid.

I did further work on improving the way the events displayed in the live events section above. The live events section which uses reverse ajax to asynchronously display events occurring in near real-time, is best kept separated from the archived events browser above. So the live events page can be made to show only the specified size of latest events in a queue fashion (where new incoming events replace some of the existing events (on an oldest removed first basis) when the defined window limit is reached). So if window limit is say 20 events, and we have the latest 19 events displayed so if there are 2 new events to show, we remove the last 1 event from the existing 19 and show the 20 events with the latest 2 being shown at top of the table. The window limit is a client side parameter and can be (re)set by the user. The events will need to be cached locally on the clients to be able to display the lastest on top (as addRows() API of DWR will only append to the table tbody). Also, by caching the events locally, we can easily control the window size. The idea to have a limit to the number of live events is required as we cannot just keep the events table growing forever as the user may just want to see the latest 20 or 50 or 100 or 1000 (some limited set of real time events only as he also has the archived events table to study a past event for analysis). The purpose of the live events table is to alert the operator at the time the event occurs so that he/she can do further analysis by studying the archived events or doing some data collection on the supposedly affected elements.

Saturday, March 10, 2007

Started learning Kenpo Karate

I have started attending Kenpo Karate (one of the forms of Karate martial arts) classes today at a local training institute. Today i attended the first class and learned about some basic stances and some ways to block an attack. The best part is the initial warm up exercise that we get to do in a group. It was very tiring for the first day and we did it for 30mins. There will 3 classes per week and 1.5hr per class.

It was in 1993 that i attended Taekwon-do martial art classes for 3 months but had to leave it because of the study pressure. I joined a gym when i got into engineering college in 1994 and for the first year i could go to the gym regularly but slowly with mounting study pressure i became irregular with gym too and though i continued on an irregular basis till i got my first job in 1999, i could not keep up with gyming once i was in Banglore in the winter of 2000. We had a gym in my Bangalore apartment too but i never could kindle the fire to exercise there. But it has been my desire always to keep fit (as also noticed in this blog's name: FitProgrammer@Work) and when it was time to come to the USA i decided i will get trained in Boxing in USA. But my wife was not very happy with that idea :) considering the first impression one has of Boxing is that its associated with gore. I could not find a local boxing institute in the Maynard region (the nearest was 15mi from my residence) but i could find one martial arts training center very close to my workplace (0.2mi from both my residence and workplace - literally on the way between my residence and work place). Martial arts is not going to be as dull as going to a gym where you train yourself alone and so you require to be very self motivated to persist in the regimen. My experience has been that i tend to loose the fizz with time when i went to gym. I persisted better with martial arts. Probably the idea of exercising in a group environment appeals more to my psyche.

So lets see how it goes... i will keep posting on my learnings in Kenpo(aka Kempo) Karate.

Updates:

6 May, 2007: I have learnt all that is required of a non-ranked student of Griffin Kenpo which includes,

Basic stances and Half-mooning
Kicks - Iron broom, dropping knees, sliding knees, front, back, side, hook, crescent, reverse crescent and their combinations.
Punches and 8 Blocks
Rolling and Falling
3 techniques to free my hand when caught

Though i should have completed this learning much earlier but for some or the other reason i was not regular in my classes but the good thing is that i am continuing to learn Karate and i am enjoying it a lot.

Thursday, March 08, 2007

Integrating Java and PHP: the Web Services way

Recently on a project, we were faced with the question:

do we put the business logic in the web tier (which was written in PHP 4.7.3) or
do we write the business logic in Java (better tools for development, reliable libraries for some of the tasks we wanted to perform as a part of our business logic, better OO language features than PHP, easy to debug in an IDE, code hidden in class files).

The first option was what we usually call the model 1 architecture, where business logic is written in the same script which serves the presentation code.

In the second approach, we were getting to the point of introducing the MVC pattern to our PHP based web tier by introducing a Java layer to play the model and let PHP scripts be controller and view. This had the added advantage that we could leverage the model written as Java web services in other clients (and we did have other clients than web interface too for our product which provided an alternate interface for user/program to interact with our product and hence we required that the business logic we write for our web interface could be leveraged in other client types too, written in other languages, possibly, which in our case was C/C++). Hence, we opted for the 2nd choice.

We had two options for integrating Java and PHP:

Do it using PHP/Java bridge
Expose the business logic written in Java as web services.

The bridge option is not well supported as of this writing in the PHP world. The Java extension for PHP language is labeled as experimental. So we decided to go with the other option.

Our business logic did not necessitate the use of EJBs (did not require container managed transactions or persistence) for the present, so going the Java path to write our business logic, we wanted to use Tomcat as web container (which could easily be integrated with the Apache web server, running the existing PHP code, using mod_jk(2)).

We developed our web services in Java using the latest JAX-WS 2.1 RI library, primarily because JAX-WS spec makes writing web services in Java pretty easy as compared to the predecessor JAX-RPC 1.1 release (aided by annotations, no more of XML hell). We also made a design choice to use the document/literal style of messaging for maximum flexibility. Here's how a typical JAX-WS code looks:

MyWebServiceImpl.java:

@WebMethod
public MyClassA myWebMethod( @WebParam(name = "aParam") int aParam) throws MyAppExceptionA, MyAppExceptionB { //.. implementation code }

We chose to use the nusoap 0.7.2 as the PHP web service stack and here's how a typical WS client was written in PHP using nusoap:
client.php:

<?php
require_once('../lib/nusoap.php');

// create a soapclient from wsdl url
$wsdl = 'http://192.168.1.101:8080/MyWebservices/MyWebServiceImpl?wsdl';
$client = new soapclient($wsdl, true);
$err = $client->getError();
if ($err) {
// Display error
}

// specify the soap body content for doc/literal request
$msg = "<web:mywebmethod xmlns:web="\"http://webservice.mycompany.com/\"">
<aparam>3334</aparam>
</web:mywebmethod>";

$namespace = 'http://webservice.mycompany.com/';
$method = 'myWebMethod'; // web method to invoke

$result = $client->call($method, $msg, $namespace, '', false, null, 'document', 'literal');

// Check for a fault
if ($client->fault) {
// Here is how we can do exception handling.
$myAppExceptionA = $result['detail']['MyAppExceptionA'];
if ($myAppExceptionA) {
print "Exception: ". $myAppExceptionA['message'];
}

$myAppExceptionB = $result['detail']['MyAppExceptionB'];
if ($myAppExceptionB) {
print "Exception: ". $myAppExceptionB['message'];
}
} else {
// Check for errors
$err = $client->getError();
if ($err) {
// Display the error - You will not come here in working condition :)
} else {
// Display the result
print_r($result);
// Grab the values in the $result array
}
}
?>

Nusoap stack on PHP end will convert the result in an associative array (as shown in the case of fault handling in the code above). To know the structure of response message to get more clarity on how to extract the data out of result, you can add the following 3 magic debug statements towards the end of the above PHP client code:

echo "<h2>Request</h2><pre>" . htmlspecialchars($client->request, ENT_QUOTES) . "</pre>";
echo "<h2>Response</h2><pre>" . htmlspecialchars($client->response, ENT_QUOTES) . "</pre>";
echo "<h2>Debug</h2><pre>" . htmlspecialchars($client->debug_str, ENT_QUOTES) . "</pre>";

I also used the SoapUI tool (version 1.6) for unit testing my web services and found it to be very useful, especially when we needed to know the exact soap request body content which we needed to embed on the PHP side while invoking the web method. You can do several things with this tools, like generate request templates which you can fill with values to test individual web methods, save such requests and do regression testing later, save the requests to do load testing, view test reports etc.

That's in short about how we can get PHP web tier to invoke methods on a Java web service based business logic tier. I had initially had some trouble in figuring out the way to pass the right arguments to the soapclient() method of nusoap library and after some web searching we could identify how to get it working.

As an aside, this was my first experience of working with PHP language and in my self learning mode i used WAMP distribution (version WAMP5_1.6.5 which includes PHP 5.1.6, Apache 2.0.59, MySQL 5.0.24a and phpmyadmin 2.8.2.4) to quickly setup an environment on my Windows workstation for experiments. I also found the EasyEclipse for PHP (version 1.2.1.1), a very useful IDE for PHP development. And a book i use as reference (only when i need to know how a certain programming thing is done in PHP) is Programming PHP by Kevin Tatroe, Rasmus Lerdorf, and Peter MacIntyre. Its excellent book and has stood by me till date.

The opinion i have formed about PHP till now is that its ideal for a model 1 architecture web application. Its better in such cases than say Servlets/JSP duo because of its simplicity in terms of programming constructs. It simply does not have those advanced features of JSP tag libraries. So for developers who are not fussy about mixing their HTML code with some PHP scripts and have a previous background in Perl or some other such dynamic languages, will be more at ease with PHP than say Java development. Java is object oriented language (a hybrid one though like C++) but PHP is object based (ie PHP has construct for defining classes as Perl but it does not support inheritance, polymorphism etc). So most of the PHP code is more or less procedural, at least thats what is more commonly done (like you are programming in C and creating libraries of functions). Also PHP eases the learning curve as its a dynamic language and does not have standard data types other than arrays (indexed and associative) and scalars. Of course you can define your ADTs but most of the time you will be good working with arrays alone. The arrays in PHP are like a mix of Java's ArrayList and HashMap ie you can use an array as an indexed or associative one. So, in general, i feel if there is a simple web application where all that has to be done is create a catalog out of a database and then allow CRUD using web forms then PHP (and hence model 1 architecture) is suitable, since it gets the work done quicker and resulting code is maintainable. But if you want to support multiple client types (hence model 1 architecture is not what you want) and probably need some advanced middleware services (like persistence, security, transactions) then you are better off with Java EE and other such frameworks.

Share your thoughts by leaving your comments.

Friday, January 26, 2007

My new abode

I am moving to a beautiful city called Maynard. Its been a while that i could post to my blog as i had been busy in setting myself up in the new place. I moved to the US on 17th January to work at a startup in the cable (HFC) broadband access technology domain. The clock tower seen prominent in the picture above, is my new place of work (and surprisingly its also called the Clock tower place) :).

Sunday, December 31, 2006

Year 2007: My Self-Study Goals

In the past year of 2006, i took 4 Sun certifications (that includes Java SE 5.0 programmer, J2EE 1.4 Webservices developer, J2EE 1.4 Web component developer and J2EE 1.3 Business component developer) and 2 trainings ( Oracle workforce development program's Introduction to Oracle 9i SQL and Program with Oracle 9i PL/SQL). In short, i invested a lot of effort and time in learning of Java EE based enterprise application development, which became the major focus for me last year. I wanted to cover every aspect of web application development in my learning, right from CSS, JavaScript, Ajax, XHTML on client tier to Servlets/JSP/Struts 1.2 on Web tier to EJB 2.x for Business tier and lastly, relational database SQL and PL/SQL on EIS tier. I also covered Webservices (in great details) which has become a very important tool for A2A (ie EAI) and B2B integrations. Learning of Webservices and my score of 88% in SCDJWS 1.4 exam really makes me feel good as i have felt that Webservices will be a good substitute for SNMP in the network management domain (the domain in which i continue to specialize).

This year (2007) i intend to continue with learning of EJB 3.0 and JSF 1.2 to begin with. I have also started a project BaseApp (a kick-starter application for development using Java EE 5.0). I also have intentions to learn a bit about JBoss Seam framework (since its on standards track and is a framework built around EJB 3 and JSF). I also think that EJB 3 and JSF 1.2 together with up coming Web beans (or current JBoss seam) will be a strong (and standard) contender to Spring framework. So i will be focusing a lot on Java EE this year too.

Also the goal will be complete BaseApp. For now, baseapp happens to be open ended enterprise but we will soon be defining the requirements for the first release formally.

A few things i missed to complete last year are the reading of,

H F Design patterns
JUnit in Action
J2EE Design and Development
H F HTML with CSS & XHTML
Get SCEA certification
BPEL and WSDM/WS-Man
Ruby/Perl

I think with work and the above tasks identified, i will have enough on my plate for this year.

Friday, December 29, 2006

EJB3 development using JBoss 4.0.5.GA

I encountered an issue today while redeploying EJB 3 jar for hot deployment to JBoss 4.0.5.GA which i installed using the JEMS installer 1.2.0 CR1. The issue and the resolution is mentioned here. You will need to install the JBoss EJB 3 RC 9 Patch 1 to fix it. For EJB3 development with JBoss 4.0.5.GA, you must use the JEMS installer only and not the archive releases.

I continue to see the issue of Wrong target message being thrown and the workaround for now is to restart the JBossAS which fixes the issue. EJB 3 RC 9 patch 1 does not do anything to solve this issue (and i found that one of the issues marked as dependent to this issue has been reopened recently). So expect a fix only by early Jan 07. For now reboot is the best bet.

Also i found that you need to have JSE 5.0 only to run your application. Even though the docs say that JDK 1.5+ is supported but you get a strange issue of "Could not dereference object" on deployment of Stateful session beans (see this). So i had to change back from JSE 6.0 to 5.0 in Eclipse.

At one point i really was planning to switch from JBoss to Glassfish :). But for now it seems i am back on track with some workarounds to continue my learning of EJB 3 using JBoss AS.

BTW, Mikalai Zaikin's notes on SCBCD 5.0 exam has a very nice appendix/tutorial (replete with screen caps) for how to develop using JBoss 4.0.5.GA + Eclipse 3.2.1 + WTP 1.5 + JBoss IDE 2.0.0.Beta 2. What's missing is how to use hibernate entity manager JPA implementation for RESOURCE_LOCAL transactions in a JSE standalone application. You will find this documentation useful for this.

If you are planning on using JBoss AS for a green field (from scratch) EJB 3.0 project and Eclipse is your IDE of choice, then you can find the instructions at following URL useful:
Eclipse + JBoss AS + EJB 3.0 setup instructions by Eric Garrido
Starter Skeleton Eclipse Project on JBoss Wiki site

Wednesday, December 27, 2006

Web Beans: JBoss Seam being standardized for Java EE 6 release

Web Beans JSR 299 proposal by JBoss (RedHat Middleware LLC) has been approved unanimously by the JCP executive committee even though IBM and BEA Systems have expressed their concerns regarding it being (too?) ambitious. Following is what they had to say:

------------------------------------------------------------------------------
On 2006-06-05 BEA Systems voted Yes with the following comment:
This appears to be a sufficient challenge to achieve, but, in light of the overwhelming support at this stage of the process, we are prepared to see it go ahead.
------------------------------------------------------------------------------
On 2006-06-05 IBM voted Yes with the following comment:
Delivering on the deep level of integration that is proposed appears to be an ambitious task.

In short, the goal of this initiative is: to enable EJB 3.0 components to be used as JSF managed beans, unifying the two component models and enabling a considerable simplification to the programming model for web-based applications in Java.

The benefit will be: this work will provide a programming model suitable for rapid development of simple data-driven applications without sacrificing the full power of the Java EE 5 platform. This is a domain where Java EE has been perceived as overly complex.

Gavin King (Hibernate founder) is the spec lead for this effort which is targeted for release of a RI with Java EE 6 (fall 2007). This is also the first time that JBoss is leading a standard spec.

Some links to learn more about what Web Beans will offer are:
Gavin's interview by InfoQ on JBoss Seam
Proposed spec description for JSR 299
What Matt Raible (author of Spring Live and AppFuse founder) said about Seam
The JBoss Seam Demo
The JBoss Seam Documentation
JBoss Seam FAQ

What excites me?
Seam is an ambitious full stack framework (unifying and integrating technologies such as Asynchronous JavaScript and XML (AJAX), Java Server Faces (JSF), Enterprise Java Beans (EJB3), Java Portlets and Business Process Management (BPM)) which enables EJB 3.0 components to be JSF managed beans. So your EJB 3.0 beans could become your Action classes and the glue code which was required earlier for enabling Actions to invoke methods on beans in the business tier will go away thus simplifying the development effort required.
Some specifics from the proposal as quoted by Floyd Marinescu at InfoQ are:

Changes to EJB 3 that will be needed for EJB's to act as JSF managed beans.
Annotations for manipulating contextual variables in a stateful, contextual, component-based architecture.
An enhanced context model including conversational and business process contexts.
Extension points to allow the integration of business process management engines.
Integration of Java Persistence API extended persistence contexts. (The other type of persistence context is transaction-scoped persistence context.)

Now thats what makes Web Beans a full stack framework! :) Here's one Seam Book by JBoss insiders.

Some notable aspects of Seam:

State management: Most other web frameworks store all application state in the HTTP session, which is inflexible, difficult to manage and a major source of memory leak. Seam can manage business and persistence components in several stateful scopes: components that only need to live across several pages are placed in the conversation scope; components that need to live with the current user session are placed in the session scope; components that require interactions from multiple users and last extended period of time (i.e., survive server reboots) are placed in the business process scope. The advanced state management facilities allow us to develop web application features that are previously not possible, or very difficult, to implement.
Multiple browser windows/tabs: Seam supports fine-grained user state management beyond the simple HTTP session. It isolates and manages user state associated with individual browser window or tab (in contrast, HTTP session is shared across all windows of the same browser). So, in a Seam application, each browser window / tab can become a separate workspace that has independent history and context. Multiple user "conversations" can be supported for the same browser. This behavior is similar to that in rich client applications.
Handling back-button navigation: Seam's nested conversation model makes it really easy to build complex, stateful applications that tolerate use of the back button.
Support for REST style bookmarkable URL: it is very easy to expose REST style bookmarkable URLs in a Seam application. In addition, the Seam application can initialize all the necessary backend business logic and persistence components when the user loads that URL. This way, the RESTful URL is not only an information endpoint but also an application interaction start point.
Support for JSR 168 portlets.
Support for internationalization.
JBoss Eclipse IDE provides a sophisticated, template-driven database reverse engineering tool which can generate an entire Seam application in minutes and a graphical jPDL editor for editing Seam pageflows and workflow definitions.

Friday, December 22, 2006

Steve jobs @ stanford in 2005

A very motivating speech by Steve Jobs to the Stanford 2005 batch graduates.

Wednesday, December 20, 2006

Apple WWDC 2006-Windows Vista Copies Mac OS X

Friday, December 15, 2006

Java SE 6 released Dec 11 2006

Some important links i could find to understand what's new about the new Java SE are given below:
Sun's Java SE 6 page
Top ten reasons why you should upgrade to Java SE 6
Java SE 6 impressive performance results
James Gosling on Java SE 6 release

I am going to try it out today :).

Sunday, December 10, 2006

Using Subversion for your open source project

Subversion has been an open source contribution by CollabNet folks which improves over CVS. The Subversion project started in earnest in February 2000, when CollabNet offered Karl Fogel a full time job developing a replacement for CVS. Karl Fogel and Jim Blandy had previously founded Cyclic Software which provides commercial support for CVS. Subversion was designed from the ground up as a modern, high-performance version control system. In contrast to CVS, which had grown organically from shell scripts and RCS, Subversion carries no historical baggage. Subversion takes advantage of a proper database backend (Berkley DB), unlike CVS which is file based. The Subversion team have tried to make the new system similar in feel to CVS, so users are immediately at home with using it. Most of the features of CVS including tagging, branching and merging, are implemented in Subversion, along with host of new features:

versioning support for directories, files and meta-data
history tracking across moves, copies and renames
truly atomic commits
cheap branching and merging operations
efficient network usage
offline diff and revert
efficient handling of binary files

Salient points about Subversion:

Subversion is a centralized client-server version control like CVS, VSS or Perforce.
Advanced network layer: the Subversion network server is Apache, and client and server speak WebDAV(2) to each other. This gives Subversion an advantage over CVS in interoperability, and provides various key features for free: authentication, wire compression, and basic repository browsing.
It's free: Subversion is released under an Apache/BSD-style, open-source license.
Like CVS, Subversion has a concept of a single, central repository (often residing on a dedicated server) that stores all data about the projects you are working on. This data is called a repository. You never work in the repository directly, though. Instead, you pull subsets of it into working copies that typically reside on other systems such as your desktop computer. In these working copies, you make your changes, and when you are pleased with them, you commit those changes into the central repository where they become once and forever part of history.
Each commit (also called a check-in) to the repository is called a revision, and, in Subversion, revisions are numbered. A commit can be a change to one file or a dozen, to directories, or to metadata.
We speak of HEAD when we mean the latest version of the repository; so, when you check in revision 17, then HEAD is revision 17.
Atomic Commits: In Subversion, a commit is an atomic operation, meaning it either succeeds entirely or fails entirely; unlike CVS, you can't end up with half of your files saved to the repository but the other half unchanged.
Not only does Subversion offer version control of files and directories, it also offers version control of metadata. In the world of Subversion, such metadata is called a property, and every file and directory can have as many properties as you wish. You can invent and store any arbitrary key/value pairs you wish: owner, perms, icons, app-owner, MIME type, personal notes, etc. This is a general-purpose feature for users. Properties are versioned, just like file contents. And some properties are auto-detected, like the MIME type of a file (no more remembering to use the -kb switch).

To read about a quick intro to the setup of a local subversion repository (ie svnadmin create) and how to use the basic svn client commands (checkout, status, commit, update, log, diff, mv, rm, merge, stat, up etc) read this article by Chip Turner. The complete manual for different subversion releases are available too.

If you already have worked with CVS, and you have the option to choose a SCM tool for your project then its better to start with Subversion as it improves upon CVS in several ways. Also the client commands are similar to CVS thus enabling easier migration. Several tools/scripts/IDE plugins/GUI Clients for subversion are now available and mentioned here. Google Code free project hosting and collaborative development site, uses subversion as its version control server.

I have been introduced to subversion in one of my project works in the year 2004 at Hewlett Packard. HP used CollabNet for most of its projects for collaborative development across HP sites and we used to use the Subversion client Tortoise SVN which integrates with Windows explorer. Back then the tortoise svn was still not stable enough and made the particular working directory (copy of the repo) in windows explorer very slow to navigate. So we mostly used the command line client then. For BaseApp, i chose google code project hosting service as it seems to be the most intuitive and easier to get started with than sourceforge.net. Google code project hosting service provides a issue tracking service and a SCM service. Google also provides free discussion forums for your projects from its Google groups. So, with BaseApp, i am back to using Subversion (and this time around i hope the GUI clients will be helpful and not make me go back to command line tools).

Friday, December 08, 2006

Starting BaseApp

I have decided to start an open source project - BaseApp. The project is hosted on Google Code site. A new project blog is started where project activity will be logged.

Objective is to come up with a foundation application which can be leveraged to build web applications quickly in J2EE 1.4 (in the likeness of AppFuse). Most of the commonly used feature will be made available (like form based authentication support, netbeans project enabled, custom ant build file created etc). A sample application source will be provided which will use the following frameworks/toolkits as reference implementation:

Oracle 10g XE (the database i will use for testing)
JBoss 4.0.4GA (the Application server)

Struts 1.2.9 (the web tier framework)
Hibernate 2.x (the presentation tier framework)
XDoclet 1.2.3 (the attribute oriented programming support library)
EJB 2.1

Stateless Session Beans (J2EE 1.4) (synchronous request/response)
Message Driven Beans (J2EE 1.4) (asynchronous)

JAX-RPC 1.1 (webservice support)

JMaki (AJAX support UI components)
Netbeans 5.5 with JBoss integration

Wednesday, December 06, 2006

Choosing your Linux distro

You can find a good comparison between any two Linux distros at http://polishlinux.org/choose/comparison/. For eg, this url compares the Ubuntu with Fedora core.

EasyUbuntu is an easy to use (duh!) script that gives the Ubuntu user the most commonly requested apps, codecs, and tweaks that are not found in the base distribution - all with a few clicks of your mouse.

Sunday, December 03, 2006

Java vs .NET

I recently attended a presentation talk by a .NET developer in my team at workplace where he talked about the reasons why he thought M$ .NET is better than Java EE. Following were the points raised by him together with my views on them:

Java is slow in performance (both on server-side and rich client) as compared to .NET.

I found this blog post which presents some stats which clearly show that Java (JRE 1.4.2) was more performant than CLR (.NET 1.1). But C# IL when compiled to native code using Ngen.exe did significantly improve performace. Here's another interesting article i could find which compares the Java 5.0 with .NET 2.0 (the two latest releases of the competing platforms as of this writing). The summary from the article is as follows:

Selection Sort algorithm implementation : .NET performs faster by 2:1 as compared to Java.
Memory comparison: .NET is using more than twice the amount of memory Java uses to store the same amount of data – 19.9 MB versus a whopping 47.8. Since .NET treats native types – such as double – as objects, that this incurs additional overhead of some form. Java, on the other hand, does not treat native types as objects and therefore can save on this overhead. However, when it comes to real-world object allocation, it appears that .NET is more efficient.
Conclusion: Thus, as we have seen, .NET 2.0 won 2 out of the 3 major tests – clearly besting Java 1.5 in both execution speed and real-world memory efficiency. Java did, however, manage to hold its own in the native types memory comparison by a pretty wide margin. This indicates that on the whole .NET is a more efficient platform, with perhaps at least one area for improvement – native type memory efficiency.

The Java Swing UI looks and feels ugly.

Good examples of better looking Swing UIs do exist like Netbeans IDE and that of SWT like Eclipse. Still Java does lack on the UI front hands down to MS .NET Winforms as the later looks, feels and responds better and is easier to program. The only reason why you may at all want to use Swing/SWT is portability across platforms.

There are too many options in Java to choose from (like Swing/SWT for UI development, Struts/Tapestry/Velocity etc for Web framework, Spring/EJB/xWorks for business tier, IDEA/Netbeans/Eclipse for IDE) and this requires developers to learn more than one way of doing the same thing.

This is debatable topic as having options to choose from is exactly what standards are meant for. So if you have a J2EE application which complies with the standards then you can easily port the application across multiple J2EE application servers, hence removing the risk of vendor lock-ins. Also, it fosters innovations at multiple quarters and there is no one organization dictating the terms in the best of its own interests.

The reason there are many more Java frameworks out there is simply because Java has been around longer, enough to have several good frameworks mature and spread. Many of those are being ported to .NET and others will rise up.

You can only program in Java whereas .NET supports multiple languages to run on its CLR.

Multiple languages can be compiled to produce byte code that runs on JVM. See this list.

In short, .NET has managed to surpass Java VM in performance and the cost involved in procuring the tools for development in .NET is not a factor against MS .NET (as investing in the tools for development is only 20% of the total cost of development). But ...

Java is portable and you can run it on Unix servers (Unix remains the server of choice for production deployments in most cases).
Java EE has matured over time and has more developer mind-share (about 4 million Java developers as in 2004).
There is alot of investment in Java today and many big companies like Sun, IBM, BEA, Oracle has vested interest (tools, AS, support & services) in not letting the technology go to dust.
Sun's open sourcing Java SE (under GPLv2) will enable JDK to be bundled with major Linux distros which in the past have not been able to do so due to licensing issues.
There are more than 200 models of mobile phones which run Java and several wireless providers across the world provide Java content to such Java enabled phones. On the contrary there are only 10 or so Windows mobile phones today (this i read from some site i cannot find out now).

Quoting from a blog post:
.NET had many missing parts that the Java world has filled in its years of existence, but these are being filled and completed even as we speak. Moreover, these are usually fixed in ways compatible with the original Java implementations - moving from JUnit to NUnit or from Spring to Spring.NET (or the other way around) is probably easier than from totally disparate implementations.

In some areas, .NET is still lagging behind. COM+ is the only application server I know of that can run .NET components, and it's often overkill. A more flexible solution for that in the .NET world would be great. The .NET solution biosphere is still not as mature as Java, but what it does, it does very well - often better than the original Java product.

How does Sun make money off of Java?

J2ME royalty earned (by Sun) per Java mobile phone sold (less than $1) remains a cash cow for Sun to date.
Also Sun sells certifications to Java EE application server vendors for their compliance to the Java EE Specs.
Sun also makes money by selling books, certifications and trainings for Java technology.
Sun provides consulting services for Java enterprise application development.
Sun also sells its hardware (server boxes) together with Solaris OS for deployment of Java enterprise applications in production.

By giving its JDK implementation and developer tools for free, Sun wants to increase the mind share for Java platform (and now more so as it has .NET to compete against) among the developer community. Sun gives developers Reference implementations for free to try out the technology and when the developers have got a production ready application then they (or their organization) would want to deploy the application on a supported server host which cuts down on the organizations TCO in the long run. Thats the model in which open source or free software businesses survive in today's software industry.

An interesting article with more statistics concludes that Java is loosing ground to LAMP and .NET for web application development. More and more corporations have adopted .NET since its easier and faster to program in than Java. .NET is gaining on developer mind share too.

I don't know how credible are the stats presented in the articles (i have found on the web and linked to above) or whether someone is funding a malicious propaganda against Java as i donot see how anything can beat free (free IDE, even AS, and you pay for a short-term support and you go live with Java). Of course, PHP has gained ground for building small websites and applications but when the applications are big then Java holds good steed. In my opinion, Java and .NET are both competent platforms for developing enterprise applications and both have their own pros and cons to be considered when a decision is being made on which platform to develop on. BTW, here are a few comments which i liked (by one Mr. Abdul Habra) refuting the claims made in the business week article:
I do not know where to start, look at these statements:

1. LAMP is used more than Java: a more accurate comparison is to compare LAMP against LAMJ or P against J. Java is used widely with LAM components.
2. PHP is used more than Java: Well, HTML is used more than both. Counting all the sites that use Java or PHP is meaningless. It is given that there are more basic, simple, or home pages than there are professional complex site. Simple sites are more likely to be written in PHP. This is similar to saying there is more printed material in the USA Today than for Shakespeare, hence USA Today must be better.
3. Sales Of AJAX books grew more than Java: The term AJAX did not exist two years ago, so selling 10 books compared with 0 books two years ago means that AJAX sales have grown infinitely. The other flaw in this argument is that many of the AJAX frameworks are Java based. This is like saying the sale of JDBC, JMX, … grew more than Java. Look at http://adaptivepath.com/publications/essays/archives/000385.php for the first public article of AJAX, it is dated Feb 18 2005.
4. Google and Yahoo do not use Java: Both are C++ shops. Their usage of AJAX is with C++/Java at the backend. Remember, Google hired Josh Bloch not for his PHP skills.
5. Merrill Lynch & Co ... using just Linux and the Apache server: So what language do they use? The author does not say. Clearly you cannot write programs with Linux and Apache, you need a programming language. This is a meaningless statement.
6. Jupiter Research report showed that 62% of ... .NET, vs. 36% for IBM's WebSphere: This only shows that .Net is used more than WS. It does not count WL, Tomcat, JBoss, ...

In conclusion, this is not new. I have been seeing this since Java became popular. Most of these claims are made by ignorant people or people with hidden agendas.

Thursday, November 16, 2006

Convergence of WS-Management and WSDM

JSR-262 which addresses support for Webservices Connector for the JMX Agents, is targeted for Java SE 7 (2008 spring). There are currently two competing webservices-based management standards: WS-Management and WSDM. There is currently an effort in progress to reconcile the two standards into one. But this reconciliation effort is going to take sometime to complete (and the guesstimate is it can be ready by 2008). The bottom layers of the convergence proposal are the existing standards WS-Transfer, WS-Enumeration, and WS-Eventing. JSR-262 chose to implement the Webservice connector using the existing WS-Management standard primarily because it seems to be the most backwards compatible approach possible. For reasons, read here. The idea behind this JSR-262 effort of providing a Webservice connector to the JMX agents is to enable management in hetrogeneous environments where the management host does not support JMX technology (say a management client written in C#). So once your JSR-77 MBeans are exposed via a JSR-262 Webservices connector, you could write a perl or ruby or C# client to talk to the MBean.

Recently after having passed the Sun certified developer for Java Webservices exam (CX-310-220) i wanted to use my learning of webservices building blocks to learn further on how webservices could be used for network management. I have learnt about such a possibility when i was working with HP when they decided to embed the webservice called IXA (XDM Access interface) [XDM means XML data model] in the firmware of the network module for printers. This approach had a strong group of supporters within HP. Till the time i was working with HP, they had already implemented the read-only interface to part of the management objects. And i found that even though it was easier to understand the benefits of this approach to managing the printers (over the traditional SNMP way), implementation was challenging stuff primarily because of less education among the firmware developers for technologies like XML namespaces, schema, etc. But the message was clear to me that sooner than later, webservices will surely be the way to manage the network elements and services. I will blog separately on the advantages of Webservices over traditional approaches like SNMP. For the moment, i just needed to highlight the fact that the webservice-based management specs are still far from final and by the year 2008 we should see some devices and servers implementing support for the webservices way of management. Atleast for Java App Servers, it is clear that JSR-262 will evolve then to support the then latest spec of converged WS-Man and WSDM standards. So for folks (like me) who want to specialize in systems management domain, will certainly want to track the progress of this convergence. IBM maintains the convergence progress drafts at its site.

Wednesday, November 15, 2006

Sun Announces SCBCD 5.0 Beta

Sun has announced SCBCD 5.0 Beta exam on October 19th it seems. It does not require one to purchase a voucher to take a beta exam. Registration starts on 24th Nov, 2006. You can take this exam from 8th Dec, 2006 to 2nd Jan, 2007. The exam objectives are here. The only pre-requisite is you need to be an SCJP (any edition).

The recommended books/tutorials to cover the objectives for this exam are:
1. Mastering EJB 3.0 by Rima Patel Sriganesh, Gerald Brose, Micah Silverman.
2. Java EE 5.0 tutorial
And of course the specs .. the list of them is mentioned here.

Register early as there is a limit to the number of people who can take the free beta exam.