Thursday, August 22, 2013

ForgeRock Product release and EOSL dates

Below is a comprehensive product release and EOSL dates for ForgeRock Open Identity Stack product.


Thursday, August 15, 2013

Replication Gateway

I was with a consultant from South Korea this week. So he kept me updated with the technical development in Korea. 

I was told that the latest hype is Oracle Unified Directory (OUD). This used to be Sun OpenDS. 

Surprisingly, he told me that OUD is able to replicate data both ways with Oracle Directory Server Edition Edition (ODSEE). This used to be Sun Directory Server Edition Edition (Sun DSEE).

As I am quite familiar with ForgeRock OpenDJ (which was also from Sun OpenDS), I know that replication both ways is not possible due to the fact that Sun DSEE replication protocol is proprietary. 

Well, but OUD and ODSEE do know each other since they are from the same family. So what if the replication protocol is proprietary? Let's make it even more proprietary! :)

So, here we go ..... let's introduce Replication Gateway! This will bridge the replication issue between ODSEE and OUD.


Wednesday, August 14, 2013

Policy Agent error >> find_active_login_server(): Library not initialized

Sometimes, it can be a pain looking at how the IT staff in customers' sites attempt to debug Policy Agent related issues.

Usually, what I observed is that they will keep on restarting the web containers where the Policy Agents are deployed. Then they will keep hitting the browsers and start to complain "It still cannot work! Why? Why?".

Well, if you are too lazy to take a look at the policy agent debug log, you'll never know why. Some do not even know where is the debug log located. *sigh*

2013-08-13 11:40:15.199   Error 11762:9a41700 PolicyEngine: am_policy_is_notification_enabled: InternalException in PolicyEngine::isNotificationEnabled with error message:Invalid policy handle. and code:invalid argument
2013-08-13 11:40:15.200   Error 11762:9a41700 all: find_active_login_server(): Library not initialized.
2013-08-13 11:40:19.376   Error 11762:9a417e0 PolicyEngine: am_policy_is_notification_enabled: InternalException in PolicyEngine::isNotificationEnabled with error message:Invalid policy handle. and code:invalid argument

The issue today is the Policy Agent is not able to contact OpenSSO server when the web container starts. And the IT staff never even attempt to restart the web container.

Ensure the OpenSSO server is started, then restart the web container where the Policy Agent is deploy will do the trick!


Tuesday, August 13, 2013

Performance Tuning

This week, I am involved in a Performance Tuning exercise for a OpenSSO infrastructure which I have  set up a few years ago.

It can be quite a learning experience for me. In fact, I find each tuning exercise to be a new experience to me, always.

This week, we have the same expert flying in from Korea to help us again. This makes the tuning exercise more fruitful. Over time, I have learnt few tips from him.

From the graph below:

1. Spikes are not good
2. Spikes are especially bad if all transactions spike at the same time

However, the above issue is easiest to solve since it implies 2 things might have gone wrong:

  • OS Kernel and/or NDD settings
  • Garage Collection in the JVM 

If the OS kernel and NDD settings are properly tuned, you'll most likely get the graph below:

Generally, most transactions should be stable (see the blue arrow). There will be occasional spikes for some transactions, which some times we can choose to ignore if the occurrence is not frequent.

After the OS kernel and NDD settings are tuned, the next step is to slowly tweak the JVM options to eliminate/lessen garage collection. 

Lesson learnt is to have a lot of patient. :)


Saturday, August 10, 2013

How to build OpenAM from Source

There has been numerous posting in OpenAM mailing list on how to build OpenAM from source. Repeated postings as well... I'm curious to know why it is so difficult to build OpenAM from source.

The first thing I do is to determine which version of OpenAM I want to build. This can be found from OpenAM Branches and Tags in the wiki.

Then I click on one of the hyperlink.

From there, I get the hyperlink to the SVN source. Once the source is downloaded, the final thing to do is to execute the maven command.

$ mvn -DskipTests=true -Prelease clean install

The compiled war can be found in 

$ pwd
/Users/cheechong/Documents/svn/forgerock/10.1.0-Xpress (tag)/openam/openam-server/target

$ ls

That's it.

FYI, I am using Maven 3.1.0.

$ mvn -v
Apache Maven 3.1.0 (893ca28a1da9d5f51ac03827af98bb730128f9f2; 2013-06-28 10:15:32+0800)
Maven home: /usr/share/maven
Java version: 1.6.0_51, vendor: Apple Inc.
Java home: /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home
Default locale: en_US, platform encoding: MacRoman
OS name: "mac os x", version: "10.7.5", arch: "x86_64", family: "mac"

Download the latest Maven from

Upgrade Maven to the latest

$ mvn -v
Apache Maven 3.0.3 (r1075438; 2011-03-01 01:31:09+0800)
Maven home: /usr/share/maven
Java version: 1.6.0_51, vendor: Apple Inc.
Java home: /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home
Default locale: en_US, platform encoding: MacRoman
OS name: "mac os x", version: "10.7.5", arch: "x86_64", family: "mac"

$ cd /usr/share/
$ ls -alt maven
lrwxr-xr-x  1 root  wheel  16 Jul 22  2011 maven -> java/maven-3.0.3

$ sudo unlink maven
$ sudo ln -s /Users/cheechong/Documents/work/apache-maven-3.1.0 /usr/share/maven

$ mvn -v
Apache Maven 3.1.0 (893ca28a1da9d5f51ac03827af98bb730128f9f2; 2013-06-28 10:15:32+0800)
Maven home: /usr/share/maven
Java version: 1.6.0_51, vendor: Apple Inc.
Java home: /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home
Default locale: en_US, platform encoding: MacRoman
OS name: "mac os x", version: "10.7.5", arch: "x86_64", family: "mac"


Friday, August 9, 2013

AMSFO Issue on Windows Platform

If you have installed AM Session Failover on a Windows platform, you will realize the installation and configuration are different from what you would have experienced in Linux. The behavior of AMSFO is different as well.

I just encountered this issue in customer's site - script on Windows does not log output to file.

No matter how hard I tried, the log are output to the command prompt!

Looks like there'll be no fix for this issue.


Thursday, August 8, 2013

OpenAM Session Stickiness Concept - Part II

If you follow my previous blog on OpenAM Session Stickiness Concept, when will the MQ Cluster get into the picture?

The following is what I have lifted from OpenAM documentation. That explains all.

OpenAM gets balanced through load-balancer, where OpenAM generates a specific load balancer cookie that they can use to implement sticky balancing. If a user gets balanced to a different instance, OpenAM will first use cross-talk to that original OpenAM instance to see if the latest session information is available. If that instance doesn’t reply (because it may be down) than the OpenAM instance will fall back to its session store and see if the session information got replicated. If the information is still not there, then OpenAM will simply consider this a new request and initiate an authentication step. 

The following illustrates the flow:

(1) : User tries to access OpenAM. The load-balancer redirects him to OpenAM 1.
(2) : User will be prompted to authenticate. OpenAM 1 creates a session and stores it in CTS
(3) : The same session is persisted into the MQ Cluster via the AM Session Failover (AMSFO) component. MQ Cluster consists of a pair of Java Message Queue servers. OpenAM will only write to Java Message Queue 1 since it is configured as the Primary node.
(4) : Upon receiving the session information, Java Message Queue 1 will broadcast to Java Message Queue 2 since it is configured as the Secondary node. This ensures the same session information can be retrieved by any OpenAM instance if Java Message Queue 1 suddenly goes down.

The above architecture ensures the high-availability of user sessions in OpenAM SSO infrastructure. Nice isn't it?


Wednesday, August 7, 2013

OpenAM Session Stickiness Concept

When we configure a Single Sign-On infrastructure in a Production environment, we will always request that the hardware load-balancer be configured with session stickiness. Always.

Of course, the network guys will start to bombard you with "why"? I have answered the same question many times and I think it will be good to blog it here, so that I can re-use it for our next project.

Assume we have the following setup.

A very typical setup for Production - 2 OpenAM servers fronted by a hardware load-balancer; in the backend, we have a pair of Java Message Queue servers which are configured to form a MQ Cluster for the OpenAM to communicate to.

Let's zoom into OpenAM first.

OpenAM currently leverages the Core Token Services (CTS) to store the session information for its authenticated users. These sessions can remain in memory or can be persisted in the Session store. In OpenAM version prior to 10.1, the Session store refers to the MQ Cluster. (Let's ignore OpenAM 10.1 and above for this discussion)

Now, technically, because of the above architecture, OpenAM continues to function even if the load balancer is not configured with session stickiness. How?

I'll use the following diagram to illustrate how.

(1) : User tries to access OpenAM. The load-balancer redirects him to OpenAM 1.
(2) : User will be prompted to authenticate. OpenAM 1 creates a session and stores it in CTS
(3) : User tries to access OpenAM again within the same browser session. The load-balancer redirects him to OpenAM 2 instead, since session stickiness is not configured.
(4) : OpenAM 2 cannot find the user session. It will perform a "cross-talk" to the original OpenAM instance to see if the latest session information is available.
(5) : User session found in OpenAM 1!
(6) : The user will not be prompted to authenticate again.

So, it works! However, there is this penalty in (5) where the "cross-talk" takes place. The rationale of having the load balancer configured with session stickiness is to avoid this extra traffic.

And imagine there are millions of requests per hour, this extra traffic can be very taxing to the OpenAM instances. Instead of serving users with authentication and authorization, extra load is taken up for the "cross-talk" activities. This is not ideal!


Tuesday, August 6, 2013

Attempting to initiate broker

I blogged quite extensively on Session Failover mechanism in OpenAM. This component is definitely a must for most production deployment and it is this component which causes quite a headache at times.

Usually it's the problem with installing Java Message Queue in a hardened environment, where firewall is superbly tight. I blogged To Enable Connections Through a Firewall before and I thought it is comprehensive enough to handle all situations.


The loosening of Message Queue PortMapper port and JMS port will only be applicable in a situation like the one illustrated below - Firewall is in-between a pair of OpenAM servers and the MQ Cluster where a pair of Java Message Queue reside.


Now, what if the iptables is enabled on the Java Message Queue server like the one illustrated below?

Well, the simplest thought that comes to mind will be to add the following rules into iptables:

1. Port 7777 for Message Queue PortMapper port
2. Port 17777 for JMS port 

No, it will not work. You'll observe the following warning message - Attempting to initiate a cluster connection to mq ... failed: No route to host:

I worked extensively with our network engineer and found out that there is this random port which will be established between the 2 Java Message Queue when they attempt to initiate a cluster.

Looking at the logs, you'll see the following segment:

[05/Aug/2013:14:08:12 ICT] [B1179]: Activated broker
        Address = mq://
        StartTime = 1375678839077
        ProtocolVersion = 410
[05/Aug/2013:14:08:12 ICT] [B1071]: Established cluster connection to broker mq://[/]

So what's the solution?

I tried setting my -D options in BROKER_OPTIONS. None will get this random port to become static.  So for the time being, we add an iptables rule to each server to allow all ports between the 2 servers.

* Sigh *


Monday, August 5, 2013

Porting embedded OpenDJ in OpenAM to external OpenDJ - Part III

I almost went to hell last week during a migration exercise where we moved the embedded OpenDJ servers to a pair of external OpenDJ servers in a production site for one of our customers.

In fact, I had previously verified that the migration is risk-free in our test labs. I was too confident. I forgotten the fact that there are multiple OpenAM servers deployed in customer's production environment. And thus, there are multiple embedded OpenDJ servers as well.

I tested on a standalone OpenAM with an embedded OpenDJ. Serve me right!

So, what went wrong?

In fact, as per what I have previously blog here (Part I) and here (Part II), everything worked per planned. I restarted JBoss application servers twice and the bootstrap files were updated accordingly. (The bootstrap files should be updated to point to the external OpenDJ servers, instead of the embedded ones)

Upon JBoss AS restart, I was able to see a bind message in the external OpenDJ servers. But that was it! When I log-in to the OpenAM console and navigated from tabs to tabs, the OpenDJ access logs were not moving. How could that be? I knew configuration data was stored in OpenDJ servers, and thus when the tabs were accessed, OpenAM must retrieved the appropriate information from the external OpenDJ.

And to my horror, when I debugged further, I saw that the OpenAM servers were still accessing information from the embedded OpenDJ servers. The replication between the multiple embedded OpenDJ servers was still functioning. 

I tried to add/modify some policies via the OpenAM console, and I saw that the data was written to the embedded OpenDJ servers! The external OpenDJ servers were a pair of white elephants!

The whole debugging process took a long while. In fact, we were working 1-2 hours beyond the planned schedule. 

In the end, we found the culprit to be a parameter in the Advanced tab -

By default, the value comes pre-set in OpenAM release.

This value has to be changed to in order for the multiple OpenAM servers to start communicating with the external OpenDJ servers. Not only bind access, but all types of LDAP operations.

Really stressful night. Stupid me for not testing in a multi-servers environment prior to the actual migration.


PS: Now, the weird thing is I still have the standalone OpenAM server running in my labs and it is able to communicate with the external OpenDJ server, even though the value has not been modified. I also confirmed both OpenAM (our test labs and our customer's site) are running on OpenAM 10.0.0.

Why? I do not know. Strange!


Thursday, August 1, 2013

Cannot start Control Panel for embedded OpenDJ on Windows platform

This week is my 1st time trying to install production OpenAM on a Windows platform.
(Yes, I have never install Sun AM, Sun OpenSSO, ForgeRock OpenAM on Windows before) :>

As in all production environment, there are at least 2 OpenAM Servers (console-less) and 1 OpenAM Admin Server (console).

So, I happily installed the 3 OpenAM servers. After that, I wanted to make sure the replication in the embedded OpenDJ is behaving well. Of course, being in a Windows platform, I went searching for control-panel.bat

Bomb! I hit into bug OPENAM-1251. No luck! In the end, I downloaded a compatible OpenDJ zip instead.

I just updated the OpenAM SVN source. Good news! The bug has just been resolved. However, it would only be released in OpenAM 10.2.0.

Happy waiting!