Cloud Foundry Blog

About Raja Rao DV

Developer Advocate for Cloudfoundry (Node.js)

Scaling Real-time Apps on Cloud Foundry Using Node.js and RabbitMQ

In the previous blog Scaling Real-time Apps on Cloud Foundry Using Node.js and Redis, we used Redis as a ‘session store’ and also as a ‘pub-sub’ service for chat messages. But in many enterprise grade real-time apps, you may want to use RabbitMQ instead of Redis to do pub-sub because of the reliability and features that comes out-of-the-box in RabbitMQ. This is especially true for financial or Bank apps like Stock Quote apps where it is critical to protect and deliver each-and-every message AND do it as quickly as possible.

So, in this blog, we will start from Scaling Real-time Apps on Cloud Foundry Using Node.js and Redis and simply replace Redis with RabbitMQ pubsub.

The app architecture (before):

The app architecture w/ RabbitMQ (after):


Introduction to RabbitMQ

The Node.js community may not be familiar with RabbitMQ. So here are some of the high-level intro of RabbitMQ.

RabbitMQ is a message broker. It simply accepts messages from one or more endpoints “Producers” and sends it to one or more endpoints “Consumers”.

RabbitMQ is more sophisticated and flexible than just that. Depending on the configuration, it can also figure out what needs to be done when a consumer crashes(store and re-deliver message), when consumer is slow (queue messages), when there are multiple consumers (distribute work load), or even when RabbitMQ itself crashes (durable). For more please see: RabbitMQ tutorials.

RabbitMQ is also very fast & efficient. It implements Advanced Message Queuing Protocol “AMQP” that was built by and for Wall Street firms like J.P. Morgan Chase, Goldman Sachs, etc. for trading stocks and related activities. RabbitMQ is an Erlang (also well-known for concurrency & speed) implementation of that protocol.

For more please go through RabbitMQ’s website.


Fundamental pieces of RabbitMQ

RabbitMQ has 4 pieces.

  1. Producer (“P”) – Sends messages to an exchange along with “Routing key” indicating how to route the message.
  2. Exchange (“X”) – Receives message and Routing key from Producers and figures out what to do with the message.
  3. Queues(“Q”) – A temporary place where the messages are stored based on Queue’s “binding key” until a consumer is ready to receive the message. Note: While a Queue physically resides inside RabbitMQ, a consumer (“C”) is the one that actually creates it by providing a “Binding Key”.
  4. Consumer(“C”) – Subscribes to a Queue to receive messages.

Routing Key, Binding Key and types of Exchanges

To allow various work-flows like pub-sub, work queues, topics, RPC etc., RabbitMQ allows us to independently configure the type of the Exchange, Routing Key and Binding Key.

Routing Key:

A string/constraint from Producer instructing Exchange how to route the message. A Routing key looks like: “logs”, “errors.logs”, “warnings.logs” “tweets” etc.

Binding Key:

Another string/constraint added by a Consumer to a queue to which it is binding/listening to. A Binding key looks like: “logs”, “*.logs”, “#.logs” etc.

Note: In RabbitMQ, Binding keys can have “patterns” (but not Routing keys).

Types of Exchange:

Exchanges can be of 4 types:

  1. Direct – Sends messages from producer to consumer if Routing Key and Binding key match exactly.
  2. Fanout – Sends any message from a producer to ALL consumers (i.e ignores both routing key & binding key)
  3. Topic – Sends a message from producer to consumer based on pattern-matching.
  4. Headers – If more complicated routing is required beyond simple Routing key string, you can use headers exchange.

In RabbitMQ the combination of the type of Exchange, Routing Key and Binding Key make it behave completely differently. For example: A fanout Exchange ignores Routing Key and Binding Key and sends messages to all queues. A Topic Exchange sends a copy of a message to zero, one or more consumers based on RabbitMQ patterns (#, *).

Going into more details is beyond the scope of this blog, but here is another good blog that goes into more details: AMQP 0-9-1 Model Explained


Using RabbitMQ to do pub-sub in our Node.js chat app.

Now that we know some of the basics of RabbitMQ, and all the 4 pieces, let’s see how to actually use it in our Chat app.

Chat App:

Connecting to RabbitMQ and creating an Exchange

For our chat application, we will create a fanout exchange called chatExchange. And we will be using node-amqp module to talk to RabbitMQ service.

//Connect to RabbitMQ and get reference to the connection.
var rabbitConn = amqp.createConnection({});

//Create an exchange with a name 'chatExchange' and of type 'fanout'
var chatExchange;
rabbitConn.on('ready', function () {
    chatExchange = rabbitConn.exchange('chatExchange', {'type': 'fanout'});
});

Creating Producers (So Users can send chat messages)

In our chat app, users are both producers(i.e. sends chat messages to others) and also consumers (i.e. receives messages from others). Let’s focus on users being ‘producers’.

When a user sends a chat message, publish it to chatExchange w/o a Routing Key (Routing Key doesn’t matter because chatExchange is a ‘fanout’).

/**
     * When a user sends a chat message, publish it to chatExchange w/o a Routing Key (Routing Key doesn't matter
     * because chatExchange is a 'fanout').
     *
     * Notice that we are getting user's name from session.
     */
    socket.on('chat', function (data) {
        var msg = JSON.parse(data);
        var reply = {action: 'message', user: session.user, msg: msg.msg };
        chatExchange.publish('', reply);
    });

Similarly, when a user joins, publish it to chatExchange w/o Routing key.

/**
     * When a user joins, publish it to chatExchange w/o Routing key (Routing doesn't matter
     * because chatExchange is a 'fanout').
     *
     * Note: that we are getting user's name from session.
     */
    socket.on('join', function () {
        var reply = {action: 'control', user: session.user, msg: ' joined the channel' };
        chatExchange.publish('', reply);
    });

Creating Consumers (So Users can receive chat messages)

Creating consumers involves 3 steps:

  1. Create a queue with some options.
  2. Bind queue to exchange using some “Binding Key”
  3. Create a subscriber (usually a callback function) to actually obtain messages sent to the queue.

For our chat app,

  1. Let’s create a queue w/o any name. This forces RabbitMQ to create new queue for every socket.io connection w/ a new random queue name. Let’s also set exclusive flag to ensure only this consumer can access the messages from this queue.
rabbitConn.queue('', {exclusive: true}, function (q) {
 ..
 }
  1. Then bind the queue to chatExchange with an empty ‘Binding key’ and listen to ALL messages.
q.bind('chatExchange', "");
  1. Lastly, create a consumer (via q.subscribe) that waits for messages from RabbitMQ. And when a message comes, send it to the browser.
q.subscribe(function (message) {
   //When a message comes, send it back to browser
   socket.emit('chat', JSON.stringify(message));
 });

Putting it all together.

sessionSockets.on('connection', function (err, socket, session) {
    /**
     * When a user sends a chat message, publish it to chatExchange w/o a Routing Key (Routing Key doesn't matter
     * because chatExchange is a 'fanout').
     *
     * Notice that we are getting user's name from session.
     */
    socket.on('chat', function (data) {
        var msg = JSON.parse(data);
        var reply = {action: 'message', user: session.user, msg: msg.msg };
        chatExchange.publish('', reply);
    });

   /**
     * When a user joins, publish it to chatExchange w/o Routing key (Routing doesn't matter
     * because chatExchange is a 'fanout').
     *
     * Note: that we are getting user's name from session.
     */
    socket.on('join', function () {
        var reply = {action: 'control', user: session.user, msg: ' joined the channel' };
        chatExchange.publish('', reply);
    });


   /**
     * Initialize subscriber queue.
     * 1. First create a queue w/o any name. This forces RabbitMQ to create new queue for every socket.io connection w/ a new random queue name.
     * 2. Then bind the queue to chatExchange  w/ "#" or "" 'Binding key' and listen to ALL messages
     * 3. Lastly, create a consumer (via .subscribe) that waits for messages from RabbitMQ. And when
     * a message comes, send it to the browser.
     *
     * Note: we are creating this w/in sessionSockets.on('connection'..) to create NEW queue for every connection
   */
    rabbitConn.queue('', {exclusive: true}, function (q) {
        //Bind to chatExchange w/ "#" or "" binding key to listen to all messages.
        q.bind('chatExchange', "");

   //Subscribe When a message comes, send it back to browser
        q.subscribe(function (message) {
            socket.emit('chat', JSON.stringify(message));
        });
    });
 });

Running / Testing it on Cloud Foundry

  • Clone this app to rabbitpubsub folder
  • cd rabbitpubsub
  • npm install & follow the below instructions to push the app to Cloud Foundry
[~/success/git/rabbitpubsub]
> vmc push rabbitpubsub
Instances> 4       <----- Run 4 instances of the server

1: node
2: other
Framework> node

1: node
2: node06
3: node08
4: other
Runtime> 3  <---- Choose Node.js 0.8v

1: 64M
2: 128M
3: 256M
4: 512M
Memory Limit> 64M

Creating rabbitpubsub... OK

1: rabbitpubsub.cloudfoundry.com
2: none
URL> rabbitpubsub.cloudfoundry.com  <--- URL of the app (choose something unique)

Updating rabbitpubsub... OK

Create services for application?> y

1: blob 0.51
2: mongodb 2.0
3: mysql 5.1
4: postgresql 9.0
5: rabbitmq 2.4
6: redis 2.6
7: redis 2.4
8: redis 2.2
What kind?> 5 <----- Select & Add RabbitMQ 2.4v service (for pub-sub)

Name?> rabbit-e1223 <-- This is just a random name for RabbitMQ service

Creating service rabbit-e1223... OK
Binding rabbit-e1223 to rabbitpubsub... OK

Create another service?> y

1: blob 0.51
2: mongodb 2.0
3: mysql 5.1
4: postgresql 9.0
5: rabbitmq 2.4
6: redis 2.6
7: redis 2.4
8: redis 2.2
What kind?> 6 <----- Select & Add Redis 2.6v service (for session store)

Name?> redis-e9771 <-- This is just a random name for Redis service

Creating service redis-e9771... OK
Binding redis-e9771 to rabbitpubsub... OK

Bind other services to application?> n

Save configuration?> n

Uploading rabbitpubsub... OK
Starting rabbitpubsub... OK
Checking rabbitpubsub... OK

  • Once the server is up, open up multiple browsers and go to <servername>.cloudfoundry.com
  • Start chatting.

Tests

Test 1

  • While chatting, refresh the browser.
  • You should automatically be logged in.

Test 2

  • Open up JS debugger (On Chrome, do cmd + alt +j )
  • Restart the server by doing vmc restart <appname>
  • Once the server restarts, Socket.io should automatically reconnect
  • You should be able to chat after the reconnection.

That’s it for now. Hopefully this blog helps you get started with using RabbitMQ. Look forward for more Node.js and RabbitMQ related blogs. The content of this blog has also been covered in a video. Feel free to get in touch with us for questions on the material.


General Notes

  • Get the code right away – Github location: https://github.com/rajaraodv/rabbitpubsub.
  • Deploy right away – if you don’t already have a Cloud Foundry account, sign up for it here.
  • Check out Cloud Foundry getting started here and install the vmc Ruby command line tool to push apps.
  • To install the latest alpha or beta vmc tool run: sudo gem install vmc --pre.
Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Email

Scaling Real-time Apps on Cloud Foundry Using Node.js and Redis

Common applications being built on Node.js like social networking and chat require real-time scaling capabilities across multiple instances. Developers need to deal with sticky sessions, scale-up, scale-down, instance crash/restart, and more. Cloud Foundry PaaS provides a ready platform to achieve this quickly.

The following blog post will walk you through deploying and managing real-time applications on Cloud Foundry using common Node.js examples and Redis key-value store capabilities.


Chat App

The main objective here is to build a simple chat app while tackling the scale requirements. Specifically, we will be building a simple Express, Socket.io and Redis-based Chat app that meets the following objectives:

  1. Chat server should run with multiple instances.
  2. The user login should be saved in a session.
    • User should be logged back in upon browser refresh
    • Socket.io should get user information from the session before sending chat messages.
    • Socket.io should only connect if user is already logged in.
  3. If the server instance that the user is connected to gets restarted, goes down or is scaled down while the user is chatting, the user should be reconnected to an available instance and recover the session.

Chat app’s Login page:

Chat app’s Chat page:

We will also cover:

  1. How to use Socket.io and Sticky Sessions
  2. How to use Redis as a session store
  3. How to use Redis as a pubsub service
  4. How to use sessions.sockets.io to get session info (like user info) from Express sessions
  5. How to configure Socket.io client and server to properly reconnect after one or more server instances goes down (i.e. has been restarted, scaled down or crashed)

Socket.io & Sticky Sessions

Socket.io is one of the earliest and most popular Node.js modules to help build real-time apps like chat, social networking etc. (Note: SockJS is another popular library similar to Socket.io).

When you run such a server in a cloud that has a load-balancer, reverse proxy, routers etc., it has to be configured to work properly, especially when you scale the server to use multiple instances.

One of the constraints Socket.io, SockJS and similar libraries have is that they need to continuously talk to the same instance of the server. They work perfectly well when there is only 1 instance of the server.

When you scale your app in a cloud environment, the load balancer (Nginx in the case of Cloud Foundry) will take over, and the requests will be sent to different instances causing Socket.io to break.

For such situations, load balancers have a feature called ‘sticky sessions’, also known as ‘session affinity’. The idea is that if this property is set, all the requests following the first load-balanced request will go to the same server instance.

In Cloud Foundry, cookie-based sticky sessions are enabled for apps that set the cookie jsessionid. Note that jsessionid is the cookie name commonly used to track sessions in Java/Spring applications. Cloud Foundry is simply adopting it as the sticky session cookie for all frameworks.

To make socket.io work, the apps just need to set a cookie with the name jsessionid.

/**
* Use cookieParser and session middleware together.
* By default Express/Connect app creates a cookie by name 'connect.sid'. But to scale Socket.io app,
* make sure to use cookie name 'jsessionid' (instead of connect.sid) use Cloud Foundry's 'Sticky Session' feature.
* W/o this, Socket.io won't work when you have more than 1 instance.
* If you are NOT running on Cloud Foundry, having cookie name 'jsessionid' doesn't hurt - it's just a cookie name.
*/
app.use(cookieParser);
app.use(express.session({store:sessionStore, key:'jsessionid', secret:'your secret here'}));

In the above diagram, when you open the app,

  1. Express sets a session cookie with name jsessionid.
  2. When socket.io connects, it uses that same cookie and hits the load balancer
  3. The load balancer always routes it to the same server that the cookie was set in.

Sending session info to Socket.io

Let’s imagine that the user is logging in via Twitter or Facebook, or a regular login screen. We are storing this information in a session after the user has logged in.

app.post('/login', function (req, res) {
    //store user info in session after login.
    req.session.user = req.body.user;
    ...
    ...
});

Once the user has logged in, we connect via socket.io to allow chatting. However, socket.io doesn’t know who the user is and if he or she is actually logged in before sending chat messages to others.

That’s where the sessions.sockets.io library comes in. It’s a very simple library that’s a wrapper around socket.io. All it does is grab session information during the handshake and then pass it to socket.io’s connection function.

//With just Socket.io..
io.sockets.on('connection', function (socket) {
    //do pubsub here
    ...
})

//But with sessions.sockets.io, you'll get session info

/*
 Use SessionSockets so that we can exchange (set/get) user data b/w sockets and http sessions
 Pass 'jsessionid' (custom) cookie name that we are using to make use of Sticky sessions.
 */
var SessionSockets = require('session.socket.io');
var sessionSockets = new SessionSockets(io, sessionStore, cookieParser, 'jsessionid');

sessionSockets.on('connection', function (err, socket, session) {

    //get info from session
    var user = session.user;

    //Close socket if user is not logged in
    if (!user)
        socket.close();

    //do pubsub
    socket.emit('chat', {user: user, msg: 'logged in'});
    ...
});


Redis as a session store

So far so good, but Express stores these sessions in MemoryStore (by default). MemoryStore is simply a Javascript object – it will be in memory as long as the server is up. If the server goes down, all the session information of all users will be lost!

We need a place to store this outside of our server, but it should also be very fast to retrieve. That’s where Redis as a session store come in.

Let’s configure our app to use Redis as a session store as below.

/*
 Use Redis for Session Store. Redis will keep all Express sessions in it.
 */
var redis = require('redis');
var RedisStore = require('connect-redis')(express);
var rClient = redis.createClient();
var sessionStore = new RedisStore({client:rClient});


  //And pass sessionStore to Express's 'session' middleware's 'store' value.
     ...
     ...
app.use(express.session({store: sessionStore, key: 'jsessionid', secret: 'your secret here'}));
     ...

With the above configuration, sessions will now be stored in Redis. Also, if one of the server instances goes down, the session will still be available for other instances to pick up.

Socket.io as pub-sub server

So far our sessions are taken care of with the above setup, but if we are using socket.io’s default pub-sub mechanism, it will work only for 1 sever instance. i.e. if user1 and user2 are on server instance #1, they can both chat with each other. If they are on different server instances, they cannot do so.

sessionSockets.on('connection', function (err, socket, session) {
    socket.on('chat', function (data) {
        socket.emit('chat', data); //send back to browser
        socket.broadcast.emit('chat', data); // send to others
    });

    socket.on('join', function (data) {
        socket.emit('chat', {msg: 'user joined'});
        socket.broadcast.emit('chat', {msg: 'user joined'});
    });
}

Redis as a PubSub service

In order to send chat messages to users across servers we will update our server to use Redis as a PubSub service (along with session store). Redis natively supports pub-sub operations. All we need to do is to create a publisher, a subscriber, and a channel.

//We will use Redis to do pub-sub

/*
 Create two Redis connections. A 'pub' for publishing and a 'sub' for subscribing.
 Subscribe 'sub' connection to 'chat' channel.
 */
var sub = redis.createClient();
var pub = redis.createClient();
sub.subscribe('chat');


sessionSockets.on('connection', function (err, socket, session) {
    socket.on('chat', function (data) {
        pub.publish('chat', data);
   });

    socket.on('join', function (data) {
        pub.publish('chat', {msg: 'user joined'});
    });

    /*
     Use Redis' 'sub' (subscriber) client to listen to any message from Redis to server.
     When a message arrives, send it back to browser using socket.io
     */
    sub.on('message', function (channel, message) {
        socket.emit(channel, message);
    });
}

The app architecture will now look like this:


Handling server scale-down / crashes / restarts

The app will work fine as long as all the server instances are running. What happens if the server is restarted or scaled down or one of the instances crash? How do we handle that?

Let’s first understand what happens in that situation.

The code below simply connects a browser to server and listens to various socket.io events.

/*
  Connect to socket.io on the server (***BEFORE FIX***).
  */
 var host = window.location.host.split(':')[0];
 var socket = io.connect('http://' + host);

 socket.on('connect', function () {
     console.log('connected');
 });
 socket.on('connecting', function () {
     console.log('connecting');
 });
 socket.on('disconnect', function () {
     console.log('disconnect');
 });
 socket.on('connect_failed', function () {
     console.log('connect_failed');
 });
 socket.on('error', function (err) {
     console.log('error: ' + err);
 });
 socket.on('reconnect_failed', function () {
     console.log('reconnect_failed');
 });
 socket.on('reconnect', function () {
     console.log('reconnected ');
 });
 socket.on('reconnecting', function () {
     console.log('reconnecting');
 });

While the user is chatting, if we restart the app on localhost or on a single host, socket.io attempts to reconnect multiple times (based on configuration) to see if it can connect. If the server comes up with in that time, it will reconnect. So we see the below logs:

If we restart the server (say using vmc restart redispubsub) and the user is chatting on the same app that’s running on Cloud Foundry AND with multiple instances, we will see the following log:

You can see that in the above logs, after the server comes back up, socket.io client (running in the browser) isn’t able to connect to socket.io server (running on Node.js in the server).

This is because, once the server is restarted on Cloud Foundry, instances are brought up as if they are brand-new server instances with different IP addresses and different ports and so jsessionid is no-longer valid. That in turn causes the load balancer to load balance socket.io’s reconnection requests (i.e. they are sent to different server instances) causing the socket.io server not to properly handshake and consequently to throw client not handshaken errors!

OK, let’s fix that reconnection issue

First, we will disable socket.io’s default “reconnect” feature, and then implement our own reconnection feature.

In our custom reconnection function, when the server goes down, we’ll make a dummy HTTP-get call to index.html every 4-5 seconds. If the call succeeds, we know that the (Express) server has already set jsessionid in the response. So, then we’ll call socket.io’s reconnect function. This time because jsessionid is set, socket.io’s handshake will succeed and the user can continue to chat without interruption.

/*
 Connect to socket.io on the server (*** FIX ***).
 */
var host = window.location.host.split(':')[0];

//Disable Socket.io's default "reconnect" feature
var socket = io.connect('http://' + host, {reconnect: false, 'try multiple transports': false});
var intervalID;
var reconnectCount = 0;
...
...
socket.on('disconnect', function () {
    console.log('disconnect');

    //Retry reconnecting every 4 seconds
    intervalID = setInterval(tryReconnect, 4000);
});
...
...


/*
 Implement our own reconnection feature.
 When the server goes down we make a dummy HTTP-get call to index.html every 4-5 seconds.
 If the call succeeds, we know that (Express) server sets ***jsessionid*** , so only then we try socket.io reconnect.
 */
var tryReconnect = function () {
    ++reconnectCount;
    if (reconnectCount == 5) {
        clearInterval(intervalID);
    }
    console.log('Making a dummy http call to set jsessionid (before we do socket.io reconnect)');
    $.ajax('/')
        .success(function () {
            console.log("http request succeeded");
            //reconnect the socket AFTER we got jsessionid set
            socket.socket.reconnect();
            clearInterval(intervalID);
        }).error(function (err) {
            console.log("http request failed (probably server not up yet)");
        });
};

In addition, since the jsessionid is invalidated by the load balancer, we can’t create a session with the same jsessionid or else the sticky session will be ignored by the load balancer. So on the server, when the dummy HTTP request comes in, we will regenerate the session to remove the old session and sessionid and ensure everything is afresh before we serve the response.

//Instead of..
exports.index = function (req, res) {
    res.render('index', { title: 'RedisPubSubApp', user: req.session.user});
};

//Use this..
exports.index = function (req, res) {
    //Save user from previous session (if it exists)
    var user = req.session.user;

    //Regenerate new session & store user from previous session (if it exists)
    req.session.regenerate(function (err) {
        req.session.user = user;
        res.render('index', { title: 'RedisPubSubApp', user: req.session.user});
    });
};


Running / Testing it on Cloud Foundry

  • Clone the app to redispubsub folder
  • cd redispubsub
  • npm install and follow the below instructions to push the app to Cloud Foundry
[~/success/git/redispubsub]
> vmc push redispubsub
Instances> 4       <----- Run 4 instances of the server

1: node
2: other
Framework> node

1: node
2: node06
3: node08
4: other
Runtime> 3  <---- Choose Node.js 0.8v

1: 64M
2: 128M
3: 256M
4: 512M
Memory Limit> 64M

Creating redispubsub... OK

1: redispubsub.cloudfoundry.com
2: none
URL> redispubsub.cloudfoundry.com  <--- URL of the app (choose something unique)

Updating redispubsub... OK

Create services for application?> y

1: blob 0.51
2: mongodb 2.0
3: mysql 5.1
4: postgresql 9.0
5: rabbitmq 2.4
6: redis 2.6
7: redis 2.4
8: redis 2.2
What kind?> 6 <----- Select & Add Redis v2.6 service

Name?> redis-e9771 <-- This is just a random name for Redis service

Creating service redis-e9771... OK
Binding redis-e9771 to redispubsub... OK
Create another service?> n

Bind other services to application?> n

Save configuration?> n

Uploading redispubsub... OK
Starting redispubsub... OK
Checking redispubsub... OK

  • Once the server is up, open up multiple browsers and go to <appname>.cloudfoundry.com
  • Start chatting

Test 1

  • Refresh the browser
  • You should automatically be logged in

Test 2

  • Open up JS debugger (in Chrome, do cmd + alt +j)
  • Restart the server by running vmc restart <appname>
  • Once the server restarts, Socket.io should automatically reconnect
  • You should be able to chat after the reconnection

That’s it for this time. Stay tuned for my next blog where I will discuss how RabbitMQ can be leveraged in scalable apps built on Cloud Foundry. The content of this blog has also been covered in a video. Feel free to get in touch with us for questions on the material.

General Notes

  • Get the code right away – Github location: https://github.com/rajaraodv/redispubsub.
  • Deploy right away – if you don’t already have a Cloud Foundry account, sign up for it here.
  • Check out Cloud Foundry getting started here and install the vmc Ruby command line tool to push apps.

  • To install the latest alpha or beta vmc tool run: sudo gem install vmc --pre.

Credits

Front end UI: https://github.com/steffenwt/nodejs-pub-sub-chat-demo

Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Email

Future-proofing Your Apps: Cloud Foundry and Node.js

Most real-world applications we ship to consumers or enterprises are multi-year projects. In the cloud era, newer technologies (programming languages, runtimes, frameworks) are created faster than ever. While most of them fail to get any traction, once in a while a technology becomes popular because it solves a problem or set of problems extremely well.

Now in such an era, if you make a large investment for a multi-year project on a PaaS that only supports one technology and some other technology comes along that happens to solve your problem better, then you are stuck. You have unintentionally become a victim of vendor lock-in. The heart of your problem is that your PaaS, and hence your app, was not future-proofed to begin with.

PaaS With 1 set of Technologies

To future-proof your long term project, you should:

  • 1. Use a polyglot PaaS like Cloud Foundry that supports a great mix of both mature technologies and upcoming technologies.Polyglot PaaS
  • 2. Learn about newer and popular technologies like Node.js to:
    • See if they can replace part of your current app (i.e., convert it to a polyglot app).
    • Write future apps for your company in newer technologies using the same PaaS that you are already familiar with.

The remainder of this blog is about the latter option–learning newer and popular technologies, in this case Node.js, to help future-proof your app.

Things to note before you read:

  • While this blog refers to JavaScript frequently, it’s all happening on the server (not in the browser). Think of yourself as a server-side engineer throughout this blog.
  • We will also discuss when not to use Node.js and other similar languages towards the end of the blog.

What is Node.js?

Official definition: “Node.js is a platform built on Chrome’s JavaScript runtime for easily building fast, scalable network applications. Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient, perfect for data-intensive real-time applications that run across distributed devices.” - Nodejs.org

Simple (my) definition: A platform that makes writing powerful C/C++ server-side apps easy by essentially wrapping them in JavaScript.

Let’s understand the definition by looking under the hood:

While at a first glance, perhaps because of the name Node.js, it might feel like it is built using JavaScript, but it is not. It simply runs JavaScript on the server. It is about 80 percent C/C++ code and about 20 percent JavaScript code. The C/C++ libraries are responsible for running JavaScript (via Google Chrome V8 JS engine) and providing support for HTTP, DNS, and TCP, etc.,–important server-side functionalities. The proportionally smaller JavaScript code mostly consists of libraries or modules to help make server-side developers’ lives a lot simpler.

Some useful definitions:

  • Chrome V8 Engine: Chrome V8 Engine is Google’s open source, C++ based JavaScript engine that actually runs JavaScript. The Node.js team took this C++ code and added other important libraries like TCP, HTTP and DNS to create Node.js. This is the same engine that is also embedded in the Google Chrome browser that runs JavaScript in the browser as well. This is not yet-another-JavaScript-engine but one that uses innovative techniques like “Hidden Class Transitions,” “JS to Machine Code compilation” and “Automatic GC” to make it one of the fastest JavaScript engines.

    We encourage you to go through Google’s Chrome comic book to learn more about this. Three relevant pages from that book are shown below.

  • Asynchronous I/O and Evented Support (C/C++):  In order to write a fast and scalable server application, we typically end up writing it in a multi-threaded fashion. While you can build great multi-threaded apps in many languages, it usually requires a lot of expertise to build them correctly. On the other hand, these libraries (along with Chrome’s V8 engine) provide a different architecture that hides the complexities of multi-threaded apps while getting the same or better benefits.
  • Let’s compare classic multi-threaded server with an evented, non-blocking I/O server:
  • An example multi-threaded HTTP server using blocking I/O
  • The above diagram depicts a simplified multi-threaded server. There are four users logging into the multi-threaded server. A couple of the users are hitting refresh buttons causing it to use lot of threads. When a request comes in, one of the threads in the thread pool performs that operation, say, a blocking I/O operation. This triggers the OS to perform context switching and run other threads in the thread pool. And after some time, when the I/O is finished, the OS context switches back to the earlier thread to return the result.
  • Architecture Summary: Multi-threaded servers supporting a synchronous, blocking I/O model provide a simpler way of performing I/O. But to handle a heavy load, multi-threaded servers end up using more threads because of the direct association to connections. Supporting more threads causes more memory and higher CPU usage due to more context switching among threads.
  • For more details, we recommend going through Benjamin Erb’s thesis paper here: http://berb.github.com/diploma-thesis/.
  • Event-driven, non-blocking I/O (Node.js server):The above diagram depicts how Node.js server works. At a high level, Node.js server has two parts to it:
  • At the front, you have Chrome V8 engine (single threaded), event loop and other C/C++ libraries that run your JS code and listen to HTTP/TCP requests.
  • And at the back of the server, you have libuv (includes libio) and other C/C++ libraries that provide asynchronous I/O.
  • Whenever a request is made from a browser, mobile device, etc., the main thread running in the V8 engine checks if it is an I/O. if it is an I/O then it immediately delegates that to the backside (kernel level) of the server where one of the threads in the POSIX thread pool actually makes async I/O. Because the main thread is now free, it starts accepting new requests/events.
  • And at some point when the response comes back from a database or file system, the backend piece generates an event indicating that we have a result from I/O. And when V8 becomes free from what it is currently doing (remember it is single-threaded), it takes the result and returns it to the client.
  • Architecture Summary: This architecture utilizes an event loop (main thread) at the front and performs asynchronous I/O at the kernel level. By not directly associating connections and threads, this model needs only a main event loop thread and many fewer (kernel) threads to perform I/O. Because there are fewer threads and consequently less context-switching, it uses less memory and also less CPU.

What are the benefits of Node.js?

  1. Savings in I/O cost (i.e., high performance): Because of the architecture, Node.js provides high performance like Nginx server as shown below. (As a side note: Nginx uses evented, non-blocking architecture, where as Apache uses multi-threaded architecture. Nginx doesn’t use Node.js, this is just an architecture comparison).
  2. Savings in Memory: Again, because of the architecture, Node.js uses relatively very little memory much like Nginx server as shown below.
  3. JavaScript: Node.js uses a familiar and very popular language–JavaScript–and allows engineers to use a single language for both client and server. (You can also use CoffeeScript (tenth on this list), which compiles to JavaScript.)https://github.com/languages
  4. Thousands of libraries: High performance and a familiar language is great, but you really need libraries to get started. Although Node.js is relatively new, it already has nearly 11,000 libraries.
  5. Second most popular watched project on Github: A large ecosystem of developers means better libraries and frameworks.

    2nd most popular on Github

    2nd most popular watched on Github

When to use Node.js:

Use Node.js to:

  1. Build a (soft) real-time social app like Twitter or a chat app.
  2. Build high-performance, high I/O, TCP apps like proxy servers, PaaS, databases, etc.
  3. Build backend logging and processing apps.
  4. Build great CLI apps similar to vmc-tool, and build tools such as ant or Make.
  5. Add a RESTful API-based web server in front of an application server.

When NOT to use Node.js:

Node.js is not suitable for every application:

  1. Mission-critical (hard) real-time apps like heart monitoring apps or those that are CPU-intensive.
  2. For simple CRUD apps that don’t have any real-time or high-performance needs, Node.js does not provide much of an advantage over other languages.
  3. Enterprise apps that might need some specific libraries for which there may not be a Node.js library yet. (However, you could build a polyglot app that uses Java in conjunction to Node.js to help with libraries.)

What are the drawbacks of Node.js:

Most of the drawbacks are because Node.js itself is relatively new:

  1. Node.js libraries are developed actively with a high rate of change. There are newer versions of libraries literally every month. This can cause version issues and instabilities. Npm shrinkwrap and package.json were introduced a while back to set up standards, but the issue still exists.
  2. Still many libraries, such as the SAML auth library which is required for enterprise apps, are not available yet.
  3. The whole callback, event-driven, functional programming aspects of Node.js can add a learning curve burden to server-side programmers of other object-oriented languages. (Note, there are several libraries to help overcome this. One example is async. In addition, developers can also use CoffeeScript which compiles to JavaScript to help with learning curve).
  4. Asynchronous and event-driven code inherently adds more complexity to the code versus a synchronous code.
  5. JavaScript has more than its share of “bad parts” and might throw off engineers and newcomers. (Side note: Read some good JavaScript books like: JavaScript: The Good Parts if you are a newcomer.)

What are other similar and newer languages I should be aware of:

  1. vertx.io: Write your application components in JavaScript, Ruby, Groovy or Java. Or mix and match several programming languages in a single application.
  2. Erlang: Erlang is a programming language used to build massively scalable soft real-time systems with requirements on high availability. Some of its uses are in telecoms, banking, e-commerce, computer telephony and instant messaging. Erlang’s runtime system has built-in support for concurrency, distribution and fault tolerance.
  3. Twisted: Twisted is an event-driven networking engine written in Python and licensed under the open source.
  4. EventMachine: EventMachine is an event-driven I/O and lightweight concurrency library for Ruby. It provides event-driven I/O using the Reactor pattern.
  5. Scala: Scala is a general purpose programming language designed to express common programming patterns in a concise, elegant and type-safe way.
  6. Dart: With the Dart platform, you can write code that runs on servers and in modern web browsers. Dart compiles to JavaScript, so your Dart web apps will work in multiple browsers.
  7. Go: Go is an open source programming environment that makes it easy to build simple, reliable and efficient software.
That’s it! Hopefully this blog gave you a good overview of polyglot PaaS and Node.js. We want to get you on track to future-proof your next multi-year project.
Also, please be sure to join me for a live webinar: “Node.js Basics: An Introductory Training” on July 18, 10:00 a.m. PDT.
Want to try Node.js on Cloud foundry?
Cloud Foundry provides a runtime environment for Node.js applications and the Cloud Foundry deployment tools automatically recognize Node.js applications. Simply follow the step-by-step instructions as described here: http://docs.cloudfoundry.com/frameworks/nodejs/nodejs.html and you will be on your way to running Node.js apps soon.
  • Raja Rao DV (@rajaraodv – Developer Advocate, Cloud Foundry, (Node.js))
Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Email