Cloud Foundry Blog

Scaling Real-time Apps on Cloud Foundry Using Node.js and RabbitMQ

In the previous blog Scaling Real-time Apps on Cloud Foundry Using Node.js and Redis, we used Redis as a ‘session store’ and also as a ‘pub-sub’ service for chat messages. But in many enterprise grade real-time apps, you may want to use RabbitMQ instead of Redis to do pub-sub because of the reliability and features that comes out-of-the-box in RabbitMQ. This is especially true for financial or Bank apps like Stock Quote apps where it is critical to protect and deliver each-and-every message AND do it as quickly as possible.

So, in this blog, we will start from Scaling Real-time Apps on Cloud Foundry Using Node.js and Redis and simply replace Redis with RabbitMQ pubsub.

The app architecture (before):

The app architecture w/ RabbitMQ (after):


Introduction to RabbitMQ

The Node.js community may not be familiar with RabbitMQ. So here are some of the high-level intro of RabbitMQ.

RabbitMQ is a message broker. It simply accepts messages from one or more endpoints “Producers” and sends it to one or more endpoints “Consumers”.

RabbitMQ is more sophisticated and flexible than just that. Depending on the configuration, it can also figure out what needs to be done when a consumer crashes(store and re-deliver message), when consumer is slow (queue messages), when there are multiple consumers (distribute work load), or even when RabbitMQ itself crashes (durable). For more please see: RabbitMQ tutorials.

RabbitMQ is also very fast & efficient. It implements Advanced Message Queuing Protocol “AMQP” that was built by and for Wall Street firms like J.P. Morgan Chase, Goldman Sachs, etc. for trading stocks and related activities. RabbitMQ is an Erlang (also well-known for concurrency & speed) implementation of that protocol.

For more please go through RabbitMQ’s website.


Fundamental pieces of RabbitMQ

RabbitMQ has 4 pieces.

  1. Producer (“P”) – Sends messages to an exchange along with “Routing key” indicating how to route the message.
  2. Exchange (“X”) – Receives message and Routing key from Producers and figures out what to do with the message.
  3. Queues(“Q”) – A temporary place where the messages are stored based on Queue’s “binding key” until a consumer is ready to receive the message. Note: While a Queue physically resides inside RabbitMQ, a consumer (“C”) is the one that actually creates it by providing a “Binding Key”.
  4. Consumer(“C”) – Subscribes to a Queue to receive messages.

Routing Key, Binding Key and types of Exchanges

To allow various work-flows like pub-sub, work queues, topics, RPC etc., RabbitMQ allows us to independently configure the type of the Exchange, Routing Key and Binding Key.

Routing Key:

A string/constraint from Producer instructing Exchange how to route the message. A Routing key looks like: “logs”, “errors.logs”, “warnings.logs” “tweets” etc.

Binding Key:

Another string/constraint added by a Consumer to a queue to which it is binding/listening to. A Binding key looks like: “logs”, “*.logs”, “#.logs” etc.

Note: In RabbitMQ, Binding keys can have “patterns” (but not Routing keys).

Types of Exchange:

Exchanges can be of 4 types:

  1. Direct – Sends messages from producer to consumer if Routing Key and Binding key match exactly.
  2. Fanout – Sends any message from a producer to ALL consumers (i.e ignores both routing key & binding key)
  3. Topic – Sends a message from producer to consumer based on pattern-matching.
  4. Headers – If more complicated routing is required beyond simple Routing key string, you can use headers exchange.

In RabbitMQ the combination of the type of Exchange, Routing Key and Binding Key make it behave completely differently. For example: A fanout Exchange ignores Routing Key and Binding Key and sends messages to all queues. A Topic Exchange sends a copy of a message to zero, one or more consumers based on RabbitMQ patterns (#, *).

Going into more details is beyond the scope of this blog, but here is another good blog that goes into more details: AMQP 0-9-1 Model Explained


Using RabbitMQ to do pub-sub in our Node.js chat app.

Now that we know some of the basics of RabbitMQ, and all the 4 pieces, let’s see how to actually use it in our Chat app.

Chat App:

Connecting to RabbitMQ and creating an Exchange

For our chat application, we will create a fanout exchange called chatExchange. And we will be using node-amqp module to talk to RabbitMQ service.

//Connect to RabbitMQ and get reference to the connection.
var rabbitConn = amqp.createConnection({});

//Create an exchange with a name 'chatExchange' and of type 'fanout'
var chatExchange;
rabbitConn.on('ready', function () {
    chatExchange = rabbitConn.exchange('chatExchange', {'type': 'fanout'});
});

Creating Producers (So Users can send chat messages)

In our chat app, users are both producers(i.e. sends chat messages to others) and also consumers (i.e. receives messages from others). Let’s focus on users being ‘producers’.

When a user sends a chat message, publish it to chatExchange w/o a Routing Key (Routing Key doesn’t matter because chatExchange is a ‘fanout’).

/**
     * When a user sends a chat message, publish it to chatExchange w/o a Routing Key (Routing Key doesn't matter
     * because chatExchange is a 'fanout').
     *
     * Notice that we are getting user's name from session.
     */
    socket.on('chat', function (data) {
        var msg = JSON.parse(data);
        var reply = {action: 'message', user: session.user, msg: msg.msg };
        chatExchange.publish('', reply);
    });

Similarly, when a user joins, publish it to chatExchange w/o Routing key.

/**
     * When a user joins, publish it to chatExchange w/o Routing key (Routing doesn't matter
     * because chatExchange is a 'fanout').
     *
     * Note: that we are getting user's name from session.
     */
    socket.on('join', function () {
        var reply = {action: 'control', user: session.user, msg: ' joined the channel' };
        chatExchange.publish('', reply);
    });

Creating Consumers (So Users can receive chat messages)

Creating consumers involves 3 steps:

  1. Create a queue with some options.
  2. Bind queue to exchange using some “Binding Key”
  3. Create a subscriber (usually a callback function) to actually obtain messages sent to the queue.

For our chat app,

  1. Let’s create a queue w/o any name. This forces RabbitMQ to create new queue for every socket.io connection w/ a new random queue name. Let’s also set exclusive flag to ensure only this consumer can access the messages from this queue.
rabbitConn.queue('', {exclusive: true}, function (q) {
 ..
 }
  1. Then bind the queue to chatExchange with an empty ‘Binding key’ and listen to ALL messages.
q.bind('chatExchange', "");
  1. Lastly, create a consumer (via q.subscribe) that waits for messages from RabbitMQ. And when a message comes, send it to the browser.
q.subscribe(function (message) {
   //When a message comes, send it back to browser
   socket.emit('chat', JSON.stringify(message));
 });

Putting it all together.

sessionSockets.on('connection', function (err, socket, session) {
    /**
     * When a user sends a chat message, publish it to chatExchange w/o a Routing Key (Routing Key doesn't matter
     * because chatExchange is a 'fanout').
     *
     * Notice that we are getting user's name from session.
     */
    socket.on('chat', function (data) {
        var msg = JSON.parse(data);
        var reply = {action: 'message', user: session.user, msg: msg.msg };
        chatExchange.publish('', reply);
    });

   /**
     * When a user joins, publish it to chatExchange w/o Routing key (Routing doesn't matter
     * because chatExchange is a 'fanout').
     *
     * Note: that we are getting user's name from session.
     */
    socket.on('join', function () {
        var reply = {action: 'control', user: session.user, msg: ' joined the channel' };
        chatExchange.publish('', reply);
    });


   /**
     * Initialize subscriber queue.
     * 1. First create a queue w/o any name. This forces RabbitMQ to create new queue for every socket.io connection w/ a new random queue name.
     * 2. Then bind the queue to chatExchange  w/ "#" or "" 'Binding key' and listen to ALL messages
     * 3. Lastly, create a consumer (via .subscribe) that waits for messages from RabbitMQ. And when
     * a message comes, send it to the browser.
     *
     * Note: we are creating this w/in sessionSockets.on('connection'..) to create NEW queue for every connection
   */
    rabbitConn.queue('', {exclusive: true}, function (q) {
        //Bind to chatExchange w/ "#" or "" binding key to listen to all messages.
        q.bind('chatExchange', "");

   //Subscribe When a message comes, send it back to browser
        q.subscribe(function (message) {
            socket.emit('chat', JSON.stringify(message));
        });
    });
 });

Running / Testing it on Cloud Foundry

  • Clone this app to rabbitpubsub folder
  • cd rabbitpubsub
  • npm install & follow the below instructions to push the app to Cloud Foundry
[~/success/git/rabbitpubsub]
> vmc push rabbitpubsub
Instances> 4       <----- Run 4 instances of the server

1: node
2: other
Framework> node

1: node
2: node06
3: node08
4: other
Runtime> 3  <---- Choose Node.js 0.8v

1: 64M
2: 128M
3: 256M
4: 512M
Memory Limit> 64M

Creating rabbitpubsub... OK

1: rabbitpubsub.cloudfoundry.com
2: none
URL> rabbitpubsub.cloudfoundry.com  <--- URL of the app (choose something unique)

Updating rabbitpubsub... OK

Create services for application?> y

1: blob 0.51
2: mongodb 2.0
3: mysql 5.1
4: postgresql 9.0
5: rabbitmq 2.4
6: redis 2.6
7: redis 2.4
8: redis 2.2
What kind?> 5 <----- Select & Add RabbitMQ 2.4v service (for pub-sub)

Name?> rabbit-e1223 <-- This is just a random name for RabbitMQ service

Creating service rabbit-e1223... OK
Binding rabbit-e1223 to rabbitpubsub... OK

Create another service?> y

1: blob 0.51
2: mongodb 2.0
3: mysql 5.1
4: postgresql 9.0
5: rabbitmq 2.4
6: redis 2.6
7: redis 2.4
8: redis 2.2
What kind?> 6 <----- Select & Add Redis 2.6v service (for session store)

Name?> redis-e9771 <-- This is just a random name for Redis service

Creating service redis-e9771... OK
Binding redis-e9771 to rabbitpubsub... OK

Bind other services to application?> n

Save configuration?> n

Uploading rabbitpubsub... OK
Starting rabbitpubsub... OK
Checking rabbitpubsub... OK

  • Once the server is up, open up multiple browsers and go to <servername>.cloudfoundry.com
  • Start chatting.

Tests

Test 1

  • While chatting, refresh the browser.
  • You should automatically be logged in.

Test 2

  • Open up JS debugger (On Chrome, do cmd + alt +j )
  • Restart the server by doing vmc restart <appname>
  • Once the server restarts, Socket.io should automatically reconnect
  • You should be able to chat after the reconnection.

That’s it for now. Hopefully this blog helps you get started with using RabbitMQ. Look forward for more Node.js and RabbitMQ related blogs. The content of this blog has also been covered in a video. Feel free to get in touch with us for questions on the material.


General Notes

  • Get the code right away – Github location: https://github.com/rajaraodv/rabbitpubsub.
  • Deploy right away – if you don’t already have a Cloud Foundry account, sign up for it here.
  • Check out Cloud Foundry getting started here and install the vmc Ruby command line tool to push apps.
  • To install the latest alpha or beta vmc tool run: sudo gem install vmc --pre.
Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Email

Future-proofing Your Apps: Cloud Foundry and Node.js

Most real-world applications we ship to consumers or enterprises are multi-year projects. In the cloud era, newer technologies (programming languages, runtimes, frameworks) are created faster than ever. While most of them fail to get any traction, once in a while a technology becomes popular because it solves a problem or set of problems extremely well.

Now in such an era, if you make a large investment for a multi-year project on a PaaS that only supports one technology and some other technology comes along that happens to solve your problem better, then you are stuck. You have unintentionally become a victim of vendor lock-in. The heart of your problem is that your PaaS, and hence your app, was not future-proofed to begin with.

PaaS With 1 set of Technologies

To future-proof your long term project, you should:

  • 1. Use a polyglot PaaS like Cloud Foundry that supports a great mix of both mature technologies and upcoming technologies.Polyglot PaaS
  • 2. Learn about newer and popular technologies like Node.js to:
    • See if they can replace part of your current app (i.e., convert it to a polyglot app).
    • Write future apps for your company in newer technologies using the same PaaS that you are already familiar with.

The remainder of this blog is about the latter option–learning newer and popular technologies, in this case Node.js, to help future-proof your app.

Things to note before you read:

  • While this blog refers to JavaScript frequently, it’s all happening on the server (not in the browser). Think of yourself as a server-side engineer throughout this blog.
  • We will also discuss when not to use Node.js and other similar languages towards the end of the blog.

What is Node.js?

Official definition: “Node.js is a platform built on Chrome’s JavaScript runtime for easily building fast, scalable network applications. Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient, perfect for data-intensive real-time applications that run across distributed devices.” - Nodejs.org

Simple (my) definition: A platform that makes writing powerful C/C++ server-side apps easy by essentially wrapping them in JavaScript.

Let’s understand the definition by looking under the hood:

While at a first glance, perhaps because of the name Node.js, it might feel like it is built using JavaScript, but it is not. It simply runs JavaScript on the server. It is about 80 percent C/C++ code and about 20 percent JavaScript code. The C/C++ libraries are responsible for running JavaScript (via Google Chrome V8 JS engine) and providing support for HTTP, DNS, and TCP, etc.,–important server-side functionalities. The proportionally smaller JavaScript code mostly consists of libraries or modules to help make server-side developers’ lives a lot simpler.

Some useful definitions:

  • Chrome V8 Engine: Chrome V8 Engine is Google’s open source, C++ based JavaScript engine that actually runs JavaScript. The Node.js team took this C++ code and added other important libraries like TCP, HTTP and DNS to create Node.js. This is the same engine that is also embedded in the Google Chrome browser that runs JavaScript in the browser as well. This is not yet-another-JavaScript-engine but one that uses innovative techniques like “Hidden Class Transitions,” “JS to Machine Code compilation” and “Automatic GC” to make it one of the fastest JavaScript engines.

    We encourage you to go through Google’s Chrome comic book to learn more about this. Three relevant pages from that book are shown below.

  • Asynchronous I/O and Evented Support (C/C++):  In order to write a fast and scalable server application, we typically end up writing it in a multi-threaded fashion. While you can build great multi-threaded apps in many languages, it usually requires a lot of expertise to build them correctly. On the other hand, these libraries (along with Chrome’s V8 engine) provide a different architecture that hides the complexities of multi-threaded apps while getting the same or better benefits.
  • Let’s compare classic multi-threaded server with an evented, non-blocking I/O server:
  • An example multi-threaded HTTP server using blocking I/O
  • The above diagram depicts a simplified multi-threaded server. There are four users logging into the multi-threaded server. A couple of the users are hitting refresh buttons causing it to use lot of threads. When a request comes in, one of the threads in the thread pool performs that operation, say, a blocking I/O operation. This triggers the OS to perform context switching and run other threads in the thread pool. And after some time, when the I/O is finished, the OS context switches back to the earlier thread to return the result.
  • Architecture Summary: Multi-threaded servers supporting a synchronous, blocking I/O model provide a simpler way of performing I/O. But to handle a heavy load, multi-threaded servers end up using more threads because of the direct association to connections. Supporting more threads causes more memory and higher CPU usage due to more context switching among threads.
  • For more details, we recommend going through Benjamin Erb’s thesis paper here: http://berb.github.com/diploma-thesis/.
  • Event-driven, non-blocking I/O (Node.js server):The above diagram depicts how Node.js server works. At a high level, Node.js server has two parts to it:
  • At the front, you have Chrome V8 engine (single threaded), event loop and other C/C++ libraries that run your JS code and listen to HTTP/TCP requests.
  • And at the back of the server, you have libuv (includes libio) and other C/C++ libraries that provide asynchronous I/O.
  • Whenever a request is made from a browser, mobile device, etc., the main thread running in the V8 engine checks if it is an I/O. if it is an I/O then it immediately delegates that to the backside (kernel level) of the server where one of the threads in the POSIX thread pool actually makes async I/O. Because the main thread is now free, it starts accepting new requests/events.
  • And at some point when the response comes back from a database or file system, the backend piece generates an event indicating that we have a result from I/O. And when V8 becomes free from what it is currently doing (remember it is single-threaded), it takes the result and returns it to the client.
  • Architecture Summary: This architecture utilizes an event loop (main thread) at the front and performs asynchronous I/O at the kernel level. By not directly associating connections and threads, this model needs only a main event loop thread and many fewer (kernel) threads to perform I/O. Because there are fewer threads and consequently less context-switching, it uses less memory and also less CPU.

What are the benefits of Node.js?

  1. Savings in I/O cost (i.e., high performance): Because of the architecture, Node.js provides high performance like Nginx server as shown below. (As a side note: Nginx uses evented, non-blocking architecture, where as Apache uses multi-threaded architecture. Nginx doesn’t use Node.js, this is just an architecture comparison).
  2. Savings in Memory: Again, because of the architecture, Node.js uses relatively very little memory much like Nginx server as shown below.
  3. JavaScript: Node.js uses a familiar and very popular language–JavaScript–and allows engineers to use a single language for both client and server. (You can also use CoffeeScript (tenth on this list), which compiles to JavaScript.)https://github.com/languages
  4. Thousands of libraries: High performance and a familiar language is great, but you really need libraries to get started. Although Node.js is relatively new, it already has nearly 11,000 libraries.
  5. Second most popular watched project on Github: A large ecosystem of developers means better libraries and frameworks.

    2nd most popular on Github

    2nd most popular watched on Github

When to use Node.js:

Use Node.js to:

  1. Build a (soft) real-time social app like Twitter or a chat app.
  2. Build high-performance, high I/O, TCP apps like proxy servers, PaaS, databases, etc.
  3. Build backend logging and processing apps.
  4. Build great CLI apps similar to vmc-tool, and build tools such as ant or Make.
  5. Add a RESTful API-based web server in front of an application server.

When NOT to use Node.js:

Node.js is not suitable for every application:

  1. Mission-critical (hard) real-time apps like heart monitoring apps or those that are CPU-intensive.
  2. For simple CRUD apps that don’t have any real-time or high-performance needs, Node.js does not provide much of an advantage over other languages.
  3. Enterprise apps that might need some specific libraries for which there may not be a Node.js library yet. (However, you could build a polyglot app that uses Java in conjunction to Node.js to help with libraries.)

What are the drawbacks of Node.js:

Most of the drawbacks are because Node.js itself is relatively new:

  1. Node.js libraries are developed actively with a high rate of change. There are newer versions of libraries literally every month. This can cause version issues and instabilities. Npm shrinkwrap and package.json were introduced a while back to set up standards, but the issue still exists.
  2. Still many libraries, such as the SAML auth library which is required for enterprise apps, are not available yet.
  3. The whole callback, event-driven, functional programming aspects of Node.js can add a learning curve burden to server-side programmers of other object-oriented languages. (Note, there are several libraries to help overcome this. One example is async. In addition, developers can also use CoffeeScript which compiles to JavaScript to help with learning curve).
  4. Asynchronous and event-driven code inherently adds more complexity to the code versus a synchronous code.
  5. JavaScript has more than its share of “bad parts” and might throw off engineers and newcomers. (Side note: Read some good JavaScript books like: JavaScript: The Good Parts if you are a newcomer.)

What are other similar and newer languages I should be aware of:

  1. vertx.io: Write your application components in JavaScript, Ruby, Groovy or Java. Or mix and match several programming languages in a single application.
  2. Erlang: Erlang is a programming language used to build massively scalable soft real-time systems with requirements on high availability. Some of its uses are in telecoms, banking, e-commerce, computer telephony and instant messaging. Erlang’s runtime system has built-in support for concurrency, distribution and fault tolerance.
  3. Twisted: Twisted is an event-driven networking engine written in Python and licensed under the open source.
  4. EventMachine: EventMachine is an event-driven I/O and lightweight concurrency library for Ruby. It provides event-driven I/O using the Reactor pattern.
  5. Scala: Scala is a general purpose programming language designed to express common programming patterns in a concise, elegant and type-safe way.
  6. Dart: With the Dart platform, you can write code that runs on servers and in modern web browsers. Dart compiles to JavaScript, so your Dart web apps will work in multiple browsers.
  7. Go: Go is an open source programming environment that makes it easy to build simple, reliable and efficient software.
That’s it! Hopefully this blog gave you a good overview of polyglot PaaS and Node.js. We want to get you on track to future-proof your next multi-year project.
Also, please be sure to join me for a live webinar: “Node.js Basics: An Introductory Training” on July 18, 10:00 a.m. PDT.
Want to try Node.js on Cloud foundry?
Cloud Foundry provides a runtime environment for Node.js applications and the Cloud Foundry deployment tools automatically recognize Node.js applications. Simply follow the step-by-step instructions as described here: http://docs.cloudfoundry.com/frameworks/nodejs/nodejs.html and you will be on your way to running Node.js apps soon.

- Raja Rao DV (@rajaraodv – Developer Advocate, Cloud Foundry, (Node.js))

Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Email