Part 1: Set up an Apache Storm environment with Vagrant and Docker

Introduction

Anyone who hears the term Big Data, automatically thinks of Hadoop and Spark, but for real-time analysis of Big Data nothing is better than Apache Storm. Before we look more closely at Apache Storm, we describe in this article, how Storm can be installed with Vagrant and Docker.

Why use Vagrant?

Vagrant is a high-level wrapper around virtualization software. It makes it easy to start virtual machines from scratch because it only needs a single file which describes the type of machine, the software that needs to be installed and the way how to access the machine.

Why use Docker?

Docker provides software containers in which applications run isolated from their surroundings. Docker uses the resource isolation features to run multiple containers within a single Linux instance, avoiding the overhead of starting and maintaining virtual machines.
Storm needs min 3 nodes (zookeeper, nimbus and supervisor), so Docker is the best choice to run Storm on a single machine.

Preparations

Create the following directory structure

/home/vagrant/storm
/home/vagrant/storm/docker
/home/vagrant/storm/docker/storm
/home/vagrant/storm/docker/zookeeper

In the Vagrant VM is the path /home/vagrant/storm is mapped to the path /vagrant. Within the VM, the above structure is shown as follows.

/vagrant
/vagrant/docker
/vagrant/docker/storm
/vagrant/docker/zookeeper

/home/vagrant/storm/docker/storm and /home/vagrant/storm/docker/zookeeper are the directories where the Docker instructions to create the images are stored.

Setup Vagrant

Download and install Vagrant on your system. Packages regarding your system can be found here.
Change into directory /home/vagrant/storm. After executing the command

$ vagrant init ubuntu/trusty64

a file Vagrant was created. Now a few adjustments to the file Vagrant must be made. A "ready for use" Vagrant environemt file can be downloaded via the following link. An existing proxy is determined by the environment variable http_proxy.

The name for the VM, generated by Vagrant is not very meaningful. Therefore, we give the VM name "storm_vm" so that it is better to identify. In addition, the main memory of the VM needs to be increased because the standard used by Vagrant of 512 mb is not sufficient. This is done with the instruction

config.vm.provider "virtualbox" do |v|
   v.name = "storm_vm"
   v.memory = 2048
end

In order to achieve the Storm UI outside of the VM, port forwarding is required. We map the port 8080 of the VM on port 8888.

config.vm.network "forwarded_port", guest: 8080, host: 8888

Setup Docker

Vagrant includes built-in support for Docker. It does NOT contain any special Docker version. This ensures that at the first start of a Vagrant box the latest version of Docker is always installed. Make sure that you are connected to the Internet.

config.vm.provision "docker"

Installs Docker in the virtual machine.

config.vm.provision "shell", inline: <<-SHELL
    sudo docker network create cluster
 SHELL

Creates a new network named cluster. All our containers are connected to that network. By using a private network, the "--link" option when starting container is no longer needed.
Now let’s build the images.

config.vm.provision "docker" do |d|
    d.build_image "/vagrant/docker/storm",
        args: "-t 'storm' --build-arg HTTP_PROXY=#{ENV['http_proxy']} --build-arg http_proxy=#{ENV['http_proxy']} --build-arg HTTPS_PROXY=#{ENV['http_proxy']} --build-arg https_proxy=#{ENV['http_proxy']}"
        
    d.build_image "/vagrant/docker/zookeeper",
        args: "-t 'zookeeper' --build-arg HTTP_PROXY=#{ENV['http_proxy']} --build-arg http_proxy=#{ENV['http_proxy']} --build-arg HTTPS_PROXY=#{ENV['http_proxy']} --build-arg https_proxy=#{ENV['http_proxy']}"

This statement builds two Docker images "storm" and "zookeeper". For the successful assembly of the images, Docker needs the correct information in a Dockerfile.
After successful build, the container should automatically be started. All containers must be on the network "cluster" and must have their readable names as the hostname. To make life easier, we map the log directory to a directory to which we have access outside the VM.

    d.run "zookeeper",
      args: "--net=cluster --hostname=zookeeper"
      
    d.run "nimbus",
      image: "storm",
      cmd: "nimbus",
      args: "--net=cluster --hostname=nimbus -t -v /vagrant/docker/storm/nimbus:/opt/apache-storm-1.0.1/logs"
      
    d.run "supervisor",
      image: "storm",
      cmd: "supervisor",
      args: "--net=cluster --hostname=supervisor -t -v /vagrant/docker/storm/supervisor:/opt/apache-storm-1.0.1/logs"
      
    d.run "storm-ui",
      image: "storm",
      cmd: "ui",
      args: "--net=cluster -p 8080:8080 --hostname=storm-ui -t -v /vagrant/docker/storm/ui:/opt/apache-storm-1.0.1/logs"
  end

Build instructions for image "storm"

Place the following contents into a file named Dockerfile in /home/vagrant/storm/docker/storm.

FROM ubuntu:14.04
MAINTAINER Carsten Zaddach 
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get install -y java-common
RUN apt-get install -y ca-certificates-java             
RUN apt-get install -y openjdk-7-jre
RUN apt-get install -y python
RUN cd /opt \
    && curl -fsSL http://mirror.softaculous.com/apache/storm/apache-storm-1.0.1/apache-storm-1.0.1.tar.gz  \
    | tar -xz 
COPY storm.yaml /opt/apache-storm-1.0.1/conf/storm.yaml
ENTRYPOINT ["/opt/apache-storm-1.0.1/bin/storm"]

Create a file storm.yaml in /home/vagrant/storm/docker/storm and put the following into this file.

########### These MUST be filled in for a storm configuration
storm.zookeeper.servers:
     - "zookeeper"
# 
nimbus.seeds : ["nimbus"]

Build instructions for image "zookeeper"

Place the following contents into a file named Dockerfile in /home/vagrant/storm/docker/zookeeper.

FROM ubuntu:14.04
MAINTAINER Carsten Zaddach
RUN apt-get update
RUN apt-get upgrade -y
RUN apt-get install -y zookeeper
CMD /usr/share/zookeeper/bin/zkServer.sh start-foreground

Ready to run

Now start the process of creating the VM and its Docker container.

$ vagrant up

After successful provisioning open your browser and go to http://localhost:8888.

If you have any questions or suggestions, do not hesitate to contact us.

 

Part 2: Install Web Based Integrated Development Environment (IDE)

comments powered by Disqus