Thursday, November 17, 2011

Ten Commands Every Linux Developer Should Know

A few simple utilities can make it easier to figure out and maintain other people's code.
This article presents a list of commands you should be able to find on any Linux installation. These are tools to help you improve your code and be more productive. The list comes from my own experience as a programmer and includes tools I've come to rely on repeatedly. Some tools help create code, some help debug code and some help reverse engineer code that's been dumped in your lap.

1. ctags
Those of you addicted to integrated development environments (IDEs) probably never heard of this tool, or if you did you probably think it's obsolete. But a tags-aware editor is a productive programming tool.

Tagging your code allows editors like vi and Emacs to treat your code like hypertext (Figure 1). Each object in your code becomes hyperlinked to its definition. For example, if you are browsing code in vi and want to know where the variable foo was defined, type :ta foo. If your cursor is pointing to the variable, simply use Ctrl-right bracket.


Figure 1. gvim at Work with Tags

The good news for the vi-impaired is ctags is not only for C and vi anymore. The GNU version of ctags produces tags that can be used with Emacs and many other editors that recognize tag files. In addition, ctags recognizes many languages other than C and C++, including Perl and Python, and even hardware design languages, such as Verilog. It even can produce a human-readable cross-reference that can be useful for understanding code and performing metrics. Even if you're not interested in using ctags in your editor, you might want to check out the human-readable cross-reference by typing ctags -x *.c*.

What I like about this tool is that you get useful information whether you input one file or one hundred files, unlike many IDEs that aren't useful unless they can see your entire application. It's not a program checker, so garbage in, garbage out (GIGO) rules apply.

2. strace
strace lets you decipher what's going on when you have no debugger nor the source code. One of my pet peeves is a program that doesn't start and doesn't tell you why. Perhaps a required file is missing or has the wrong permissions. strace can tell you what the program is doing right up to the point where it exits. It can tell you what system calls the program is using and whether they pass or fail. It even can follow forks.

strace often gives me answers much more quickly than a debugger, especially if the code is unfamiliar. On occasion, I have to debug code on a live system with no debugger. A quick run with strace sometimes can avoid patching the system or littering my code with printfs. Here is a trivial example of me as an unprivileged user trying to delete a protected file:

strace -o strace.out rm -f /etc/yp.conf
The output shows where things went wrong:

lstat64("/etc/yp.conf", {st_mode=S_IFREG|0644,
st_size=361, ...}) = 0
access("/etc/yp.conf", W_OK) = -1 EACCES
(Permission denied)
unlink("/etc/yp.conf") = -1 EACCES (Permission
denied)
strace also lets you attach to processes for just-in-time debugging. Suppose a process seems to be spending a lot of time doing nothing. A quick way to find out what is going on is to type strace -c -p mypid. After a second or two, press Ctrl-C and you might see a dump something like this:

% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
91.31 0.480456 3457 139 poll
6.66 0.035025 361 97 write
0.91 0.004794 16 304 futex
0.52 0.002741 14 203 read
0.31 0.001652 3 533 gettimeofday
0.26 0.001361 4 374 ioctl
0.01 0.000075 8 10 brk
0.01 0.000064 64 1 clone
0.00 0.000026 26 1 stat64
0.00 0.000007 7 1 uname
0.00 0.000005 5 1 sched_get_priority_max
0.00 0.000002 2 1 sched_get_priority_min
------ ----------- ----------- --------- --------- ----------------
100.00 0.526208 1665 total
In this case, it's spending most of its time in the poll system call—probably waiting on a socket.
3. fuser
The name is a mnemonic for file user and tells what processes have opened a given file. It also can send a signal to all those processes for you. Suppose you want to delete a file but can't because some program has it open and won't close it. Instead of rebooting, type fuser -k myfile. This sends a SIGTERM to every process that has myfile opened.

Perhaps you need to kill a process that forked itself all over the place, intentionally or otherwise. An unenlightened programmer might type something like ps | grep myprogram. This inevitably would be followed by several cut-and-paste operations with the mouse. An easier way is to type fuser -k ./myprogram, where myprogram is the pathname of the executable. fuser typically is located in /sbin, which generally is reserved for system administrative tools. You can add /usr/sbin and /sbin to the end of your $PATH.

4. ps
ps is used to find process status, but many people don't realize it also can be a powerful debugging tool. To get at these features, use the -o option, which lets you access many details of your processes, including CPU usage, virtual memory usage, current state and much more. Many of these options are defined in the POSIX standard, so they work across platforms.

To look at your running commands by pid and process state, type ps -e -o pid,state,cmd. The output looks like this:

4576 S /opt/OpenOffice.org1.1.0/program/soffice.bin -writer
4618 D dd if /dev/cdrom of /dev/null
4619 S bash
4645 R ps -e -o pid,state,cmd
Here you can see my dd command is in an uninterruptible sleep (state D). Basically, it is blocking while waiting for /dev/cdrom. My OpenOffice.org writer is sleeping (state S) while I type my example, and my ps command is running (state R).

For an idea of how a running program is performing, type:

ps -o start,time,etime -p mypid
This shows the basic output from the time command, discussed later, except you don't have to wait until your program is finished.

Most of the information that ps produces is available from the /proc filesystem, but if you are writing a script, using ps is more portable. You never know when a minor kernel rev will break all of your scripts that are mining the /proc filesystem. Use ps instead.

5. time
The time command is useful for understanding your code's performance. The most basic output consists of real, user and system time. Intuitively, real time is the amount of time between when the code started and when it exited. User time and system time are the amount of time spent executing application code versus kernel code, respectively.

Two flavors of the time command are available. The shell has a built-in version that tells you only scheduler information. A version in /usr/bin includes more information and allows you to format the output. You easily can override the built-in time command by preceding it with a backslash, as in the examples that follow.

A basic knowledge of the Linux scheduler is helpful in interpreting the output, but this tool also is helpful for learning how the scheduler works. For example, the real time of a process typically is larger than the sum of the user and system time. Time spent blocking in a system call does not count against the process, because the scheduler is free to schedule other processes during this time. The following sleep command takes one second to execute but takes no measurable system or user time:

\time -p sleep 1
real 1.03
user 0.00
sys 0.00
The next example shows how a task can spend all of its time in user space. Here, Perl calls the log() function in a loop, which requires nothing from the kernel:

\time perl -e 'log(2.0) foreach(0..0x100000)'
real 0.40
user 0.20
sys 0.00
This example shows a process using a lot of memory:

\time perl -e '$x = 'a' x 0x1000000'

0.06user 0.12system 0:00.22elapsed 81%CPU
(0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (309major+8235minor)pagefaults
0swaps
The useful information here is listed as pagefaults. Although the GNU time command advertises a lot of information, the 2.4 series of the Linux kernel stores only major and minor page-fault information. A major page fault is one that requires I/O; a minor page fault does not.

6. nm
This command allows you to retrieve information on symbol names inside an object file or executable file. By default, the output gives you a symbol name and its virtual address. What good is that? Suppose you are compiling code and the compiler complains that you have an unresolved symbol _foo. You search all of your source code and cannot find anywhere where you use this symbol. Perhaps it got pulled in from some template or a macro buried in one of the dozens of include files that compiled along with your code. The command:

nm -guA *.o | grep foo
shows all the modules that refer to foo. If you want to find out what library defines foo, simply use:

nm -gA /usr/lib/* | grep foo
The nm command also understands how to demangle C++ names, which can be handy when mixing C and C++. For example, forgetting to declare a C function with extern"C" produces a link time error something like this:

undefined reference to `cfunc(char*)'
In a large project with poorly defined headers, you might have a hard time tracking down the offending module. In this case, you can look for all the unresolved symbols in each object file with demangling turned on as follows:

nm -guC *.o
extern-c.o:cfunc
no-extern-c.o:cfunc(char*)
The first module is correct; the second is not.

7. strings
This command looks for ASCII strings embedded in binary files. It can be used for good or for evil. The good uses include trying to figure out what library is producing that cryptic string on stdout every once in a while, for example:

strings -f /usr/lib/lib* | grep "cryptic message"
On the evil side, the character strings can be used to probe your format strings looking for clues and vulnerabilities. This is why you should never put passwords and logins in your programs. It might be wise to examine your own programs with this tool and see what a clever programmer can see. The version of strings that comes with the GNU binutils has many useful options.

8. od, xxd
These two commands do basically the same thing, but each offers slightly different features. od is used to convert a binary file to whatever format you like. When dealing with programs that generate raw binary files, od can be indispensable. Although the name stands for octal dump, it can dump data in decimal and hexadecimal as well. od dumps integers, IEEE floats or plain bytes. When looking at multibyte integers or floats, the host byte order affects the output.

xxd also dumps binary files but does not try to interpret them as integers or floats, so the host byte order does not affect the output, which can be confusing or helpful depending on the file. Let's create a four-byte file on an Intel machine:

$ echo -n abcd > foo.bin
$ od -tx4 foo.bin
0000000 64636261
0000004

$ xxd -g4 foo.bin
0000000: 61626364 abcd
The output of od is a byte-swapped 32-bit integer, and the output of xxd is a group of four bytes in the same byte order as they appear in the file. If you're looking for the string abcd, xxd is the command for you. But, if you're looking for the 32-bit number 0x64636261, od is the right command.

xxd also knows a few cool tricks that od doesn't, including the ability to format the output in binary and to translate a binary file into a C array. Suppose you have a binary file that you want to encode inside an array in your C program. One way to do this is by creating a text file as follows:

$ xxd -i foo.bin

unsigned char foo_bin[] = {
0x61, 0x62, 0x63, 0x64
};

unsigned int foo_bin_len = 4;
9. file
UNIX and Linux have never enforced any policy of filename extensions. Naming conventions have evolved, but they are guidelines, not policies. If you want to name your digital picture image00.exe, go ahead. Your Linux photo application gladly accepts the file no matter what the name is, although it may be hard to remember.

The file command can help when you have to retrieve a file from a brain-dead Web browser, which mangles the name—say a file that should have been named foo.bar.hello.world.tar.gz comes out as foo.bar. The file command can help like this:

$ file foo.bar

foo.bar: gzip compressed data,
was "foo.bar.hello.world.tar", from Unix
Perhaps you received a distribution with a bin directory full of dozens of files, some of which are executables and some are scripts. Suppose you want to pick out all the shell scripts. Try this:

$ file /usr/sbin/* | grep script

/usr/sbin/makewhatis: a /bin/bash script text
executable

/usr/sbin/xconv.pl: a /usr/bin/perl script
text executable
The file command identifies all the files in the bin directory, and the grep command filters out everything not a script. Here are some more examples:

file core.4867

core.4867: ELF 32-bit LSB core file Intel 80386,
version 1 (SYSV), SVR4-style, from 'abort'

file /boot/initrd-2.4.20-6.img

/boot/initrd-2.4.20-6.img: gzip compressed data,
from Unix, max compression

file -z /boot/initrd-2.4.20-6.img

/boot/initrd-2.4.20-6.img: Linux rev 1.0 ext2
filesystem data (gzip compressed data, from Unix,
max compression)
Just as you shouldn't judge a book by its cover, you shouldn't assume the contents of a file based on its name.

10. objdump
This is a more advanced tool and is not for the faint of heart. It's sort of a data-mining tool for object files. A treasure trove of information is encoded inside your object code, and this tool lets you see it. One useful thing this tool can do is dump assembly code mixed with source lines, something gcc -S doesn't do for some reason. Your object code must be compiled with debug (-g) for this to work:

objdump --demangle --source myobject.o
objdump also can help extract binary data from a core file for postmortem debug when you don't have access to a debugger. A complete example is too long for this article, but you need the virtual address from nm or obdump -t. Then, you can dump the file offsets for each virtual address with objdump -x. Finally, objdump is able to read from non-ELF file formats that gdb and other tools can't touch.
Author: John Fusco

Saturday, October 8, 2011

Performance Analysis of OpenVPN on a Consumer Grade Router

Author: Michael Hall

Abstract:

Virtual Private Networks (VPNs) offer an alternative solution using Internet Protocol (IP) tunnels to create secure, encrypted communication between geographically distant networks using a common shared medium such as the Internet. They use tunneling to establish end-to-end connectivity. OpenVPN is a cross-platform, secure, highly configurable VPN solution. Security in OpenVPN is handled by the OpenSSL cryptographic library which provides strong security over a Secure Socket Layer (SSL) using standard algorithms such as Advanced Encryption Standard (AES), Blowfish, or Triple DES (3DES). The Linksys WRT54GL router is a consumer-grade router made by Linksys, a division of Cisco Systems, capable of running under Linux. The Linux-based DD-WRT open-source router firmware can run OpenVPN on the Linksys WRT54GL router. For this case study, the performance of OpenVPN is measured and analyzed using a 2k-p fractional factorial design for 5 minus 1 factors where k = 5 and p = 1. The results show that the throughput is mainly limited by the encryption cipher used, and that the round-trip time (RTT) is mostly dependent on the transport protocol selected.
Keywords: Virtual Private Network, Performance Analysis, WRT54GL, DD-WRT, OpenVPN, OpenSSL, Experimental Design, Fractional Factorial Design

Table of Contents




1 Introduction

In the past, enterprises have used leased lines over long distances for secure communication between two networks. Typically this is done in order to communicate data, voice, or other traffic between two geographically-separated sites of a company or with a valued business partner. Leased lines provide dedicated bandwidth and a private link between the two locations. Running leased lines are not always possible or practical for all enterprises and everyday users due to cost, space, and time of installation [Joha08]. Thus, an alternative solution is needed.

Virtual Private Networks (VPNs) were created to address this problem by using the Internet to facilitate communications. Internet access is cheap; however, it is insecure and often bandwidth limited. VPNs are designed to create secure, encrypted Internet Protocol (IP) tunnels to communicate between geographically-distant networks across the Internet. This solution is cost-effective for and available to companies and individuals alike and provides secure access to resources on the remote network.

For this case study, the performance of OpenVPN running under Linux on the Linksys WRT54GL router is analyzed. The sections that follow give background information on VPNs, and describe the VPN solution and router used in this case study.

1.1 Background

Tunneling is a method by which data is transferred across a network between two endpoints. VPNs use tunnels to establish end-to-end connectivity. A packet or frame destined to a remote network is first encapsulated by adding additional header information and is then sent across the network to the remote endpoint. At the endpoint, the header information is removed and the packet is sent out onto the remote network [Joha08]. This process is shown in Figure 1.

Figure 1: VPN tunnel between two endpoints across an network

Figure 1: VPN tunnel between two endpoints across an network [Joha08]

There are tradeoffs to using a VPN solution compared to dedicated lines. A VPN offers benefits such as flexibility, transparency, security, and cost. However, it has some drawbacks such as availability and bandwidth [Kolesnikov02]. A VPN connection is very flexible because a user can connect to the remote network from anyplace with an Internet connection. Transparency is achieved through tunneling which allows arbitrary traffic to traverse the VPN. For VPNs, security is provided using authentication and encryption. Authentication restricts access to the network by allowing only authorized users to connect. Encryption provides privacy by scrambling the data in the tunnel. The cost of a VPN is much less than the cost of running dedicated lines, particularly if a freely available open source VPN solution is used. VPN solutions are typically deployed to provide access over the Internet which sometimes varies in the availability and bandwidth of the connection. In this case, dedicated lines provide a clear advantage. They are both highly available and provide guaranteed bandwidth.

In general, a VPN solution should take into consideration security, key distribution, scalability, transport protocol, interoperability, and cross-platform availability [Kolesnikov02]. Security is perhaps the biggest concern because there are so many ways to implement security incorrectly. Key distribution is related to security and has to do with the procedure by which keys are distributed to clients. If keys are distributed in an insecure manner, they can be intercepted, allowing an intruder to gain access to the private network. Scalability refers to how well a VPN solution scales in terms of the number of connections and sites. The transport protocol has an effect on the overhead and performance of the VPN tunnel which will be described in a later section. Interoperability refers to devices running the same VPN solution being able to work with each other. Simple, well thought out designs tend to be the more interoperable. Last, cross-platform availability allows the VPN solution to work with multiple operating system platforms.

In the next section, the VPN solution used in this case study which takes these points into consideration is described.

1.2 The VPN

OpenVPN is a cross-platform, secure, highly configurable VPN solution [OpenVPN]. It uses virtual interfaces provided by the universal Network TUNnel/TAP (TUN/TAP) driver and is implemented entirely in user-mode in the least privileged protection ring of the system. This decision was made to provide better security. If a vulnerability is found by an intruder, their access will be limited. However, this does affect performance due to multiple memory copies between kernel and user space. OpenVPN supports peer-to-peer and multi-client server configurations which makes many VPN topologies possible: host-host, host-network, and network-network. It supports creating a Layer 3 or Layer 2 VPN using TUN/TAP devices, respectfully [Feilner06].

Security in OpenVPN is handled by the OpenSSL cryptographic library [OpenSSL] which provides strong security over Secure Socket Layer (SSL) using standard algorithms such as Advanced Encryption Standard (AES), Blowfish, or Triple DES (3DES). Certificates are used for authentication, and symmetric and asymmetric ciphers for encryption. A cipher has several characteristic parameters: key length, block size, and mode. Key length dictates the strength of the cipher. The block size dictates how much data is encrypted in a block. The mode dictates how the encryption cipher is actually used. Other important factors are key distribution and the cryptographic strength of the cipher. OpenSSL uses symmetric and asymmetric ciphers as part of the overall security. However, the security is only as strong as the weakest link. Kolesnikov and Hatch give an example. If a 40-bit symmetric key and a 4096 bit asymmetric key are used for the ciphers, likely the 40-bit key will be the weakest link, making a 4096 bit asymmetric key unnecessarily large [Kolesnikov02].

In OpenSSL, block ciphers are used for symmetric encryption and can be used in different modes. OpenVPN uses a mode called Cipher Block Chaining (CBC) which makes the cipher text of the current block dependent on the cipher text of the previous block. This prevents an attacker from seeing patterns between blocks with identical plaintext messages and manipulating one or more of these blocks [Kolesnikov02].

The philosophy for judging the security of an encryption cipher is based on the test of time. A cipher that has stood scrutiny of the security community for many years with its details published is generally considered strong. If the cipher had any major flaws, they likely would have been found. Some of the criteria for selecting a cipher are security, performance, and availability. The cipher selected should meet security needs [Kolesnikov02].

The cross-platform support in OpenVPN allows it to be deployed to other systems including embedded routers. The router used in this case study is described next.

1.3 The Router

The Linksys WRT54GL router is a consumer-grade router made by Linksys [Linksys], a division of Cisco Systems, capable of running under Linux. A Linux firmware actively being developed is DD-WRT [DD-WRT] which is based on the OpenWrt kernel [OpenWrt]. DD-WRT is released under the GNU General Public License (GPL) and provides an alternative to the stock firmware in the Linksys router. The firmware allows the router to take on many roles: Internet gateway, VPN gateway, firewall, wireless access point, dynamic Domain Name Service (DNS) client, etc. It has a friendly web interface and supports many features beyond the router's original capabilities. It supports OpenVPN through special firmware and can be extended from the console using packages. Console access is given by both Secure Shell (SSH) and Telnet. For basic and advanced configurations, tutorials are available [DD-WRT Tutorials].

OpenVPN, which is supported in the DD-WRT firmware [OpenVPN/DD-WRT wiki], can be used on the router in a variety of different ways. First, key management can be maintained on the router. This allows certificates to be generated for users from the console; however, due to a limited amount of flash memory available, this requires a modification to the Linksys router to add a SecureDigital/MultiMediaCard (SD/MMC) memory card. A tutorial on the OpenWrt wiki describes how to do this [MMC Tutorial]. Second, the VPN package can be configured in several different topologies such as host-network or network-network depending on whether users will connect to the VPN or will access the remote network using a site-to-site connection. Last, the VPN virtual interface can be bridged to the physical network interface, allowing Ethernet frames to traverse between clients and the private network.

VPNs provide a cost-effective alternative solution to leased lines and are able to create secure connections between two end-points. OpenVPN is a VPN solution which can run on an embedded router running Linux. The Linksys WRT54GL router can run Linux through a firmware upgrade to the DD-WRT firmware. In the next section, the characteristics of a VPN will be discussed as they relate to performance analysis.

2 VPN Characteristics

There are many characteristics of a VPN that affects the performance of the system. In the sections that follow, the transport protocol is discussed, a set of performance metrics are defined, and the system parameters are identified.

2.1 Transport Protocol

The transport protocol used for the VPN tunnel will have an impact on the performance of the VPN. If the Transport Control Protocol (TCP) is used, it will have an undesirable effect when TCP is used in the tunnel as well. This is called TCP stacking. TCP is a connection-oriented protocol that was not designed to be stacked. It assumes an unreliable medium and retransmits packets when a timeout occurs. TCP uses an adaptive timeout which exponentially increases to avoid an effect known as meltdown. The problem occurs when both TCP protocols timeout. This is the case when the base connection loses packets. TCP will queue a retransmission and increase the timeout, trying not to break the connection. The upper-layer protocol will queue retransmissions faster than the lower layer due to having a smaller timeout value. This causes the meltdown effect that TCP was originally trying to prevent. The User Datagram Protocol (UDP), a datagram carrier having the same characteristics as IP, should be used as the lower layer protocol [Titz01].

2.2 Performance Metrics

Network performance is measured using a set of performance criteria or metrics. For OpenVPN, the service provided is access to the private network. The response time, throughput, and utilization are used to characterize the performance of the VPN. In the case that errors occur, the probability of errors and time between errors should be measured [Jain91]. A list of selected performance metrics are below:

  1. Overhead
  2. Round-trip time
  3. Jitter
  4. TCP throughput
  5. Router CPU utilization
  6. Client CPU utilization
  7. Probability of error
  8. Time between errors
  9. Link utilization

Every VPN packet incurs overhead from the encapsulation process. When a payload is sent through the VPN tunnel, headers/trailers of various protocols are added to the payload to form a routable packet. Additional overhead comes from encryption ciphers used to secure the tunnel. The effects of overhead can be alleviated by using compression to reduce the amount of data transmitted [Khanvilkar04].

The overhead in OpenVPN is a function of the interface, transport protocol, cryptographic algorithm, and compression. The fixed overhead added to each packet is 14 bytes from the frame header and 20 bytes from the IP header. The transport protocol, used to form the VPN tunnel, contributes 8 (32) bytes from the UDP (TCP) header. The cryptographic algorithm used to secure the tunnel will contribute to the overhead depending on the algorithm. Part of the overhead includes the hash from the medium access control (MAC) algorithm, such as MD5 (128-bits) or SHA-1 (160-bits), and zero padding for block encryption ciphers. Compression of uncompressible data adds at most one byte of overhead. Other minor contributions come from sequence numbers and timestamps that are included to defeat reply attacks [Khanvilkar04].

The round-trip time (RTT) is the time it takes for a packet to reach a remote host and return back and is related to the latency of the connection. Latency through a VPN tunnel is dependent on the machine hardware, the link speed, and the encapsulation time. Higher latencies in OpenVPN are caused by multiple copies between kernel and user space, and the compute-intensive operations of encryption and compression. Latency can be improved, generally, by using faster hardware and better algorithms [Khanvilkar04]. Jitter is the variation in the latency of packets received by a remote host. For applications with streaming connections, jitter can be alleviated by buffering the stream. However, this adds delay in the connection which is intolerable for some applications such as Voice over Internet Protocol (VoIP). Low latency and low jitter are better for these metrics.

Throughput is a measure of the amount of payload data that can be transmitted end-to-end through the VPN tunnel. It does not include the overhead incurred by protocol headers / trailers, and the VPN tunnel. Similar to the latency, the throughput is limited by the machine hardware and encapsulation time; although it can be improved by using faster hardware and better algorithms. Throughput is a critical performance metric which will limit the number of users whom the VPN can support. Thus, higher throughput is better.

The performance of a VPN solution is often limited by the CPU on one or both of the endpoints which must encapsulate, encode, transmit, receive, and decode packets. Monitoring the CPU utilization of each device allows us to identify the bottleneck in the network communication. For this metric, utilization in the middle of the range is better.

The link utilization is the ratio of the physical network interface throughput to the link speed. In this case, the throughput is the total throughput of all packets transmitted including overhead. This metric is not directly useful, however, it can be used indirectly through a calculation to gauge the efficiency of the packets transmitted through the VPN tunnel. Higher link utilization is better only when the throughput is also higher.

Errors can sometimes occur in network communication causing packets to be lost, corrupted, duplicated, or out of order. When an error occurs, it is important to know the probability of it happening again, and the time between errors. A related metric is packet loss which gives the percentage of packets that were lost or corrupted. No errors are ideal, but low error rate is acceptable.

2.3 System Parameters

The performance in OpenVPN is affected by many parameters ranging from the hardware to the configuration. A list of these parameters is below.

  1. Network topology
  2. Memory
  3. Speed of the router CPU
  4. Speed of the network
  5. VPN topology
  6. Interface
  7. Transport Protocol
  8. Encryption cipher
  9. Encryption key size
  10. Compression algorithm

The network topology will affect system performance. For example, a topology in which the client is located far from the OpenVPN server will have to contend with network traffic unlike a client that is directly connected. Hardware factors such as memory, speed of the router CPU, and speed of the network can all affect system performance. At least one of these three hardware factors will be a bottleneck in the system. The VPN topology, such as host-host, host-network, and network-network, will affect system performance. The choice of the interface, transport protocol, encryption cipher, key size, and compression algorithm will all affect system performance. This is due to additional overhead and compute-intensive operations.

The transport protocol, if it is TCP, for the VPN tunnel will have an impact on the network performance. UDP is a protocol which has the same characteristics as IP which will not suffer from meltdown caused by retransmissions. There are many metrics that are used to measure network performance. While not all of them are significant, each one should nonetheless be included in the performance analysis. There are many system parameters that affect the performance of the VPN. Only a subset of these parameters can be changed. For the performance analysis done in the next section, a set of factors were chosen and varied in a 2k-p fractional factorial design.

3 Performance Analysis

There are three evaluation techniques that can be used for performance analysis: analytical modeling, simulation, and measurement. For assessing the performance of OpenVPN on the Linksys WRT54GL router, measurement was chosen. The performance can be measured and analyzed through a series of experiments using the experimental design method described by Jain called 2k-p fractional factorial design [Jain91]. This method is intended to determine the effects and percent variation of the effects for each factor and their interactions. Confidence intervals can also be calculated to determine the significance of each effect. In the sections that follow, the measurement tools used and the experimental setup are described. The results of the study are then presented.

3.1 Measurement Tools

Assessing the performance of OpenVPN requires the use of several measurement tools for generating, measuring, and monitoring network traffic. The tools used in this case study are wireshark, iperf, ping, and sar. Of these four tools, wireshark, iperf, and ping are available for both Windows and Linux. Although Windows has a ping tool, it is limited and not sufficient for the latency and packet loss tests in this case study. The last tool, sar, is available only on Linux. A description of each tool is given below.

  • Wireshark, formally Ethereal, is a network protocol analyzer with a rich feature set for capturing and analyzing network traffic. It has deep inspection and filtering capabilities of hundreds of protocols, making it a valuable tool for monitoring network traffic [Wireshark]. In this case study, it was used to monitor VPN-encapsulated packets and normal packets.

  • Iperf is a network testing tool for creating and measuring TCP and UDP streams. It has options for controlling several network parameters including maximum segment size (MSS), buffer length, TCP window size, and TCP no delay (for disabling Nagle's Algorithm). The TCP test will generate traffic at full speed and measure the bandwidth between two endpoints. The UDP test will generate traffic at a given bandwidth and measure the jitter (variation in the latency) and packet loss between two endpoints [Iperf].

  • Ping is a network testing tool for measuring latency and packet loss between two endpoints using the Internet Control Message Protocol (ICMP). A large number of packets can be transmitted using a flood ping. A flood ping works by transmitting one packet at a time and waiting for a reply or timeout. If a timeout occurs, the packet is counted as lost.

  • Sar is a system activity collection and reporting tool found in the sysstat utilities package [SYSSTAT]. It is able to collect and report information on CPU and network interface activity over a period of time. This information can be collected in parallel with the TCP bandwidth, UDP jitter, and latency tests.

3.2 Experimental Setup

The goal of this case study is to evaluate the performance of OpenVPN on a consumer grade router running the DD-WRT firmware. The router is a Linksys WRT54GL v1.1 with 16 MB RAM, 4 MB flash memory, and a 200 MHz processor.

Figure 2: System definition for the study of OpenVPN on a consumer grade router

Figure 2: System definition for the study of OpenVPN on a consumer grade router

The system definition, shown in Figure 2, consists of two systems connected to a router in the middle. The first system is the OpenVPN client which needs to establish a VPN tunnel to access the internal private network. It is the first test endpoint for the performance tests. The router is the OpenVPN server which is the system under test (SUT) [Jain91]. The second system is a computer on the private network which is the second test endpoint. The specifications of these two test systems are shown in Table 1.

SystemDescription
Test endpoint 1
(test client)
Cento OS 5.2 Linux,
AMD Athlon XP-M 2600+,
1.83 GHz, 2.0 GB of RAM,
ASUS A7V880 motherboard,
VIA KT880 chipset
Test endpoint 2
(test server)
Windows XP Professional, Service Pack 3,
AMD Athlon XP 2000+ (Thoroughbred),
1.67 GHz, 768 MB of RAM,
ASUS A7V8X-X motherboard,
VIA KT400 chipset
Table 1: System specifications for test endpoints

To facilitate the testing process, a Python script, on test endpoint 1, is used to automatically run each test and collect data results which are saved in a log file. There are three tests that are run: TCP bandwidth test, UDP jitter test, and latency test. The TCP bandwidth test uses iperf to generate traffic from a workload, and to measure the bandwidth over multiple time intervals. Simultaneously, CPU and network activity are measured using sar over the same time intervals. The UDP jitter test also uses iperf to generate traffic from a workload, and measures the jitter and packet loss across the VPN tunnel. As with the TCP test, sar is used to measure CPU and network activity. The latency test uses ping to flood small packets to the remote host and to measure the return-trip time (RTT) and packet loss across the VPN tunnel. The RTT is related to the latency of the connection, and the packet loss can be used to get the probability of error.

Many of the system parameters were fixed to a single value for a reduction in the number of total experiments needed. Some parameters are determined by the hardware which cannot be changed such as memory, router CPU speed, and network speed. The other parameter values were chosen based on recommendations given in the OpenVPN documentation. These parameters are shown in Table 2.

Fixed ParameterValue
Memory16 MB RAM
Speed of router CPU200 MHz
Speed of network100 Mbps
Encryption key size256-bit
DigestSHA1
TLS cipherDHE-RSA-AES256-SHA
Table 2: Parameters fixed in the experimental setup

In the fractional factorial design, five factors were chosen and are listed in Table 3.

FactorLevel (-1)Level (+1)
AInterfaceTAP (bridged)TUN
BProtocolUDPTCP
CCipherNoneAES-256
DCompressionNoneLZO
EWorkloadTextVideo
Table 3: Factors in fractional factorial design

OpenVPN supports creating tunnels using two devices: TUN and TAP. The primary distinction between these two is the layer at which they operate. TUN, which stands for Network TUNnel, operates at Layer 3 of the OSI model and will not transmit any Layer 2 protocols through the VPN. TAP, which stands for Network TAP, operates at Layer 2 of the OSI model. It is capable of sending Layer 2 protocols through the VPN, but needs a bridge between the virtual network interface controller (NIC) and the physical NIC. If bridging mode is not used, then additional routing table entries are needed to route packets between the client and remote network [OpenVPN HOWTO].

Two transport protocols supported by OpenVPN are UDP and TCP. The performance of the VPN depends on the protocol used. UDP is a datagram packet which has less overhead and shares the same characteristics of IP. TCP, however, is a connection-based protocol which assumes an unreliable medium. Consequently, it has more overhead, and will encounter adverse effects from packet loss as described in Section 2.1.

Many ciphers are available in OpenVPN which uses the OpenSSL cryptographic library. For this factor, a comparison is being done between no encryption, and encryption using the Advanced Encryption Standard (AES) algorithm. This algorithm is recommended by the National Institute of Standards and Technology (NIST) federal agency for secure communications [NIST].

In low-throughput links, compression is a way to increase the overall throughput. The throughput achieved, however, is dependent on the workload. For this study, the performance is tested both with and without compression and for different workloads. Two workloads were chosen: text, and video. The text workload consists of highly compressible RFC documents and the video workload consists of uncompressible MPEG video.

The 25-1 fractional factorial design for measuring the effects of each factor is shown in Table 4. The design is the equivalent of a 4 factor design, except that the ABCD interaction is replaced with factor E, the workload. The confounding effects for this design are shown in Table 5.

IABCDABACADBCBDCDABCABDACDBCDE
1-1-1-1-1111111-1-1-1-11
1-1-1-1111-11-1-1-1111-1
1-1-11-11-11-11-11-111-1
1-1-1111-1-1-1-1111-1-11
1-11-1-1-111-1-1111-11-1
1-11-11-11-1-11-11-11-11
1-111-1-1-111-1-1-111-11
1-1111-1-1-1111-1-1-11-1
11-1-1-1-1-1-1111111-1-1
11-1-11-1-111-1-11-1-111
11-11-1-11-1-11-1-11-111
11-111-111-1-11-1-11-1-1
111-1-11-1-1-1-11-1-1111
111-111-11-11-1-11-1-1-1
1111-111-11-1-11-1-1-1-1
1111111111111111
Table 4: 25-1 fractional factorial design; A = Interface, B = Protocol, C = Cipher, D = Compression, E = Workload

IABCDABACAD
ABCDEBCDEACDEABDEABCECDEBDEBCE
BCBDCDABCABDACDBCDE
ADEACEABEDECEBEAEABCD
Table 5: Confounding effects of 25-1 fractional factorial design

3.3 Experimental Results

The results of the 25-1 fractional factorial design are presented in the section below. There are a few general observations that I made while running these tests. First, the CPU utilization of the router was always at 100% during these tests. This indicates that the router was consistently the bottleneck. The CPU utilization of the router was not reported because there was no mechanism for obtaining this information automatically from the router. Second, the CPU utilization of the client machine was always low, indicating that it was never the bottleneck.

The overhead of the VPN tunnel can be estimated empirically using the network activity information gathered by sar during the TCP test. The overhead is simply the difference in the throughputs of the physical and virtual network interfaces divided by the number of packets per second transmitted. The equation is shown as follows:

Equation 1: Emperical estimate of overhead equation

For each experiment in the TCP test, traffic was generated from the client to the server and multiple metrics were measured: bandwidth, link utilization, and CPU utilization. The results shown in Table 6 are the mean value of 5 replications. These results show that the link was highly underutilized and that the client CPU was not a bottleneck in these experiments.

The overhead was estimated only for experiments that did not involve compression since compression reduces the packet size in the physical interface. The results show the smallest overhead of 51 bytes for the UDP protocol without encryption. This number is an estimate of the overhead due to the encapsulation of packets in the VPN tunnel. It does not show the overhead due to headers in the payload itself. For Layer 3 VPNs, which use the TUN interface, the payload does not contain any Layer 2 header information which reduces the overhead by 14 bytes.

The largest bandwidth of 8.87 Mbps is measured for the TAP interface with bridging using the UDP transport protocol for the tunnel and no encryption. The bandwidth is cut to less than half to 3.70 Mbps when encryption using AES 256-bit is enabled.

TCP Tests
InterfaceProtocolCipherCompWorkloadOverhead (Bytes)BW (Mbps)Link %Client CPU %
TAP (br)UDPNoneNoneVideo51.08.871.20%4.12%
TAP (br)UDPNoneLZOText 8.020.70%5.92%
TAP (br)UDPAES256NoneText100.03.640.51%3.48%
TAP (br)UDPAES256LZOVideo 3.700.52%3.58%
TAP (br)TCPNoneNoneText76.56.170.85%3.76%
TAP (br)TCPNoneLZOVideo 6.270.87%3.90%
TAP (br)TCPAES256NoneVideo127.53.260.47%3.54%
TAP (br)TCPAES256LZOText 3.830.36%4.42%
TUNUDPNoneNoneText51.07.380.99%3.30%
TUNUDPNoneLZOVideo 7.320.98%3.56%
TUNUDPAES256NoneVideo98.03.350.46%3.16%
TUNUDPAES256LZOText 3.660.33%4.16%
TUNTCPNoneNoneVideo76.55.670.77%3.28%
TUNTCPNoneLZOText 6.110.53%4.70%
TUNTCPAES256NoneText125.03.050.43%3.30%
TUNTCPAES256LZOVideo 3.000.42%3.30%
Table 6: Client-to-server TCP test results for 25-1 fractional factorial design; the values shown are the mean of 5 replicatons

The UDP and latency test results are shown in Table 7. For both tests, the packet loss percentage is 0% indicating that no errors occurred in the VPN tunnel. The jitter measurement was done using large packets equal to the MTU size with a fixed UDP bandwidth of 1 Mbps. Although not shown, the jitter measurements were found to be sensitive to the UDP bandwidth and payload length, but not to the factors under test.

UDP TestsLatency Tests
InterfaceProtocolCipherCompWorkloadJitter (ms)Loss %RTT (ms)Loss %CPU %
TAP (br)UDPNoneNoneVideo6.20.00%1.30.00%7.02%
TAP (br)UDPNoneLZOText6.20.00%1.30.00%6.62%
TAP (br)UDPAES256NoneText6.30.00%2.30.00%5.22%
TAP (br)UDPAES256LZOVideo6.30.00%2.20.00%5.34%
TAP (br)TCPNoneNoneText6.50.00%15.10.00%0.40%
TAP (br)TCPNoneLZOVideo6.10.00%12.70.00%0.76%
TAP (br)TCPAES256NoneVideo6.30.00%7.90.00%2.34%
TAP (br)TCPAES256LZOText6.60.00%14.90.00%0.52%
TUNUDPNoneNoneText6.20.00%1.90.00%5.14%
TUNUDPNoneLZOVideo6.40.00%1.90.00%5.04%
TUNUDPAES256NoneVideo6.30.00%2.80.00%4.10%
TUNUDPAES256LZOText6.20.00%2.80.00%4.20%
TUNTCPNoneNoneVideo6.20.00%8.40.00%1.50%
TUNTCPNoneLZOText7.00.00%15.10.00%0.50%
TUNTCPAES256NoneText6.30.00%11.20.00%1.84%
TUNTCPAES256LZOVideo6.30.00%10.50.00%0.82%
Table 7: UDP and latency test results for 25-1 fractional factorial design; the values shown are the mean of 5 replications

Using the analysis technique described for a 2k-p fractional factorial design, the effects and the percent variation were calculated [Jain91]. The percent variation of the effects is shown in Figure 3. 84% of the bandwidth is explained by encryption cipher (C). Another 8% is explained by the transport protocol (B). There is a small 4% interaction (BC) between the encryption cipher and the transport protocol of the bandwidth.

The jitter is not explained very well by the model and is relatively independent of the changes in the factors. The percent variation in the round-trip time (RTT) is 66% for the transport protocol. Another 3% is explained by the interaction (BE) between the transport protocol and the workload which is confounded with interaction ACD.

Figure 3: Variation of effects for 2<sup>5-1</sup> fractional factorial design
Figure 3: Variation of effects for 25-1 fractional factorial design

3.4 Future Work

The tests performed in the 2k-p fractional factorial design should be extended to include server-to-client tests, and simultaneous bidirectional traffic tests. Although it is possible to get this information, it is not easy to do this in an automatic way, and will require extension of the test script. The design should also be extended to include other factors that will potentially affect the performance of the VPN such as payload length, encryption key size, encryption digest, TLS digest, etc. Using the results from the 2k-p fractional factorial design, a one or two factor design should be done to analyze the effects of the most sensitive factors such as the transport protocol and encryption. The test bed can also be expanded for testing a site-to-site VPN topology involving two routers. Last, the effects of communication with multiple clients should also be tested.

In addition to these tests, the results should be verified using another evaluation technique such as analytical modeling. Some of the contributions of the overhead were modeled in this paper, but this work needs to be expanded to explain changes in performance as well. This will allow us to understand how each of these factors affects the performance and why.

4 Summary

The Linksys WRT54GL router is an inexpensive router that can easily be upgraded for extended functionality using an open-source Linux firmware called DD-WRT. This allows a VPN package such as OpenVPN to be set up to allow remote access to the internal network. OpenVPN is a flexible, cross-platform solution that is highly configurable and fairly easy to set up using available tutorials.

The performance of OpenVPN depends on the router hardware, and the configuration parameters. The throughput was found to be limited by the router CPU, and is not sufficient for fast connections such as 10/100 Mbps LANs. It is sufficient for slower connections such as most Internet connections. Measurements were presented for traffic generated from client to server. The encryption cipher was found to significantly reduce total throughput. For a configuration using the TAP interface with bridging, UDP transport protocol, AES256 cipher, and no compression, the throughput was 3.64 Mbps. 96% of the variation in the throughput was explained by the transport protocol, encryption cipher, and the interaction between the two; the encryption cipher explained the majority (84%) of the variation. The jitter in the latency was found to be relatively insensitive to the factors tested at around 6.3 ms. The round-trip time (RTT) was significantly larger for the TCP transport protocol explaining 66% of the variation. The next significant factor was the workload (3%), followed by the interaction between the workload and the encryption cipher (3%). This interaction is confounded with the interaction between the interface, cipher, and compression factors. For the same configuration above, the average RTT was 2.3 ms.

Although the encryption cipher accounted for the majority of the variation in the throughput, it is an important feature in VPNs. Future work is to investigate effects of different encryption algorithms with varying key sizes on the throughput that are still considered strong. One criterion for choosing an encryption algorithm is whether or not it is acceptable for use in ecommerce.

In conclusion, the Linksys WRT54GL router provides a cost-effective solution for setting up an OpenVPN server for remote access over the Internet. This solution is throughput-limited, but should be sufficient for most Internet connections. It is an appropriate solution for most home users and small businesses depending on their needs.


References

  1. [DD-WRT] "DD-WRT :: News"; http://www.dd-wrt.com.
  2. [DD-WRT Tutorials] "Tutorials - DD-WRT Wiki"; http://www.dd-wrt.com/wiki/index.php/Tutorials.
  3. [Feilner06] M. Feilner, OpenVPN: Building and Integrating Virtual Private Networks, Packt Publishing, 2006.
  4. [Jain91] R.K. Jain, The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling, Wiley, 1991.
  5. [Joha08] A. Joha, F. Ben Shatwan, and M. Ashibani, "Remote Access VPNs Performance Comparison between Windows Server 2003 and Fedora Core 6," Home Networking, 2008, pp. 329-343; http://dx.doi.org/10.1007/978-0-387-77216-5_24.
  6. [Iperf] "Iperf - Wikipedia, the free encyclopedia"; http://en.wikipedia.org/wiki/Iperf.
  7. [Khanvilkar04] S. Khanvilkar and A. Khokhar, "Virtual private networks: an overview with performance evaluation," Communications Magazine, IEEE, vol. 42, 2004, pp. 146-154.
  8. [Kolesnikov02] O. Kolesnikov and B. Hatch, Building Linux Virtual Private Networks (VPNs), Sams, 2002.
  9. [Linksys] "Linksys - A Division of Cisco"; http://www.linksys.com.
  10. [McGregor00] J. McGregor and R. Lee, "Performance impact of data compression on virtual private network transactions," Local Computer Networks, 2000. LCN 2000. Proceedings. 25th Annual IEEE Conference on, 2000, pp. 500-510.
  11. [MMC Tutorial] "OpenWrtDocs/Customizing/Hardware/MMC - OpenWrt"; http://wiki.openwrt.org/OpenWrtDocs/Customizing/Hardware/MMC.
  12. [NIST] "NIST.gov - Computer Security Division - Computer Security Resource Center"; http://csrc.nist.gov/groups/ST/toolkit/index.html.
  13. [OpenSSL] "OpenSSL: The Open Source toolkit for SSL/TLS"; http://www.openssl.org.
  14. [OpenWrt] "OpenWrt"; http://openwrt.org.
  15. [OpenVPN] "Welcome to OpenVPN"; http://openvpn.net.
  16. [OpenVPN/DD-WRT wiki] "OpenVPN - DD-WRT Wiki"; http://www.dd-wrt.com/wiki/index.php/OpenVPN.
  17. [OpenVPN HOWTO] "OpenVPN 2.0 HOWTO," OpenVPN; http://openvpn.net/index.php/documentation/howto.html.
  18. [SYSSTAT] "SYSSTAT"; http://pagesperso-orange.fr/sebastien.godard.
  19. [Titz01] O. Titz, Why TCP Over TCP Is A Bad Idea, April, 2001; http://sites.inka.de/sites/bigred/devel/tcp-tcp.html.
  20. [Tcpdump] JWS. "TCPDUMP Public Repository". Official Product Website. http://www.tcpdump.org. Website for Packet Capture / Sniffing tool.
  21. [Wireshark] CACE Technologies. "Wireshark: The World's Most Popular Network Protocol Analyzer". Official Product Website. http://www.wireshark.org. Website for Packet Capture / Sniffing tool.

List of Acronyms

AcronymDefinition
3DESTriple DES
AESAdvanced Encryption Standard
CBCCipher Block Chaining
CPUCentral Processing Unit
DESData Encryption Standard
DNSDomain Name System
GPLGeneral Public License
ICMPInternet Control Message Protocol
IPInternet Protocol
LZOLempel-Ziv-Oberhumer
MACMedium Access Control
MMCMultiMediaCard
MPEGMoving Picture Experts Group
NICNetwork Interface Controller
NISTNational Institute of Standards and Technology
OSIOpen Systems Interconnection
RAMRandom Access Memory
RFCRequest for Comments
RTTRound-Trip Time
SDSecureDigital
SSLSecure Socket Layer
SUTSystem Under Test
TAPNetwork TAP
TCPTransport Control Protocol
TLSTransport Layer Security
TUNNetwork TUNnel
UDPUser Datagram Protocol
VoIPVoice over Internet Protocol
VPNVirtual Private Network

Monday, September 12, 2011

Memory Management in vSphere

VMware vSphere is definitely the most efficient Hypervisor in managing the Hardware resources of the host like CPU,Memory,Storage and Network . Understanding how these resources are efficiently managed by the Hypervisor would help Virtualization Administrators efficiently plan and manage their environment . This post is to give a basic idea on various components involved and the memory management techniques employed by VMware vSphere 4.1 . This blog is based on the Memory Resource Mangement in vSphere 4.1 white paper by VMware . Please note that each Virtual machine’s target memory allocation is dynamic and is dependent on current load on the host and memory allocated to the Virtual machine along with its associated parameters like Shares , Reservation and Limits .

Below are certain terminologies commonly used in VMware ESX whose understanding would help administrators understand these techniques better

  • Host Physical memory - Amount of memory available on the host that is visible to the hypervisor.
  • Guest Physical memory -Amount of memory that is visible to a guest OS running inside a VM.
  • Consumed memory - This is the amount of host memory that has been allocated to a Virtual machine .
  • Active memory - This is the amount of guest memory that is actively used by the applications and guest OS .
  • Guest level paging -Transfer of Memory between the guest physical memory and guest swap . This is driven by the Guest OS.
  • Hypervisor Swapping – Transfer of memory between guest physical memory and host swap . This is driven by the hypervisor.
  • Guest virtual memory – Continuous virtual address space provided by the guest OS to its applications.
  • Shares – Entitles a VM to a fraction of available host physical memory, based on a proportional-share allocation policy.
  • Reservation – Guaranteed amount of minimum host physical memory the host reserves for a virtual machine .
  • Limit – Maximum amount of host physical memory allocated for a virtual machine . Default is unlimited .

Memory Management and Reclamation techniques :

Ever wondered how ESX is able to over commit memory to the Virtual machines . By over commit I mean that the total sum of memory allocated to Virtual machines running on a host is greater than the available physical memory on the host . There are various memory reclamation techniques employed by ESX to achieve this namely ,

  • TPS – Transparent Page sharing helps reclaim memory by removing redundant pages on the memory with identical content .
  • Ballooning – Memory is reclaimed by artificially increasing the memory pressure inside a VM .
  • Swapping : Hypervisor swapping help reclaim memory by swapping out guest memory onto swap created for every VM by hypervisor .
  • Memory Compression : Reclaims memory by compressing pages that needs to be swapped out into Compression cache created for every VM on the host physical memory .

Transparent Page sharing :

When multiple Virtual machines running in a host are running same guest OS or at times same guest OS and Application , we would see a lot of identical content on the memory . Instead of storing redundant content it is enough if we store a single identical copy on the host physical memory and allow multiple VM’s to access this identical copy . This is the basic mechanism behind please note that this operation is invisible to the Virtual Machine and is performed at the hypervisor level and therefore there is no chance of leakage of sensitive information between Virtual machines .

Redundant pages are identified by their content and hashing algorithm is used to identify the redundant content from the guest physical memory . ESX scans the guest physical memory content at a specified rate for sharing opportunities . TPS is always ” ON ” and has the least overhead .

Ballooning :

Remember I had mentioned earlier in my post that a VM’s target memory allocation is dynamic and ESX allocates memory on demand to the VM based on the various parameters like shares , reservation , limits and host memory . Let’s take for example a VM has consumed 2 GB of RAM during its peak operations but is currently running with 1 GB active memory only . Say if there is a memory crunch on the host running this VM , Although the VM is currently using 1 GB of RAM only it will not release the additional 1 GB that it has freed since the VM is isolated and is not aware of host memory shortage on the hypervisor . Ballooning makes the Guest OS aware of the low memory status of the host and help reclaim memory .

Ballooning is achieved using Balloon drivers which are loaded into Guest OS running inside Virtual machines using VMware Tools . These drivers load as a pseudo-device driver and communicates with the hypervisor using a private channel . When the host in need of physical memory , the host will set a target balloon size for the balloon driver and in turn the balloon driver will inflate inside the Guest OS by allocating guest physical pages within the VM . What will happen is the Guest OS will page out the contents from the host physical memory . Once this is completed the balloon driver notifies the hypervisor which can now safely reclaim the freed memory . Please note that If the virtual machine has plenty of free guest physical memory ( as described in the example above ) , inflating the balloon will induce no paging and will not impact guest performance. Guest OS has to make the decision on what pages needs to be paged out and hypervisor has no call on this .

Ballooning is employed when these is memory pressure on the host . Ballooning will offload host memory pressure onto the VM as VM will now use less physical memory on the host and more physical memory inside the guest . Ballooning is employed as the first technique to reclaim memory whenever the host is under memory pressure . Please note that TPS is always enabled .

Swapping :

A Swap file is created for every VM on startup and is deleted when the VM is powered off . The size of the swap file is directly proportional to the amount of RAM that has been allocated to the VM . Swapping is invoked when memory reclaim from TPS and Ballooning is not sufficient enough to reduce host memory pressure . When swapping is invoked hypervisor will swap out guest physical memory to the swap file that was created on the VM startup . This will free up host physical memory and reduce the memory pressure on the host . This is a guaranteed technique to reclaim a specific amount of physical memory in the quickest time possible among the three techniques but then considering the overhead involved this is the last resort that ESX will use to reclaim memory . Please note that the swap file that has been created by the hypervisor for each and every VM is on the disk and Hypervisor swapping and further access of these pages by the VM will make the VM responsiveness slow .

Memory Compression :

Remember when hypervisor swapping is invoked pages are swapped out from guest physical memory to swap file created on the disk by hypervisor which will have latency . Memory Compression tries to compress the pages that needs to be swapped in a compression cache that is allocated per VM on the host physical memory itself . When these pages needs to be accessed b y the VM they will be uncompressed and presented which will be faster than a disk access in case of swapping . Please note that ESX will not proactively compress guest pages when host swapping is not required .

The per-VM compression cache is accounted for by the VM’s guest memory usage, which means ESX will not allocate additional host physical memory to store the compressed pages. If the compression cache is full, one compressed page must be replaced in order to make room for a new compressed page. An age-based replacement policy is used to choose the target page. The target page will be decompressed and swapped out. ESX will not swap out compressed pages.In ESX 4.1, the default maximum compression cache size is conservatively set to 10% of configured VM memory size .

Memory States in ESX Server

There are four memory states that are maintained by the ESX Server

  • High – 6% Threshold of Total Host Memory
  • Soft – 4% Threshold of Total Host Memory
  • Hard – 2% Threshold of Total Host Memory
  • Low – 1% Threshold of Total Host Memory

Why are these States important ? What does these States denote ?

As mentioned earlier , TPS is enabled by default in the ESX Server . Host Memory free states in ESX Server determines when to invoke Ballooning and Swapping ( activates memory compression as well )

When the memory state is high , it indicates that the total amount of active memory used by the all VMs in a host is less than total host memory .This means that the host will not invoke ballooning or swapping as long as the state is high . This is valid only if limits are not set for the VM’s .

If host free memory drops towards the soft threshold, the hypervisor starts to reclaim memory using ballooning. Ballooning happens before free memory actually reaches the soft threshold because it takes time for the balloon driver to allocate and pin guest physical memory. Usually, the balloon driver is able to reclaim memory in a timely fashion so that the host free memory stays above the soft threshold.

If ballooning is not sufficient to reclaim memory or the host free memory drops towards the hard threshold, the hypervisor starts to use swapping in addition to using ballooning. During swapping, memory compression is activated as well. With host swapping and memory compression, the hypervisor should be able to quickly reclaim memory and bring the host memory state back to the soft state.

Note : In certain scenarios, host memory reclamation happens regardless of the current host free memory state. For example, even if host free memory is in the high state, memory reclamation is still mandatory when a virtual machine’s memory usage exceeds its specified memory limit. If this happens, the hypervisor will employ ballooning and, if necessary, swapping and memory compression to reclaim memory from the virtual machine until the virtual machine’s host memory usage falls back to its specified limit.

What are the benefits of these memory management techniques ?

  • Better Utilization of available Memory
  • Higher Consolidation Ratio
Wrote by Sudharsan

Monday, May 16, 2011

Bacula - Open Source network backup and restore solution


1 . INTRODUCTION
Bacula is a set of Open Source, enterprise ready, computer programs that permit you (or the system administrator) to manage backup, recovery, and verification of computer data across a network of computers of different kinds. Bacula is relatively easy to use and efficient, while offering many advanced storage management features that make it easy to find and recover
lost or damaged files. In technical terms, it is an Open Source, enterprise ready, network based backup program.
2 . BACULA ARCHITECUTURE

BACULA DIRECTOR
The Bacula Director service is the program that supervises all the backup, restore, verify and archive operations. The system administrator uses the Bacula Director to schedule backups and to recover files. For more details see the Director Services Daemon Design Document in the Bacula Developer's Guide. The Director runs as a daemon (or service) in the background.

BACULA CONSOLE
The Bacula Console service is the program that allows the administrator or user to communicate with the Bacula Director Currently, the Bacula Console is available in three versions: text-based console interface, QT-based interface, and a wxWidgets graphical interface. The first and simplest is to run the Console program in a shell window (i.e. TTY interface). Most system
administrators will find this completely adequate. The second version is a GNOME GUI interface that is far from complete, but quite functional as it has most the capabilities of the shell Console. The third version is a wxWidgets GUI with an interactive file restore. It also has most of the capabilities of the shell console, allows command completion with tabulation, and gives you instant help about the command you are typing. For more details see the Bacula Console Design Document_ConsoleChapter.

BACULA FILE
The Bacula File service (also known as the Client program) i
s the software program that is installed on the machine to be backed up. It is specific to the operating system on which it runs and is responsible for providing the file attributes and data when requested by the Director. The File services are also responsible for the file system dependent part of restoring the file attributes and data during a recovery operation. For more details see the File Services Daemon Design Document in the Bacula Developer's Guide. This program runs as a daemon on the machine to be backed up. In addition to Unix/Linux File daemons, there is a Windows File daemon (normally distributed in binary format). The Windows File daemon runs on current Windows versions (NT, 2000, XP, 2003, and possibly Me and 98).

BACULA STORAGE
The Bacula Storage services consist of the software programs that perform the storage and recovery of the file attributes and data to the physical backup media or volumes. In other words, the Storage daemon is responsible for reading and writing your tapes (or other storage media, e.g. files). For more details see the Storage Services Daemon Design Document in the Bacula Developer's Guide. The Storage services runs as a daemon on the machine that has the backup device (usually a tape drive).

CATALOG
The Catalog services are comprised of the software programs respo
nsible for maintaining the file indexes and volume databases for all files backed up. The Catalog services permit the system administrator or user to quickly locate and restore any desired file. The Catalog services sets Bacula apart from simple backup programs like tar and bru, because the catalog maintains a record of all Volumes used, all Jobs run, and all Files saved, permitting efficient restoration and Volume management. Bacula currently supports three different databases, MySQL, PostgreSQL, and SQLite, one of which must be chosen when building Bacula.
The three SQL databases currently supported (MySQL, PostgreSQL or SQLite) provide quite a number of features, including rapid indexing, arbitrary queries, and security.

Although the Bacula project plans to support other major SQL databases, the current Bacula implementation interfaces only to MySQL, PostgreSQL and SQLite. For the technical and porting details see the Catalog Services Design Document in the developer's documented.

The packages for MySQL and PostgreSQL are available for several operating systems. Alternatively, installing from the source is quite easy, see the Installing and Configuring MySQLMySqlChapter chapter of this document for the details. For more information on MySQL, please see: www.mysql.comhttp://www.mysql.com. Or see the Installing an

d Configuring PostgreSQLPostgreSqlChapter chapter of this document for the details. For more information on PostgreSQL, please see: www.postgresql.orghttp://www.postgresql.org.

Configuring and building SQLite is even easier. For the details of configuring SQLite, please see the Installing and Configuring SQLiteSqlLiteChapter chapter of this document.

BACULA MONITOR
A Bacula Monitor service is the program that allows the administrator or user to watch current status of Bacula Directors, Bacula File Daemons and Bacula Storage Daemons. Currently, only a GTK+ version is available, which works with GNOME, KDE, or any windo
w manager that supports the FreeDesktop.org system tray standard.
To perform a successful save or restore, the following four daemons must be configured and running: the Director daemon, the File daemon, the Storage daemon, and the Catalog service (MySQL, PostgreSQL or SQLite).