Sunday, May 24, 2009

Creating Filesystems Using TimeStorm LDS

Introduction

One of the most significant aspects of the power and flexibility of Linux is the fact that it is a true multi-tasking operating system, not merely a complex program loader linked with a single application. This enables embedded Linux applications to display greater flexibility and responsiveness to a variety of conditions than the simpler execution environments provided by traditional, proprietary embedded operating systems.

This white paper begins with overviews of the filesystems used during the boot process on most embedded Linux systems, the Linux system startup process, and the relationships between the two. Subsequent sections discuss system requirements for successful system startup and program execution, focusing on how powerful software such as TimeSys Linux Development Suite (LDS) delivers Linux expertise in an easy-to-use, graphical environment that can reduce the time and effort required to create and manage a deployable, right-sized embedded Linux system containing customized applications.

Filesystems and the Linux Startup Process

The core of the Linux operating system is known as the kernel. When an embedded Linux system boots, the kernel is loaded into memory from a device that an embedded system's boot monitor can access, and then executed. The kernel automatically probes, identifies, and initializes as much of your system's hardware as possible, and then looks for an initial filesystem that it can access and load and run applications from in order to continue the boot process. The first filesystem mounted by Linux systems during the boot process is known as a root filesystem because it is automatically mounted at the Linux directory '/', which is the base of the hierarchical Linux filesystem. Once mounted, the root filesystem provides the Linux system with a basic directory structure that it can use to map devices to Linux device nodes, access those devices, and locate, load, and execute subsequent code such as system code or your custom applications.

Linux supports a wider range of filesystems than any other operating system. These are easily separated into three general classes in the context of the Linux boot process. These classes include:
  1. Filesystems that are located on local devices which are directly connected to your embedded hardware

  2. Network filesystems using standard protocols such as NFS

  3. A special in-memory filesystem known as an initial RAM disk
The types of root filesystems that your embedded Linux system supports during the boot process depend on the types of filesystems that are supported by the kernel that you are booting.

Initial RAM Disks and Linux

Initial RAM disks are actually compressed images of other types of Linux filesystems. An initial RAM disk provides a filesystem from which the kernel can execute programs and load kernel modules as part of the boot process. The former enables you to perform administrative tasks such as device initialization, checking the consistency of any local storage devices, and so on before actually using or mounting those devices. The latter enables you to minimize the number of device drivers that you have to build into your kernel, because device drivers can be loaded as modules based on the type of hardware that is detected by the kernel's device probing and initialization routines. Initial RAM disks are therefore commonly used on desktop Linux systems which may be deployed on a wide range of different hardware. This aspect of an initial RAM disk is less important in embedded Linux systems, because the types of attached hardware rarely change.

Many Linux distributions, such as the Linux 2.6 Reference Distributions available from TimeSys, include pre-assembled initial RAM disks for supported platforms and architectures. Desktop Linux systems typically use the EXT2 filesystem in their initial RAM disks, but many embedded Linux systems use smaller, more simple types of filesystems such as CRAMFS, ROMFS, or even the Minix filesystem. Regardless of the type of filesystem contained in an initial RAM disk, an initial RAM disk is typically compressed using gzip to save even more space.

Initial RAM disks are extremely popular in embedded Linux deployments for two reasons. First, they are effectively mandatory in embedded systems that do not have attached storage devices or which cannot depend on network access during the boot process. Secondly, they provide an easy way to load proprietary kernel modules that would otherwise have to be built into the kernel and would therefore be subject to the GNU Public License (GPL).

Using an Initial RAM Disk

In order to boot from an initial RAM disk, you must configure your kernel to support the type of filesystem in which your initial RAM disk was originally created, and you must also activate kernel configuration variables such as CONFIG_BLK_DEV_RAM and CONFIG_BLK_DEV_INITRD. The kernel knows how to uncompress the initial RAM disk image into memory, which it can then mount and access like any physical filesystem.

In embedded Linux scenarios, an initial RAM disk is often bundled into the Linux kernel during the kernel compilation process. This provides a single bootable entity that you can install into Flash or any other boot media that your embedded system's boot monitor can access. However, this also means that you must either prepare the initial RAM disk before you begin the kernel compilation process or that you must build it during the kernel compilation process.

Using an initial RAM disk has two potential problems. First, embedded systems with extremely limited amounts of RAM may not have enough to spare to hold the filesystem image. Second, an initial RAM disk provides no long-term storage-each time that you boot your embedded Linux system, the initial RAM disk is reloaded from the compressed image, which cannot be updated without recompilation. For these reasons, embedded systems with local storage typically install a root filesystem there. This takes full advantage of your available hardware and, in the case of writable storage media such as Flash, Compact Flash, Disk-On-Chip, or devices such as hard drives, provides a location where your system can maintain state and log information across reboots.

Using Other Types of Root Filesystems with Embedded Linux

Root filesystems in formats such as the Journaling Flash Filesystem (JFFS2) are typically used on systems with Flash memory that can be partitioned into multiple sections, usually containing the boot monitor, the loadable kernel image, and a JFFS2 filesystem. Systems with attached devices such as Compact Flash or hard drives typically use a journaling filesystem such as EXT3, XFS, or JFS. Journaling filesystems reduce system restart time by minimizing the chances that a filesystem will be left in an inconsistent state by a system crash or unplanned restart.

Finally, network filesystems are often used as root filesystems during the embedded Linux development process, because they provide more storage than is typically available on an embedded system, and also because they can easily preserve debugging and other program and system state information across restarts of your embedded system, since the storage is not physically located on your embedded system.

Regardless of the type of root filesystem that you want to use in your embedded Linux system, manually creating a root filesystem using traditional methods requires some specialized Linux knowledge. You have to be familiar with the commands used to create filesystems of each type. If you are creating an initial RAM disk, you have to create an empty file, associate that file with a Linux device, create the filesystem, and then mount that file as a special type of virtual device known as a loopback device in order to populate it. Depending on how your embedded system accesses devices, you may also have to know the specialized Linux commands required to create device nodes and their naming conventions. In order to create a root filesystem of any type, you have to understand the organization of a Linux root filesystem.

As discussed in the remainder of this white paper, graphical embedded development tools, such as TimeStorm Integrated Development Environment (IDE) and TimeStorm Linux Development Suite (LDS), can expedite and simplify development by encapsulating much of the specialized Linux knowledge that you would otherwise have to master and memorize. Figure 1 shows the TimeStorm LDS RFS Image Options screen, on which you can specify the type of filesystem that you want to create.


Figure 1: Specifying Filesystem Type Information in TimeStorm LDS
(Click to enlarge)

Using TimeStorm LDS to create a root filesystem can be as simple as specifying its type, selecting the packages that you want to include, defining any special behavior you're interested in, and clicking a button.

Overview of the Linux Startup Process

Though TimeStorm LDS makes it easy to assemble a root filesystem that is suitable for running your Linux system and application(s), it's important to understand the applications that your system may execute during the boot process, in order to make sure that they are present in your root filesystem. As we'll see later, TimeStorm LDS makes it easy to identify missing components of a root filesystem. This section discusses the files used during the Linux boot process and the infrastructure required for running many Linux applications, in order to provide some background for later sections.

All Linux systems start in essentially the same way, with one minor but significant difference if you are using an initial RAM disk as part of the boot process. After loading the kernel into memory and executing it, Linux systems execute a system application known as the init (initialization) process, which is typically found in /sbin/init on Linux systems. The init process is process number 1 on the system, as shown in a process status listing produced using the "ps" command, and is therefore the ancestor of all other processes on your system. The init process reads the file /etc/inittab to identify the way in which the system should boot and lists all other processes and programs that it should start.

When you boot a Linux system that uses an initial RAM disk, the boot sequence includes one extra step. Before executing the init process, the system uncompresses and mounts the initial RAM disk, and then executes the file /linuxrc (Linux Run Commands). This file must therefore be executable, but can be a command file that lists other commands to execute, can be a multi-call binary such as BusyBox, or can simply be a symbolic link to a multi-call binary or to the /sbin/init process itself.

The /linuxrc File in an Initial RAM Disk

Executing the file /linuxrc is done as a step in the initial RAM disk's mount process, as specified in the kernel source file init/do_mounts_initrd.c. A sample /linuxrc file, where the /linuxrc file in your initial RAM disk is actually a command script (taken from a generic Red Hat 9 system) is the following:
#!/bin/nash

echo Mounting /proc filesystem
mount -t proc /proc /proc
echo Creating block devices
mkdevices /dev
echo Creating root device
mkrootdev /dev/root
echo 0x0100 > /proc/sys/kernel/real-root-dev
echo Mounting root filesystem
mount -o defaults --ro -t ext3 /dev/root /sysroot
pivot_root /sysroot /sysroot/initrd
umount /initrd/proc
As you can see from this example, this sample /linuxrc file executes a number of commands that help initialize the system. The last commands in this command file mount the root filesystem on a local device and use the pivot_root command to change the system's idea of the root ('/') directory. Systems that offer local storage and want to use a filesystem that it contains as the root filesystem, but which also use an initial RAM disk, use the pivot_root command, included in the linux-utils package, to change the system's root directory from the initial RAM disk to the device that actually provides your long-term storage. This device is usually identified through a kernel boot argument.

On embedded systems that use an initial RAM disk and where you do not need to load any additional modules, perform any additional commands during the boot process, and so on, the /linuxrc file is often a symbolic link to the /sbin/init program discussed earlier in this section. This is an optimization-if the /linuxrc file is an actual command-file, your Linux system typically executes the /sbin/init process when it finishes executing the /linuxrc command file.

Components of a Root Filesystem

Regardless of whether your root filesystem is an initial RAM disk, local storage, or a network root filesystem, that filesystem must provide any kernel modules that you want to load during the boot process, all of the Linux commands the system needs to execute, any custom applications that you want to run on your Linux system, and entries for all of the devices that you want to able to access from your system. It also needs to provide any Linux infrastructure that the system needs in order to execute those applications.

Inter-Package Dependencies

Linux applications are generally provided as part of packages that contain a group of related applications. For example, the module-init-tools package on a 2.6 Linux system (known as modutils on 2.4 and early Linux systems) contains the applications that Linux systems use to insert, remove, and query loadable kernel modules, such as /sbin/depmod, /sbin/rmmod, /sbin/insmod, and so on. When you construct a root filesystem, these applications must be present in your root filesystem in order to perform any of those functions. Similarly, to log in on a Linux system over the network, that system must be running a server process such as the telnet or SSH daemons in order to receive your remote request. These servers are typically found in the telnet-server and openssh-server packages, respectively. It must also be able to initiate a login process of some sort (such as getty, mingetty, or agetty), and then execute your login shell (usually bash) and other commands that the login process requires.

To make mandatory packages available, most Linux distributions, such as those available from TimeSys, include sets of packages that are pre-compiled for your target hardware and from which you can pick and choose when creating a root filesystem. Figure 2 shows the TimeStorm LDS Package Selection screen, which enables you to select from the packages available for installation in your root filesystem. This list includes any Red Hat Package Managers (RPMs) or TimeStorm projects that you have created in any of your TimeStorm workspaces. The list shown in the figure is for a customized RFS based on a subset of available packages.


Figure 2: TimeStorm LDS Package Selection Screen
(Click to enlarge)

Unfortunately, the relationships between the applications contained in different packages are not always easy to identify unless you are a Linux wizard-or have access to a tool that automates this sort of relationship analysis, such as TimeStorm LDS.

Library Dependencies

Even after mastering packages and their relationships, an additional source of complexity in many embedded Linux systems is that Linux applications are often compiled using shared libraries. One of the advantages of Linux is that it provides a rich program compilation and execution environment. Thousands of libraries of pre-compiled functions are available, which your applications can use to minimize the number of times that you have to reinvent the wheel. Linux supports two different types of libraries, known as static and shared libraries.

Static libraries are libraries of precompiled functions that your application links to when it is compiled, resolving any function references at that time. Your application's binary therefore literally contains a copy of each library that it is statically linked to, but can be executed without those libraries actually being installed on your system, because the libraries are actually compiled into your applications. This leads to larger, but more self-sufficient, binaries.

Shared libraries are libraries of commonly-used functions that applications can link to when you execute those applications (i.e., at run-time), rather than when they are compiled. Binaries that use shared libraries are therefore much smaller than statically-linked binaries, but the libraries that they require must be present on your system in order to execute them. A run-time loader and the library that it requires must also be present on your system in order to run those binaries.

Understanding the hierarchy of commands that standard Linux services requires, the relationships between the commands in different Linux packages, and the shared libraries that these different commands can require is known as "dependency analysis." Dependency analysis is important on any Linux system in order to ensure that you will be able to run the services and commands that you need to execute. It is even more important in embedded Linux systems, where resources are limited and you may therefore need to make your root filesystem as small as possible without sacrificing functionality.

Linux provide a number of specialized command-line tools in order to understand package relationships and perform dependency analysis, each of which uses different syntax and requires special expertise. An alternative to becoming a Linux wizard at this level is to use a graphical tool such as TimeStorm LDS, which encapsulates all of the Linux expertise you'll need to perform this sort of analysis, and does it for you when you simply select a menu command.

Figure 3 shows the TimeStorm LDS Library Dependencies screen, displaying information about the packages and files provided and required by the e2fsprogs package, a package of utilities for creating and maintaining EXT2 and EXT3 filesystems.


Figure 3: TimeStorm LDS Package Selection Screen
(Click to enlarge)

Standards for Filesystem Content and Organization

Many embedded Linux systems are designed in-house, or for specialized purposes such as supporting a single, dedicated application or set of related applications. These custom, single-purpose systems tend to use root filesystems that include only the applications, utilities, and infrastructure that they require in order to boot and run.

As Linux systems become more widely used, especially within industry segments such as telecommunications, many software vendors are beginning to develop applications that are designed to run on a wide range of Linux-based systems. These systems may or may not even use the same Linux distribution, which makes it far more difficult to prepare for and address differences in the packages and infrastructure that are available on these systems.

The Linux community has proposed various standards that are designed to address and eliminate these sorts of problems. The best-known of these are the FHS (Filesystem Hierarchy Standard), LSB (Linux Standard Base), and CGL (Carrier Grade Linux) standards, which do the following:
  • FHS defines a standard set of utilities and associated infrastructure and where those utilities and associated files should be located in a compliant Linux filesystem. Being able to ensure that FHS-compliant filesystems will contain specific applications in a known location makes it easy for Linux command scripts and applications to leverage and invoke other Linux applications, which is the essence of the Unix application model in the first place.

  • LSB mandates FHS, adding the notion of a compliant run-time environment that will enable pre-compiled binaries to run on any other LSB-compliant system.

  • CGL mandates LSB, adding a number of performance and functionality requirements at both the kernel and filesystem level. The CGL specification is designed to make sure that compliant systems satisfy the needs of the telecommunications industry, but much of it is applicable to any high-availability environment.
Standards such as CGL are actively under development themselves. For example, two versions of the CGL specification are currently available. The older CGL 1.1 standard is supported by several embedded Linux distributions, such as those from MontaVista. More modern Linux distributions, such as the 2.6-based CGL Reference Distribution available from TimeSys, target the more extensive and up-to-date CGL 2.0 specification.

Summary

The Linux boot process, its interactions with physical and in-memory filesystems, the sequence of command scripts executed as a Linux system boots, and the dependencies between commands, files, and libraries can be quite complex. Graphical tools such as the TimeStorm Target Configurator from TimeSys, a component of TimeStorm LDS, make it easy to create initial RAM disks and other types of filesystems that contain the system software that you need at boot time or run time. TimeStorm LDS also simplifies adding your applications to the root filesystem that you want to deploy, whether this will be an initial RAM disk, located on flash storage, or stored on physical media such as a disk drive. The TimeStorm tools are powerful, easy-to-use, graphical tools that build in much of the specialized knowledge traditionally required when working with Linux systems, letting you focus on your application and implementation rather than having to become a Linux wizard along the way.
About the author


William von Hagen is a Senior Product Manager at TimeSys Corp., has been a Unix devotee for over twenty years, and has been a Linux fanatic since the early 1990s. He has worked as a system administrator, writer, developer, systems programmer, drummer, and product and content manager. Bill is the author of Linux Filesystems, Hacking the TiVo, SGML for Dummies, Installing Red Hat Linux 7, and is the coauthor of The Definitive Guide to GCC (with Kurt Wall) and The Mac OS X Power Users Guide (with Brian Profitt). Linux Filesystems is available in English, Spanish, Polish, and traditional Chinese. Bill has also written for publications including Linux Magazine, Mac Tech, Linux Format, and online sites such as Linux Planet and Linux Today. An avid computer collector specializing in workstations, he owns more than 200 computer systems.

No comments: