Wednesday, October 15, 2008

Unix File System

UNIX Unleashed, System Administrator's Edition

- 18 -

File System and Disk Administration

By Steve Shah

This chapter discusses the trials and tribulations of creating, maintaining, and repairing file systems. While these tasks may appear simple from a user's standpoint, they are, in fact, intricate and contain more than a handful of nuances . In the course of this chapter, we'll step through many of these nuances and, hopefully, come to a strong understanding of the hows and whys of file systems.

Before we really jump into the topic, you should have a good understanding of UNIX directories, files, permissions, and paths. These are the key building blocks in understanding how to administer your file systems, and I assume you already have a mastery of them. If the statement, "Be sure to have /usr/bin before /usr/local/bin in your $PATH" confuses you in any way, you should be reading something more fundamental first. Refer to Part I, "Introduction to UNIX," for some basic instructions in UNIX.

This chapter goes about the explanation of file systems a bit differently than other books. We first discuss the maintenance and repair of file systems, then discuss their creation. This was done because it is more likely that you already have existing file systems you need to maintain and fix. Understanding how to maintain them also helps you better understand why file systems are created the way they are.

The techniques we cover here are applicable to most UNIX systems currently in use. The only exceptions are when we actually create the file systems. This is where the most deviation from any standard (if there ever was one) occurs. We cover the creation of file systems under the SunOS, Solaris, Linux, and IRIX implementations of UNIX. If you are not using one of these operating systems, you should check the documentation that came with your operating system for details on the creation of file systems.


CAUTION: Working with file systems is inherently dangerous. You may be surprised at how quickly and easily you can damage a file system beyond repair. In some instances, it is even possible to damage the disk drive as well. BE CAREFUL. When performing the actions explained in this chapter, be sure you have typed the commands in correctly and you understand the resulting function fully before executing it. When in doubt, consult the documentation that came from the manufacturer. Most importantly, the documentation that comes from the manufacturer is always more authoritative than any book.



NOTE: You should read the entire chapter before actually performing any of the tasks below. This will give you a better understanding of how all the components work together, thereby giving you more solid ground when performing potentially dangerous activities.


What Is a File System

The file system is the primary means of file storage in UNIX. Each file system houses directories, which, as a group, can be placed almost anywhere in the UNIX directory tree. The topmost level of the directory tree, the root directory, begins at /. Subdirectories nested below the root directory may traverse as deep as you like so long as the longest absolute path is less than 1,024 characters.

With the proliferation of vendor-enhanced versions of UNIX, you will find a number of "enhanced" file systems. From the standpoint of the administrator, you shouldn't have to worry about the differences too much. The two instances where you will need to worry about vendor-specific details are in the creation of file systems and when performing backups. We will cover the specifics of:

SunOS 4.1.x, which uses 4.2

Solaris, which uses ufs

Linux, which uses ext2

IRIX, which uses efs and xfs

Note that the ufs and 4.2 file systems are actually the same.

A file system, however, is only a part of the grand scheme of how UNIX keeps its data on disk. At the top level, you'll find the disks themselves. These disks are then broken into partitions, each varying in size depending on the needs of the administrator. It is on each partition that the actual file system is laid out. Within the file system, you'll find directories, subdirectories, and, finally, the individual files.

Although you will rarely have to deal with the file system at a level lower than the individual files stored on it, it is critical that you understand two key concepts: inodes and the superblock. Once you understand these, you will find that the behavior and characteristics of files make more sense.


An inode maintains information about each file. Depending on the type of file system, the inode can contain upwards of 40+ pieces of information. Most of it, however, is only useful to the kernel and doesn't concern us. The fields that do concern us are

mode The permission mask and type of file.

link count The number of directories that contain an entry with this inode number.

user ID The ID of the file's owner.

group ID The ID of the file's group.

size Number of bytes in this file.

access time The time at which the file was last accessed.

mod time The time at which the file was last modified.

inode time The time at which this inode structure was last modified.

block list A list of disk block numbers which contain the first segment of the file.

indirect list A list of other block lists.

The mode, link count, user ID, group ID, size, and access time are used when generating file listings. Note that the inode does not contain the file's name. That information is held in the directory file (see below for details).


This is the most vital information stored on the disk. It contains information on the disk's geometry (number of heads, cylinders, and so on), the head of the inode list, and free block list. Because of its importance, the system automatically keeps mirrors of this data scattered around the disk for redundancy. You only have to deal with superblocks if your file system becomes heavily corrupted.

Types of Files

Files come in 8 flavors:

Normal Files


Hard Links

Symbolic links


Named Pipes

Character Devices

Block Devices

Normal Files These are the files you use the most. They can be either text or binary files; however, their internal structure is irrelevant from a System Administrator standpoint. A file's characteristics are specified by the inode in the file system that describes it. An ls -l on a normal file will look something like this:

-rw------- 1 sshah admin 42 May 12 13:09 hello

Directories These are a special kind of file that contains a list of other files. Although there is a one-to-one mapping of inode to disk blocks, there can be a many-to-one mapping from directory entry to inode. When viewing a directory listing using the ls -l command, you can identify directories by their permissions starting with the d character. An ls -l on a directory looks something like this:

drwx------ 2 sshah admin 512 May 12 13:08 public_html

Hard Links

A hard link is actually a normal directory entry except instead of pointing to a unique file , it points to an already existing file . This gives the illusion that there are two identical files when you do a directory listing. Because the system sees this as just another file, it treats it as such. This is most apparent during backups because hard-linked files get backed up as many times as there are hard links to them. Because a hard link shares an inode, it cannot exist across file systems. Hard links are created with the ln command. For example, given this directory listing using ls -l, we see:

-rw------- 1 sshah admin 42 May 12 13:04 hello

When you type ln hello goodbye and then perform another directory listing using ls -l, you see:

-rw------- 2 sshah admin 42 May 12 13:04 goodbye

-rw------- 2 sshah admin 42 May 12 13:04 hello

Notice how this appears to be two separate files that just happen to have the same file lengths. Also note that the link count (second column) has increased from one to two. How can you tell they actually are the same file? Use ls -il. Observe:

13180 -rw------- 2 sshah admin 42 May 12 13:04 goodbye

13180 -rw------- 2 sshah admin 42 May 12 13:04 hello

You can see that both point to the same inode, 13180.


WARNING: Be careful when creating hardlinks, especially when hardlinking to a directory. It is possible to corrupt a filesystem by doing so since the hardlink does not contain the fact that the i-node being pointed to needs to be treated as a directory.


Symbolic Links

A symbolic link (sometimes referred to as a symlink) differs from a hard link because it doesn't point to another inode but to another filename. This allows symbolic links to exist across file systems as well as be recognized as a special file to the operating system. You will find symbolic links to be crucial to the administration of your file systems, especially when trying to give the appearance of a seamless system when there isn't one. Symbolic links are created using the ln -s command. A common thing people do is create a symbolic link to a directory that has moved. For example, if you are accustomed to accessing the directory for your home page in the subdirectory www but at the new site you work at, home pages are kept in the public_html directory, you can create a symbolic link from www to public_html using the command ln -s public_html www. Performing an ls -l on the result shows the link.

drwx------ 2 sshah admin 512 May 12 13:08 public_html

lrwx------ 1 sshah admin 11 May 12 13:08 www -> public_html


Sockets are the means for UNIX to network with other machines. Typically, this is done using network ports; however, the file system has a provision to allow for interprocess communication through socket files. (A popular program that uses this technique is the X Windows system.) You rarely have to deal with this kind of file and should never have to create it yourself (unless you're writing the program). If you need to remove a socket file, use the rm command. Socket files are identified by their permission settings beginning with an s character. An ls -l on a socket file looks something like this:

srwxrwxrwx 1 root admin 0 May 10 14:38 X0

Named Pipes

Similar to sockets, named pipes enable programs to communicate with one another through the file system. You can use the mknod command to create a named pipe. Named pipes are recognizable by their permissions settings beginning with the p character. An ls -l on a named pipe looks something like this:

prw------- 1 sshah admin 0 May 12 22:02 mypipe

Character Devices

These special files are typically found in the /dev directory and provide a mechanism for communicating with system device drivers through the file system one character at a time. They are easily noticed by their permission bits starting with the c character. Each character file contains two special numbers, the major and minor. These two numbers identify which device driver that file communicates with. An ls -l on a character device looks something like this:

crw-rw-rw- 1 root wheel 21, 4 May 12 13:40 ptyp4

Block Devices

Block devices also share many characteristics with character devices in that they exist in the /dev directory, are used to communicate with device drivers, and have major and minor numbers. The key difference is that block devices typically transfer large blocks of data at a time versus one character at a time. (A hard disk is a block device, whereas a terminal is a character device.) Block devices are identified by their permission bits starting with the b character. An ls -l on a block device looks something like this:

brw------- 2 root staff 16, 2 Jul 29 1992 fd0c

Managing File Systems

Managing file systems is relatively easy. That is, once you can commit to memory the location of all the key files in the directory tree on each major variation of UNIX as well as your own layout of file systems across the networkÉ

In other words, it can be a royal pain.

From a technical standpoint there isn't much to deal with. Once the file systems have been put in their correct places and the boot time configuration files have been edited so that your file systems automatically come online at every start up, there isn't much to do besides watch your disk space.

From a management standpoint, it's much more involved. Often you'll need to deal with existing configurations, which may not have been done "the right way," or you're dealing with site politics such as, "I won't let that department share my disks." Then you'll need to deal with users who don't understand why they need to periodically clean up their home directories. Don't forget the ever exciting vendor-specific nuisances and their idea of how the system "should be" organized.

This section covers the tools you need to manage the technical issues. Unfortunately, managerial issues are something that can't be covered in a book. Each site has different needs as well as different resources, resulting in different policies. If your site lacks any written policy, take the initiative to write one yourself.

Mounting and Unmounting File Systems

As I mentioned earlier in this chapter, part of the power in UNIX stems from its flexibility in placing file systems anywhere in the directory tree. This feat is accomplished by mounting file systems.

Before you can mount a file system, you need to select a mount point. A mount point is the directory entry in the file system where the root directory of a different file system will overlay it. UNIX keeps track of mount points, and accesses the correct file system, depending on which directory the user is currently in. A mount point may exist anywhere in the directory tree.


NOTE: While it is technically true that you can mount a file system anywhere in the directory tree, there is one place you will NOT want to mount it: the root directory. Remember that once a file system is mounted at a directory, that directory is overshadowed by the contents of the mounted file system. Hence, by mounting on the root directory, the system will no longer be able to see its own kernel or local configuration files. How long your system goes on before crashing depends on your vendor.

There is an exception to the rule. Some installation packages will mount a network file system to the root directory. This is done to give the installation software access to many packages that may not be able to fit on your boot disk. Unless you fully understand how to do this yourself, don't.


Mounting and Unmounting File Systems Manually To mount a file system, use the mount command:

mount /dev/device /directory/to/mount

where /dev/device is the device name you want to mount and /directory/to/mount is the directory you want to overlay in your local file system. For example, if you wanted to mount /dev/hda4 to the /usr directory, you would type:

mount /dev/hda4 /usr

Remember that the directory must exist in your local file system before anything can be mounted there.

There are options that can be passed to the mount command. The most important characteristics are specified in the -o option. These characteristics are:

rw read/write

ro read only

bg background mount (if the mount fails,

place the process into the background

and keep trying until success.)

intr interruptible mount (if a process is

pending I/O on a mounted partition, it

will allow the process to be interrupted

and the I/O call dropped)

An example of these parameters being used is:

mount -o rw,bg,intr /dev/hda4 /usr

See the man page on your system for vendor specific additions.

To unmount a file system, use the umount command. For example:

umount /usr

This unmounts the /usr file system from the current directory tree, unveiling the original directory underneath it.

There is, of course, a caveat. If users are using files on a mounted file system, you cannot unmount it. All files must be closed before this can happen, which on a large system can be tricky to say the least. There are three ways to handle this:

Use the lsof program (available at to list the users and their open files on a given file system. Then either wait until they are done, beg them to leave, or kill their processes off. Then unmount the file system. Often, this isn't very desirable.

Use the -f option with umount command to force the unmount. This is often a bad idea because it leaves the programs (and users) accessing the partition confused. Files which are in memory that have not been committed to disk may be lost.

Bring the system to single user mode, then unmount the file system. While the largest inconvenience, it is the safest way because no one loses any work.

Mounting File Systems Automatically At boot time, the system automatically mounts the root file system with read-only privileges. This enables it to load the kernel and read critical startup files. However, once it has bootstrapped itself, it needs guidance. Although it is possible for you to mount all the file systems by hand, it isn't realistic because you would then have to finish bootstrapping the machine yourself, and worse, the system could not come back online by itself. (Unless, of course, you enjoy coming into work at 2 a.m. to bring a system back up.)

To get around this, UNIX uses a special file called /etc/fstab (/etc/vfstab under Solaris). This file lists all the partitions that need to be mounted at boot time and the directory where they need to be mounted. Along with that information you can pass parameters to the mount command.

Each file system to be mounted is listed in the fstab file in the following format:

/dev/device /dir/to/mount ftype parameters fs_freq fs_passno


/dev/device Is the device to be mounted, for instance, /dev/hda4.

/dir/to/mount Is the location at which the file system should be mounted on your directory tree.

ftype Is the file system type. This should be 4.2 under SunOS, ufs under Solaris, ext2 under Linux, efs or xfs in IRIX (depending on your version), nfs for NFS mounted file systems, swap for swap partitions, and proc for the /proc file system. Some operating systems, such as Linux, support additional filesystem types, although they are not as likely to be used.

parameters Are the parameters we passed to mount using the -o option. They follow the same comma-delineated format. An example entry would look like rw,intr,bg.

fs_freq Is used by dump to determine whether a file system needs to be dumped.

fs_passno Is used by the fsck program to determine the order to check disks at boot time.

Any lines in the fstab file that start with the pound symbol (#) are considered comments.

If you need to mount a new file system while the machine is live, you must perform the mount by hand. If you wish to have this mount automatically active the next time the system is rebooted, you should be sure to add the appropriate entry to your fstab file.

There are two notable partitions that don't follow the same set of rules as normal partitions. They are the swap partition and /proc. (Note that SunOS does not use the /proc file system.)

Mounting the swap partition is not done using the mount command. It is instead managed by the swap command under Solaris and IRIX, and by the swapon command under SunOS and Linux. In order for a swap partition to be mounted, it must be listed in the appropriate fstab file. Once it's there, use the appropriate command (swap or swapon) with the -a parameter followed by the partition on which you've allocated swap space.

The /proc file system is even stranger because it really isn't a file system. It is an interface to the kernel abstracted into a file system style format. This should be listed in your fstab file with file system type proc.


TIP: If you need to remount a file system that already has an entry in the fstab file, you don't need to type in the mount command with all the parameters. Instead, simply pass the directory to mount as a parameter like this:

mount /dir/to/mount

mount automatically looks to the fstab file for all the details, such as which partition to mount and which options to use.

If you need to remount a large number of file systems that are already listed in the fstab file (in other words, you need to remount directories from a system that has gone down), you can use the -a option in the mount command to try and remount all the entries in the fstab file like this:

mount -a

If mount finds that a file system is already mounted, no action is performed on that file system. If, on the other hand, mount finds that an entry is not mounted, it automatically mounts it with the appropriate parameters.


Here is a complete fstab file from a SunOS systems:


# Sample /etc/fstab file for a SunOS machine


# Local mounts

/dev/sd0a / 4.2 rw 1 1

/dev/sd0g /usr 4.2 rw 1 2

/dev/sd0b swap swap rw 0 0

/dev/sd0d /var 4.2 rw 0 0

# Remote mounts

server1:/export/home /home nfs rw,bg,intr 0 0

server1:/export/usr/local /usr/local nfs rw,bg,intr 0 0

server2:/export/var/spool/mail /var/spool/mail nfs rw,bg,intr 0 0

Common Commands for File System Management

In taking care of your system, you'll quickly find that you can use these commands and many of their parameters without having to look them up. This is because you're going to be using them all the time. I highly suggest you learn to love them.


NOTE: In reading this book you may have noticed the terms program and command are used interchangably. This is because there are no "built in" commands to the system, each one is invoked as an individual program. However, you will quickly find that both the people who use UNIX, as well as UNIX related texts (such as this one), use the both terms to mean the same thing. Confusing? A bit. But it's tough to change 25+ years of history.



NOTE: At the end of each command description, I mention the GNU equivalent. Linux users shouldn't worry about getting them, because Linux ships with all GNU tools. If you are using another platform and aren't sure whether you're using the GNU version, try running the command with the --version option. If it is GNU, it will display its title and version number. If it isn't GNU, it'll most likely reject the parameter and give an error.


df The df command summarizes the free disk space by file system. Running it without any parameters displays all the information about normally mounted and NFS mounted file systems. The output varies from vendor to vendor (under Solaris, use df -t) but should closely resemble this:

Filesystem 1024-blocks Used Available Capacity Mounted on

/dev/hda3 247871 212909 22161 91% /

/dev/hda6 50717 15507 32591 32% /var

/dev/hda7 481998 15 457087 0% /local


489702 222422 218310 50% /var/spool/mail

The columns reported show: Filesystem Which file system is being shown. File systems mounted using NFS are shown as hostname:/dir/that/is/mounted

1024-blocks The number of 1 KB blocks the file system consists of. (Its total size.)

Used The number of blocks used.

Available The number of blocks available for use.

Capacity Percentage of partition currently used.

Mounted on The location in the directory tree this partition has been mounted on.

Common parameters to this command are:

directory Show information only for the partition on which the specified directory exists.

-a Show all partitions including swap and /proc.

-i Show inode usage instead of block usage.

The GNU df program, which is part of the fileutils distribution, has some additional print formatting features you may find useful. You can download the latest fileutils package at

du The du command summarizes disk usage by directory. It recurses through all subdirectories and shows disk usage by each subdirectory with a final total at the end. Running it without any parameters shows the usage like so:

409 ./doc

945 ./lib

68 ./man

60 ./m4

391 ./src

141 ./intl

873 ./po

3402 .

The first column shows the blocks of disk used by the subdirectory, and the second column shows the subdirectory being evaluated. To see how many kilobytes each subdirectory consumes, use the -k option. Some common parameters to this command are

directory Show usage for the specified directory. The default is the current directory.

-a Show usage for all files, not just directories.

-s Show only the total disk usage.

Like the df program, this program is available as part of the GNU fileutils distribution. The GNU version has expanded on many of the parameters which you may find useful. The fileutils package can be downloaded from

ln The ln program is used to generate links between files. This is very useful for creating the illusion of a perfect file system in which everything is in the "right" place when, in reality, it isn't. This is done by making a link from the desired location to the actual location.

The usage of this program is

ln file_being_linked_to link_name

where file_being_linked_to is the file that already exists, and you wish to have another file point to it called link_name. The command above generates a hard link, meaning that the file link_name will be indistinguishable from the original file. Both files must exist on the same file system.

A popular parameter to ln is the -s option, which generates symbolic links instead of hard links. The format of the command remains the same:

ln -s file_being_linked_to link_name

the difference being that the link_name file is marked as a symbolic link in the file system. Symbolic links may span file systems and are given a special tag in the directory entry.


TIP: Unless there is an explicit reason not to, you should always use symbolic links by specifying the -s option to ln. This makes your links stand out and makes it easy to move them from one file system to another.


tar The tar program is an immensely useful archiving utility. It can combine an entire directory tree into one large file suitable for transferring or compression.

The command line format of this program is:

tar parameters filelist

Common parameters are:

c Create an archive

x Extract the archive

v Be verbose

f Specify a tar file to work on

p Retain file permissions and ownerships

t View the contents of an archive.

Unlike most other UNIX commands, the parameters do not need to have a dash before them.

To create the tarfile myCode.tar, I could use tar in the following manners:

tar cf myCode.tar myCode

where myCode is a subdirectory relative to the current directory where the files I wish to archive are located.

tar cvf myCode.tar myCode

Same as the previous tar invocation, although this time it lists all the files added to the archive on the screen.

tar cf myCode.tar myCode/*.c

This archives all the files in the myCode directory that are suffixed by .c

tar cf myCode.tar myCode/*.c myCode/*.h

This archives all the files in the myCode directory that are suffixed by .c or .h

To view the contents of the myCode.tar file, use:

tar tf myCode.tar

To extract the files in the myCode.tar file, use:

tar xf myCode.tar

If the myCode directory doesn't exist, tar creates it. If the myCode directory does exist, any files in that directory are overwritten by the ones being untarred.

tar xvf myCode.tar

Same as the previous invocation of tar, but this lists the files as they are being extracted.

tar xpf myCode.tar

Same as the previous invocation of tar, but this attempts to set the permissions of the unarchived files to the values they had before archiving (very useful if you're untarring files as root).


TIP: The greatest use of tar for Systems Administrators is to move directory trees around. This can be done using the following set of commands:

(cd /src;tar cf - *) | (cd /dest;tar xpf -)

where /src is the source directory and /dest is the destination directory.

This is better than using a recursive copy because symbolic links and file permissions are kept. Use this and amaze your friends.


While the stock tar that comes with your system works fine for most uses, you may find that the GNU version of tar has some nicer features. You can find the latest version of GNU tar at

find Of the commands that I've mentioned so far, you're likely to use find the most. Its purpose is to find files or patterns of files. The parameters for this tool are

find dir parameters

where dir is the directory where the search begins, and parameters define what is being searched for. The most common parameters you will use are:

-name Specify the filename or wildcards to look for. If you use any wildcards, be sure to place them inside of quotes so that the shell doesn't parse them before find does.

-print Typically turned on by default, it tells find to display the resulting file list.

-exec Executes the specified command on files found matching the -name criteria.

-atime n File was last accessed n days ago.

-mtime n File's data was last modified n days ago.

-size n[bckw] File uses n units of space where the units are specified by either b,c,k, or w. b is for 512 byte blocks, c is bytes, k is kilobytes, and w is two-byte words.

-xdev Do not traverse down nonlocal file systems.

-o Logical or the options.

-a Logical and the options.

Some examples of the find command are

find / -name core -mtime +7 -print -exec /bin/rm {} \;

This starts its search from the root directory and finds all files named core that have not been modified in seven days.

find / -xdev -atime +60 -a -mtime +60 -print

This searches all files, from the root directory down, on the local file system, which have not been accessed for at least 60 days and have not been modified for at least 60 days, and prints the list. This is useful for finding those files that people claim they need but, in reality, never use.

find /home -size +500k -print

This searches all files from /home down and lists them if they are greater than 500 KB in size. A handy way of finding large files in the system.

The GNU version of find, which comes with the findutils package, offers many additional features you will find useful. You can download the latest version from

Repairing File Systems with fsck

Sooner or later, it happens: Someone turns off the power switch. The power outage lasts longer than your UPS's batteries and you didn't shut down the system. Someone presses the reset button. Someone overwrites part of your disk. A critical sector on the disk develops a flaw. If you run UNIX long enough, eventually a halt occurs where the system did not write the remaining cached information (sync'ed) to the disks.

When this happens, you need to verify the integrity of each of the file systems. This is necessary because if the structure is not correct, using the file systems could quickly damage them beyond repair. Over the years, UNIX has developed a very sophisticated file system integrity check that can usually recover the problem. It's called fsck.

The fsck Utility

The fsck utility takes its understanding of the internals of the various UNIX file systems and attempts to verify that all the links and blocks are correctly tied together. It runs in five passes, each of which checks a different part of the linkage and each of which builds on the verifications and corrections of the prior passes.

fsck walks the file system, starting with the superblock. It then deals with the allocated disk blocks, pathnames, directory connectivity, link reference counts, and the free list of blocks and inodes.


NOTE: The xfs filesystem now shipped with all IRIX based machines no longer need the fsck command.


The Superblock Every change to the file system affects the superblock, which is why it is cached in RAM. Periodically, at the sync interval, it is written to disk. If it is corrupted, fsck checks and corrects it. If it is so badly corrupted that fsck cannot do its work, find the paper you saved when you built the file system and use the -b option with fsck to give it an alternate superblock to use. The superblock is the head of each of the lists that make up the file system, and it maintains counts of free blocks and inodes.

Inodes fsck validates each of the inodes. It makes sure that each block in the block allocation list is not on the block allocation list in any other inode, that the size is correct, and that the link count is correct. If the inodes are correct, then the data is accessible. All that's left is to verify the pathnames.

What Is a Clean (Stable) File System?

Sometimes fsck responds

/opt: stable (ufs file systems)

This means that the superblock is marked clean and that no changes have been made to the file system since it was marked clean. First, the system marks the superblock as dirty; then it starts modifying the rest of the file system. When the buffer cache is empty and all pending writes are complete, it goes back and marks the superblock as clean. If the superblock is marked clean, there is normally no reason to run fsck, so unless fsck is told to ignore the clean flag, it just prints this notice and skips over this file system.

Where Is fsck?

When you run fsck, you are running an executable in either the /usr/sbin or /bin directory called fsck, but this is not the real fsck. It is just a dispatcher that invokes a file system type-specific fsck utility.

When Should I Run fsck?

Normally, you do not have to run fsck. The system runs it automatically when you try to mount a file system at boot time that is dirty. However, problems can creep up on you. Software and hardware glitches do occur from time to time. It wouldn't hurt to run fsck just after performing the monthly backups.


CAUTION: It is better to run fsck after the backups rather than before. If fsck finds major problems, it could leave the file system in worse shape than it was prior to running. Then you can just build an empty file system and reread your backup, which also cleans up the file system. If you did it in the other order, you would be left with no backup and no file system.


How Do I Run fsck?

Because the system normally runs it for you, running fsck is not an everyday occurrence for you to remember. However, it is quite simple and mostly automatic.

First, to run fsck, the file system you intend to check must not be mounted. This is a bit hard to do if you are in multiuser mode most of the time, so to run a full system fsck you should bring the system down to single user mode.

In single user mode you need to invoke fsck, giving it the options to force a check of all file systems, even if they are already stable.

fsck -f (SunOS)

fsck -o f (Solaris)

fsck (Linux and IRIX)

If you wish to check a single specific file system, type its character device name. (If you aren't sure what the device name is, see the section on adding a disk to the system for details on how to determine this information.) For example:

fsck /dev/hda1

Stepping Through an Actual fsck fsck occurs in five to seven steps, depending on your operating system and what errors are found, if any. fsck can automatically correct most of these errors and does so if invoked at boot time to automatically check a dirty file system.

The fsck we are about to step through was done on a ufs file system. While there are some differences between the numbering of the phases for different file systems, the errors are mostly the same, requiring the same solutions. Apply common sense liberally to any invocation of fsck and you should be okay.

Checking ufs File Systems For ufs file systems, fsck is a five-phase process. fsck can automatically correct most of these errors and does so if invoked at boot time to automatically check a dirty file system. However, when you run fsck manually, you are asked to answer the questions that the system would automatically answer.


CAUTION: Serious errors reported by the ufs fsck at the very beginning, especially before reporting the sta8rt of phase 1, indicate an invalid superblock. fsck should be terminated and restarted with the -b option specifying one of the alternate superblocks. Block 32 is always an alternate and can be tried first, but if the front of the file system was overwritten, it too may be damaged. Use the hard copy you saved from the mkfs to find an alternate later in the file system.


Phase 1: Check Blocks and Sizes This phase checks the inode list, looking for invalid inode entries. Errors requiring answers include


The file type bits are invalid in the inode. Options are to leave the problem and attempt to recover the data by hand later or to erase the entry and its data by clearing the inode.


The inode appears to point to less data than the file does. This is safely salvaged, because it indicates a crash while truncating the file to shorten it.

block BAD I=inode number

block DUP I=inode number

The disk block pointed to by the inode is either out of range for this inode or already in use by another file. This is an informational message. If a duplicate block is found, phase 1b is run to report the inode number of the file that originally used this block.

Phase 2: Check Pathnames This phase removes directory entries from bad inodes found in phase 1 and 1b and checks for directories with inode pointers that are out of range or pointing to bad inodes. You may have to handle


You can convert inode 2, the root directory, back into a directory, but this usually means there is major damage to the inode table.

I=OUT OF RANGE I=inode number NAME=file name (REMOVE?)




A bad inode number was found, an unallocated inode was used in a directory, or an inode that had a bad or duplicate block number in it is referenced. You are given the choice to remove the file, losing the data, or to leave the error. If you leave the error, the file system is still damaged, but you have the chance to try to dump the file first and salvage part of the data before rerunning fsck to remove the entry.

fsck may return one of a variety of errors indicating an invalid directory length. You will be given the chance to have fsck fix or remove the directory as appropriate. These errors are all correctable with little chance of subsequent damage.

Phase 3: Check Connectivity This phase detects errors in unreferenced directories. It creates or expands the lost+found directory if needed and connects the misplaced directory entries into the lost+found directory. fsck prints status messages for all directories placed in lost+found.

Phase 4: Check Reference Counts This phase uses the information from phases 2 and 3 to check for unreferenced files and incorrect link counts on files, directories, or special files.


The filename is not known (it is an unreferenced file), so it is reconnected into the lost+found directory with the inode number as its name. If you clear the file, its contents are lost. Unreferenced files that are empty are cleared automatically.





In both cases, an entry was found with a different number of references than what was listed in the inode. You should let fsck adjust the count.


A file or directory has a bad or duplicate block in it. If you clear it now, the data is lost. You can leave the error and attempt to recover the data, and rerun fsck later to clear the file.

Phase 5: Check Cylinder Groups This phase checks the free block and unused inode maps. It automatically corrects the free lists if necessary, although in manual mode it asks permission first.

What Do I Do After fsck Finishes?

First, relax, because fsck rarely finds anything seriously wrong, except in cases of hardware failure where the disk drive is failing or where you copied something on top of the file system. UNIX file systems are very robust.

However, if fsck finds major problems or makes a large number of corrections, rerun it to be sure the disk isn't undergoing hardware failure. It shouldn't find more errors in a second run. Then, recover any files that it may have deleted. If you keep a log of the inodes it clears, you can go to a backup tape and dump the list of inodes on the tape. Recover just those inodes to restore the files.

Back up the system again, because there is no reason to have to do this all over again.

Dealing with What Is in lost+found

If fsck reconnects unreferenced entries, it places them in the lost+found directory. They are safe there, and the system should be backed up in case you lose them while trying to move them back to where they belong. Items in lost+found can be of any type: files, directories, special files (devices), and so on. If it is a named pipe or socket, you may as well delete it. The process which opened it is long since gone and will open a new one when it is run again.

For files, use the owner name to contact the user and have him look at the contents and see if the file is worth keeping. Often, it is a file that was deleted and is no longer needed, but the system crashed before it could be fully removed.

For directories, the files in the directory should help you and the owner determine where they belong. You can look on the backup tape lists for a directory with those contents if necessary. Then just remake the directory and move the files back. Then remove the directory entry in lost+found. This re-creation and move has the added benefit of cleaning up the directory.

Creating File Systems

Now that you understand the nuances of maintaining a file system, it's time to understand how they are created. This section walks you through the three steps of:

Picking the right kind of disk for your system

Creating partitions

Creating the file system

Disk Types

Although there are many different kinds of disks, UNIX systems have come to standardize on SCSI for workstations. Many PCs also sport SCSI interfaces, but because of the lower cost and abundance, you'll find a lot of IDE drives on UNIX PC's as well.

SCSI itself comes in a few different flavors now. There is regular SCSI, SCSI-2, SCSI-Wide, SCSI-Fast and Wide, and now SCSI-3. Although it is possible to mix and match these devices with converter cables, you may find it both easier on your sanity and your performance if you stick to one format. As of this writing, SCSI-2 is the most common interface.

When attaching your SCSI drive, there are many important points to remember.

Terminate your SCSI chain. Forgetting to do this causes all sorts of non-deterministic behavior (a pain to track down). SCSI-2 requires active termination, which is usually indicated by terminators with LEDs on them.

If a device claims to be self-terminating, you can take your chances, but you'll be less likely to encounter an error if you put a terminator on anyway.

There is a limit of eight devices on a SCSI chain with the SCSI card counting as a device. Some systems may have internal SCSI devices, so be sure to check for those.

Be sure all your devices have unique SCSI IDs. A common symptom of having two devices with the same ID is their tendency to frequently reset the SCSI chain. Of course, many devices simply won't work under those conditions.

When adding or removing a SCSI disk, be sure to power the system down first. There is power running through the SCSI cables, and failing to shut them down first may lead to problems in the future.

Although SCSI is king of the workstation, PCs have another choice: IDE. IDE tends to be cheaper and more available than SCSI devices with many motherboards offering direct IDE support. The advantage of using this kind of interface is its availability as well as lower cost. They are also simpler and require less configuration on your part.

The down side to IDEs is that their simplicity comes at the cost of configurability and expandability. The IDE chain can only hold two devices, and not all motherboards come with more than one IDE chain. If your CD-ROM is IDE, you only have space for one disk. This is probably okay with a single person workstation, but as you can imagine, it's not going to fly well in a server environment. Another consideration is speed. SCSI was designed with the ability to perform I/O without the aid of the main CPU, which is one of the reasons it costs more. IDE, on the other hand, was designed with cost in mind. This resulted in a simplified controller; hence, the CPU takes the burden for working the drive.

While IDE did manage to simplify the PC arena, it did come with the limitation of being unable to handle disks greater than 540M. Various tricks were devised to circumvent this, however, the clean solution is now predominantly available. Known as EIDE (Enhanced IDE), it is capable of supporting disks up to 8G and can support up to 4 devices on one chain.

In weighing the pros and cons of EIDE versus SCSI in the PC environment, don't forget to think about the cost-to-benefit ratio. Having a high speed SCSI controller in a single person's workstation may not be as necessary as the user is convinced it is. Plus, with disks being released in 2+ gigabyte configurations, there is ample room on the typical IDE disk.

Once you have decided on the disk subsystem to install, read the documentation that came with the machine for instructions on physically attaching the disk to the system.

What Are Partitions and Why Do I Need Them?

Partitions are UNIX's way of dividing the disk into usable pieces. UNIX requires that there be at least one partition; however, you'll find that creating multiple partitions, each with a specific function, is often necessary.

The most visible reason for creating separate partitions is to protect the system from the users. The one required partition mentioned earlier is called the root partition. It is here that critical system software and configuration files (the kernel and mount tables) must reside. This partition must be carefully watched so that it never fills up. If it fills up, your system may not be able to come back up in the event of a system crash. Because the root partition is not meant to hold the users' data, you must create separate partitions for the users' home directories, temporarily files, and so forth. This enables their files to grow without the worry of crowding out the key system files.

Dual boot configurations are becoming another common reason to partition, especially with the ever-growing popularity of Linux. You may find your users wanting to be able to boot to either Windows or Linux; therefore, you need to keep at least two partitions to enable them to do this.

The last, but certainly not least, reason to partition your disks is the issue of backups. Backup software often works by dumping entire partitions onto tape. By keeping the different types of data on separate partitions, you can be explicit about what gets backed up and what doesn't. For example, daily backup of the system software isn't necessary, but backups of home directories are. By keeping the two on separate partitions, you can be more concise in your selection of what gets backed up and what doesn't.

Another example relates more to company politics. It may be possible that one group does not want their data to be backed up to the same tape as another group's. (Note: common sense doesn't always apply to inter-group politicsÉ) By keeping the two groups on separate partitions, you can exclude one from your normal backups and exclude the others during your special backups.

Which Partitions To Create As I mentioned earlier, the purpose of creating partitions is to separate the users from the system areas. So how many different partitions need to be created? While there is no right answer for every installation, here are some guidelines to take into account.

You always need a root partition. In this partition, you'll have your /bin, /etc, and /sbin directories at the very least. Depending on your version of UNIX, this could require anywhere from 30 to 100 megabytes. /tmp The /tmp directory is where your users, as well as programs, store temporarily files. The usage of this directory can quickly get out of hand, especially if you run a quota-based site. By keeping it a separate partition, you do not need to worry about its abuse interfering with the rest of the system. Many operating systems automatically clear the contents of /tmp on boot. Size /tmp to fit your site's needs. If you use quotas, you will want to make it a little larger, whereas sites without quotas may not need as much space.

Under Solaris, you have another option when setting up /tmp. Using the tmpfs filesystem, you can have your swap space and /tmp partition share the same physical location on disk. While it appears to be an interesting idea, you'll quickly find that it isn't a very good solution, especially on a busy system. This is because as more users do their work, more of /tmp will be used. Of course, if there are more users, there is a greater memory requirement to hold them all. The competition for free space can become very problematic.

/var The /var directory is where the system places its spool files (print spool, incoming/outgoing mail queue, and so on) as well as system log files. Because of this, these files constantly grow and shrink with no warning. Especially the mail spool. Another possibility to keep in mind is the creation of a separate partition just for mail. This enables you to export the mail spool to all of your machines without having to worry about your print spools being exported as well. If you use a backup package that requires its own spool space, you may wish to keep this a separate partition as well.

/home The /home directory is where you place your users' account directories. You may need to use multiple partitions to keep your home directories (possibly broken up by department) and have each partition mount to /home/dept where dept is the name of the respective department.

/usr The /usr directory holds noncritical system software, such as editors and lesser used utilities. Many sites hold locally compiled software in the /usr/local directory where they either export it to other machines, or mount other machines' /usr/local to their own. This makes it easy for a site to maintain one /usr/local directory and share it amongst all of its machines. Keeping this a separate partition is a good idea since local software inevitably grows.

swap This isn't a partition you actually keep files on, but it is key to your system's performance. The swap partition should be allocated and swapped to instead of using swap files on your normal file system. This enables you to contain all of your swap space in one area that is out of your way. A good guideline for determining how much swap space to use is to double the amount of RAM installed on your system.


TIP: Several new versions of UNIX are now placing locally compiled software in the /opt directory. Like /usr/local, this should be made a separate partition as well. If your system does not use /opt by default, you should make a symbolic link from there to /usr/local. The vice versa is true as well, if your system uses /opt, you should create a symbolic link from /usr/local to /opt.

To add to the confusion, the Redhat Distribution of Linux has brought the practice of installing precompiled software (RPMs) in the /usr/bin directory. If you are using Redhat, you may want to make your /usr directory larger since locally installed packages will consume that partition.


The Device Entry

Most implementations of UNIX automatically create the correct device entry when you boot it with the new drive attached. Once this entry has been created, you should check it for permissions. Only root should be given read/write access to it. If your backups run as a nonroot user, you may need to give group read access to the backup group. Be sure that no one else is in the backup group. Allowing world read/write access to the disk is the easiest way to have your system hacked, destroyed, or both.

Device entries under Linux IDE disks under Linux use the following scheme to name the hard disks:


Each IDE drive is lettered starting from a. So the primary disk on the first chain is a; the slave on the first chain is b; the primary on the secondary chain is c; and so on. Each disk's partition is referenced by number. For example, the third partition of the slave drive on the first chain is /dev/hdb3.

SCSI disks use the same scheme except instead of using /dev/hd as the prefix, /dev/sd is used. So to refer to the second partition of the first disk on the SCSI chain, you would use /dev/sda2.

To refer to the entire disk, specify all the information except the partition. For example, to refer to the entire primary disk on the first IDE chain, you would use /dev/hda.

Device entries under IRIX SCSI disks under IRIX are referenced in either the /dev/dsk or /dev/rdsk directories. The following is the format:


where C is the controller number, S is the SCSI address, and P is the partition, s0,s1,s2, and so on. The partition name can also be vh for the volume header or vol to refer to the entire disk.

Device entries under Solaris The SCSI disks under Solaris are referenced in either the /dev/dsk or /dev/rdsk directories. The following is the format:


where C is the controller number, S is the SCSI address, and P is the partition number. Partition 2 always refers to the entire disk and label information. Partition 1 is typically used for swap.

Device entries under SunOS Disks under SunOS are referenced in the /dev directory. The following is the format:


where T is the target number and P is the partition. Typically, the root partition is a, the swap partition is b, and the entire disk is referred to as partition c. You can have partitions from a through f.

An important aspect to note is an oddity with the SCSI target and unit numbering: Devices that are target three need to be called target zero, and devices that are target zero need to be called target three.

A Note About Formatting Disks

"Back in the old days," disks needed to be formatted and checked for bad blocks. The procedure of formatting entailed writing the head, track, and sector numbers in a sector preamble and a checksum in the postamble to every sector on the disk. At the same time, any sectors that were unusable due to flaws in the disk surface were marked and, depending on the type of disk, an alternate sector mapped into its place.

Thankfully, we have moved on.

Both SCSI and IDE disks now come pre-formatted from the factory. Even better, they transparently handle bad blocks on the disk and remap them without any assistance from the operating system.


CAUTION: You should NEVER attempt to low level format an IDE disk.

Doing so will make your day very bad as you watch the drive quietly kill itself. Be prepared to throw the disk away should you feel the need to low level format it.


Partitioning Disks and Creating File Systems

In this section, we will cover the step by step procedure for partitioning disks under Linux, IRIX, SunOS, and Solaris. Since the principles are similar across all platforms, each platform will also cover another method of determining how a disk should be partitioned up depending on its intended usage.

Linux To demonstrate how partitions are created under Linux, we will setup a disk with a single user workstation in mind. It will need not only space for system software, but for application software and the user's home directories.

Creating Partitions For this example, we'll create the partitions on a 1.6 GB IDE disk located on /dev/hda. This disk will become the boot device for a single user workstation. We will create the boot /usr, /var, /tmp, /home, and swap partitions.

During the actual partitioning, we don't name the partitions. Where the partitions are mounted is specified with the /etc/fstab file. Should we choose to mount them in different locations later on, we could very well do that. However, by keeping the function of each partition in mind, we have a better idea of how to size them.

A key thing to remember with the Linux fdisk command is that it does not commit any changes made to the partition table to disk until you explicitly do so with the w command.

With the drive installed, we begin by running the fdisk command:

# fdisk /dev/hda

This brings us to the fdisk command prompt. We start by using the p command to print what partitions are currently on the disk.

Command (m for help): p

Disk /dev/hda: 64 heads, 63 sectors, 786 cylinders

Units = cylinders of 4032 * 512 bytes

Device Boot Begin Start End Blocks Id System

Command (m for help):

We see that there are no partitions on the disk. With 1.6 GB of space, we can be very liberal with allocating space to each partition. Keeping this policy in mind, we begin creating our partitions with the n command:

Command (m for help): n

e extended

p primary partition (1-4)


Partition number (1-4): 1

First cylinder (1-786): 1

Last cylinder or +size or +sizeM or +sizeK ([1]-786): +50M

Command (m for help):

The 50 MB partition we just created becomes our root partition. Because it is the first partition, it is referred to as /dev/hda1. Using the p command, we see our new partition:

Command (m for help): p

Disk /dev/hda: 64 heads, 63 sectors, 786 cylinders

Units = cylinders of 4032 * 512 bytes

Device Boot Begin Start End Blocks Id System

/dev/hda1 1 1 26 52384+ 83 Linux native

Command (m for help):

With the root partition out of the way, we will create the swap partition. Our sample machine has 32 MB of RAM and will be running X-Windows along with a host of development tools. It is unlikely that the machine will get a memory upgrade for a while, so we'll allocate 64 MB to swap.

Command (m for help): n

Command action

e extended

p primary partition (1-4)


Partition number (1-4): 2

First cylinder (27-786): 27

Last cylinder or +size or +sizeM or +sizeK ([27]-786): +64M

Command (m for help):

Because this partition is going to be tagged as swap, we need to change its file system type to swap using the t command.

Command (m for help): t

Partition number (1-4): 2

Hex code (type L to list codes): 82

Changed system type of partition 2 to 82 (Linux swap)

Command (m for help):

Because of the nature of the user, we know that there will be a lot of local software installed on this machine. With that in mind, we'll create /usr with 500 MB of space.

Command (m for help): n

Command action

e extended

p primary partition (1-4)


Partition number (1-4): 3

First cylinder (60-786): 60

Last cylinder or +size or +sizeM or +sizeK ([60]-786): +500M

If you've been keeping your eyes open, you've noticed that we can only have one more primary partition to use, but we want to have /home, /var, and /tmp to be in separate partitions. How do we do this?

Extended partitions.

The remainder of the disk is created as an extended partition. Within this partition, we can create more partitions for use. Let's create this extended partition:

Command (m for help): n

Command action

e extended

p primary partition (1-4)


Partition number (1-4): 4

First cylinder (314-786): 314

Last cylinder or +size or +sizeM or +sizeK ([314]-786): 786

Command (m for help):

We can now create /home inside the extended partition. Our user is going to need a lot of space, so we'll create a 500 MB partition. Notice that we are no longer asked whether we want a primary or extended partition.

Command (m for help): n

First cylinder (314-786): 314

Last cylinder or +size or +sizeM or +sizeK ([314]-786): +500M

Command (m for help):

Using the same pattern, we create a 250 MB /tmp and a 180 MB /var partition.

Command (m for help): n

First cylinder (568-786): 568

Last cylinder or +size or +sizeM or +sizeK ([568]-786): +250M

Command (m for help): n

First cylinder (695-786): 695

Last cylinder or +size or +sizeM or +sizeK ([695]-786): 786

Command (m for help):

Notice on the last partition we created that I did not specify a size, but instead specified the last track. This is to ensure that all of the disk is used.

Using the p command, we look at our final work:

Command (m for help): p

Disk /dev/hda: 64 heads, 63 sectors, 786 cylinders

Units = cylinders of 4032 * 512 bytes

Device Boot Begin Start End Blocks Id System

/dev/hda1 1 1 26 52384+ 83 Linux native

/dev/hda2 27 27 59 66528 82 Linux swap

/dev/hda3 60 60 313 512064 83 Linux native

/dev/hda4 314 314 786 953568 5 Extended

/dev/hda5 314 314 567 512032+ 83 Linux native

/dev/hda6 568 568 694 256000+ 83 Linux native

/dev/hda7 695 695 786 185440+ 83 Linux native

Command (m for help):

Everything looks good. To commit this configuration to disk, we use the w command:

Command (m for help): w

The partition table has been altered!

Calling ioctl() to re-read partition table.

(Reboot to ensure the partition table has been updated.)

Syncing disks.

Reboot the machine to ensure that the partition has been updated and you're done creating the partitions.

Creating File Systems in Linux Creating a partition alone isn't very useful. In order to make it useful, we need to make a file system on top of it. Under Linux, this is done using the mke2fs command and the mkswap command.

To create the file system on the root partition, we use the following commands:

mke2fs /dev/hda1

The program only takes a few seconds to run and generates output similar to this:

mke2fs 0.5b, 14-Feb-95 for EXT2 FS 0.5a, 95/03/19

128016 inodes, 512032 blocks

25601 blocks (5.00%) reserved for the super user

First data block=1

Block size=1024 (log=0)

Fragment size=1024 (log=0)

63 block groups

8192 blocks per group, 8192 fragments per group

2032 inodes per group

Superblock backups stored on blocks:








Writing inode tables: done

Writing superblocks and file system accounting information: done

You should make a note of these superblock backups and keep them in a safe place. Should the day arise that you need to use fsck to fix a superblock gone bad, you will want to know where the backups are.

Simply do this for all of the partitions, except for the swap partition.

To create the swap file system, you need to use the mkswap command like this:

mkswap /dev/hda2

Replace /dev/hda2 with the partition you chose to make your swap space.

The result of the command will be similar to:

Setting up swapspace, size = 35090432 bytes

And the swap space is ready.

To make the root file system bootable, you need to install the lilo boot manager. This is part of all the standard Linux distributions, so you shouldn't need to hunt for it on the Internet.

Simply modify the /etc/lilo.conf file so that /dev/hda1 is set to be the boot disk and run:


The resulting output should look something like:

Added linux *

where linux is the name of the kernel to boot, as specified by the name= field in /etc/lilo.conf.

SunOS In this example, we will be preparting a Seagate ST32550N as an auxiliary disk to an existing system. The disk will be divided into three partitions: one for use as a mail spool, one for use as a /usr/local, and the third as an additional swap partition.

Creating the partitions


CAUTION: The procedure for formatting disks is not the same for SunOS and Solaris. Read each section to note the differences.


Once a disk has been attached to the machine, you should verify its connection and SCSI address by running the probe-scsi command from the PROM monitor if the disk is attached to the internal chain, or the probe-scsi-all command to see all the SCSI devices on the system. When you are sure the drive is properly attached and verified to be functioning, you're ready to start accessing the drive from the OS.

After the machine has booted, run the dmesg command to collect the system diagnostic messages. You may want to pipe the output to grep so that you can easily find the information on disks. For example:

dmesg | grep sd

On our system this generated the following output:


sd1 at esp0 target 1 lun 0

sd1: corrupt label - wrong magic number

sd1: Vendor 'SEAGATE', product 'ST32550N', 4194058 512 byte blocks

root on sd0a fstype 4.2

swap on sd0b fstype spec size 32724K

dump on sd0b fstype spec size 32712K

This result tells us that we have an installed disk on sd0 that the system is aware of and using. The information from the sd1 device is telling us that it found a disk, but it isn't usable because of a corrupt label. Don't worry about the error. Until we partition the disk and create file systems on it, the system doesn't know what to do with it, hence the error.

If you are using SCSI address 0 or 3, remember the oddity we mentioned earlier where device 0 needs to be referenced as 3 and device 3 needs to be referenced as 0.

Even though we do not have to actually format the disk, we do need to use the format program that come with SunOS because it also creates the partitions and writes the label to the disk.

To invoke the format program, simply run:

format sd1

where sd1 is the name of the disk we are going to partition.

The format program displays the following menu:


disk - select a disk

type - select (define) a disk type

partition - select (define) a partition table

current - describe the current disk

format - format and analyze the disk

repair - repair a defective sector

show - translate a disk address

label - write label to the disk

analyze - surface analysis

defect - defect list management

backup - search for backup labels



We need to enter type at the format> prompt so that we can tell SunOS the kind of disk we have. The resulting menu looks something like:


0. Quantum ProDrive 80S

1. Quantum ProDrive 105S

2. CDC Wren IV 94171-344

3. SUN0104


13. other

Specify disk type (enter its number):

Because we are adding a disk this machine has not seen before, we need to select option 13, other. This begins a series of prompts requesting the disk's geometry. Be sure to have this information from the manufacturer before starting this procedure.

The first question, Enter number of data cylinders: is actually a three-part question. After you enter the number of data cylinders, the program asks for the number of alternative cylinders and then the number of physical cylinders. The number of physical cylinders is the number your manufacturer provided you. Subtract two from there to get the number of data cylinders, and then just use the default value of 2 for the number of alternate cylinders. For our Seagate disk, we answered the questions as follows:

Enter number of data cylinders: 3508

Enter number of alternate cylinders [2]: 2

Enter number of physical cylinders [3510]: 3510

Enter number of heads: 11

Enter number of data sectors/track: 108

Enter rpm of drive [3600]:

Enter disk type name (remember quotes): "SEAGATE ST32550N"

selecting sd1:

[disk formatted, no defect list found]

No defined partition tables.

Note that even though our sample drive actually rotates at 7200 rpm, we stick with the default of 3600 rpm because the software will not accept entering a higher speed. Thankfully, this doesn't matter because the operating system doesn't use the information.

Even though format reported that the disk was formatted, it really wasn't. It only acquired information needed to later write the label.

Now we are ready to begin preparations to partition the disk.

These preparations entail computing the amount each cylinder holds and then approximating the number of cylinders we want in each partition.

With our sample disk, we know that each cylinder is composed of 108 sectors on a track, with 11 tracks composing the cylinder.

From the information we saw in dmesg, we know that each block is 512 bytes long. Hence, if we want our mail partition to be 1 GB in size, we perform the following math to compute the necessary blocks:

1 gigabyte = 1048576 kilobytes

One cylinder = 108 sectors * 11 heads = 1188 blocks

1188 blocks = 594 kilobytes

1048576 / 594 = 1765 cylinders

1765 * 1188 = 2096820 blocks

Obviously, there are some rounding errors since the exact one GB mark occurs in the middle of a cylinder and we need to keep each partition on a cylinder boundary. 1,765 cylinders is more than close enough. The 1,765 cylinders translates to 2,096,820 blocks.

The new swap partition we want to make needs to be 64 MB in size. Using the same math as before, we find that our swap needs to be 130,680 blocks long. The last partition on the disk needs to fill the remainder of the disk. Knowing that we have a 2 GB disk, a 1 GB mail spool, and a 64 MB swap partition, this should leave us with about 960 MB for /usr/local.

Armed with this information, we are ready to tackle the partitioning. From the format> prompt, type partition to start the partitioning menu. The resulting screen looks something like this:

format> partition


a - change 'a' partition

b - change 'b' partition

c - change 'c' partition

d - change 'd' partition

e - change 'e' partition

f - change 'f' partition

g - change 'g' partition

h - change 'h' partition

select - select a predefined table

name - name the current table

print - display the current table

label - write partition map and label to the disk



To create our mail partition, we begin by changing partition a. At the partition> prompt, type a.

partition> a

This brings up a prompt for entering the starting cylinder and the number of blocks to allocate. Because this is going to be the first partition on the disk, we start at cylinder 0. Based on the math we did earlier, we know that we need 2,096,820 blocks.

partition a - starting cyl 0, # blocks 0 (0/0/0)

Enter new starting cyl [0]: 0

Enter new # blocks [0, 0/0/0]: 2096820


Now we want to create the b partition, which is traditionally used for swap space. We know how many blocks to use based on our calculations, but we don't know which cylinder to start from.

To solve this, we simply display the current partition information for the entire disk using the p command:

partition> p

Current partition table (unnamed):

partition a - starting cyl 0, # blocks 2096820 (1765/0/0)

partition b - starting cyl 0, # blocks 0 (0/0/0)

partition c - starting cyl 0, # blocks 0 (0/0/0)

partition d - starting cyl 0, # blocks 0 (0/0/0)

partition e - starting cyl 0, # blocks 0 (0/0/0)

partition f - starting cyl 0, # blocks 0 (0/0/0)

partition g - starting cyl 0, # blocks 0 (0/0/0)

partition h - starting cyl 0, # blocks 0 (0/0/0)


We can see that partition a is allocated with 2,096,820 blocks and is 1,765 cylinders long. Because we don't want to waste space on the disk, we start the swap partition on cylinder 1765.

(Remember to count from zero!)

partition> b

partition b - starting cyl 0, # blocks 0 (0/0/0)

Enter new starting cyl [0]: 1765

Enter new # blocks [0, 0/0/0]: 130680


Before we create our last partition, we need to take care of some tradition first, namely partition c. This is usually the partition that spans the entire disk. Before creating this partition, we need to do a little math.

108 cylinders x 11 heads x 3508 data cylinders = 4167504 blocks

Notice that the number of blocks we compute here does not match the number actually on the disk. This number was computed based on the information we entered when giving the disk type information.

It is important that we remain consistent.

Since the c partition spans the entire disk, we specify the starting cylinder as 0. Creating this partition should look something like this:

partition> c

partition c - starting cyl 0, # blocks 0 (0/0/0)

Enter new starting cyl [0]: 0

Enter new # blocks [0, 0/0/0]: 4167504


We have only one partition left to create: /usr/local. Because we want to fill the remainder of the disk, we need to do one last bit of math to compute how many blocks are still free.

This is done by taking the size of partition c (the total disk) and subtracting the sizes of the existing partitions. For our example, this works out to be:

4167504 - 2096820 - 130680 = 1940004 remaining blocks

Now we need to find out which cylinder to start from.

To do so, we run the p command again:

partition> p

Current partition table (unnamed):

partition a - starting cyl 0, # blocks 2096820 (1765/0/0)

partition b - starting cyl 1765, # blocks 130680 (110/0/0)

partition c - starting cyl 0, # blocks 4167504 (3508/0/0)

partition d - starting cyl 0, # blocks 0 (0/0/0)

partition e - starting cyl 0, # blocks 0 (0/0/0)

partition f - starting cyl 0, # blocks 0 (0/0/0)

partition g - starting cyl 0, # blocks 0 (0/0/0)

partition h - starting cyl 0, # blocks 0 (0/0/0)


To figure out which cylinder to start from, we add the number of cylinders used so far. Remember not to add the cylinders from partition c since it encompasses the entire disk.

1765 + 110 = 1875

Now that we know which cylinder to start from and how many blocks to make it, we create our last partition.

partition> d

partition d - starting cyl 0, # blocks 0 (0/0/0)

Enter new starting cyl [0]: 1875

Enter new # blocks [0, 0/0/0]: 1940004


Congratulations! You've made it through the ugly part. Before we can truly claim victory, we need to commit these changes to disk using the label command. When given the prompt, Ready to label disk, continue? simply answer y.

partition> label

Ready to label disk, continue? y


To leave the format program, type quit at the partition> prompt, and then quit again at the format> prompt.

Creating File Systems Now comes the easy part. Simply run the newfs command on all the partitions we created except for the swap partition and the entire disk partition . Your output should look similar to this:

# newfs sd1a

/dev/rsd1a: 2096820 sectors in 1765 cylinders of 11 tracks, 108 sectors

1073.6MB in 111 cyl groups (16 c/g, 9.73MB/g, 4480 i/g)

superblock backups (for fsck -b #) at:

32, 19152, 38272, 57392, 76512, 95632, 114752, 133872, 152992,

172112, 191232, 210352, 229472, 248592, 267712, 286832, 304160, 323280,

342400, 361520, 380640, 399760, 418880, 438000, 457120, 476240, 495360,

514480, 533600, 552720, 571840, 590960, 608288, 627408, 646528, 665648,

684768, 703888, 723008, 742128, 761248, 780368, 799488, 818608, 837728,

856848, 875968, 895088, 912416, 931536, 950656, 969776, 988896, 1008016,

1027136, 1046256, 1065376, 1084496, 1103616, 1122736, 1141856, 1160976, 1180096,

1199216, 1216544, 1235664, 1254784, 1273904, 1293024, 1312144, 1331264, 1350384,

1369504, 1388624, 1407744, 1426864, 1445984, 1465104, 1484224, 1503344, 1520672,

1539792, 1558912, 1578032, 1597152, 1616272, 1635392, 1654512, 1673632, 1692752,

1711872, 1730992, 1750112, 1769232, 1788352, 1807472, 1824800, 1843920, 1863040,

1882160, 1901280, 1920400, 1939520, 1958640, 1977760, 1996880, 2016000, 2035120,

2054240, 2073360, 2092480,

Be sure to note the superblock backups. This is critical information when fsck discovers heavy corruption in your file system. Remember to add your new entries into /etc/fstab if you want them to automatically mount on boot.

If you created the first partition with the intention of making it bootable, you have a few more steps to go. First, mount the new file system to /mnt.

# mount /dev/sd1a /mnt

Once the file system is mounted, you need to clone your existing boot partition using the dump command like this:

# cd /mnt

# dump 0f - / | restore -rf -

With the root partition cloned, use the installboot command to make it bootable:

# /usr/kvm/mdec/installboot /mnt/boot /usr/kvm/mdec/bootsd /dev/rsd1a

Be sure to test your work by rebooting and making sure everything mounts correctly. If you created a bootable partition, be sure you can boot from it now. Don't wait for a disaster to find out whether or not you did it right.

Solaris For this example, we are partitioning a disk that is destined to be a web server for an intranet. We need a minimal root partition, adequate swap, tmp, var, and usr space, and a really large partition, which we'll call /web. Because the web logs will remain on the /web partition, and there will be little or no user activity on the machine, /var and /tmp will be set to smaller values. /usr will be a little larger because it may be destined to house web development tools.

Creating partitions


TIP: In another wondrous effort on its part to be just a little different, Sun has decided to call partitions "slices." With the number of documents regarding the file system so vast, you'll find that not all of them have been updated to use this new term, so don't be confused by the mix of "slices" with "partitions"--they are both the same.


Once a disk has been attached to the machine, you should verify its connection and SCSI address by running the probe-scsi command from the PROM monitor if the disk is attached to the internal SCSI chain, probe-scsi-all to list all the SCSI devices on the system Once this shows that the drive is properly attached and verified to be functioning, you're ready to start accessing the drive from the OS. Boot the machine and login as root.

In order to find the device name, we are going to use for this, we again use the dmesg command.

# dmesg | grep sd


sd1 at esp0: target 1 lun 0

sd1 is /sbus@1,f8000000/esp@0,800000/sd@1,0

WARNING: /sbus@1,f8000000/esp@0,800000/sd@1,0 (sd1):

corrupt label - wrong magic number

Vendor 'SEAGATE', product 'ST32550N', 4194058 512 byte blocks


From this message, we see that our new disk is device /dev/[r]dsk/c0t1d0s2. The disk hasn't been set up for use on a Solaris machine before, which is why we received the corrupt label error.

If you recall the layout of Solaris device names, you'll remember that the last digit on the device name is the partition number. Noting that, we see that Solaris refers to the entire disk in partition 2, much the same way SunOS refers to the entire disk as partition c.

Before we can actually label and partition the disk, we need to create the device files. This is done with the drvconfig and disks commands. They should be invoked with no parameters:

# drvconfig ; disks

Now that the kernel is aware of the disk, we are ready to run the format command to partition the disk.

# format /dev/rdsk/c0t1d0s2

This brings up the format menu as follows:


disk - select a disk

type - select (define) a disk type

partition - select (define) a partition table

current - describe the current disk

format - format and analyze the disk

repair - repair a defective sector

label - write label to the disk

analyze - surface analysis

defect - defect list management

backup - search for backup labels

verify - read and display labels

save - save new disk/partition definitions

inquiry - show vendor, product and revision

volname - set 8-character volume name



To help the format command with partitioning, we need to tell it the disk's geometry by invoking the type command at the format> prompt. We will then be asked to select what kind of disk we have. Because this is the first time this system is seeing this disk, we need to select other. This should look something like this:

format> type


0. Auto configure

1. Quantum ProDrive 80S

2. Quantum ProDrive 105S

3. CDC Wren IV 94171-344

. . .

16. other

Specify disk type (enter its number): 16

The system now prompts for the number of data cylinders. This is two less than the number of cylinders the vendor specifies because Solaris needs two cylinders for bad block mapping.

Enter number of data cylinders: 3508

Enter number of alternate cylinders[2]: 2

Enter number of physical cylinders[3510]: 3510

The next question can be answered from the vendor specs as well.

Enter number of heads: 14

The followup question about drive heads can be left as default.

Enter physical number of heads[default]:

The last question you must answer can be pulled from the vendor specs as well.

Enter number of data sectors/track: 72

The remaining questions should be left as default.

Enter number of physical sectors/track[default]:

Enter rpm of drive[3600]:

Enter format time[default]:

Enter cylinder skew[default]:

Enter track skew[default]:

Enter tracks per zone[default]:

Enter alternate tracks[default]:

Enter alternate sectors[default]:

Enter cache control[default]:

Enter prefetch threshold[default]:

Enter minimum prefetch[default]:

Enter maximum prefetch[default]:

The last question you must answer about the disk is its label information. Enter the vendor name and model number in double quotes for this question. For our sample disk, this would be:

Enter disk type name (remember quotes): "SEAGATE ST32550N"

With this information, Solaris makes creating partitions easy. Dare I say, fun?

After the last question from the type command, you will be placed at the format> prompt. Enter partition to start the partition menu.

format> partition


0 - change '0' partition

1 - change '1' partition

2 - change '2' partition

3 - change '3' partition

4 - change '4' partition

5 - change '5' partition

6 - change '6' partition

7 - change '7' partition

select - select a predefined table

modify - modify a predefined partition table

name - name the current table

print - display the current table

label - write partition map and label to the disk



At the partition> prompt, enter modify to begin creating the new partitions. This brings up a question about what template to use for partitioning. We want the All Free Hog method.

partition> modify

Select partitioning base:

0. Current partition table (unnamed)

1. All Free Hog

Choose base (enter number)[0]? 1

The All Free Hog method enables you to select one partition to receive the remainder of the disk once you have allocated a specific amount of space for the other partitions. For our example, the disk hog would be the /web partition because you want it to be as large as possible.

As soon as you select option 1, you should see the following screen:

Part Tag Flag Cylinders Size Blocks

0 root wm 0 0 (0/0/0)

1 swap wu 0 0 (0/0/0)

2 backup wu 0 - 3507 1.99GB (3508/0/0)

3 unassigned wm 0 0 (0/0/0)

4 unassigned wm 0 0 (0/0/0)

5 unassigned wm 0 0 (0/0/0)

6 usr wm 0 0 (0/0/0)

7 unassigned wm 0 0 (0/0/0)

Do you wish to continue creating a new partition

table based on above table [yes]? yes

Because the partition table appears reasonable, agree to use it as a base for your scheme. You will now be asked which partition should be the Free Hog Partition, the one that receives whatever is left of the disk when everything else has been allocated.

For our scheme, we'll make that partition number 5.

Free Hog Partition[6]? 5

Answering this question starts the list of questions asking how large to make the other partitions. For our web server, we need a root partition to be about 200 MB for the system software, a swap partition to be 64 MB, a /tmp partition to be 200 MB, a /var partition to be 200 MB, and a /usr partition to be 400 MB. Keeping in mind that partition 2 has already been tagged as the "entire disk" and that partition 5 will receive the remainder of the disk, you will be prompted as follows:

Enter size of partition '0' [0b, 0c, 0.00mb]: 200mb

Enter size of partition '1' [0b, 0c, 0.00mb]: 64mb

Enter size of partition '3' [0b, 0c, 0.00mb]: 200mb

Enter size of partition '4' [0b, 0c, 0.00mb]: 200mb

Enter size of partition '6' [0b, 0c, 0.00mb]: 400mb

Enter size of partition '7' [0b, 0c, 0.00mb]: 0

As soon as you finish answering these questions, the final view of all the partitions appears looking something like:

Part Tag Flag Cylinders Size Blocks

0 root wm 0 - 344 200.13mb (345/0/0)

1 swap wu 345 - 455 64.39mb (111/0/0)

2 backup wu 0 - 3507 1.99GB (3508/0/0)

3 unassigned wm 456 - 800 200.13mb (345/0/0)

4 unassigned wm 801 - 1145 200.13mb (345/0/0)

5 unassigned wm 1146 - 2817 969.89mb (1672/0/0)

6 unassigned wm 2818 - 3507 400.25mb (690/0/0)

7 unassigned wm 0 0 (0/0/0)

This is followed by the question:

Okay to make this the correct partition table [yes]? yes

Answer yes since the table appears reasonable. This brings up the question:

Enter table name (remember quotes): "SEAGATE ST32550N"

Answer with a description of the disk you are using for this example. Remember to include the quote symbols when answering. Given all of this information, the system is ready to commit this to disk. As one last check, you will be asked:

Ready to label disk, continue? y

As you might imagine, we answer yes to the question and let it commit the changes to disk. You have now created partitions and can quit the program by entering quit at the partition> prompt and again at the format> prompt.

Creating file systems To create a file system, simply run:

# newfs /dev/c0t1d0s0

where /dev/c0t1d0s0 is the partition on which to create the file system. Be sure to create a file system on all the partitions except for partitions 2 and 3, the swap, and entire disk, respectively. Be sure to note the backup superblocks that were created. This information is very useful when fsck is attempting to repair a heavily damaged file system.

After you create the file systems, be sure to enter them into the /etc/vfstab file so that they are mounted the next time you reboot.

If you need to make the root partition bootable, you still have two more steps. The first is to clone the root partition from your existing system to the new root partition using:

# mount /dev/dsk/c0t1d0s0 /mnt

# ufsdump 0uf - / | ufsrestore -rf -

Once the file root partition is cloned, you can run the installboot program like this:

# /usr/sbin/installboot /usr/lib/fs/ufs/bootblk /dev/rdsk/c0t1d0s0

Be sure to test your new file systems before you need to rely on them in a disaster situation.

IRIX For this example, we are creating a large scratch partition for a user who does modeling and simulations. Although IRIX has many GUI-based tools to perform these tasks, it is always a good idea to learn the command line versions just in case you need to do any kind of remote administration.

Creating partitions Once the drive is attached, run a program called hinv to take a "hardware inventory." On the sample system, you saw the following output:


Integral SCSI controller 1: Version WD33C93B, revision D

Disk drive: unit 6 on SCSI controller 1

Integral SCSI controller 0: Version WD33C93B, revision D

Disk drive: unit 1 on SCSI controller 0


Our new disk is external to the system, so we know it is residing on controller 1. Unit 6 is the only disk on that chain, so we know that it is the disk we just added to the system.

To partition the disk, run the fx command without any parameters. It prompts us for the device name, controller, and drive number. Choose the default device name and enter the appropriate information for the other two questions.

On our sample system, this would look like:

# fx

fx version 6.2, Mar 9, 1996

fx: "device-name" = (dksc)

fx: ctlr# = (0) 1

fx: drive# = (1) 6

fx: lun# = (0)

...opening dksc(1,6,0)

...controller test...OK

Scsi drive type == SEAGATE ST32550N 0022

----- please choose one (? for help, .. to quit this menu)-----

[exi]t [d]ebug/ [l]abel/

[b]adblock/ [exe]rcise/ [r]epartition/


We see that fx found our Seagate and is ready to work with it. From the menu we select r to repartition the disk. fx displays what it knows about the disk and then presents another menu specifically for partitioning the disk.

fx> r

----- partitions-----

part type cyls blocks Megabytes (base+size)

7: xfs 3 + 3521 3570 + 4189990 2 + 2046

8: volhdr 0 + 3 0 + 3570 0 + 2

10: volume 0 + 3524 0 + 4193560 0 + 2048

capacity is 4194058 blocks

----- please choose one (? for help, .. to quit this menu)-----

[ro]otdrive [u]srrootdrive [o]ptiondrive [re]size


Looking at the result, we see that this disk has never been partitioned in IRIX before. Part 7 represents the amount of partitionable space, part 8 the volume header, and part 10 the entire disk.

Because this disk is going to be used as a large scratch partition, we want to select the optiondrive option from the menu. After you select that, you are asked what kind of file system you want to use. IRIX 6 and above defaults to xfs, while IRIX 5 defaults to efs. Use the one appropriate for your version of IRIX.

Our sample system is running IRIX 6.3, so we accept the default of xfs:

fx/repartition> o

fx/repartition/optiondrive: type of data partition = (xfs)

Next we are asked whether we want to create a /usr log partition. Because our primary system already has a /usr partition, we don't need one here. Type no.

fx/repartition/optiondrive: create usr log partition? = (yes) no

The system is ready to partition the drive. Before it does, it gives one last warning allowing you to stop the partitioning before it completes the job. Because you know you are partitioning the correct disk, you can give it "the go-ahead":

Warning: you must reinstall all software and restore user data from backups after changing the partition layout. Changing partitions causes all data on the drive to be lost. Be sure you have the drive backed up if it contains any user data. Continue? y

The system takes a few seconds to create the new partitions on the disk. Once it is done, it reports what the current partition list looks like.

----- partitions-----

part type cyls blocks Megabytes (base+size)

7: xfs 3 + 3521 3570 + 4189990 2 + 2046

8: volhdr 0 + 3 0 + 3570 0 + 2

10: volume 0 + 3524 0 + 4193560 0 + 2048

capacity is 4194058 blocks

----- please choose one (? for help, .. to quit this menu)-----

[ro]otdrive [u]srrootdrive [o]ptiondrive [re]size


Looks good. We can exit fx now by typing .. at the fx/repartition> prompt and exit at the fx> prompt.

Our one large scratch partition is now called /dev/dsk/dks1d6s7.

Creating the filesystem To create the file system, we use the mkfs command like this:

# mkfs /dev/rdsk/dks1d6s7

This generates the following output:

meta-data=/dev/dsk/dks1d6s7 isize=256 agcount=8, agsize=65469 blks

data = bsize=4096 blocks=523748, imaxpct=25

log =internal log bsize=4096 blocks=1000

realtime =none bsize=65536 blocks=0, rtextents=0

Remember to add this entry into the /etc/fstab file so that the system automatically mounts the next time you reboot.


As you've seen in this chapter, creating, maintaining, and repairing filesystems is not a trivial task. It is, however, a task which should be well understood. An unmaintained file system can quickly lead to trouble and without its stability, the remainder of the system is useless.

Let's make a quick rundown of the topics we covered:

Disks are broken into partitions (sometimes called slices).

Each partition has a file system.

A file system is the primary means of file storage in UNIX.

File systems are made of inodes and superblocks.

Some partitions are used for raw data such as swap.

The /proc file system really isn't a file system, but an abstraction to kernel data.

An inode maintains critical file information.

Superblocks track disk information as well as the location of the heads of various inode lists.

In order for you to use a file system, it must be mounted.

No one must be accessing a file system in order for it to be unmounted.

File systems can be mounted anywhere in the directory tree.

/etc/fstab (vfstab in Solaris) is used to by the system to automatically mount file systems on boot.

The root file system should be kept away from users.

The root file system should never get filled.

Be sure to watch how much space is being used.

fsck is the tool to use to repair file systems.

Don't forget to terminate your SCSI chain!

In short, file systems administration is not a trivial task and should not be taken lightly. Good maintenance techniques not only help maintain your uptime, but your sanity as well.

©Copyright, Macmillan Computer Publishing. All rights reserved.

No comments:


My Blog List