0% found this document useful (0 votes)
378 views369 pages

The Unix Programming Environment PDF

The classic introduction to the unix operating system by 2 of the leading researchers-Brian kernighan and Rob Pike

Uploaded by

paul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
378 views369 pages

The Unix Programming Environment PDF

The classic introduction to the unix operating system by 2 of the leading researchers-Brian kernighan and Rob Pike

Uploaded by

paul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 369
2 dl The UNIX Programming Environment Brian W. Kernighan Rob Pike Bell Laboratories Murray Hill, New Jersey PRENTICE-HALL, INC. Englewood Cliffs, New Jersey 07632 * UNIX is a Trademark of Bell LaboratoriesLibrary of Congress Catalog Card Number 8 3 -6 285 1 Prentice-Hall Software Series Brian W. Kernighan, Advisor Editorial/production supervision: Ros Herion Cover design: Photo Plus Art, Celine Brandes Manufacturing buyer: Gordon Osbourne Copyright © 1984 by Bell Telephone Laboratories, Incorporated. Alll rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopy- ing, recording, or otherwise, without the prior written permission of the publisher. Printed in the United States of America. Published simultaneously in Canada. This book was typeset in Times Roman and Courier by the authors, using a Mer- genthaler Linotron 202 phototypesetter driven by a VAX-11/750 running the 8th Edition of the UNIX operating system. UNIX is a trademark of Bell Laboratories. DEC, PDP and VAX are trademarks of Digital Equipment Corporation. 20:19 18 17 16 1S 14 ISBN O-13-937b99-2 ISBN O-13-937?b81-X {PBK} PRENTICE-HALL INTERNATIONAL, INC., London PRENTICE-HALL OF AUSTRALIA PTY. LIMITED, Sydney EDITORA PRENTICE-HALL DO BRASIL, LTDA., Rio de Janeiro PRENTICE-HALL CANADA INC., Toronto PRENTICE-HALL OF INDIA PRIVATE LIMITED, New Delhi PRENTICE-HALL OF JAPAN, INC., Tokyo PRENTICE-HALL OF SOUTHEAST ASIA PTE. LTD., Singapore WHITEHALL BOOKS LIMITED, Wellington, New Zealand3. 4. Preface UNIX for Beginners 1.1 Getting started 1.2. Day-to-day use: files and common commands 1.3 More about files: directories 1.4 The shell 1.5 The rest of the UNIX system The File System 2.1. The basics of files 2.2 What’s in a file? 2.3. Directories and filenames 2.4. Permissions 2.5. Inodes 2.6. The directory hierarchy 2.7 Devices Using the Shell 3.1 Command line structure 3.2 Metacharacters 3.3 Creating new commands 3.4 Command arguments and parameters 3.5 Program output as arguments 3.6 Shell variables 3.7 More on I/O redirection 3.8 Looping in shell programs 3.9 bundle: putting it all together 3.10 Why a programmable shell? Filters 4.1 The grep family 4.2. Other filters iii CONTENTS 101 102 106 upon their backs to bite them, and so on ad infinitum. $ This says that line 2 in the first file (poem) has to be changed into line 2 of the second file (new_poem), and similarly for line 4. Generally speaking, cmp is used when you want to be sure that two files really have the same contents. It’s fast and it works on any kind of file, not just text. diff is used when the files are expected to be somewhat different, and you want to know exactly which lines differ. diff works only on files of text. A summary of file system commands Table 1.1 is a brief summary of the commands we’ve seen so far that deal with files. 1.3 More about files: directories The system distinguishes your file called junk from anyone else’s of the same name. The distinction is made by grouping files into directories, rather in the way that books are placed on shelves in a library, so files in different directories can have the same name without any conflict. Generally each user has a personal or home directory, sometimes called login directory, that contains only the files that belong to him or her. When you log in, you are “in” your home directory. You may change the directory you are working in — often called your working or current directory — but your home directory is always the same. Unless you take special action, when you create a new file it is made in your current directory. Since this is initially your home directory, the file is unrelated to a file of the same name that might exist in someone else’s directory. A directory can contain other directories as well as ordinary files (“Great directories have lesser directories ...”). The natural way to picture this organi- zation is as a tree of directories and files. It is possible to move around within this tree, and to find any file in the system by starting at the root of the tree and moving along the proper branches. Conversely, you can start where you are and move toward the root. Let’s try the latter first. Our basic tool is the command pwd (“print work- ing directory”), which prints the name of the directory you are currently in:22. THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 1 Table 1.1: 1s 1s filenames ls -t a is -u cy ed filename cp filel file2 mv file! file xm filenames cat filenames pr filenames pr -n filenames pr -m filenames we filenames we -1. filenames grep pattern filenames grep -v pattern files sort filenames tail filename tail -n filename tail +n filename cmp file! file2 aift file! file2 Common File System Commands list names of all files in current directory list only the named files list in time order, most recent first. ‘ist long: more information; also 1s -1t list by time last used; also 1s -1u, 1s -lut list in reverse order; also -rt, -r1t, etc. edit named file copy file! to file2, overwrite old file2 if it exists move file! to file2, overwrite old file2 if it exists remove named files, irrevocably print contents of named files print contents with header, 66 lines per page print in n columns print named files side by side (multiple columns) count lines, words and characters for each file count lines for each file print lines matching pattern print lines not matching pattern sort files alphabetically by line print last 10 lines of file print last n lines of file start printing file at line n print location of first difference print all differences between files $ pwd /asx/you $ This says that you are currently in the directory you, in the directory usr, which in turn is in the root directory, which is conventionally called just ‘/”. The / characters separate the components of the name; the limit of 14 charac- ters mentioned above applies to each component of such a name. On many systems, /usr is a directory that contains the directories of all the normal users of the system. (Even if your home directory is not /usr/you, pwd will print something analogous, so you should be able to follow what happens below.) If you now typeCHAPTER 1 UNIX FOR BEGINNERS 23 $ 1s /usr/you you should get exactly the same list of file names as you get from a plain 1s. When no arguments are provided, 1s lists the contents of the current direc- tory; given the name of a directory, it lists the contents of that directory. Next, try $ 1s /usr This should print a long series of names, among which is your own login direc- tory you. The next step is to try listing the root itself. You should get a response similar to this: $is/7 bin boot dev etc lib tmp unix usr $ (Don’t be confused by the two meanings of /: it’s both the name of the root and a separator in filenames.) Most of these are directories, but unix is actu- ally a file containing the executable form of the UNIX kernel. More on this in Chapter 2. Now try $ cat /usr/you/junk (if junk is still in your directory). The name /usr/you/ junk is called the pathname of the file. “Pathname” has an intuitive meaning: it represents the full name of the path from the root through the tree of direc- tories to a particular file. It is a universal rule in the UNIX system that wher- ever you can use an ordinary filename, you can use a pathname. The file system is structured like a genealogical tree; here is a picture that may make it clearer.24 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 1 Ha bing ideve etc. user tmp unix boot) “hd you" mike paul junk junk temp tM. Your file named junk is unrelated to Paul’s or to Mary’s. Pathnames aren’t too exciting if all the files of interest are in your own directory, but if you work with someone else or on several projects con- currently, they become handy indeed. For example, your friends can print your junk by saying / $ cat /usr/you/junk Similarly, you can find out what files Mary has by saying $ 1s /usr/mary data junk $ or make your own copy of one of her files by $ cp /usr/mary/data data or edit her file: $ ed /usr/mary/data If Mary doesn’t want you poking around in her files, or vice versa, privacy can be arranged. Each file and directory has read-write-execute permissions for the owner, a group, and everyone else, which can be used to control access. (Recall 1s -1.) In our local systems, most users most of the time find open- ness of more benefit than privacy, but policy may be different on your system, so we'll get back to this in Chapter 2. As a final set of experiments with pathnames, try $ 1s /bin /usr/bin Do some of the names look familiar? When you run a command by typing its name after the prompt, the system looks for a file of that name. It normally looks first in your current directory (where it probably doesn’t find it), then in /bin, and finally in /usr/bin. There is nothing special about commandsCHAPTER 1 UNIX FOR BEGINNERS 25 like cat or 1s, except that they have been collected into a couple of direc- tories to be easy to find and administer. To verify this, try to execute some of these programs by using their full pathnames: $ /bin/date Mon Sep 26 23:29:32 EDT 1983 $ /bin/who srm tty1 Sep 26 22:20 cvw tty4 Sep 26 22:40 you ttyS Sep 26 23:04 $ Exercise 1-3. Try $ 1s /asr/games and do whatever comes naturally. Things might be more fun outside of normal working hours. Changing directory — ca If you work regularly with Mary on information in her directory, you can say “I want to work on Mary’s files instead of my own.” This is done by changing your current directory with the cd command: $ cd /usr/mary Now when you use a filename (without /’s) as an argument to cat or pr, it refers to the file in Mary’s directory. Changing directories doesn’t affect any permissions associated with a file — if you couldn’t access a file from your own directory, changing to another directory won’t alter that fact. It is usually convenient to arrange your own files so that all the files related to one thing are in a directory separate from other projects. For example, if you want to write a book, you might want to keep all the text in a directory called book. The command mkdir makes a new directory. $ mkdir book Make a directory $ cd book Go to it $ pwd Make sure you're in the right place /asx/you/book Write the book (several minutes pass) $ cd .. Move up one level in file system $ pwd /usr/you $ refers to the parent of whatever directory you are currently in, the direc- tory one level closer to the root. *.’ is a synonym for the current directory. $ ed Return to home directory26 = THE UNIX PROGRAMMING ENVIRONMENT CHAPTER | all by itself will take you back to your home directory, the directory where you log in. Once your book is published, you can clean up the files. To remove the directory book, remove all the files in it (we'll show a fast way shortly), then cd to the parent directory of book and type $ xmdir book rmdix will only remove an empty directory. 1.4 The shell When the system prints the prompt $ and you type commands that get exe- cuted, it’s not the kernel that is talking to you, but a go-between called the command interpreter or shell. The shell is just an ordinary program like date or who, although it can do some remarkable things. The fact that the shell sits between you and the facilities of the kernel has real benefits, some of which we'll talk about here. There are three main ones: Filename shorthands: you can pick up a whole set of filenames as argu- ments to a program by specifying a pattern for the names — the shell will find the filenames that match your pattern. Input-output redirection: you can arrange for the output of any program to go into a file instead of onto the terminal, and for the input to come from a file instead of the terminal. Input and output can even be connected to other programs. Personalizing the environment: you can define your own commands and shorthands. Filename shorthand Let’s begin with filename patterns. Suppose you're typing a large document like a book. Logically this divides into many small pieces, like chapters and perhaps sections. Physically it should be divided too, because it is cumbersome to edit large files. Thus you should type the document as a number of files. You might have separate files for each chapter, called ch1, ch2, etc. Or, if each chapter were broken into sections, you might create files called chi.1 ch1.2 ch1.3 ch2.1 ch2.2 which is the organization we used for this book. With a systematic naming convention, you can tell at a glance where a particular file fits into the whole. What if you want to print the whole book? You could sayCHAPTER 1 UNIX FOR BEGINNERS 27 $ pr ch1.1 ch1.2 ch1.3 ... but you would soon get bored typing filenames and start to make mistakes. This is where filename shorthand comes in. If you say $ pr chx the shell takes the * to mean “any string of characters,” so ch» is a pattern that matches all filenames in the current directory that begin with ch. The shell creates the list, in alphabetical? order, and passes the list to pr. The pr command never sees the *; the pattern match that the shell does in the current directory generates a list of strings that are passed to pr. The crucial point is that filename shorthand is not a property of the pr command, but a service of the shell. Thus you can use it to generate a sequence of filenames for any command. For example, to count the words in the first chapter: $ wo ch1.* 113 562 3200 ch1.0 935 4081 22435 ch1.1 974 4191 22756 ch1.2 378 «1561 8481 ch1.3 1293 5298 28841 ch1.4 33 194 1190 ch1.5 75 323 2030 ch1.6 3801 16210 88933 total $ There is a program called echo that is especially valuable for experiment- ing with the meaning of the shorthand characters. As you might guess, echo does nothing more than echo its arguments: $ echo hello world hello world $ But the arguments can be generated by pattern-matching: $ echo ch1.* lists the names of all the files in Chapter 1, $ echo * lists all the filenames in the current directory in alphabetical order, $ pr prints all your files (in alphabetical order), and + Again, the order is not strictly alphabetical, in that upper case letters come before lower case letters. See ascii(7) for the ordering of the characters used in the sort.28 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 1 $ rm * removes ail files in your current directory. (You had better be very sure that’s what you wanted to say!) The * is not limited to the last position in a filename — *’s can be any- where and can occur several times. Thus $ rm *.save removes all files that end with . save. Notice that the filenames are sorted alphabetically, which is not the same as numerically. If your book has ten chapters, the order might not be what you intended, since ch10 comes before ch2: $ echo * ch1.1 ch1.2 ... ch10.1 ch10.2 ... ch2.1 ch2.2 ... $ The * is not the only pattern-matching feature provided by the shell, although it’s by far the most frequently used. The pattern [...] matches any of the characters inside the brackets. A range of consecutive letters or digits can be abbreviated: $ pr ch[12346789]* Print chapters 1,2,3,4,6,7,8,9 but not 5 $ pr ch[1-46-9]* Same thing $ xm templa-z] Remove any of tempa, ..., tempz that exist The ? pattern matches any single character: $ 1s? List files with single-character names $ 1s -1 ch?.1 List ch1.1 ch2.1 ch3.1, etc. but not ch10.1 $ xm temp? Remove files temp1, ..., tempa, etc. Note that the patterns match only existing filenames. In particular, you cannot make up new filenames by using patterns. For example, if you want to expand ch to chapter in each filename, you cannot do it this way: $ mv ch.* chapter.+ Doesn't work! because chapter .* matches no existing filenames. Pattern characters like * can be used in pathnames as well as simple filenames; the match is done for each component of the path that contains a special character. Thus /usr/mary/* performs the match in /usr/mary, and /usr/*/calendar generates a list of pathnames of all user calendar files. If you should ever have to turn off the special meaning of *, ?, etc., enclose the entire argument in single quotes, as in OH ae You can also precede a special character with a backslash:CHAPTER 1 UNIX FOR BEGINNERS 29. $ 1s \? (Remember that because ? is not the erase or line kill character, this backslash is interpreted by the shell, not by the kernel.) Quoting is treated at length in Chapter 3. Exercise 1-4. What are the differences among these commands? $ 1s junk $ echo junk $ is / $ echo / $ 1s $ echo sis # $ echo * $ Is ‘*’ $ echo ’s’ o Input-output redirection Most of the commands we have seen so far produce output on the terminal; some, like the editor, also take their input from the terminal. It is nearly universal that the terminal can be replaced by a file for either or both of input and output. As one example, $ 1s makes a list of filenames on your terminal. But if you say $ Is >filelist that same list of filenames will be placed in the file £ilelist instead. The symbol > means “put the output in the following file, rather than on the termi- nal.” The file will be created if it doesn’t already exist, or the previous con- tents overwritten if it does. Nothing is produced on your terminal. As another example, you can combine several files into one by capturing the out- put of cat in a file: $ cat £1 £2 £3 >temp The symbol >> operates much as > does, except that it means “add to the end of.” That is, $ cat £1 £2 £3 >>temp copies the contents of £1, £2 and £3 onto the end of whatever is already in temp, instead of overwriting the existing contents. As with >, if temp doesn’t exist, it will be created initially empty for you. In a similar way, the symbol or , it becomes possible to combine commands to achieve effects not possible otherwise. For example, to print an alphabetical list of users, $ who >temp $ sort temp $ we -1 temp $ we -1 temp $ pr -3 temp $ grep mary and 1s.out causes 1s. out to be included in the list of names. 0 Exercise 1-6. Explain the output from $ we temp >temp If you misspell a command name, as in $ woh >temp what happens? 0 Pipes All of the examples at the end of the previous section rely on the same trick: putting the output of one program into the input of another via a tem- porary file. But the temporary file has no other purpose; indeed, it’s clumsy to have to use such a file. This observation leads to one of the fundamental con- tributions of the UNIX system, the idea of a pipe. A pipe is a way to connect the output of one program to the input of another program without any tem- porary file; a pipeline is a connection of two or more programs through pipes. Let us revise some of the earlier examples to use pipes instead of tem- poraries. The vertical bar character | tells the shell to set up a pipeline: $ who / sort Print sorted list of users $ who | we -1 Count users $ Is i we -1 Count files $ Is f pr -3 3-column list of filenames $ who / grep mary Look for particular user Any program that reads from the terminal can read from a pipe instead; any program that writes on the terminal can write to a pipe. This is where the convention of reading the standard input when no files are named pays off: any program that adheres to the convention can be used in pipelines. grep, pr, sort and we are all used that way in the pipelines above. You can have as many programs in a pipeline as you wish:32 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 1 $ 1s | pr -3 | Ipr creates a 3-column list of filenames on the line printer, and $ who | grep mary ! we -1 counts how many times Mary is logged in. The programs in a pipeline actually run at the same time, not one after another. This means that the programs in a pipeline can be interactive; the kernel looks after whatever scheduling and synchronization is needed to make it all work. As you probably suspect by now, the shell arranges things when you ask for a pipe; the individual programs are oblivious to the redirection. Of course, programs have to operate sensibly if they are to be combined this way. Most commands follow a common design, so they will fit properly into pipelines at any position. Normally a command invocation looks like command optional-arguments optional-filenames If no filenames are given, the command reads its standard input, which is by default the terminal (handy for experimenting) but which can be redirected to come from a file or a pipe. At the same time, on the output side, most com- mands write their output on the standard output, which is by default sent to the terminal. But it too can be redirected to a file or a pipe. Error messages from commands have to be handled differently, however, or they might disappear into a file or down a pipe. So each command has a standard error output as well, which is normally directed to your terminal. Or, as a picture: standard input command, standard : => i — or files options output standard error Almost all of the commands we have talked about so far fit this model; the only exceptions are commands like date and who that read no input, and a few like cmp and diff that have a fixed number of file inputs. (But look at the ‘-’ option on these.) Exercise 1-7. Explain the difference between $ who | sort andCHAPTER 1 UNIX FOR BEGINNERS 33 $ who >sort o Processes The shell does quite a few things besides setting up pipes. Let us turn briefly to the basics of running more than one program at a time, since we have already seen a bit of that with pipes. For example, you can run two pro- grams with one command line by separating the commands with a semicolon; the shell recognizes the semicolon and breaks the line into two commands: $ date; who Tue Sep 27 01:03:17 EDT 1983 ken ttyO Sep 27 00:43 amr tty! Sep 26 23:45 xob tty2 Sep 26 23:59 bwk tty3 Sep 27 00:06 3 tty4 Sep 26 23:31 you ttyS Sep 26 ber tty7 Sep 26 Both commands are executed (in sequence) before the shell returns with a prompt character. You can also have more than one program running simultaneously if you wish. For example, suppose you want to do something time-consuming like counting the words in your book, but you don’t want to wait for we to finish before you start something else. Then you can say $ wo ch* >we.out & 6944 Process-id printed by the shell $ The ampersand & at the end of a command line says to the shell “start this command running, then take further commands from the terminal immedi- ately,” that is, don’t wait for it to complete. Thus the command will begin, but you can do something else while it’s running. Directing the output into the file we. out keeps it from interfering with whatever you're doing at the same time. An instance of a running program is called a process. The number printed by the shell for a command initiated with & is called the process-id; you can use it in other commands to refer to a specific running program. It’s important to distinguish between programs and proce: we is a pro- gram; each time you run the program we, that creates a new process. If several instances of the same program are running at the same time, each is a separate process with a different process- If a pipeline is initiated with &, as in34 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 1 $ pr che | Ipr & 6951 Process-id of lpr $ the processes in it are all started at once — the & applies to the whole pipeline. Only one process-id is printed, however, for the last process in the sequence. The command $ wait waits until all processes initiated with & have finished. If it doesn’t return immediately, you have commands still running. You can interrupt wait with DELETE. You can use the process-id printed by the shell to stop a process initiated with &: $ kill 6944 If you forget the process-id, you can use the command ps to tell you about everything you have running. If you are desperate, kill 0 will kill all your processes except your login shell. And if you're curious about what other users are doing, ps -ag will tell you about all processes that are currently running. Here is some sample output: $ ps -ag PID TTY TIME CMD 36 co 6:29 /etc/cron 6423 5 0:02 -sh 6704 1 0:04 -sh 6722 1 0:12 vi paper 4430 2 0:03 -sh 6612 7 0:03 -sh 6628 7 1:13 rogue 6843 2 0:02 write dmr 6949 4 0:01 login bimmler 6952 5 0:08 pr ch1.1 ch1.2 ch1.3 ch1.4 69515 0:03 Ipr 6959 5 0:02 ps -ag 6844 1 0:02 write rob $ PID is the process-id; TTY is the terminal associated with the process (as in who); TIME is the processor time used in minutes and seconds; and the rest is the command being run. ps is one of those commands that is different on dif- ferent versions of the system, so your output may not be formatted like this. Even the arguments may be different — see the manual page ps(1). Processes have the same sort of hierarchical structure that files do: each process has a parent, and may well have children. Your shell was created by a process associated with whatever terminal line connects you to the system. AsCHAPTER 1 UNIX FOR BEGINNERS 35 you run commands, those processes are the direct children of your shell. If you run a program from within one of those, for example with the | command to escape from ed, that creates its own child process which is thus a grandchild of the shell. Sometimes a process takes so long that you would like to start it running, then turn off the terminal and go home without waiting for it to finish. But if you turn off your terminal or break your connection, the process will normally be killed even if you used & The command nohup (“no hangup”) was created to deal with this situation: if you say $ nohup command & the command will continue to run if you log out. Any output from the com- mand is saved in a file called nohup.out. There is no way to nohup a com- mand retroactively. If your process will take a lot of processor resources, it is kind to those who share your system to run your job with lower than normal priority; this is done by another program called nice: $ nice expensive-command & nohup automatically calls nice, because if you're going to log out you can afford to have the command take a little longer. Finally, you can simply tell the system to start your process at some wee hour of the morning when normal people are asleep, not computing. The com- mand is called at(1): $ at time whatever commands you want ... ctl-d $ This is the typical usage, but of course the commands could come from a file: $ at 3am temp $ ed ch2.1 1534 x temp 168 od produces text on its standard output, which can then be used anywhere text can be used. This uniformity is unusual; most systems have several file for- mats, even for text, and require negotiation by a program or a user to create a file of a particular type. In UNIX systems there is just one kind of file, and all that is required to access a file is its name.+ The lack of file formats is an advantage overall — programmers needn’t worry about file types, and all the standard programs will work on any file — but there are a handful of drawbacks. Programs that sort and search and edit really expect text as input: grep can’t examine binary files correctly, nor can sort sort them, nor can any standard editor manipulate them There are implementation limitations with most programs that expect text as input. We tested a number of programs on a 30,000 byte text file containing no newlines, and surprisingly few behaved properly, because most programs make unadvertised assumptions about the maximum length of a line of text (for an exception, see the BUGS section of sort(1)). + There’s a good test of file system uniformity, due originally to Doug Mellroy, that the UNIX file system passes handily. Can the output of a FORTRAN program be used as input to the FORTRAN compiler? A remarkable number of systems have trouble with this test48 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 2 Non-text files definitely have their place. For example, very large data- bases usually need extra address information for rapid access; this has to be binary for efficiency. But every file format that is not text must have its own family of support programs to do things that the standard tools could perform if the format were text. Text files may be a little less efficient in machine cycles, but this must be balanced against the cost of extra software to maintain more specialized formats. If you design a file format, you should think care- fully before choosing a non-textual representation. (You should also think about making your programs robust in the face of long input lines.) 2.3 Directories and filenames All the files you own have unambiguous names, starting with /usr/you, but if the only file you have is junk, and you type 1s, it doesn’t print /usx/you/ junk; the filename is printed without any prefix: $ 1s junk $ That is because each running program, that is, each process, has a current directory, and all filenames are implicitly assumed to start with the name of that directory, unless they begin directly with a slash. Your login shell, and 1s, therefore have a current directory. The command pwd (print working directory) identifies the current directory: $ pwd /asr/you $ The current directory is an attribute of a process, not a person or a program — people have login directories, processes have current directories. If a pro- cess creates a child process, the child inherits the current directory of its parent. But if the child then changes to a new directory, the parent is unaf- fected — its current directory remains the same no matter what the child does. The notion of a current directory is certainly a notational convenience, because it can save a lot of typing, but its real purpose is organizational. Related files belong together in the same directory. /usr is often the top directory of the user file system. (user is abbreviated to usr in the same spirit as cmp, 1s, etc.) /usr/you is your login directory, your current direc- tory when you first log in. /usr/src contains source for system programs, /usr/src/cmd contains source for UNIX commands, /usr/src/cmd/sh contains the source files for the shell, and so on. Whenever you embark on a new project, or whenever you have a set of related files, say a set of recipes, you could create a new directory with mkdir and put the files there.CHAPTER 2 THE FILE SYSTEM = 49 $ pwd /asr/you $ mkdir recipes $ cd recipes $ pwd /usr/you/recipes $ mkdir pie cookie $ ed pie/apple $ ed cookie/choc.chip $ Notice that it is simple to refer to subdirectories. pie/apple has an obvious meaning: the apple pie recipe, in directory /usr/you/recipes/pie. You could instead have put the recipe in, say, recipes/apple.pie, rather than in a subdirectory of recipes, but it seems better organized to put all the pies together, too. For example, the crust recipe could be kept in recipes/pie/crust rather than duplicating it in each pie recipe. Although the file system is a powerful organizational tool, you can forget where you put a file, or even what files you’ve got. The obvious solution is a command or two to rummage around in directories. The 1s command is cer- tainly helpful for finding files, but it doesn’t look in sub-directories. $ cd $ Is junk recipes $ file » junk: ascii text recipes: directory $ 1s recipes cookie pie $ 1s recipes/pie apple crust $ This piece of the file system can be shown pictorially as:50 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 2 /asr/you / \ junk recipes / \ pie cookie apple crust choc. chip The command du (disc usage) was written to tell how much disc space is consumed by the files in a directory, including all its subdirectories. $ du 6 ./recipes/pie 4 ./recipes/cookie 4 :/recipes 13 $ The filenames are obvious; the numbers are the number of disc blocks — typi- cally 512 or 1024 bytes each — of storage for each file. The value for a direc- tory indicates how many blocks are consumed by all the files in that directory and its subdirectories, including the directory itself. du has an option -a, for “all,” that causes it to print out all the files in a directory. If one of those is a directory, du processes that as well: $ du -a ./recipes/pie/apple . /recipes/pie/crust ./recipes/pie -/recipes/cookie/choc. chip -/recipes/cookie -/recipes -/ junk Boss Rwauwn The output of du -a can be piped through grep to look for specific files: $ du -a / grep choc 3 ./recipes/cookie/choc.chip $ Recall from Chapter | that the name ‘.’ is a directory entry that refers to the directory itself; it permits access to a directory without having to know the fullCHAPTER 2 THE FILE SYSTEM 51 name. du looks in a directory for files; if you don’t tell it which directory, it assumes ‘.’, the directory you are in now. Therefore, junk and ./junk are names for the same file. Despite their fundamental properties inside the kernel, directories sit in the file system as ordinary files. They can be read as ordinary files. But they can’t be created or written as ordinary files — to preserve its sanity and the users’ files, the kernel reserves to itself all control over the contents of direc- tories. The time has come to look at the bytes in a directory: $ od -cb « oo00000 4 «=; ~~ \O \0 \O \O \o \o \O \O \o XO \o \o \o 064 073 056 000 000 000 000 000 000 000 000 000 000 000 000 000 0000020 273 ( =. . \0 \0 \0 \o \o \o \o \o0 \0 \o \o \o 273 050 056 056 000 000 000 000 000 000 000 000 000 000 000 000 ooo0040 252; «=r e c¢ i pe s \0 \0 \o \0 \o \o Xo 252 073 162 145 143 151 160 145 163 000 000 000 000 000 000 000 0000060 230 = 43 u n xk \O \O \O \0 \o \o \0 \o \o Xo 230 075 152 165 156 153 000 000 000 000 000 000 000 000 000 000 0000100 s See the filenames buried in there? The directory format is a combination of binary and textual data. A directory consists of 16-byte chunks, the last 14 bytes of which hold the filename, padded with ASCII NUL’s (which have value 0) and the first two of which tell the system where the administrative informa- tion for the file resides — we'll come back to that. Every directory begins with the two entries *.’ (“dot”) and *. .” (“dot-dot”). $ cd Home $ cd recipes $ pwd /asr/you/recipes $ cd ..3 pwd Up one level /uasr/you $ cd ..3 pwd Up another level vase $ cd ..3 pwd Up another level / $ cd ..3 pwd Up another level 7 Can't go any higher $ The directory / is called the root of the file system. Every file in the sys- tem is in the root directory or one of its subdirectories, and the root is its own parent directory. Exercise 2-2. Given the information in this section, you should be able to understand roughly how the 1s command operates. Hint: cat . >f00; 1s -f foo. 052 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 2 Exercise 2-3. (Harder) How does the pwd command operate? 0 Exercise 2-4. du was written to monitor disc usage. Using it to find files in a directory hierarchy is at best a strange idiom, and perhaps inappropriate. As an alternative, look at the manual page for find(1), and compare the two commands. In particular, com- pare the command du -a } grep ... with the corresponding invocation of find. Which runs faster? Is it better to build a new tool or use a side effect of an old one? 2.4 Permissions Every file has a set of permissions associated with it, which determine who can do what with the file. If you're so organized that you keep your love letters on the system, perhaps hierarchically arranged in a directory, you prob- ably don’t want other people to be able to read them. You could therefore change the permissions on each letter to frustrate gossip (or only on some of the letters, to encourage it), or you might just change the permissions on the directory containing the letters, and thwart snoopers that way. But we must warn you: there is a special user on every UNIX system, called the super-user, who can read or modify any file on the system. The special login name root carries super-user privileges; it is used by system administra- tors when they do system maintenance. There is also a command called su that grants super-user status if you know the root password. Thus anyone who knows the super-user password can read your love letters, so don’t keep sensitive material in the file system. If you need more privacy, you can change the data in a file so that even the super-user cannot read (or at least understand) it, using the crypt command (crypt(1)). Of course, even crypt isn’t perfectly secure. A super-user can change the crypt command itself, and there are cryptographic attacks on the crypt algorithm. The former requires malfeasance and the latter takes hard work, however, so crypt is in practice fairly secure. In real life, most security breaches are due to passwords that are given away or easily guessed. Occasionally, system administrative lapses make it possible for a malicious user to gain super-user permission. Security issues are discussed further in some of the papers cited in the bibliography at the end of this chapter. When you log in, you type a name and then verify that you are that person by typing a password. The name is your login identification, or login-id. But the system actually recognizes you by a number, called your user-id, or uid. In fact different login-id’s may have the same uid, making them indistinguishable to the system, although that is relatively rare and perhaps undesirable for secu- rity reasons. Besides a uid, you are assigned a group identification, or group- id, which places you in a class of users. On many systems, all ordinary users (as opposed to those with login-id’s like root) are placed in a single group called other, but your system may be different. The file system, and there- fore the UNIX system in general, determines what you can do by theCHAPTER 2 THE FILE SYSTEM) = 53. permissions granted to your uid and group-id. The file /etc/passwd is the password file; it contains all the login infor- mation about each user. You can discover your uid and group-id, as does the system, by looking up your name in /etc/passwd: $ grep you /etc/passwd you: gkmbCTrJ04COM: 604: 1:¥.0.A.People:/usr/you: $ The fields in the password file are separated by colons and are laid out like this (as seen in passwd(5)): login-id : encrypted-password : uid : group-id: miscellany : login-directory : shell The file is ordinary text, but the field definitions and separator are a conven- tion agreed upon by the programs that use the information in the file. The shell field is often empty, implying that you use the default shell, /bin/sh. The miscellany field may contain anything; often, it has your name and address or phone number. Note that your password appears here in the second field, but only in an encrypted form. Anybody can read the password file (you just did), so if your password itself were there, anyone would be able to use it to masquerade as you. When you give your password to login, it encrypts it and compares the result against the encrypted password in /etc/passwd. If they agree, it lets you log in. The mechanism works because the encryption algorithm has the property that it’s easy to go from the clear form to the encrypted form, but very hard to go backwards. For example, if your password is ka-boom, it might be encrypted as gkmbCTrJ04COM, but given the latter, there's no easy way to get back to the original. The kernel decided that you should be allowed to read /etc/passwd by looking at the permissions associated with the file. There are three kinds of permissions for each file: read (i.e., examine its contents), write (i.e., change its contents), and execute (i.e., run it as a program). Furthermore, different permissions can apply to different people. As file owner, you have one set of read, write and execute permissions. Your “group” has a separate set. Every- one else has a third set. The -1 option of 1s prints the permissions information, among other things: $ 1s -1 /etc/passwad -rw-r--r-- 1 root 5115 Aug 30 10:40 /etc/passwa $ 1s -lg /etc/passwa -rw-r--r-- 1 adm 5115 Aug 30 10:40 /etc/passwa $ These two lines may be collectively interpreted as: /etc/passwd is owned by login-id root, group adm, is 5115 bytes long, was last modified on August 30 at 10:40 AM, and has one link (one name in the file system; we'll discuss links54 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 2 in the next section). Some versions of 1s give both owner and group in one invocation. The string -rw-r--r~- is how 1s represents the permissions on the file. The first - indicates that it is an ordinary file. If it were a directory, there would be a d there. The next three characters encode the file owner’s (based on uid) read, write and execute permissions. rw- means that root (the owner) may read or write, but not execute the file. An executable file would have an x instead of a dash. The next three characters (r--) encode group permissions, in this case that people in group adm, presumably the system administrators, can read the file but not write or execute it. The next three (also r--) define the permissions for everyone else — the rest of the users on the system. On this machine, then, only root can change the login information for a user, but anybody may read the file to discover the information. A plausible alternative would be for group adm to also have write permission on /etc/passwd. The file /etc/group encodes group names and group-id’s, and defines which users are in which groups. /etc/passwd identifies only your login group; the newgrp command changes your group permissions to another group Anybody can say $ ed /etc/passwa and edit the password file, but only root can write back the changes. You might therefore wonder how you can change your password, since that involves editing the password file. The program to change passwords is called passwd; you will probably find it in /bin: $ 1s -1 /bin/passwd -rwsr-xr-x 1 root 8454 Jan 4 1983 /bin/passwd $ (Note that /etc/passwd is the text file containing the login information, while /bin/passwd, in a different directory, is a file containing an executable program that lets you change the password information.) The permissions here state that anyone may execute the command, but only root can change the passwd command. But the s instead of an x in the execute field for the file owner states that, when the command is run, it is to be given the permissions corresponding to the file owner, in this case root. Because /bin/passwa is “set-uid” to root, any user can run the passwd command to edit the pass- word file. The set-uid bit is a simple but elegant ideat that solves a number of security problems. For example, the author of a game program can make the program set-uid to the owner, so that it can update a score file that is otherwise + The set-uid bit is patented by Dennis RitchieCHAPTER 2 THE FILE SYSTEM 55 protected from other users’ access. But the set-uid concept is potentially dangerous. /bin/passwd has to be correct; if it were not, it could destroy system information under root’s auspices. If it had the permissions _rwsrwxxwx, it could be overwritten by any user, who could therefore replace the file with a program that does anything. This is particularly serious for a set-uid program, because root has access permissions to every file on the sys- tem. (Some UNIX systems turn the set-uid bit off whenever a file is modified, to reduce the danger of a security hole.) The set-uid bit is powerful, but used primarily for a few system programs such as passwd. Let's look at a more ordinary file. $ 1s -1 /bin/who -rwxrwxr-x 1 root 6348 Mar 29 1983 /bin/who $s who is executable by everybody, and writable by root and the owner’s group. What “executable” means is this: when you type $ who to the shell, it looks in a set of directories, one of which is /bin, for a file named “who.” If it finds such a file, and if the file has execute permission, the shell calls the kernel to run it. The kernel checks the permissions, and, if they are valid, runs the program. Note that a program is just a file with exe- cute permission. In the next chapter we will show you programs that are just text files, but that can be executed as commands because they have execute permission set. Directory permissions operate a little differently, but the basic idea is the same. $ 1s -ld. drwxrwxr-x 3 you 80 Sep 27 06:11 . $ The -d option of 1s asks it to tell you about the directory itself, rather than its contents, and the leading d in the output signifies that ‘.’ is indeed a directory. An x field means that you can read the directory, so you can find out what files are in it with 1s (or od, for that matter). A w means that you can create and delete files in this directory, because that requires modifying and therefore writing the directory file. Actually, you cannot simply write in a directory — even root is forbidden to do so. $ who >. Try to overwrite *.” : cannot create You can't $ Instead there are system calls that create and remove files, and only through them is it possible to change the contents of a directory. The permissions idea,56 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 2 however, still applies: the w fields tell who can use the system routines to modify the directory. Permission to remove a file is independent of the file itself. If you have write permission in a directory, you may remove files there, even files that are protected against writing. The rm command asks for confirmation before removing a protected file, however, to check that you really want to do so — one of the rare occasions that a UNIX program double-checks your intentions. (The ~£ flag to xm forces it to remove files without question.) The x field in the permissions on a directory does not mean execution; it means ‘“‘search.” Execute permission on a directory determines whether the directory may be searched for a file. It is therefore possible to create a direc- tory with mode --x for other users, implying that users may access any file that they know about in that directory, but may not run 1s on it or read it to see what files are there. Similarly, with directory permissions r--, users can see (1s) but not use the contents of a directory. Some installations use this device to turn off /usr/games during busy hours. The chmod (change mode) command changes permissions on files. $ chmod permissions filenames .. The syntax of the permissions is clumsy, however. They can be specified in two ways, either as octal numbers or by symbolic description. The octal numbers are easier to use, although the symbolic descriptions are sometimes convenient because they can specify relative changes in the permissions. It would be nice if you could say $ chmod rw-rw-rw- junk Doesn't work this way! rather than $ chmod 666 junk but you cannot. The octal modes are specified by adding together a 4 for read, 2 for write and 1 for execute permission. The three digits specify, as in 1s, permissions for the owner, group and everyone else. The symbolic codes are difficult to explain; you must look in chmod(1) for a proper description. For our purposes, it is sufficient to note that + turns a permission on and that = turns it off. For example $ chmod +x command allows everyone to execute command, and $ chmod -w file turns off write permission for everyone, including the file’s owner. Except for the usual disclaimer about super-users, only the owner of a file may change the permissions on a file, regardless of the permissions themselves. Even if some- body else allows you to write a file, the system will not allow you to change itsCHAPTER 2 THE FILE SYSTEM = 57 permission bits. $ 1s -ld /usr/mary drwxrwxrwx 5 mary 704 Sep 25 10:18 /usr/mary $ chmod 444 /usr/mary chmod: can’t change /usr/mary $ If a directory is writable, however, people can remove files in it regardless of the permissions on the files themselves. If you want to make sure that you or your friends never delete files from a directory, remove write permission from it: $ cd $ date >temp $ chmod -w . Make directory unwritable $ 1s -ld. dr-xr-xr-x 3 you 80 Sep 27 11:48 . $ rm temp rm: temp not removed Can't remove file $ chmod 775 . Restore permission $ Is -ld. arwxrwxr-x 3 you 80 Sep 27 11:48 $ xm temp $ ‘Now you can temp is now gone. Notice that changing the permissions on the directory didn’t change its modification date. The modification date reflects changes to the file’s contents, not its modes. The permissions and dates are not stored in the file itself, but in a system structure called an index node, or i-node, the subject of the next section. Exercise 2-5. Experiment with chmod. Try different simple modes, like 0 and 1. Be careful not to damage your login directory! © 2.5 Inodes A file has several components: a name, contents, and administrative infor- mation such as permissions and modification times. The administrative infor- mation is stored in the inode (over the years, the hyphen fell out of “‘i-node”), along with essential system data such as how long it is, where on the disc the contents of the file are stored, and so on. There are three times in the inode: the time that the contents of the file were last modified (written); the time that the file was last used (read or exe- cuted); and the time that the inode itself was last changed, for example to set the permissions58 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 2 $ date Tue Sep 27 12:07:24 EDT 1983 $ date >junk $ 1s -1 junk -rw-rw-rw- 1 you 29 Sep 27 12:07 junk $ 1s -lu junk -rw-rw-rw- 1 you 29 Sep 27 06:11 junk $ le -le junk -rw-rw-rw- 1 you 29 Sep 27 12:07 junk $ Changing the contents of a file does not affect its usage time, as reported by 1s -1u, and changing the permissions affects only the inode change time, as reported by 1s -1c. $ chmod 444 junk $ Is -lu junk -r r-- 1 you 29 Sep 27 06:11 junk $ 1s -1c junk r--r-- 1 you 29 Sep 27 12:11 junk $ chmod 666 junk $ The -t option to 1s, which sorts the files according to time, by default that of last modification, can be combined with -c or ~u to report the order in which inodes were changed or files were read: $ 1s recipes cookie pie $ 1s -lut total 2 drwxrwxrwx 4 you 64 Sep 27 12:11 recipes -rw-rw-rw- 1 you 29 Sep 27 06:11 junk $ recipes is most recently used, because we just looked at its contents. It is important to understand inodes, not only to appreciate the options on 1s, but because in a strong sense the inodes are the files. All the directory hierarchy does is provide convenient names for files. The system’s internal name for a file is its i-number: the number of the inode holding the file’s infor- mation. 1s -i reports the i-number in decimal: $ date >x $ 1s -i 15768 junk 15274 recipes 15852 x $ It is the i-number that is stored in the first two bytes of a directory, before theCHAPTER 2 THE FILE SYSTEM = 59. name. od -d will dump the data in decimal by byte pairs rather than octal by bytes and thus make the i-number visible. $ od -c . OG00008 Ag) | \0 NO) A010] NO \0..\0. 0, N01 N00 N02. 0) 0000020 273 ( . . \O0 \O0 \O \O \O \O \o \o \o \O \O XO OUUOd f 6 0 Of fo OM OO WW 0000060 230 = j u nk \O \0 \O \O \o \0 \o \o \O XO 0000100 354 = x \O \O \O \O \O \O \o \o \o \o \o \0 \O 0000120 $ od -d 0000000 15156 00046 00000 00000 00000 00000 00000 o0000 0000020 10427 11822 00000 00000 00000 00000 00000 o0000 0000040 15274 25970 26979 25968 00115 00000 00000 00000 0000060 15768 30058 27502 00000 00000 00000 00000 00000 0000100 15852 00120 00000 00000 00000 00000 00000 00000 0000120 s The first two bytes in each directory entry are the only connection between the name of a file and its contents. A filename in a directory is therefore called a link, because it links a name in the directory hierarchy to the inode, and hence to the data. The same i-number can appear in more than one directory. The xm command does not actually remove inodes; it removes directory entries or links. Only when the last link to a file disappears does the system remove the inode, and hence the file itself. If the i-number in a directory entry is zero, it means that the link has been removed, but not necessarily the contents of the file — there may still be a link somewhere else. You can verify that the i-number goes to zero by removing the file: $ rm x $ od -d . 0000000 15156 00046 00000 00000 00000 00000 00000 00000 0000020 10427 11822 00000 00000 00000 00000 00000 00000 0000040 15274 25970 26979 25968 00115 00000 00000 00000 0000060 15768 30058 27502 00000 00000 00000 00000 00000 0000100 00000 00120 00000 00000 00000 00000 00000 00000 0000120 $ The next file created in this directory will go into the unused slot, although it will probably have a different i-number. The 1n command makes a link to an existing file, with the syntax $ In old-file new-file The purpose of a link is to give two names to the same file, often so it can appear in two different directories. On many systems there is a link to /bin/ed called /bin/e, so that people can call the editor e. Two links to a60 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 2 file point to the same inode, and hence have the same i-number: $ In junk linktojunk $ Is -li total 3 15768 -rw-rw-rw- 2 you 29 Sep 27 12:07 junk 15768 -rw-rw-rw- 2 you 29 Sep 27 12:07 linktojunk 15274 drwxrwxrwx 4 you 64 Sep 27 09:34 recipes $ The integer printed between the permissions and the owner is the number of links to the file. Because each link just points to the inode, each link is equally important — there is no difference between the first link and subsequent ones. (Notice that the total disc space computed by 1s is wrong because of double counting.) When you change a file, access to the file by any of its names will reveal the changes, since all the links point to the same file. $ echo x >junk $ 1s -1 total 3 -rw-rw-rw- 2 you 2 Sep 27 12:37 junk -rw-rw-rw- 2 you 2 Sep 27 12:37 linktojunk drwxrwxrwx 4 you 64 Sep 27 09:34 recipes $ rm linktojunk $ 1s -1 total 2 -rw-rw-rw- 1 you 2 Sep 27 12:37 junk drwxrwxrwx 4 you 64 Sep 27 09:34 recipes $ After linktojunk is removed the link count goes back to one. As we said before, rm’ing a file just breaks a link; the file remains until the last link is removed. In practice, of course, most files only have one link, but again we see a simple idea providing great flexibility. A word to the hasty: once the last link to a file is gone, the data is irretriev- able. Deleted files go into the incinerator, rather than the waste basket, and there is no way to call them back from the ashes. (There is a faint hope of resurrection. Most large UNIX systems have a formal backup procedure that periodically copies changed files to some safe place like magnetic tape, from which they can be retrieved. For your own protection and peace of mind, you should know just how much backup is provided on your system. If there is none, watch out — some mishap to the discs could be a catastrophe.) Links to files are handy when two people wish to share a file, but some- times you really want a separate copy — a different file with the same infor- mation. You might copy a document before making extensive changes to it, for example, so you can restore the original if you decide you don’t like the changes. Making a link wouldn’t help, because when the data changed, bothCHAPTER 2 THE FILE SYSTEM 61 links would reflect the change. cp makes copies of files: $ cp junk copyofjunk $ Is -li total 3 15850 -rw-rw-rw- 1 you 15768 -rw-rw-rw- 1 you 15274 drwxrwxrwx 4 you s 2 Sep 27 13:13 copyof junk 2 Sep 27 12:37 junk 64 Sep 27 09:34 recipes The i-numbers of junk and copyof junk are different, because they are dif- ferent files, even though they currently have the same contents. It’s often a good idea to change the permissions on a backup copy so it’s harder to remove it accidentally. $ chmod -w copyof junk $ 1s -li total 3 15850 -r--r--r-- 1 you 15768 -rw-rw-rw- 1 you 15274 drwxrwxxrwx 4 you $ rm copyof junk xm: copyofjunk 444 mode n $ date >junk $ Is -1i a 15850 -r--r--r-- 1 you 15768 -rw-rw-rw- 1 you 15274 drwxrwxrwx 4 you $ xm copyof junk rm: copyofjunk 444 mode y $ Is -li total 2 15768 -rw-rw-rw- 1 you 15274 drwxrwxrwx 4 you $ Turn off write permission 2 Sep 27 13:13 copyofjunk 2 Sep 27 12:37 junk 64 Sep 27 09:34 recipes No! It's precious 2 Sep 27 13:13 copyof junk 29 Sep 27 13:16 junk 64 Sep 27 09:34 recipes Well, maybe not so precious 29 Sep 27 13:16 junk 64 Sep 27 09:34 recipes Changing the copy of a file doesn’t change the original, and removing the copy has no effect on the original. Notice that because copyof junk had write per- mission turned off, rm asked for confirmation before removing the file. There is one more common command for manipulating files: mv moves or renames files, simply by rearranging the links. Its syntax is the same as cp and 1n:62 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 2 $ mv junk sameoldjunk $ Is -li total 2 15274 drwxrwxrwx 4 you 64 Sep 27 09:34 recipes 15768 -rw-rw-rw- 1 you 29 Sep 27 13:16 sameoldjunk $ sameoldjunk is the same file as our old junk, right down to the i-number; only its name — the directory entry associated with inode 15768 — has been changed. We have been doing all this file shuffling in one directory, but it also works across directories. 1n is often used to put links with the same name in several directories, such as when several people are working on one program or docu- ment. mv can move a file or directory from one directory to another. In fact, these are common enough idioms that mv and cp have special syntax for them: $ mv (or ep) filel file2 ... directory moves (or copies) one or more files to the directory which is the last argument. The links or copies are made with the same filenames. For example, if you wanted to try your hand at beefing up the editor, you might begin by saying $ cp /usr/src/cmd/ed.c . to get your own copy of the source to play with. If you were going to work on the shell, which is in a number of different source files, you would say $ mkdir sh $ cp /usr/src/cmd/sh/* sh and cp would duplicate all of the shell’s source files in your subdirectory sh (assuming no subdirectory structure in /usr/src/cmd/sh — cp is not very clever). On some systems, 1n also accepts multiple file arguments, again with a directory as the last argument. And on some systems, mv, cp and 1n are themselves links to a single file that examines its name to see what service to perform. Exercise 2-6. Why does 1s -1 report 4 links to recipes? Hint: try : $ 1s -ld /usr/you Why is this useful information? © Exercise 2-7. What is the difference between $ mv junk junk? and $ cp junk junk? $ xm junk Hint: make a link to junk, then try it. ©CHAPTER 2 THE FILE SYSTEM = 63 Exercise 2-8. cp doesn’t copy subdirectories, it just copies files at the first level of a hierarchy. What does it do if one of the argument files is a directory? Is this kind or even sensible? Discuss the relative merits of three possibilities: an option to cp to des- cend directories, a separate command rep (recursive copy) to do the job, or just having cp copy a directory recursively when it finds one. See Chapter 7 for help on providing this facility. What other programs would profit from the ability to traverse the directory tree? 0 2.6 The directory hierarchy In Chapter 1, we looked at the file system hierarchy rather informally, starting from /usr/you. We're now going to investigate it in a more orderly way, starting from the top of the tree, the root. The top directory is 7. $ is 7 bin boot dev ete 1ib tmp unix usr $ /unix is the program for the UNIX kernel itself: when the system starts, /anix is read from disc into memory and started. Actually, the process occurs in two steps: first the file /boot is read; it then reads in /unix. More information about this “bootstrap” process may be found in boot(8). The rest of the files in /, at least here, are directories, each a somewhat self-contained section of the total file system. In the following brief tour of the hierarchy, play along with the text: explore a bit in the directories mentioned. The more familiar you are with the layout of the file system, the more effectively you will be able to use it. Table 2.1 suggests good places to look, although some of the names are system dependent. /bin (binaries) we have seen before: it is the directory where the basic programs such as who and ed reside. /dev (devices) we will discuss in the next section. /ete (et cetera) we have also seen before. It contains various administra- tive files such as the password file and some system programs such as /etc/getty, which initializes a terminal connection for /bin/login. /etc/re is a file of shell commands that is executed after the system is bootstrapped. /etc/group lists the members of each group. /1ib (library) contains primarily parts of the C compiler, such as /1ib/cpp, the C preprocessor, and /1ib/libe.a, the C subroutine library. /tmp (temporaries) is a repository for short-lived files created during the64 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 2 7, /bin /dev fete /ete/motd /etc/passwd /lib /tmp /unix /usr /usr/adm /usr/bin /usr/dict /usx/games /usr/include /usr/include/sys /usr/lib /usr/man /usr/man/man1 /usr/mdec /usr/news /usr/pub /asr/sre /usr/src/cmd /usr/src/lib /usx/spool /usx/spool/1pd /usx/spool/mail /usr/spool/uucp /usr/sys 7usr/tmp /usr/you /usr/you/bin Table 2.1: Interesting Directories (see also hier(7)) root of the file system essential programs in executable form (“‘binaries”’) device files system miscellany login message of the day password file essential libraries, etc. temporary files; cleaned when system is restarted executable form of the operating system user file system. system administration: accounting info., etc. user binaries: troff, etc. dictionary (words) and support for spe11(1) game programs header files for C programs, e.g. math.h system header files for C programs, e.g. inode. libraries for C, FORTRAN, ete. on-line manual manual pages for section 1 of manual hardware diagnostics, bootstrap programs, etc. community service messages public oddments: see ascii(7) and eqnchar(7) source code for utilities and libraries source for commands in /bin and /usr/bin source code for subroutine libraries working directories for communications programs line printer temporary directory mail in-boxes working directory for the uucp programs source for the operating system kernel alternate temporary directory (little used) your login directory your personal programs execution of a program. When you start up the editor ed, for example, it creates a file with a name like /tmp/e00512 to hold its copy of the file you are editing, rather than working with the original file. It could, of course, create the file in your current directory, but there are advantages to placing it in /tmp: although it is unlikely, you might already have a file called e00512 in your directory; /tmp is cleaned up automatically when the system starts, so your directory doesn’t get an unwanted file if the system crashes; and often /tmp is arranged on the disc for fast access.CHAPTER 2 THE FILE SYSTEM = 65. There is a problem, of course, when several programs create files in /tmp at once: they might interfere with each other’s files. That is why ed’s tem- porary file has a peculiar name: it is constructed in such a way as to guarantee that no other program will choose the same name for its temporary file. In Chapters 5 and 6 we will see ways to do this. /usr is called the “user file system,” although it may have little to do with the actual users of the system. On our machine, our login directories are /usr/bwk and /usr/rob, but on your machine the /usx part might be dif- ferent, as explained in Chapter 1. Whether or not your personal files are in a subdirectory of /usr, there are a number of things you are likely to find there (although local customs vary in this regard, too). Just as in /, there are direc- tories called /usr/bin, /usr/lib and /usr/tmp. These directories have functions similar to their namesakes in /, but contain programs less critical to the system. For example, nroff is usually in /usr/bin rather than /bin, and the FORTRAN compiler libraries live in /usr/1ib. Of course, just what is deemed “critical” varies from system to system. Some systems, such as the distributed 7th Edition, have all the programs in /bin and do away with /usr/bin altogether; others split /usr/bin into two directories according to frequency of use. Other directories in /usr are /usr/adm, containing accounting informa- tion and /usr/dict, which holds a modest dictionary (see spel1(1)). The on-line manual is kept in /usr/man — see /usr/man/man1/spell.1, for example. If your system has source code on-line, you will probably find it in /usr/sre. It is worth spending a little time exploring the file system, especially /usr, to develop a feeling for how the file system is organized and where you might expect to find things. 2.7 Devices We skipped over /dev in our tour, because the files there provide a nice review of files in general. As you might guess from the name, /dev contains device files. ‘ One of the prettiest ideas in the UNIX system is the way it deals with peri- pherals — discs, tape drives, line printers, terminals, etc. Rather than having special system routines to, for example, read magnetic tape, there is a file called /dev/mt0 (again, local customs vary). Inside the kernel, references to that file are converted into hardware commands to access the tape, so if a pro- gram reads /dev/mt0, the contents of a tape mounted on the drive are returned. For example, $ cp /dev/mtO junk copies the contents of the tape to a file called junk. cp has no idea there is anything special about /dev/mt0; it is just a file —- a sequence of bytes.66 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 2 The device files are something of a zoo, each creature a little different, but the basic ideas of the file system apply to each. Here is a significantly shor- tened list of our /dev: $ 1s -1 /dev crw--w--w- 1 root 0, O Sep 27 23:09 console crw-r- 1 root 3, 1 Sep 27 14:37 kmem erw-r, 1 root 3, O May 6 1981 mem brw-rw-rw- 1 root 1, 64 Aug 24 17:41 mto erw-rw-rw- 1 root 3, 2 Sep 28 02:03 null crw-rw-rw- 1 root 4, 64 Sep 9 15:42 rmtO brw-r. 1 root 2, O Sep 8 08:07 rp0Dd 1 root 2, 1 Sep 27 23:09 rp01 1 root 13, 0 Apr 12 1983 rrp00 BE ano 13, 1 Jul 28 15:18 rrp01 crw-rw-rw- 1 root 2, 0 Jul 5 08:04 tty crw- 1 you 1, 0 Sep 28 02:38 tty0 crw- 1 root 1, 1 Sep 27 23:09 tty1 crw- 1 root 1, 2 Sep 27 17:33 tty2 erw- 1 root 1, 3 Sep 27 18:48 tty3 $s The first things to notice are that instead of a byte count there is a pair of small integers, and that the first character of the mode is always a ‘b’ or a ‘c’. This is how 1s prints the information from an inode that specifies a device rather than a regular file. The inode of a regular file contains a list of disc blocks that store the file’s contents. For a device file, the inode instead con- tains the internal name for the device, which consists of its type — character (c) or block (b) — and a pair of numbers, called the major and minor device numbers. Discs and tapes are block devices; everything else — terminals, printers, phone lines, etc. — is a character device. The major number encodes the type of device, while the minor number distinguishes different instances of the device. For example, /dev/tty0 and /dev/tty1 are two ports on the same terminal controller, so they have the same major device number but dif- ferent minor numbers. Disc files are usually named after the particular hardware variant they represent. /dev/rp00 and /dev/rp01 are named after the DEC RPO6 disc drive attached to the system. There is just one drive, divided logically into two file systems. If there were a second drive, its associated files would be named /dev/rp10 and /dev/rp11. The first digit specifies the physical drive, and the second which portion of the drive. You might wonder why there are several disc device files, instead of just one. For historical reasons and for ease of maintenance, the file system is divided into smaller subsystems. The files in a subsystem are accessible through a directory in the main system. The program /etc/mount reports the correspondence between device files and directories:CHAPTER 2 THE FILE SYSTEM = 67 $ /etc/mount xp01 on /usr $ In our case, the root system occupies /dev/rp00 (although this isn’t reported by /etc/mount) while the user file system — the files in /usr and its sub- directories — reside on /dev/rp01. The root file system has to be present for the system to execute. /bin, /dev and /etc are always kept on the root system, because when the system starts only files in the root system are accessible, and some files such as /bin/sh are needed to run at all. During the bootstrap operation, all the file systems are checked for self-consistency (see icheck(8) or fsck(8)), and attached to the root system. This attachment operation is called mounting, the software equivalent of mounting a new disc pack in a drive; it can normally be done only by the super-user. After /dev/rp01 has been mounted as /usr, the files in the user file system are accessible exactly as if they were part of the root system. For the average user, the details of which file subsystem is mounted where are of little interest, but there are a couple of relevant points. First, because the subsystems may be mounted and dismounted, it is illegal to make a link to a file in another subsystem. For example, it is impossible to link programs in /bin to convenient names in private bin directories, because /usr is in a dif- ferent file subsystem from /bin: $ In /bin/mail /usr/you/bin/m In: Cross-device link $ There would also be a problem because inode numbers are not unique in dif- ferent file systems. Second, each subsystem has fixed upper limits on size (number of blocks available for files) and inodes. If a subsystem fills up, it will be impossible to enlarge files in that subsystem until some space is reclaimed. The df (disc free space) command reports the available space on the mounted file subsys- tems: $ df /dev/rp00 1989 /dev/rp01 21257 $ /usr has 21257 free blocks. Whether this is ample space or a crisis depends on how the system is used; some installations need more file space headroom than others. By the way, of all the commands, df probably has the widest variation in output format. Your df output may look quite different. Let’s turn now to some more generally useful things. When you log in, you get a terminal line and therefore a file in /dev through which the characters68 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 2 you type and receive are sent. The tty command tells you which terminal you are using: $ who am i you tty0 Sep 28 01:02 $ tty /dev/ttyd $ 1s -1 /dev/tty0 crw--w--w- 1 you 1, 12 Sep 28 02:40 /dew/tty0 $ date >/dev/ttyo Wed Sep 28 02:40:51 EDT 1983 $s Notice that you own the device, and that only you are permitted to read it. In other words, no one else can directly read the characters you are typing. Any- one may write on your terminal, however. To prevent this, you could chmod the device, thereby preventing people from using write to contact you, or you could just use mesg. $ mesg n Turn off messages $ 1s -1 /dev/ttyo erw------- 1 you 1, 12 Sep 28 02:41 /dew/tty0 $ mesg y Restore $ It is often useful to be able to refer by name to the terminal you are using, but it’s inconvenient to determine which one it is. The device /dew/tty is a synonym for your login terminal, whatever terminal you are actually using. $ date >/dev/tty Wed Sep 28 02:42:23 EDT 1983 $ /dev/tty is particularly useful when a program needs to interact with a user even though its standard input and output are connected to files rather than the terminal. crypt is one program that uses /dev/tty. The “clear text” comes from the standard input, and the encrypted data goes to the standard output, so crypt reads the encryption key from /dev/tty: $ crypt cryptedtext Enter key: Type encryption key $ The use of /dev/tty isn’t explicit in this example, but it is there. If crypt read the key from the standard input, it would read the first line of the clear text. So instead crypt opens /dev/tty, turns off automatic character echo- ing so your encryption key doesn’t appear on the screen, and reads the key. In Chapters 5 and 6 we will come across several other uses of /dev/tty. Occasionally you want to run a program but don’t care what output is pro- duced. For example, you may have already seen today’s news, and don’t wantCHAPTER 2 THE FILE SYSTEM 69 to read it again. Redirecting news to the file /dev/nul1 causes its output to be thrown away: $ news >/dev/null $ Data written to /dev/null is discarded without comment, while programs that read from /dev/null get end-of-file immediately, because reads from 7dev/null always return zero bytes. One common use of /dev/null is to throw away regular output so that diagnostic messages are visible. For example, the time command (time(1)) reports the CPU usage of a program. The information is printed on the stan- dard error, so you can time commands that generate copious output by sending the standard output to /dev/null: $ Is -1 /usr/dict/words r-- 1 bin 196513 Jan 20 1979 /usr/dict/words $ time grep e /usr/dict/words >/dev/null real 4 user sys $ time egrep e /usr/dict/words >/dev/null yoo uo} real 8.0 user 3.9 sys 2.8 $ The numbers in the output of time are elapsed clock time, CPU time spent in the program and CPU time spent in the kernel while the program was running. egrep is a high-powered variant of grep that we will discuss in Chapter 4; it’s about twice as fast as grep when searching through large files. If output from grep and egrep had not been sent to /dev/null or a real file, we would have had to wait for hundreds of thousands of characters to appear on the ter- minal before finding out the timing information we were after. Exercise 2-9. Find out about the other files in /dev by reading Section 4 of the manual. What is the difference between /dev/mt0 and /dev/rmt0? Comment on the potential advantages of having subdirectories in /dev for discs, tapes, ete. Exercise 2-10. Tapes written on non-UNIX systems often have different block sizes, such as 800 bytes — ten 80-character card images — but the tape device /dev/mt0 expects 512-byte blocks. Look up the dd command (aa(1)) to see how to read such a tape. 0 Exercise 2-11. Why isn’t /dev/tty just a link to your login terminal? What would happen if it were mode rw--w--w- like your login terminal? 0 Exercise 2-12. How does write(1) work? Hint: see utmp(5). 0 Exercise 2-13. How can you tell if a user has been active at the terminal recently? 070 ‘THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 2 History and bibliographic notes The file system forms one part of the discussion in “UNIX implementation,” by Ken Thompson (BSTJ, July, 1978). A paper by Dennis Ritchie, entitled “The evolution of the UNIX time-sharing system” (Symposium on Language Design and Programming Methodology, Sydney, Australia, Sept. 1979) is an fascinating description of how the file system was designed and implemented on the original PDP-7 UNIX system, and how it grew into its present form. The UNIX file system adapts some ideas from the MULTICS file system. The MULTICS System: An Examination of its Structure, by E. 1. Organick (MIT Press, 1972) provides a comprehensive treatment of MULTICS. “Password security: a case history,” by Bob Morris and Ken Thompson, is an entertaining comparison of password mechanisms on a variety of systems; it can be found in Volume 2B of the unix Programmer's Manual. In the same volume, the paper “On the security of UNIX,” by Dennis Ritchie, explains how the security of a system depends more on the care taken with its administration than with the details of programs like crypt.CHAPTER 3: USING THE SHELL The shell — the program that interprets your requests to run programs — is the most important program for most UNIX users; with the possible exception of your favorite text editor, you will spend more time working with the shell than any other program. In this chapter and in Chapter 5, we will spend a fair amount of time on the shell’s capabilities. The main point we want to make is that you can accomplish a lot without much hard work, and certainly without resorting to programming in a conventional language like C, if you know how to use the shell. We have divided our coverage of the shell into two chapters. This chapter goes one step beyond the necessities covered in Chapter 1 to some fancier but commonly used shell features, such as metacharacters, quoting, creating new commands, passing arguments to them, the use of shell variables, and some elementary control flow. These are topics you should know for your own use of the shell. The material in Chapter 5 is heavier going — it is intended for writing serious shell programs, ones that are bullet-proofed for use by others. The division between the two chapters is somewhat arbitrary, of course, so both should be read eventually. 3.1 Command line structure To proceed, we need a slightly better understanding of just what a com- mand is, and how it is interpreted by the shell. This section is a more formal coverage, with some new information, of the shell basics introduced in the first chapter. The simplest command is a single word, usually naming a file for execution (later we will see some other types of commands): $ who Execute the file bin/who you tty2 Sep 28 0 jpl tty4 Sep 28 0 $ A command usually ends with a newline, but a semicolon ; is also a command terminator: a72 ‘THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 3. $ date; Wed Sep 28 09:07:15 EDT 1983 $ date; who Wed Sep 28 09:07:23 EDT 1983 you tty2 Sep 28 07:51 Spl tty4 Sep 28 08:32 $ Although semicolons can be used to terminate commands, as usual nothing happens until you type RETURN. Notice that the shell only prints one prompt after multiple commands, but except for the prompt, $ date; who is identical to typing the two commands on different lines. In particular, who doesn’t run until date has finished. Try sending the output of “date; who” through a pipe: $ date; who / we Wed Sep 28 09:08:48 EDT 1983 2 10 60 $ This might not be what you expected, because only the output of who goes to we. Connecting who and we with a pipe forms a single command, called a pipeline, that runs after date. The precedence of | is higher than that of ‘;’ as the shell parses your command line. Parentheses can be used to group commands: $ (date; who) Wed Sep 28 09:11:09 EDT 1983 you tty2 Sep 28 07:51 jpl tty4 Sep 28 0: $ (date; who) | we 3 16 89 $ The outputs of date and who are concatenated into a single stream that can be sent down a pipe. Data flowing through a pipe can be tapped and placed in a file (but not another pipe) with the tee command, which is not part of the shell, but is nonetheless handy for manipulating pipes. One use is to save intermediate out- put in a file:CHAPTER 3 USING THE SHELL = 73 $ (date; who) / tee save | we 3 16 89 Output from we $ cat save Wed Sep 28 09:13:22 EDT 1983 you tty2 Sep 28 07:51 jpl tty4 Sep 28 08:32 $ we , 1, 5 and &, are not arguments to the programs the shell runs. They instead control how the shell runs them. For example, $ echo Hello >junk tells the shell to run echo with the single argument Hello, and place the out- put in the file junk. The string >junk is not an argument to echo; it is interpreted by the shell and never seen by echo. In fact, it need not be the last string in the command: $ >junk echo Hello is identical, but less obvious. Exercise 3-1. What are the differences among the following three commands? $ cat file | pr $ pr file direct standard output to file prog >>file append standard output to file prog run p;; if unsuccessful, run p> In this last example, because the quotes are discarded after they’ve done their job, echo sees a single argument containing no quotes. Quoted strings can contain newlines: $ echo “hello > world’ hello world $ The string ‘> ’ is a secondary prompt printed by the shell when it expects you to type more input to complete a command. In this example the quote on the first line has to be balanced with another. The secondary prompt string is stored in the shell variable PS2, and can be modified to taste. In all of these examples, the quoting of a metacharacter prevents the shell from trying to interpret it. The commandCHAPTER 3 USING THE SHELL 77 $ echo x+y echoes all the filenames beginning x and ending y. As always, echo knows nothing about files or shell metacharacters; the interpretation of *, if any, is supplied by the shell. What happens if no files match the pattern? The shell, rather than com- plaining (as it did in early versions), passes the string on as though it had been quoted. It’s usually a bad idea to depend on this behavior, but it can be exploited to learn of the existence of files matching a pattern: $ Is x*y x*y not found Message from 1s: no such files exist $ >xyzzy Create xy2zy $ 1s x+y xyz2y File xyzzy matches x+y $ 1s ‘xey’ xy not found Ls doesn’t interpret the * $ A backslash at the end of a line causes the line to be continued; this is the way to present a very long line to the shell. $ echo abc\ > def\ > ghi abcdefghi $ Notice that the newline is discarded when preceded by backslash, but is retained when it appears in quotes. The metacharacter # is almost universally used for shell comments; if a shell word begins with #, the rest of the line is ignored: $ echo hello # there hello $ echo hello#there hello#there $ The # was not part of the original 7th Edition, but it has been adopted very widely, and we will use it in the rest of the book. Exercise 3-2. Explain the output produced by Sls + a A digression on echo Even though it isn’t explicitly asked for, a final newline is provided by echo. A sensible and perhaps cleaner design for echo would be to print only78 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 3 what is requested. This would make it easy to issue prompts from the shell: $ pure-echo Enter a command: Enter a command:$ No trailing newline but has the disadvantage that the most common case — providing a newline — is not the default and takes extra typing: $ pure-echo ‘Hello! > Hello! $ Since a command should by default execute its most commonly used function, the real echo appends the final newline automatically. But what if it isn’t desired? The 7th Edition echo has a single option, -n, to suppress the last newline: $ echo -n Enter a command: Enter a command:$ Prompt on same line $ echo - - Only -n is special s The only tricky case is echoing -n followed by a newline: $ echo -n ’-n ao -n $ It’s ugly, but it works, and this is a rare situation anyway. A different approach, taken in System V, is for echo to interpret C-like backslash sequences, such as \b for backspace and \c (which isn’t actually in the C language) to suppress the last newline: $ echo ‘Enter a command: \c’ System V version Enter a command: $ Although this mechanism avoids confusion about echoing a minus sign, it has other problems. echo is often used as a diagnostic aid, and backslashes are interpreted by so many programs that having echo look at them too just adds to the confusion. Still, both designs of echo have good and bad points. We shall use the 7th Edition version (-n), so if your local echo obeys a different convention, a couple of our programs will need minor revision. Another question of philosophy is what echo should do if given no argu- ments — specifically, should it print a blank line or nothing at all? All the current echo implementations we know print a blank line, but past versions didn’t, and there wete once great debates on the subject. Doug Mcllroy imparted the right feelings of mysticism in his discussion of the topic:CHAPTER 3 USING THE SHELL 79 ‘The UNIX and the Echo There dwelt in the land of New Jersey the UNIX, a fair maid whom savants traveled far to admire. Dazzled by her purity, all sought to espouse her, one for her virginal grace, another for her polished civility, yet another for her agility in performing exacting tasks seldom accomplished even in much richer lands. So large of heart and accommodating of nature was she that the UNIX adopted all but the most insufferably rich of her suitors. Soon many offspring grew and prospered and spread to the ends of the earth. Nature herself smiled and answered to the UNIX more eagerly than to other mortal beings. Humbler folk, who knew little of more courtly manners, delighted in her echo, so precise and crys- tal clear they scarce believed she could be answered by the same rocks and woods that so garbled their own shouts into the wilderness. And the compliant uNIx obliged with perfect echoes of what- ever she was asked When one impatient swain asked the UNIX, ‘Echo nothing,’ the UNIX obligingly opened her mouth, echoed nothing, and closed it again. “Whatever do you mean,’ the youth demanded, ‘opening your mouth like that? Henceforth never open your mouth when you are supposed to echo nothing!’ And the UNIX obliged. “But I want a perfect performance, even when you echo nothing,” pleaded a sensitive youth, ‘and no perfect echoes can come from a closed mouth.’ Not wishing (o offend either one, the UNIX agreed to say different nothings for the impatient youth and for the sensitive youth. She called the sensitive nothing ‘\n.° Yet now when she said ‘\n," she was really not saying nothing so she had to open her mouth twice, once to say ‘wn,’ and once to say nothing, and so she did not please the sensitive youth, who said forthwith, “The \n sounds like a perfect nothing to me, but the second one ruins it. I want you to take back one of them.’ So the UNIX, who could not abide offending, agreed to undo some echoes, and called that ‘\c.’ Now the sensitive youth could hear a perfect echo of nothing by asking for ‘\n’ and ‘\c’ together. But they say that he died of a surfeit of notation before he ever heard one. Exercise 3-3. Predict what each of the following grep commands will do, then verify your understanding. grep \$ grep \\ grep \\$ grep \\\\ grep \\\$ grep "\s" grep ‘\s’ grep ‘"$’ grep ‘\’s’ grep "$" A file containing these commands themselves makes a good test case if you want to experiment. 0 Exercise 3-4. How do you tell grep to search for a pattern beginning with a ‘-"?. Why doesn’t quoting the argument help? Hint: investigate the -e option. 0 Exercise 3-5. Consider $ echo +/+ Does this produce all names in all directories? In what order do the names appear? 0 Exercise 3-6. (Trick question) How do you get a / into a filename (i.e., a / that doesn’t separate components of the path)? 9 Exercise 3-7. What happens with80 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 3 $ cat xy >y and with $ cat x >>x Think before rushing off to try them. © Exercise 3-8. If you type $ rm * why can’t rm warn you that you're about to delete all your files? 0 3.3 Creating new commands It’s now time to move on to something that we promised in Chapter 1 — how to create new commands out of old ones. Given a sequence of commands that is to be repeated more than a few times, it would be convenient to make it into a “new” command with its own name, so you can use it like a regular command. To be specific, suppose you intend to count users frequently with the pipeline $ who ! we -1 that was mentioned in Chapter 1, and you want to make a new program nu to do that. The first step is to create an ordinary file that contains ‘who } we -1’. You can use a favorite editor, or you can get creative: $ echo ‘who / we -1’ >nu (Without the quotes, what would appear in nu?) As we said in Chapter 1, the shell is a program just like an editor or who or we; its name is sh. And since it’s a program, you can run it and redirect its input. So run the shell with its input coming from the file nu instead of the terminal: $ who you tty2 Sep 28 07:51 xhh tty4 Sep 28 10:02 moh ttyS Sep 28 09:38 ava tty6 Sep 28 10:17 $ cat nu who } we -1 $ sh cx Create cx originally $ sh cx cx Make cx itself executable $ echo echo Hi, there! >hello Make a test program $ hello Try it hello: cannot execute $ cx hello Make it executable $ hello Try again Hi, there! It works $ mv cx /usr/you/bin Install cx $ xm hello Clean up $ Notice that we said $ sh cx cx exactly as the shell would have automatically done if cx were already execut- able and we typed $ cx cx What if you want to handle more than one argument, for example to make a program like cx handle several files at once? A crude first cut is to put nine arguments into the shell program, as in chmod +x $1 $2 $3 $4 $5 $6 $7 $8 $9 (it only works up to $9, because the string $10 is parsed as “first argument, $1, followed by a 0”!) If the user of this shell file provides fewer than nine arguments, the missing ones are null strings; the effect is that only the argu- ments that were actually provided are passed to chmod by the sub-shell. So this implementation works, but it’s obviously unclean, and it fails if more than nine arguments are provided. Anticipating this problem, the shell provides a shorthand $* that means “all the arguments.” The proper way to define cx, then, is chmod +x $* which works regardless of how many arguments are provided. With $* added to your repertoire, you can make some convenient shell files, such as 1¢ or m: $ cd /usr/you/bin $ cat le # 1c: count number of lines in files we -1l $* $ cat m # m: a concise way to type mail mail $* $ Both can sensibly be used without arguments. If there are no arguments, $*84 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 3 will be null, and no arguments at all will be passed to we or mail. With or without arguments, the command is invoked properly: $ le /usr/you/bin/+ 1 /usr/you/bin/cx /asr/you/bin/1e /asr/you/bin/m /usr/you/bin/nu /usr/you/bin/what /usr/you/bin/where total $ 1s /usr/you/bin | le 6 W2NaNNn $ These commands and the others in this chapter are examples of personal programs, the sort of things you write for yourself and put in your bin, but are unlikely to make publicly available because they are too dependent on per- sonal taste. In Chapter 5 we will address the issues of writing shell programs suitable for public use. The arguments to a shell file need not be filenames. For example, consider searching a personal telephone directory. If you have a file named /usr/you/1ib/phone-book that contains lines like dial-a-joke 212-976-3838 dial-a-prayer 212-246-4200 dial santa 212-976-3636 dow jones report 212-976-4141 then the grep command can be used to search it. (Your own Lib directory is a good place to store such personal data bases.) Since grep doesn’t care about the format of information, you can search for names, addresses, zip codes or anything else that you like. Let’s make a directory assistance program, which we'll call 411 in honor of the telephone directory assistance number where we live: $ echo “grep $* /usr/you/lib/phone-book’ >411 $ cx 417 $ 411 joke @ial-a-joke 212-976-3838 $ 411 dial dial-a-joke 212-976-3838 dial-a-prayer 212-246-4200 dial santa 212-976-3636 $ 411 ‘dow jones’ grep: can’t open jones Something is wrong $ The final example is included to show a potential problem: even though dow jones is presented to 411 as a single argument, it contains a space and is noCHAPTER 3 USING THE SHELL 85 longer in quotes, so the sub-shell interpreting the 411 command converts it into two arguments to grep: it’s as if you had typed $ grep dow jones /usr/you/lib/phone-book and that’s obviously wrong. ‘One remedy relies on the way the shell treats double quotes. Although anything quoted with ’...’ is inviolate, the shell looks inside "..." for $’s, \’s, and *...°’s. So if you revise 411 to look like grep “$*" /usr/you/1ib/phone-book the $* will be replaced by the arguments, but it will be passed to grep as a single argument even if it contains spaces. $ 411 dow jones dow jones report 212-976-4141 $ By the way, you can make grep (and thus 411) case-independent with the -y option: $ grep -y pattern . with -y, lower case letters in pattern will also match upper case letters in the input. (This option is in 7th Edition grep, but is absent from some other sys- tems.) There are fine points about command arguments that we are skipping over until Chapter 5, but one is worth noting here. The argument $0 is the name of the program being executed — in cx, $0 is “‘cx.” A novel use of $0 is in the implementation of the programs 2, 3, 4, ..., which print their output in that many columns: $ who / 2 arh ttyO Sep 28 21:23 cw ttyS Sep 28 21:09 amr tty6 Sep 28 22:10 scj tty7 Sep 28 22:11 you tty9 Sep 28 23:00 jib ttyb Sep 28 19:58 $ The implementations of 2, 3, ... are identical; in fact they are links to the same file: $ In 2 3; In 2 4; In 25; In26 $ Is -1i [1-9] 16722 -rwxrwxrwx 5 you 51 Sep 28 23:21 2 16722 -rwxrwxrwx 5 you 51 Sep 28 23:21 3 16722 -rwxrwxrwx 5 you 51 Sep 28 23:21 4 16722 -rwxrwxrwx 5 you 51 Sep 28 23:21 5 16722 -rwxrwxrwx 5 you 51 Sep 28 23:21 686 © THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 3 $ 1s /usr/you/bin | 5 2 3 4 at 5 6 ox le m nu what where $ cat 5 #2, 3, : print in n columns pr -$0 -t -11 $# $ The -t option turns off the heading at the top of the page and the -1n option sets the page length to lines. The name of the program becomes the number-of-columns argument to pr, so the output is printed a row at a time in the number of columns specified by $0. 3.5 Program output as arguments Let us turn now from command arguments within a shell file to the genera- tion of arguments. Certainly filename expansion from metacharacters like * is the most common way to generate arguments (other than by providing them explicitly), but another good way is by running a program. The output of any program can be placed in a command line by enclosing the invocation in back- quotes *...*: $ echo At the tone the time will be ‘date’. At the tone the time will be Thu Sep 29 00:02:15 EDT 1983. $ A small change illustrates that *...* is interpreted it side double quotes " $ echo "At the tone > the time will be ‘date’." At the tone the time will be Thu Sep 29 00:03:07 EDT 1983. $ As another example, suppose you want to send mail to a list of people whose login names are in the file mailinglist. A clumsy way to handle this is to edit mailinglist into a suitable mail command and present it to the shell, but it’s far easier to say $ mail ‘cat mailinglist‘ dir=/usr/you/bin $ echo $dir /usr/you/bin $ The value of a variable is associated with the shell that creates it, and is not automatically passed to the shell’s children. $ x=Hello Create x $ sh New shell $ echo $x Newline only: x undefined in the sub-shell ctl-d Leave this shell Back in original shell $ $ $ echo $x Hello x still defined $ This mear.s that a shell file cannot change the value of a variable, because the shell file is run by a sub-shell: $ echo ’x="Good Bye" Make a two-line shell file ... > echo $x’ >setx ++ 0 set and print x $ cat setx x="Good Bye" echo $x $ echo $x Hello x is He11o in original shell $ sh setx Good Bye x is Good Bye in sub-shell... $ echo $x Hello ---but still He11o in this shell $ There are times when using a shell file to change shell variables would be useful, however. An obvious example is a file to add a new directory to your PATH. The shell therefore provides a command ‘.’ (dot) that executes the commands in a file in the current shell, rather than in a sub-shell. This was originally invented so people could conveniently re-execute their . profile files without having to log in again, but it has other uses:

You might also like