Copyright © Cay S. Horstmann 2012
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.
When you submit your homework, some instructors will ask you to zip up all homework files. A beginner would use a program such as WinZip for this task. You know the drill. Click on the WinZip icon. Click, click, click, until you have added each file. Click, click, click, type the zip file name, click, click, and you are done. Did you need to make a last-minute change to a file? Do it all over again. And again.
You can do better than that. The command
jar -cf homework.zip *.java
makes a zip file called homework.zip
that contains all Java files in the current directory. (The cf
option denotes compression to a file.)
Maybe you also need to include some other files (such as Javadoc documentation and UML diagrams). Then the command gets longer.
jar -cvf homework.zip *.java *.html *.png
(The v
option produces verbose output, listing the actions that the jar
command takes.)
cd
, change to a directory that contains a couple of Java files. Type
jar -cvf homework.zip *.java
What output do you get? (If you get a complaint that the jar
command is not found, check your PATH
—see Lab 2 Section I 6. You need to fix this before going on with the lab.)
~/zipup
(that is, a file with name zipup
in your home directory). Place the following inside:
#!/bin/bash # Zip up homework source, HTML, images jar -cvf homework.zip *.java *.html *.png
Save the file and exit Emacs.
The lines starting with #
are comments. The first line indicates that this is a bash shell script.
Back in the shell window, type
chmod +x ~/zipup
This makes the file executable.
Now type
~/zipup
What happens?
javadoc *.java
Enhance the zipup
file to automatically execute the javadoc
program. What is the contents of the file now?
Suppose you sometimes want to zip up your homework to hw1.zip
, then to hw2.zip
, or even to hw1_1728.zip
. This can be easily done by modifying zipup
.
Change the filename homework.zip
to $1.zip
:
jar -cvf $1.zip *.java *.html *.png
When executing the shell script, you now need to supply the name of the zip file (without the .zip
extension which is appended in the shell script). For example,
~/zipup hw1
Then the $1
is replaced by hw1
. $1
is the first argument of the shell script, $2
is the second, and so on.
hw1_1728.zip
? zipup
without an argument. What happens?jar
command.
if [ -z $1 ] then echo "Usage: zipup nameOfFile" exit fi
Now what happens when you run the script without an argument?
fi
?
grade
. It takes two parameters: the name of the zip file (without the extension), and the class with the main
method. For example,
grade hw1_1728 BankAccountTest
hw1_1728.zip
, compile the Java source, run the BankAccountTest
program, and capture the output in the file hw1_1728.txt
? Hint: Use redirection: java ... > ...
grade
file. Put it in your home directory again. Simply take the results from the preceding exercise and replace hw1_1728
with $1
and BankAccountTest
with $2
. What is the contents of grade
?Then run
~/grade bank BankAccountTest
What output file was created? What is the contents of that file?
>>
. It appends the output of a program to a file.
Modify your script so that it first writes the contents of $2.java
to $2.txt
, then appends the result of running the program. What is your script now?
===Program Run===
. How can you do that? (Hint: echo
, >>
)>
and >>
commands only redirect the standard output stream. The remedy is to use the 2>
operator which redirects the standard error stream. (For historical reason, that stream is also known as stream #2.)
javac *.java 2> homework.txt
Add this enhancement to the grade
file. What is your file now?
You can do simple programming in shell scripts. You have already seen the if ... then ... fi
construct. Now let's turn to loops. The syntax is
for var in ... do command $var done
You probably expected rof
, not done
at the end, but bash isn't plagued by a foolish consistency.
Here is an example:
for f in *.java do javac $f done
Or, all on one line:
for f in *.java ; do javac $f ; done
Note the semicolons before do
and done
.
Now let us put this technique to work in our grading shell script. We want to feed all input files of the form input*.txt
to the program to be graded. That way, the grader can prepare an arbitrary number of files input1.txt
, input2.txt
, etc.
for f in input*.txt ; do ( java $2 < $f >> $1.txt ) ; done
Note the ()
around the Java command. This makes the java
program run in a subshell, confining the redirection of input and output. Without the parentheses, the <
and >>
would apply to the whole loop.
input1.txt
and input2.txt
with different inputs for the SavingsAccountTest
program. (Run it to see what inputs are expected.) What are your input files?interest.txt
after you ran ~/grade interest SavingsAccountTest
?gradehelper
so that the outputs that belong to different program runs are separated from each other? (Hint: echo
) There are actually several, such as zealous and whereabouts.
How about a word that contains the consecutive letters ea, followed by another letter, followed by the letters io?
(If you don't know one, just write “I don't know”. You'll learn in this lab how to find the answer.)
{eaaou, eabou, eacou, eadou, eaeou, eafou, eagou, eahou, ..., eazou}
This set is described by the regular expression ea[a-z]ou
.
Generally, a character matches itself (such as ea
and ou
in the example above). But the expression [a-z]
matches any lowercase letter from a
to z
. [a-zA-Z]
matches any upper- or lowercase letter, and [aeiou]
matches any vowel.
Using the same syntax, write an expression for “any letter or digit”.
egrep
command prints out all lines in a file that contain a match for a regular expression. Type
egrep 'ea[a-z]ou' /usr/share/dict/words
What happens?
When you browse the web, you may run into tutorials where the same command is written as grep ea[a-z]ou /usr/share/dict/words
. It is a good idea to always use egrep
instead of the older grep
. Also, get into the habit of enclosing the regular expression inside single quotes ''
. Then the command shell won't intercept characters such as $
and \
inside your regular expressions.
[^...]
syntax. For example, [^aeiou]
means “anything but a vowel”.
Find all words in /usr/share/dict/words
that have a q or Q followed by a letter other than u. What is your call to egrep
?
^
character matches the beginning of the line, and $
matches the end of the line. For example, ^oo
matches all words that begin with oo, such as ooze or oodles.
The |
operator separates alternative matches. For example, a(vv|x)y
matches savvy or waxy.
How do you find all words that begin or end with oo, using a single call to egrep
?
* | 0 or more |
+ | 1 or more |
? | 0 or 1 |
{n} | n times |
{n,} | at least n times |
{,n} | at most n times |
{m, n} | between m and n times |
Find all words in /usr/share/dict/words
that have an a, b, or c at least five times in a row (such as cabbage). What is your call to egrep
? What matches did you find?
.*
means zero or more characters, and .+
means at least one character. For example, o.+o.+o
matches tomorrow but not zoology.
How do you find all words that contain oo twice, such as foolproof or voodoo?
/usr/share/dict/words
is a bit special since it has one word per line. By default, egrep
lists all lines that contain a match. What happens when you run
egrep '[A-Za-z]+' BankAccount.java
-o
option:
egrep -o '[A-Za-z]+' BankAccount.java
What happens when you try that?
How many words contain oo twice? You can count the output of the preceding call to egrep
, but that's tedious. Instead, you can use the wc
program that counts words.
wc < BankAccount.java
What output do you get? What do you think the numbers mean?
If you can't figure it out, try running with this file instead:
Hello World Goodbye
egrep
command from step E7 and save the output to a file temp.txt
. Then use wc
to count the words in temp.txt
. What were your commands?egrep 'your pattern' /usr/share/dict/words | wc
feeds the standard output of egrep
into the standard input of wc
, without the need to make a temporary file.
What command do you call to find out how many words contain oo twice?
egrep -o '^[ ]*[0-9]*'
to only grab the first number. What is your command?egrep
command. What does it match?wc --help
)java Median1 10000
What output do you get?
time java Median1 10000
What output do you get?
for f in 10000 20000 30000 ; do ( time java Median1 $f ) ; done
What output do you get?
user
times. Use grep
to only show those lines:
( for f in 10000 20000 30000 ; do ( time java Median1 $f ) ; done ) | grep real
What happens? Why?
time
sends its output to stderr
, not stdout
. The syntax to pipe stderr
is a bit bizarre. It is
command1 2>&1 | command2
Fix up the command of the preceding step. What is your command? What is your output?
seq 10000 5000 100000
What is the output?
seq
mean? (Try seq --help
)seq
output as the parameters of the for
loop. In bash, you can splice the output of one command into another with the backticks `...`
for f in `seq 10000 5000 100000` ; do
Try this. What command did you use? What is your output?
Be sure to use the backtick (to the left of the 1 key) and not the single quote.
tar tvfz enron_with_categories.tar.gz
How many files are inside? (Hint: Run it again and pipe into wc
.)
tar xvfz enron_with_categories.tar.gz
Afterwards, run
find enron_with_categories | wc
What number of lines do you get? Why?
egrep -o -h -r '[^0-9][0-9]{3}[-) ]+[0-9]{3}[- ]+[0-9]{4}' enron_with_categories
What output do you get?
-o
, -h
, -r
options? (Hint: egrep --help
)-888-271-0949 800-283-1805 609 279 4094 (415) 777 -0220 415 781 0701
Imagine you are an intern working for a congressional committee. Your task is to clean these up for your neat-freak boss: Put them all into the format (888) 271-0949. There are several hundred of them, so you don't want to do it by hand.
The first step is easy: put them all in a single text file. How do you do that?
A programmer's editor can match regular expressions. (Notepad—not so much.) In Emacs, the command is Edit -> Search -> Incremental search -> Forward regexp or C-M-s (i.e. Control+Alt+s). Enter that command. Type [0-9]+
. Then type C-s repeatedly to go from one match to the next.
What happened when you typed in [0-9]+
?
[0-9]+
with the #
symbol. In Emacs, you type a !
to indicate global replacement. That is,
[0-9]+
Enter#
EnterWhat happens?
Let's try matching each of the numbers. A suitable regular expression is
.[0-9]+[^0-9]+[0-9]+[^0-9]+[0-9]+
Explain this expression in English.
\(
and \)
to mark the groups:
.\([0-9]+\)[^0-9]+\([0-9]+\)[^0-9]+\([0-9]+\)
Then you use \1
\2
\3
to refer to the match for the first, second, and third group.
Try it out: Replace each phone number with
(\1) \2-\3
What output did you get?
Congratulations! If you were a congressional staffer, you could now take the afternoon off instead of laboriously formatting each phone number.
Hopefully, these exercises have given you a feel for the power of automation. While it is undeniably challenging to automate a task for the first time, the effort is repaid handsomely. It is fun to watch the computer do the same boring tasks over and over, particularly if you consider how much time it would have taken you to do it by hand.
In your programming and testing process, you carry out lots of repetitive steps. Automate them, and you will become more productive. You will also find that you would never attempt certain tasks without automation. For example, consider the task of testing your programs. Whenever you change a program, you should really test it again with a bunch of inputs. Do you do that? Probably not. What could be more tedious than typing in the same inputs over and over again? You now know that you can automate that task. Put a bunch of test inputs into files and write a shell script that automatically feeds them into your program. Test automation leads to higher quality programs.
For those reasons, all professional programmers are serious about automating their build and test processes. You have just learned how to use the command shell for basic automation tasks.