Homework 1: Working with GDB and building simple UNIX programs

This assignment will make you more familiar with how to build simple Unix programs, and debug them with GDB. You can do this assignment on any operating system that supports the Unix API (Linux Openlab machines, your laptop that runs Linux or Linux VM, and even MacOS, etc.). You don't need to set up xv6 for this assignment. Submit your programs and the shell through Gradescope (see instructions at the bottom of this page).

NOTE: YOU CANNOT PUBLICLY RELEASE SOLUTIONS TO THIS HOMEWORK. It's ok to show your work to your future employer as a private github/gitlab repo, however any public release is prohibited.

For Mac / OSX users, the support of 32 bit applications is deprecated in the latest version. So if you already updated your system to macOS Catalina or have updated XCode then we recommend that you do the homework on Openlab machines.

Part 1: Simple UNIX programs

Download the main.c, and look it over. This is a skeleton for a simple UNIX program.

To compile main.c, you need a C compiler, such as gcc. On Openlab machines, you can compile the skeleton with the following command:

 $ gcc main.c 
This will produce an a.out file, which you can run:
 $ ./a.out 

Alternatively you can pass an additional option to gcc to give a more meaningful name to the compiled binary, like

$ gcc main.c -o hello

Here gcc will compile your program as hello.

Debugging programs with GDB

On UNIX systems the main debugger is GDB (GNU debugger). To be able to comfortably debug your code compile it with the -g option which will instruct the compiler to generate debug symbols (variable names, source lines, etc.) for the program. For example, change the command you ran to build hello to
$ gcc main.c -o hello -Wall -g -m32 -fno-pic 
This will compile your hello program with debugging symbols (-g flag), as a 32bit x86 executable (-m32 flag), and for simplicity avoid generating position independent code ( -fno-pic flag). Then you can start you program under control of gdb:
$ gdb hello
This starts gdb ready to execute your hello program. To get it running type the run command in the GDB command prompt (or just r for short) :
(gdb) run
Now the program runs and finished printing "Hello world".

GDB is a feature-rich debugger, and it will take you some time to learn all the features. Here are a few starting points: GDB tutorial, GDB intro and GDB Cheat Sheet.

Probably, the best resource for this homework is Operating Systems from 0 to 1. Chapter 6. Runtime Inspection and Debug (it is strongly recommended to read this chapter).

At a high level you need only two main things: 1) breakpoints and 2) ability to examine data. Breakpoints can be set with the "b" command inside gdb.

Breakpoints and single stepping

Just to make debugging a bit more realistic lets add another function to our simple program. Lets change it to compute a sum of numbers from 0 to n. You can do this by implementing the sum function, and calling it from main:
unsigned long sum(int n) {
    int i;
    unsigned long sum = 0;
    for (i = 0; i < n; i++) {
        sum = sum + i;
    }

    return sum;
}

int main(void) {
    
    printf("Hello world\n"); 

    unsigned long s;
    s = sum(100);
    printf("Sum:%ld\n", s);

    return 0; 
}

Running the program on its own is not that useful. Lets try setting a breakpoint on the "main" function to examine what the program is actually doing. Type break main in the GDB command prompt (or b for short) and then run the program with r.

(gdb) break main
Breakpoint 1 at 0x56b: file main.c, line 26.
(gdb) r
Starting program: ...  

Breakpoint 1, main () at main.c:26
26          s = sum(100);
(gdb) 

The debugger stopped at the beginning of the main function (line 26 of main.c). You can examine the source code of the program by typing list (or l for short).

(gdb) list
21
22      int main(void) {
23
24          unsigned long s;
25
26          s = sum(100);
27          printf("Hello world, the sum:%ld\n", s);
28          return 0;
29      }
30

Now you can execute the program line by line by typing next (or n for short), which executes the next line. By default typing next will skip over functions. Type step (or s for short) to step into a function. Try stepping into the sum function by running step.

(gdb) s
sum (n=100) at main.c:13
13          unsigned long sum = 0;

We are now inside the sum function. Type l to list the source code, and then type n repeatedly to execute the function line by line. Note that we can also type n once, and then simply hit Enter asking GDB to execute the last command for us.

(gdb) l
8       #include 
9       #include 
10
11      unsigned long sum(int n) {
12          int i;
13          unsigned long sum = 0;
14
15          for (i = 0; i < n; i++) {
16              sum = sum + i;
17          }
(gdb) n
15          for (i = 0; i < n; i++) {
(gdb)
16              sum = sum + i;
(gdb)
15          for (i = 0; i < n; i++) {
(gdb)
16              sum = sum + i;

TUI: Graphical User Interface

The second most useful feature is the TUI mode that turns GDB into a real modern debugger. Here is a useful discussion about TUI.

You can switch into TUI by pressing Ctrl-X and then "1", or start gdb in TUI mode right away

 $ gdb hello -tui
You can also type tui enable in the gdb command prompt (this command doesn't work on Openlab, so you'll have to do Ctrl-X and then 1, but it normally works).

Start the program from the begginging and single step it with n and s. The source code of the program will be scrolling in the TUI window in the top part of the screen.

Examining data

You can print values of variables with "print", e.g., print the values of i and sum

  (gdb) p i
  (gdb) p sum

Conditional breakpoints

While debugging programs it's often useful to see what the program is doing right before it crashes. One way to do this is to step through, one at a time, every statement of the program, until we get to the point of execution where we want to examine the state of the program. This works, but sometimes you may want to just run until you reach a particular section of code based on a condition, and stop execution at that point so you can examine data at that point of execution.

For instance, in the sum function, you might want to examine the state of the program when the index i is equal to 50. You can single step until i increments and reaches the value 50, but this would be very tedious.

GDB allows you to set conditional breakpoints. To set a conditional breakpoint to break inside the loop of the sum function when the index i is equal to 50, we do the following: first, list the source code to get the exact source lines; second, set a breakpoint inside the main.c file at line 16 with break main.c:16; third, to make the breakpoint trigger only when i is equal to 50 (and not trigger for every iteration of the loop) we type condition 2 i==50.

(gdb) l
11      unsigned long sum(int n) {
12          int i;
13          unsigned long sum = 0; 
14
15          for (i = 0; i < n; i++) {
16              sum = sum + i; 
17          }
18
19          return sum; 
20      }
(gdb) break main.c:16
Breakpoint 2 at 0x56555543: file main.c, line 16.
(gdb) condition 2 i==50
Note that the 2 in the condition refers to the breakpoint number we were notified about when we initially set the breakpoint. We can also achieve the above in one command statement with the following:
(gdb) break main.c:16 if i==50

We now continue execution of the program with the continue or c command.

(gdb) c
Continuing.

Breakpoint 2, sum (n=100) at main.c:16
16              sum = sum + i; 

When the breakpoint is hit we can check if the value of i is really 50:

(gdb) p i
$1 = 50
(gdb) 

Exploring crashes

Now, lets take a look at how you can use GDB to debug your crashing programs. First, lets generate a program that crashes. Add a global variable a[32] to your program (it's an array of 32 integers), and then add a function that makes an out of bounds array access.
int a[32]; // the global array

unsigned long crash_array(int n) {
    int i;
    unsigned long sum = 0;

    for (i = 0; i < n; i++) {
        sum = sum + a[i];
    }

    return sum;
}
If you invoke this function with n larger than 31 it will crash. Note that you might get lucky and it will not crash: not all out of bounds accesses cause a crash in C programs. To be sure, lets invoke it with n equal to 10,000
s = crash_array(100000);
printf("crash array sum:%ld\n", s);    
If you append the above lines to your main.c, compile, and run it, it will crash.
$ ./hello
Hello world
Sum:4950
Segmentation fault (core dumped)
$
Now, to understand the crash you can run it under gdb:
(gdb) r
Starting program: /home/aburtsev/doc/OS_Stuff/Flux/git/personal/classes/os-class/cs143a/hw/hello
Hello world
Sum:4950

Program received signal SIGSEGV, Segmentation fault.
0x56555566 in crash_array (n=100000) at main.c:18
18	        sum = sum + a[i];
You can use the backtrace (bt) command to look at the backtrace (a chain of function invocations leading to the crash):
(gdb) bt
#0  0x56555566 in crash_array (n=100000) at main.c:18
#1  0x565555ec in main () at main.c:45
Here, the GDB tells you that crash_array got a segmentation fault at line 18 in main.c. You see that there are two stack frames available (0 for main and 1 for crash_array). You can use the frame (f) command to choose any of the frames and inspect it. For example, lets choose frame #0 and list the crashing code with the list command
(gdb) f 0
#0  0x56555566 in crash_array (n=100000) at main.c:18
18	        sum = sum + a[i];
(gdb) l
13	unsigned long crash_array(int n) {
14	    int i;
15	    unsigned long sum = 0;
16
17	    for (i = 0; i < n; i++) {
18	        sum = sum + a[i];
19	    }
20
21	    return sum;
22	}
We know that line 18 is the crashing line. We can print the values of the local variable i
(gdb) p i
$1 = 35824
It is equal to 35824. This should give you enough information for why you crashed.

Now fix the crash_array function to prevent the program from crashing.

What to submit

The fixed main.c program.

Part 2: Simple UNIX programs

cat pogram (cat238p)

Use the main.c template as a starting point for a simple cat program that you should implement. First copy the main.c into main-cat238p.c (you will need to use main.c for other parts of future homeworks, so lets keep it around).

Our cat program displays the contents of a single file on the standard output. It takes either one or no arguments. If one argument is provided (the name of the file), then the program simply displays the contents on standard output. If no argument is given the program simply shows the content of the standard input on the standard output.

Here is an example invocation which displays the contents of a file main.c, with the name of the file provided as an argument (assuming you call your executable cat238p):

$ cat238p main.c
Or it should also work like this, where standard input has been redirected to the file:
$ cat238p < main.c
You should use read() and write() system calls to read the input and write the output. Since cat238p takes command line arguments you should change the definition of the main() function to allow passing of command line arguements like:
int main(int argc, char *argv[])
If you have never worked with command line arguments in C here is a link that might be useful: Arguments to main. You can also take a look at a couple of user-level programs that take command line arguments from the xv6 source tree: rm.c, ls.c, wc.c.

Note: You might find it useful to look at the manual page for read(), write(), and other system calls. For example, type

$ man read 
and read about the read system call. Here the manual says that you should include
#include <unistd.h>
in your program to be able to use it, and the system call can be called as a function with the following signature: ssize_t read(int fd, void *buf, size_t count); .

The manual describes the meaning of the arguments for the system call, return value, and possible return codes. Finally, it lists several related system calls that might be helpful.

Note that when the manual lists a function like open(2), it means that it's described in the 2nd section of the manual and to get to that specific section you have to invoke man with an additional argument like this:

$ man 2 open
It's a good idea to read the man entry on man itself, i.e.,
$ man man
Some useful commands are -k to search the manual for the string matching a query:
$ man -k open
Note, that here there are multiple entries for the open() system call and default invocation, man open, will return an entry for the openvt command, and not file open command.

Note: If you would use exec function, then be careful because location of cat executable may be different from your machine. Gradescope uses Ubuntu 18.04 and the location for cat is /bin/cat. Or you can read exec call manual and use execvp function which support automatic searching for executables in PATH.

What to submit

Submit main-cat238p.c which is your implementation of cat

Submit your work

Submit your solution through Gradescope Gradescope CS238P Operating Systems. Place each part of the assignment into folders with name part1, part2 then pack them into a zip archive and submit it. Please name the C files main.c for part1, and main-cat238p.c for part 2. You can resubmit as many times as you wish. If you have any problems with the structure the autograder will tell you. The structure of the zip file should be the following:

/
  - /part1
    - main.c
  - /part2
    - main-cat238p.c
Updated: April, 2020