On this page:
Functions and Variables a Shared Library
Inspecting a Shared Library
Levels of Completion
Prototype Assumptions
Output Format
x86-64 Instruction Constraints
Tips
Test Cases and Test Harness

Linking Lab

Due: Wednesday, October 19, 11:59pm

This assignment is about understanding how the ELF format supports the run-time linking of shared libraries, including the way that references to global variables are implemented and represented. The assignment also requires you to implement a small amount of disassembly of machine code.

Please note that the output required from your program is specified in Output Format. We provide tests and a test harness as described in Test Cases and Test Harness for three levels of completion. Your work is not a solution is if doesn’t match the given output specification, whether that’s because it prints extra spaces or extra blank lines or extra names or extra debugging output.

Functions and Variables a Shared Library

A shared library can import and export both functions and global variables. The functions can read and write to global variables, whether or not the shared library defines the variables.

For example, this .c program could be compiled to a shared library:

  int a;

  extern int b;

  

  int do_something(int v) {

    b = a;

    a = v;

    return b;

  }

The shared library provides a do_something function and a variable named a, and it uses a variable b to be supplied by another shared library or the main executable.

If a programmer has the above .c source, then it’s obvious that do_something uses the global variables a and b. If a programmer is given only the code as a compiled shared library, then that information is not nearly as apparent. It can be extracted only by a careful reading of the machine code and an understanding of how shared libraries are linked for a running program. The objdump program decodes the relevant information, but mixed among much other information.

We’d like to have an inspect tool that takes a shared library and reports which global variables are used by each provided function. Your job is to create an initial prototype of that tool. In principle, determining the variables that a function actually uses could require solving the halting problem. Many approximations are useful, however, and your task will involve a particularly simple approximation.

Inspecting a Shared Library

Suppose that the C code with do_something above is in "demo.c" and compiled with

  $ gcc -O2 -fPIC -c demo.c

  $ gcc -shared -o demo.so demo.o

Then, running your inspect prototype as

  $ ./inspect demo.so

should print out

  do_something

    a

    b

This output shows that a function call do_something is provided by the library, and it refers to global variables a and b.

If "demo.c" is instead

  int a;

  extern int b;

  

  int do_something(int v) {

    b = a;

    a = v;

    return b;

  }

  

  int do_something_else(int v) {

    return v;

  }

then, running your inspect prototype should print out

  do_something

    a

    b

  do_something_else

because do_something_else doesn’t use global variables. Finally, because a function might not itself use variables but might call another function that does, in the case of

  int a;

  extern int b;

  

  int do_something(int v) {

    b = a;

    a = v;

    return b;

  }

  

  int do_something_else(int v) {

    return v;

  }

  

  int do_the_third_thing(int v) {

    return do_something(v);

  }

then your inspect program will print

  do_something

    a

    b

  do_something_else

  do_the_third_thing

    (do_something)

to expose the fact that do_the_third_thing calls do_something (which, in turn, uses a and b).

To determine function and variable information, your program will read an ELF shared-object file directly. As usual, your program should be written in ANSI standard C using only standard libraries and headers—but with one big exception: you are allowed to use the "elf.h" system header. You can also use enough Unix functions to map the file into memory with mmap.

In fact, you should start with inspect.c, which maps a given shared-object file into memory, so you can traverse ELF information by following pointers in memory. The starting code demonstrates using the in-memory image to check fields of an ELF file that identify the file type.

Levels of Completion

For a basic check grade (80%), complete the assignment to Level 1: simply print all functions that are exported from a shared library, with no information variables that are used or functions that are called.

For a check+ grade (100%), complete the assignment to Level 2: print only information about variables that are used in an exported function before that function performs any jumps (i.e., treat jumps as returns).

For a check++ grade (110%), complete the assignment fully: print all information as described above, which involves following jumps to determine what kind of function is called and (if it’s not an exported function) the variables that function uses.

When grading, we will infer an intended level of completion based on your program’s output as compared to expected output for the three levels.

Prototype Assumptions

To simplify the problem for this prototype, you can make several assumptions:

For example, if do_something is implemented as

  int do_something(int v) {

    b = a + v;

    a = v;

    return b;

  }

then your program does not need to report a use of b or a, because the implementation uses an addition instruction.

Output Format
x86-64 Instruction Constraints

Detecting variable uses will require not only finding functions that are listed in the .dynsym section, but disassembling the function implementation. Disassembling x86-64 is no fun, so your prototype need only handle the following instruction patterns (expressed as byte sequences):

If your program encounters any other opcode sequence, it should probably abandon disassembling the function. We will apply your program only to functions that are compiled to fit the constraints above.

Similar to the above patterns, you’ll need to recognize exactly one machine-code pattern in the .plt section:

Tips

Use readelf to get a human-readable form of the content of an ELF file to get an idea of what your program should find. Use objdump -d to disassemble functions to get an idea of what your program should recognize. Note that objdump -d prints opcodes alongside the assembly code that it prints.

All of the information that you need from the ELF file can be found via section headers and section content, so you will not need to use program headers. In particular, you’ll need to consult the .dynsym, .dynstr, .rela.dyn, .plt, and .rela.plt sections.

When working with ELF content, you have to keep track of which things are in terms of file offsets and which are in terms of (tentative) memory addresses where the library will eventually run. When working with ELF content that is mmapped into memory (as in the starting inspect.c), then you have one more way of referencing things, which is an address in memory at present. Be careful to keep in mind which kind of reference you have at any time.

Don’t confuse “symbol” with “string,” and keep in mind that they are referenced in different ways. Symbols are referenced by an index that corresponds to the symbol’s position in an array of symbols. Strings are referenced by an offset in bytes within the relevant string section. Every symbol has a string for its name.

Take small steps. Start out by printing the symbol index for every function that is provided by the shared library. Then, print the name instead of the symbol index. Then, print the address where each function’s implementation is found, and so on. Make reporting for variables work before attempting to implement reporting for called functions.

The information about ELF that you need to complete this assignment is mostly covered by Videos: ELF. You might use "/usr/include/elf.h" as a reference to find relevant structures, fields, and macros. You might also consult any number of other ELF references on the web.

As a lower bound, a complete solution can fit in about 200 lines of C code, including the 70 lines in the provided inspect.c. Depending on whitespace (fine), comments (good), error checking (commendable), and duplicated code instead abstracting into a function (bad), many solutions will be the range of 300-400 lines.

Test Cases and Test Harness

The archive linklab-handout.zip provides a "Makefile":

For example, the test file "f_uses_a.c" gets compiled to "f_uses_a.so", and the result of inspect f_uses_a.so is written to "f_uses_a.so.out". For a complete solution, that file is compared against the provided "f_uses_a.so.expect", and differences are reported as a failure. For Level 2 completion, our program’s output is checked against "f_uses_a.so.expect-2", instead. For Level 1 completion, you program’s output is checked against "f_uses_a.so.expect-1".

By default, comparison uses diff, which checks whether your output matches the provided output exactly. The output specification from Output Format, however, allows some flexibility in the output. The "diff.rkt" script supports all of the allowed flexibility, so, if necessary, adjust the DIFF definition in "Makefile" to use "diff.rkt" instead of plain diff.

You are not required to use the test files, but for grading purposes, we expect your program’s output to match the specification here—and checking against the test files is a good way to gain some assurance. Our testing for grading will run your program on additional shared-library files. We may also compile your program with different optimization or debugging options; as always, your program must build with gcc on CADE machines with no language-adjusting command-line flags.