My Adventures With LLVM

This is the log of things I did to create a new LLVM backend for the TRaX architecture. I used the LLVM Tutorial for writing an LLVM Backend and the other documents linked from it as a guideline. I also wrote installation directions for the Trax back-end and instructions to add Trax intrinsics.

  1. Download LLVM source
  2. Make sure LLVM compiles on my machine (./configure make)
  3. Copy MBlaze directory into Trax directory in < llvm-directory >/lib/Target
  4. Find replace (MBlaze -> Trax) and (mblaze -> trax) in all files in the Trax directory and its subdirectories
    emacs find replace
    File renamed with:
    for f in MBlaze*; do mv "$f" "Trax${f#MBlaze}"; done
  5. Edit "configure" script in main directory and add Trax target to all the target lists.
  6. Edit lib/Support/Triple.cpp and include/llvm/ADT/Triple.h and add trax to the architecture lists there.
  7. With llvm 2.9 I had to change TRAX_INTR and TRAX_SVOL back to MBLAZE_INTR and MBLAZE_SVOL in TraxFrameLowering.cpp and TraxISelLowering.cpp
  8. I had to rename DisableDelaySlotFiller to DisableDelaySlotFillerTrax in TraxDelaySlotFiller.cpp
  9. I had to rename DisableStackAdjust in TraxFrameLowering.cpp to DisableStackAdjustTrax
  10. I removed TraxAsmBackend.cpp and the following code from TraxTargetMachine.cpp since we don't need the Asm Backend:
      // Register the asm backend
      TargetRegistry::RegisterAsmBackend(TheTraxTarget, 
                                         createTraxAsmBackend); 
  11. Check to be sure LLVM still compiles (./configure make)
  12. Run ./Release/bin/llc --version and ensure that Trax shows up as a target in the list.
  13. Modify TraxSubtarget.cpp to turn on fpu and other features we have on by default.
  14. I won't mention it any more, but it's best to check to be sure it's working after pretty much every step.
  15. Change the strings in TraxInstrInfo.td to match the TRaX instruction set.
  16. In the trax simulator, add all the new instructions to Instruction.h and the corresponding functional units.
  17. In our case the code we were compiling was generating a fneg instruction in the LLVM byte code and causeing LLVM to fail for our target. To fix this, the following two lines were added in TraxInstrFPU.td:
      def FNEG   : ArithF2<0x16, 0x300, "fneg   ", IIAlu>;
    and
      def : Pat<(fneg FGR32:$V), (FNEG FGR32:$V)>; 
  18. Add LLVM intrinsic for PRINT instruction.
  19. Similarly add intrinsics for the other special trax instructions: loadi, loadf, atomicinc, storei, storef, min, max, invsqrt
  20. I ran into a problem with libgcc calls being generated when we don't have a version of libgcc for trax. In tracking down the problem, I found this page that explains what the different library calls are.
  21. I wrote my own linker to link together various files so that I could link library functions. Primarily it seems that I need floating point functions which can be found at koders.com.

LLVM Intrinsics

To get LLVM intrinsics to work, you need to have a header file or something that makes function calls that represent them (to make programming much nicer). The intrinsic for square root is as follows:

extern "C" float sqrt(float) asm("llvm.sqrt.f32");
Then call the sqrt() function like normal.

Addressing Mode

The Trax simulator is word addressed while LLVM can only emit byte addressed code. We will need to come up with a solution for this problem.

Notes

In the process of working on this, I found a problem with many of the back ends not being able to compile our ray tracer because of problems with the calling convention. Here is the list of backends with whether they work or not. The error message is something along the lines of:

Call result #2 has unhandled type i32
UNREACHABLE executed at CallingConvLower.cpp:162!
Sometimes the type would be f32 on systems that support floating point.

I made a list of the MicroBlaze ISA to compare against the current TRaX ISA so that we could see which instructions were going to be changed. The following is that spreadsheet.

Some simple programs compile using MicroBlaze backend.

int main() {
  int a = 5;
  int b = 15;
  float c = 1.0;
  float d = 2.0;
  int i;
  for(i = 0; i < 1000; ++i) {
    c *= 3.0;
  }
  return a + b + (c + d) / 1.6;
}
compiles to (no opt. llc -march=trax):
        .file   "test_opt.bc"
        .text
        .globl  main
        .align  2
        .type   main,@function
        .ent    main                    # @main
main:
        .frame  r1,4,r15
        .mask   0x00008000,-4
# BB#0:                                 # %bb.nph
        addi      r1, r1, -28
        swi       r15, r1, 0
        addi      r3, r0, 1000
        ori       r4, r0, 0x3F800000;   # immediate = float
        1.000000e+00
$BB0_1:                                 # %bb
                                        # =>This Inner Loop Header:
        Depth=1
        ori       r5, r0, 0x40400000;   # immediate = float
        3.000000e+00
        fmul      r4, r4, r5
        addi      r3, r3, -1
        add       r5, r0, r0
        cmp       r5, r5, r3
        bneid     r5, $BB0_1
        nop    
# BB#2:                                 # %bb2
        ori       r3, r0, 0x40000000;   # immediate = float
        2.000000e+00
        fadd      r5, r4, r3
        brlid     r15, __extendsfdf2
        nop    
        addi      r7, r0, 1073322393
        addi      r8, r0, -1717986918
        add       r5, r3, r0
        add       r6, r4, r0
        brlid     r15, __divdf3
        nop    
        addi      r7, r0, 1077149696
        add       r8, r0, r0
        add       r5, r3, r0
        add       r6, r4, r0
        brlid     r15, __adddf3
        nop    
        add       r5, r3, r0
        add       r6, r4, r0
        brlid     r15, __fixdfsi
        nop    
        lwi       r15, r1, 0
        addi      r1, r1, 28
        rtsd      r15, 8
        nop    
        .end    main
$tmp0:
        .size   main, ($tmp0)-main
Also compiles to (opt -O3 Traxc):
        REG r1
        REG r2
        REG r3
        REG r4
        REG r5
        LOADIMM r1 1.00000000
        LOADIMM r1 2.00000000
        LOADIMM r1 5
        LOADIMM r1 15
        LOADIMM r1 0
        LOADIMM r1 0
        LOADIMM r2 1.00000000
l2:     LOADIMM r3 3.00000000
        FPMUL r3 r2 r3
        LOADIMM r4 1
        ADD r4 r1 r4
        LOADIMM r5 1000
        EQ r5 r4 r5
        BNZ r5 0 l4
        NOP
        NOP
        NOP
        MOV r1 r4
        MOV r2 r3
        JMP 0 0 l2
l4:     LOADIMM r2 2.00000000
        FPADD r2 r3 r2
        LOADIMM r3 0
        FPINV r3 r3
        FPMUL r3 r2 r3
        LOADIMM r2 20.0000000
        FPADD r2 r3 r2
        PRINT r2
        HALT
The following issues have been resolved.