Bagel assembler kernel generator

The Bagel assembler generation library is written and maintained by Peter Boyle

It is composed of two key parts: a library to which one can programme a generic RISC assembler kernel, and a set of programmes that use the library to produce key QCD and linear algebra operations.

The kernels it generates are commonly used in both the Chroma and Columbia Physics System QCD code bases when targetting QCDOC and BlueGene. It has comms hooks for both FAKE (single processor), SCU calls (QCDOC) and QMP calls (for Chroma) . The kernels are designed to be callable
from any C or C++ code base, however.

The generator is retargetable, but for now key targets are ppc440, bgl, bgq and powerIII.

If you use this software, please cite it as

Computer Physics Communications Volume 180, Issue 12, December 2009, Pages 2739–2748
"P.A. Boyle,, 2005"

BAGEL download


Production Bagel for BlueGene/Q includes a multilevel HDCG solver
Bagel for BlueGene/Q prerelease versions (beta quality):

Remember the old joke about optimising your code until the lights flicker?
This time I really have: this is the Edinburgh BG/Q running the BFM inverter and the lights are DC LEDs on the bulk power modules.
They only flicker when running Bagel!

Bagel-3 will change the interface substantially. The data layout becomes opaque and architecture dependent with an import/export interface.

Entire algorithms are implemented in Bagel to amortize layout change overhead:


  • Preconditioned conjugate gradient
  • Unpreconditioned conjugate gradient
  • Mixed precision defect correction inversion
  • Multi-mass conjugate gradient
  • Implicitly restarted shifted Lanczos (Rudy Arthur)

  • Matrix support:

  • Wilson
  • Wilson twisted mass
  • Domain Wall (5d even odd)
  • Domain Wall (4d even odd)
  • Overlap general Cayley form Mobius kernels (tanh or zolotarev)
  • Overlap general Continued fraction Wilson kernel (tanh or zolotarev)
  • Overlap general Partial fraction Wilson kernel (tanh or zolotarev)

  • Compilation

    Configuring bfm
    env CXX=g++ CC=gcc ~/BGQ_sfw/src/revision_controlled/bagel/configure --enable-itype=uint64_t --enable-isize=8 --enable-ifmt=%lx --prefix=$INST --enable-istype=uint32_t --enable-issize=4 --enable-isfmt=%x

    Configuring bfm
  • configure CXX=mpicxx LIBS="-lSPI -lSPI_l1p -L$BGQ/spi/lib/" CC=mpicc CXXFLAGS="-fpermissive -fopenmp -I$BGQ -I$BGQ/spi/include/kernel/cnk/" \
  • --prefix=$INST \
  • --with-bagel=$INST \
  • --host=none \
  • --build=powerpc-bgq-linux \
  • --enable-qdp \
  • --enable-chroma-regression \
  • --enable-target-cpu=bgq \
  • --enable-comms=spi \
  • --enable-spidslash \
  • --enable-thread-model=spi
  • Bagel-2

    Bagel version 2 & paper preprint released


    Stable QCDOC code:

    bagel-1.3.2.tar.gz and bagel_wilson_dslash-1.3.3.tar.gz

    BlueGene and QCDOC code.

    bagel-1.4.0.tar.gz and bagel_wilson_dslash-1.4.0.tar.gz and hacked bagel_qdp-1.4.0.tar.gz

    BlueGene and QCDOC code that hopefully works on ODD sublattices bagel_wilson_dslash-1.4.6.tar.gz

    • Supports double Hummer complex primitives,
    • mixed precision support
    • simple runtime selection of reduced precision in two-spinors,
    • new bluegene kernels in addition to QCDOC kernels
    • Multi-core enabled (posix threads and BG/L co-routines - tested on MacOS-X/pthreads and due care taken to ensure L1 coherency on BG/L).
    I'm hoping the mixed precision approach will yield a modest speed up.  More on this when I've benchmarked it.

    See: Compiling for bluegeneand

    And finally,

    Please do cite the bagel paper.

    It would also be kind to mail me and let me know if you like it!