XML String Output Example

Look at the XMLStringOutputExample.java . (The source for this is available in the full SnuggleTeX distribution if you want to compile this yourself.) This example generalises the minimal example somewhat to demonstrate the versatility of the XML String outputs you can generate from SnuggleTeX:

/* Create vanilla SnuggleEngine and new SnuggleSession */
SnuggleEngine engine = new SnuggleEngine();
SnuggleSession session = engine.createSession();

/* Parse some LaTeX input */
SnuggleInput input = new SnuggleInput("\\section*{The quadratic formula}"
        + "$$ \\frac{-b \\pm \\sqrt{b^2-4ac}}{2a} $$");
session.parseInput(input);

/* Specify how we want the resulting XML */
XMLStringOutputOptions options = new XMLStringOutputOptions();
options.setSerializationMethod(SerializationMethod.XHTML);
options.setIndenting(true);
options.setEncoding("UTF-8");
options.setAddingMathSourceAnnotations(true);
options.setUsingNamedEntities(true); /* (Only used if caller has an XSLT 2.0 processor) */

/* Convert the results to an XML String, which in this case will
 * be a single MathML <math>...</math> element. */
System.out.println(session.buildXMLString(options));

This block of code converts the given LaTeX input to XHTML and MathML, printing the result to the console.

You can run this on the command line with:

java -classpath snuggletex-core-n.n.n.jar uk.ac.ed.ph.snuggletex.samples.XMLStringOutputExample

(This assumes that snuggletex-core-n.n.n.jar is in the current directory; you will of course need to provide a path if it is not. You should also substitute n.n.n for the actual version number attached to your SnuggleTeX JAR.)

Under the hood, this uses the default XSLT 1.0 processor that comes bundled with your Java install, which does not support all of the available SnuggleTeX output options. In particular, the SerializationMethod.XHTML option set in the example code requires XSLT 2.0, so SnuggleTeX will actually give you a SerializationMethod.XML output instead if you only have XSLT 1.0, which does not include some XHTML specific niceties. Also, the setUsingNamedEntities() option also requires XSLT 2.0 and is silently ignored here. If having either of these features in your output is important, you can use the full SnuggleTeX distribution, which includes the Saxon XSLT 2.0 processor, and run the example with:

java -classpath snuggletex-core-n.n.n.jar:saxon9-9.1.n.n.jar:saxon9-dom-9.1.n.n.jar \
  uk.ac.ed.ph.snuggletex.samples.XMLStringOutputExample

(I have used a \ character here to split the input into multiple lines for readability. Input this command as a single line if your shell or command prompt does not support this feature.)

This adds the Saxon XSLT processor to the ClassPath, which is enough for SnuggleTeX to find it.

When run, you should get an output like:

<h2 xmlns="http://www.w3.org/1999/xhtml">The quadratic formula</h2>
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
  <semantics>
    <mfrac>
      <mrow>
        <mo>-</mo>
        <mi>b</mi>
        <mo>&pm;</mo>
        <msqrt>
          <mrow>
            <msup>
              <mi>b</mi>
              <mn>2</mn>
            </msup>
            <mo>-</mo>
            <mn>4</mn>
            <mi>a</mi>
            <mi>c</mi>
          </mrow>
        </msqrt>
      </mrow>
      <mrow>
        <mn>2</mn>
        <mi>a</mi>
      </mrow>
    </mfrac>
    <annotation encoding="SnuggleTeX">$$ \frac{-b \pm \sqrt{b^2-4ac}}{2a} $$</annotation>
  </semantics>
</math>

(This is actually the output when using Saxon, as you will see from the &pm; entity. If you had run using the default XSLT 1.0 processor, you would get an actual ± character in the output.)

In this example, we have used an XMLStringOutputOptions Object to control the type of output we obtain, asking for the resulting XML to be written out using XHTML conventions (if possible), encoded in UTF-8 and indented by the default indentation amount of two spaces. The setAddingMathSourceAnnotations(true) causes SnuggleTeX to add in a <annotation/> element to each MathML generated showing the original LaTeX source for each bit of mathematics, which can sometimes be useful. Additionally, the setUsingNamedEntities(true) causes SnuggleTeX to write out mathematical symbols as named XML entities rather than numerical character references. This is useful here as it makes the output more readable, but is not something you would want to want all of the time as it requies your XML to be accompanied by an appropriate DTD if you want it to be parseable. (Note again that the XHTML output and named entities features require an XSLT 2.0 processor, such as Saxon, bundled with the full SnuggleTeX distribution. They will be silently ignored otherwise.)