Home > Articles > Home & Office Computing

  • Print
  • + Share This
Like this article? We recommend

Transforming Paragraphs

The preceding section showed how you can generate simple <P> tags from Word paragraphs. In most cases, you’d want to perform at least two additional transformation steps:

  1. Generate heading tags (<H1>, <H2>, <H3>, etc.) based on Word outline level.
  2. For all other paragraph types, include the name of the Word paragraph style as the class tag of the generated <P> tag.

The change needed to generate the class tag is very simple. The Word style name is stored in the w:val attribute of the w:pPr/w:pStyle child. If the Word style name equals some predefined value (such as normal or BodyText) or doesn’t exist at all, we ignore it; otherwise, the class attribute is appended to the <P> tag (see Listing 4).

Listing 4 Word paragraph style converted into the class attribute of the <P> tag.

<xsl:template match="w:p[ancestor::w:body]">
 <p>
  <xsl:variable name="paraStyle" select="w:pPr/w:pStyle/@w:val" />
  <xsl:choose>
   <xsl:when test="not($paraStyle)" />
   <xsl:when test="$paraStyle = ’normal’ or $paraStyle = ’BodyText’" />
   <xsl:otherwise>
    <xsl:attribute name="class"><xsl:value-of select="$paraStyle" /></xsl:attribute>
   </xsl:otherwise>
  </xsl:choose>
  <xsl:apply-templates />
 </p>
</xsl:template>

To transform the Word outline paragraph styles into the HTML heading tags, we extract the Word paragraph style name as just described and check the w:outlineLvl child of the corresponding w:style tag. The transformation should generate the HTML heading tags if the outline level exists for the selected style, or regular <P> tags otherwise. To avoid a massive xsl:choose select block, we’ll use the xsl:element instruction to create output HTML tags dynamically (see Listing 5). You can also download the complete XSL template for this example.

Listing 5 Translate the w:p tags into headings and regular paragraphs.

<xsl:template match="w:p[ancestor::w:body]">
 <xsl:variable name="paraStyle" select="w:pPr/w:pStyle/@w:val" />
 <xsl:variable name="outLvl" select="//w:style[@w:type = ’paragraph’ and @w:styleId=$paraStyle]/w:pPr/w:outlineLvl/@w:val" />
 <xsl:variable name="elName">
  <xsl:choose>
   <xsl:when test="$outLvl">h<xsl:value-of select="$outLvl + 1" /></xsl:when>
   <xsl:otherwise>p</xsl:otherwise>
  </xsl:choose>
 </xsl:variable>
 <xsl:element name="{$elName}">
  <xsl:choose>
   <xsl:when test="$elName != ’p’" />
   <xsl:when test="not($paraStyle)" />
   <xsl:when test="$paraStyle = ’normal’ or $paraStyle = ’BodyText’" />
   <xsl:otherwise>
    <xsl:attribute name="class"><xsl:value-of select="$paraStyle" /></xsl:attribute>
   </xsl:otherwise>
  </xsl:choose>
  <xsl:apply-templates />
 </xsl:element>
</xsl:template>

As you can see, the paragraph transformation is becoming more and more complex:

  • The paragraph style is extracted into the paraStyle variable.
  • Outline level is extracted into the outLvl variable. The select attribute of the xsl:variable instruction finds a w:style paragraph style node (identified with w:type equal to paragraph) with style name equal to our paragraph style, and extracts the w:val attribute of its w:pPr/w:outlineLvl child. If that child doesn’t exist, the outLvl variable remains empty.
  • The elName variable is set to hx if the outLvl variable is set, or to p (regular paragraph) otherwise.
  • An output element is generated with the xsl:element instruction (replacing the <P> tag from the previous transformation) using elName as its name. Note that the $elName variable has to be placed in braces ({}) to force variable substitution.
  • The rest of the transformation has already been explained in Listing 4; the additional xsl:when instruction skips the generation of class tags for HTML headings.
  • + Share This
  • 🔖 Save To Your Account