- Intended Audience
- Deployment Assumptions
- How the Gateway Works
- Concepts of the Rewriter
- Adding and Removing Rewriter Rules
- Methodology for Rule Extraction
- Out-Of-Box Rule Set
- Rewriting HTML Attributes
- Rewriting FORM Tag Input
- Rewriting Applet Parameters
- Rewriting Cascading Style Sheets
- Rewriting XML
- Order Importance
- Third Party Application Cookbooks
- How to Get Hot Patches
Rewriting HTML Attributes
The following is an example:
<A TARGET="content" HREF="iim.jnlp" NAME="CHAT" onMouseOver=document.images.src="images/chat2.gif" onMouseOut=document.images.src="images/chat.gif";> <IMG ALIGN="MIDDLE" SRC="images/chat.gif" BORDER="0" ALT=" Chat"></A>
In this particular anchor example, there are two tags, A and IMG. The tag attributes are TARGET, HREF, NAME, onMouseOver, onMouseOut, ALIGN, SRC, and ALT. Only HREF, onMouseOver, onMouseOut, and SRC need to be considered for containing potential URLs. The HREF and SRC attributes have already been added to the Rewrite HTML Attributes section of the gateway profile. Their values are both raw URLs, so they will be rewritten correctly.
HTML BASE Tag
It is important to understand the role that the BASE tag plays in how documents are rewritten and what to expect in content that contains a BASE tag. The BASE tag is used by the browser for address completion of relative links. Instead of rewriting the BASE HREF attribute value and leaving the relative URLs alone, the rewriter comments out the BASE tag entirely and rewrites the relative URLs throughout the document by using the translated value of the BASE tag for address completion. The reason for this implementation is that multiple scraped channels can be displayed on the Portal Server desktop and that one uncommented BASE tag would affect any other Portal Server desktop content that might contain its own relative URLs.
Because the Portal Server desktop is essentially an HTML table after it is rendered, there is no way to have multiple BASE tags and have the relative URLs resolved correctly. Similarly, scraped pages that contain CSS content can adversely affect the entire Portal Server desktop if the CSS content contains generalized style definitions for basic HTML elements such as the BODY and TABLE tags.
One other limitation to be aware of is when content contains a BASE tag and an APPLET and/or OBJECT tag that does not contain a CODEBASE attribute. In this particular case, when the BASE tag is commented out, the browser will no longer be able to find the APPLET and/or OBJECT code and/or data because there will not be any prepended path information supplied. In this case, always be sure that a CODEBASE attribute is used for these, and similar tags, when a BASE tag is also used within the same document. The SP4 Hot Patch 1 release handles this case by inserting a CODEBASE attribute if one does not already exist when a BASE tag is present in the document HEAD element. Even though the BASE HREF value can be a fully qualified URL, which includes a resource name, it is recommended to end the HREF value with a directory name and a trailing slash.
The following is an example:
<BASE HREF="http://www.iplanet.com/docs/index.html"> <BASE HREF="http://www.iplanet.com/docs/">
The first instance is a valid BASE tag. The second instance will be sure to resolve relative URLs throughout the remainder of the document correctly. The SP4 Hot Patch 1 release addresses cases in which the BASE tag contains only the host and port information, but no path information, as in the following example:
Best PracticesHTML Programming for Use Through the Gateway
You should use the following best practices:
Always use CODEBASE attributes for tags that support them, as in the following example:
<APPLET CODEBASE="http://www.iplanet.com/java/" CODE="helloWorld.class">
End BASE HREF attribute URLs with a directory name or a directory name and a following slash, as in the following example:
Avoid fractured HTML where attribute values or tag bodies might be defined on multiple lines, as in the following example:
document.write("<A HREF=\"\n"); document.write("http://www.iplanet.com\n"); document.write("\">link</A>\n");
Try to maintain well-formed HTML where quotes match up and they are the same type.
Avoid nested quotes where possible, and use consistency across tag definitions, as in the following example:
document.write("<IMG SRC='" + theSrc + "' HEIGHT=80 WIDTH='80'>");
Here the gateway will blindly rewrite the SRC attribute without knowing the value of theSrc variable. There may be a fix for this by the time you read this guide, so check with Sun ONE support if you experience this problem and are unable to code around it.
Specify URLs with prepended path information whenever possible.
Having prepended path information makes it easier for the gateway to figure out address completion. The following is an example:
Do not use upper case or mixed case protocol identifiers in your URLs, as in the following:
Do not attempt to mimic the rewriter behavior by adding the gateway name to the URL prior to passing the content through the gateway.
Try to avoid setting attribute values to null if the attribute name has been added to the Rewrite HTML Attributes list. Prior to SP3 Hot Patch 3, a value of "" would still be rewritten.
The following is an example of what to avoid prior to SP3 Hot Patch 3:
<FRAMESET cols="20%, 80%"> <FRAMESET rows="100, 200"> <FRAME src=""> <FRAME src="test-txt2.html"> </FRAMESET> <FRAME src="test-txt3.html"> </FRAMESET>
Avoid using the STYLE attribute with a background URL in HTML tags, as in the following example:
<BODY STYLE="background-image:url(../../img/background.jpg); background-repeat:repeat;width:770px">
Avoid nesting tags of the same type, which may contain content requiring translation, as in the following example:
<SPAN STYLE="color:blue; font-weight:bold; font-style:italic"> <SPAN> Inside SPAN tag: <BR CLEAR="ALL"> <A HREF="../../img/after.jpg"> <IMG SRC="../../img/after.jpg"> </A> </SPAN> </SPAN> <BR CLEAR="ALL"> Outside SPAN tag:<BR> <A HREF="../../img/after.jpg"> <IMG SRC="../../img/after.jpg"> </A>
Prior to SP3 Hot Patch 3, the rewriter would ignore the content between nested SPAN tags.
Do not pass gzipped HTML through the gateway to be displayed by the client.
This HTML could contain URLs that will not be rewritten because the content is in a compressed format when it passes through the gateway.