Home > Articles

  • Print
  • + Share This
This chapter is from the book

Converting Files to the New StarOffice File Formats

Opening files saved in a foreign file format enables you to access the information stored in those files. If you want to work with the information, taking full advantage of the StarOffice features, you have to complete the file conversion process by saving the imported information in the new native StarOffice XML file format.

Introducing the New StarOffice XML File Format

In response to customer demands for content stability, performance, and the flexibility to create, manage, and access complex documents and Web pages, StarOffice engineering has replaced the previous binary file format with a new, XML-based file format. XML provides a platform- and application-independent environment for defining document markup that enables you to output and exchange content of StarOffice documents for years to come.

The new StarOffice XML-based file format saves the content, layout, and formatting information of each StarOffice document as a set of XML streams or subdocuments. To make it easier for users to manage and share files, these XML streams—alongside binary data for embedded bitmap graphics and objects, if any—are saved in one compressed package using the popular zip format. The default file extensions for the documents, however, are different for each document type. Table 3.2 provides an overview of the new file extensions for native StarOffice documents and templates.

Table 3.2 StarOffice File Extension by Document Type

Document Type

Document Application

Template Extension

Extension

Text

Writer

.sxw

.stw

Spreadsheet

Calc

.sxc

.stc

Draw

Draw

.sxd

.std

Presentation

Impress

.sxi

.sti

Formula

Math

.sxm

n.a.

Master document

Writer

.sxg

n.a.


Using a zip archive utility (such as PKZip or WinZip), you can easily view and unpack the streams that make up the full StarOffice document, as shown in Figure 3.7.

NOTE

Some zip utilities—such as StuffIt—that identify archives based on extensions rather than the archive's entry header will not recognize a StarOffice XML file as a compressed archive. In this case, you must rename the default file extension to .zip to unpack the XML subdocuments. Also, the document type definition (DTD) files you need to open the XML files are part of the product and are located in the <StarOffice>\shared\office60\share\dtd\officedocument\1_0 directory. (A DTD file is a specification that accompanies a document and identifies the markup that separates paragraphs, topic headings, and so forth and how each is to be processed.)

Figure 3.7Figure 3.7 The new StarOffice XML file format enables you to work with files in new ways.

A typical StarOffice document that does not contain macros, pictures, or embedded objects consists of five streams:

  • content.xml, as the name suggests, stores the main content of the document, including text, tables, and graphical elements. Embedded bitmap graphics and objects, if any, are stored in the Pictures and Objects directory, respectively (refer to Figure 3.7). Depending on the type, embedded graphics are stored in the Portable Network Graphics (PNG) format or their original binary format. StarOffice objects are saved as XML representations, with each object having its own directory; all other objects are stored in their native binary format. Storing embedded bitmap graphics and objects in their own directories allows for easy searching and extracting of the files. The content.xml stream contains only references to these files.

  • styles.xml stores the properties and attributes of all character, paragraph, page, object, and numbering styles that have been used to provide a consistent look to the contents of the current document. For example, the attribute-value pairs in the following snippet (taken from the styles.xml stream of a text document) map to the attributes defined on the Organizer tab page of the Paragraph Style: Text Body dialog box, shown in Figure 3.8.

  • The name attribute maps to the Name text box; the family attribute indicates that this style is a paragraph style (as opposed to a page style, for example); the parent-style-name attribute maps to the Linked With text box; and the auto-update parameter maps to the AutoUpdate check box.

Figure 3.8Figure 3.8 The properties for the Text body paragraph style.

  • meta.xlm stores general information about the current document—including title, type, location, user, time of last save, and more. The contents of this file map to the information defined in the File, Properties dialog box.

  • settings.xml stores application-specific document and view settings for the document, such as selected printer properties and print options, zoom level, and window size.

  • manifest.xml provides additional information about the XML files such as MIME type and encryption method. Like graphic files and objects, the manifest.xml stream is stored in its own directory.

If the document contains macros, the compressed package will contain additional XML streams and directories. For example, StarOffice Basic macros are stored as separate XML streams in the Basic directory. In Figure 3.7, you also see a version stream, which indicates that another version of the same file is stored with this document.

The advantages of the new StarOffice XML file format over binary file formats are three-fold: It ensures better long-term compatibility because user data is stored independently from the source application that created it in a human readable format; it facilitates open information publishing because of better indexing and hyperlinking support and the option to apply templates during publication rather than document creation; and it encourages third-party development because developers can use widely available tools to open, modify, and share StarOffice content. All this will become critical as enterprises move their data and information from networks and hard disks to Web-based content stores (such as Microsoft Exchange or WebDAV-enabled Web servers) and as users begin to publish documents to these online content stores rather than distributing them as email attachments.

NOTE

Developers who want to build applications that can exchange documents with StarOffice can find the file specifications on the OpenOffice.org Web site. For more information about the StarOffice XML format, go to http://xml.openoffice.org/. For more information about the zip file format, go to http://xml.openoffice.org/package.html.

Converting Files Individually

Converting individual files is as easy as opening the file in question in StarOffice and then saving it in the StarOffice format. Follow these steps:

  1. Select File, Open; locate and select the document you want to convert in the Open dialog box and then choose Open. Based on the file's extension, StarOffice opens the file using the application that has the appropriate conversion filter.

  2. Select File, Save As to display the Save As dialog box, and then select the new StarOffice 6.0 document description in the File Type drop-down list (for example, StarOffice 6.0 Text Document).

  3. Remove the document extension (for example, .doc) from the file's name in the File Name box and click Save to save the document with the same name but a different extension.

What Makes Documents Simple or Complex?

In general, the current StarOffice conversion filters handle basic documents quite well. In the case of complex documents, however, some layout features and formatting attributes implemented in Microsoft Office 97/2000/XP remain unsupported or are handled differently in StarOffice 6.0. Especially complex document features that are proprietary implementations of the application in question cannot be expected to convert with 100% accuracy. So what exactly are simple or complex documents?

Simple documents do not contain macros, proprietary graphics (such as Microsoft WordArt), vector graphics, complex formatting, or advanced elements such as footnotes, end notes, tables, or indexes. You can typically convert simple documents in batches with the built-in StarOffice conversion utility (File, AutoPilot, Document Converter) or by opening the original file in StarOffice and then saving it in the StarOffice format. However, you may still be required to evaluate and clean up the converted documents manually, depending on the content and formatting of the source.

Complex documents contain macros, shared components, proprietary or vector graphics, multiple links or cross-references, OLE objects, frames, text boxes, footnotes, end notes, active content, form fields, form controls, formulas, tables, or a wealth of character, paragraph, or page formatting. Some of these elements may not convert easily because equivalent functions have not yet been implemented in the existing StarOffice conversion filters or because a feature is either handled differently or not supported in StarOffice. In general, complex documents do not convert as easily as simple documents. They typically require post-conversion formatting or layout cleanup. In some cases (such as document-based macros or custom solutions), complex documents may even have to be reengineered to provide the same functionality and look as the original document.

Simple templates consist of generic text and formatting that serve as a starting point or rough draft for new documents. Good examples of simple templates include boilerplate text for form letters, basic reports, memos, proposals, or fax cover sheets. In this case, you have the same conversion options as with simple documents.

Complex templates contain form fields and automation features that may not convert easily and may have to be re-created in the appropriate StarOffice module, or as in the case of complex document-based scripting solutions, reengineered by an experienced StarOffice developer.

Converting Files in Batches

Needless to say, opening and saving each file you want to convert to the new StarOffice file format individually gets old fast if you are stuck with a batch of files that needs converting. If this batch of files consists of documents and templates that have been created in StarOffice 5.2 or earlier, or of Microsoft Office documents and templates created in Word, Excel, or PowerPoint, you can rely on the built-in StarOffice conversion utility to convert these files for you.

NOTE

Although convenient, using the StarOffice conversion utility does have its drawbacks. Due to the number of processes the program has to run to compare and convert the existing content and structure of files, the time it takes to convert documents and templates depends on processing power and increases exponentially with the number (and complexity) of files you are trying to convert. Using Document Converter also interferes with your productivity, because the process taxes valuable processor resources. For these reasons, you should convert no more than 50 documents and templates at once—preferably fewer. For larger conversion jobs, you should plan to start the conversion process after hours or at a time when you don't have to work on your computer. Depending on the number of files you want to convert, it can take hours. Also, because you are creating copies of all templates and documents you want to convert, be sure that you have enough free space on the disk or partition where you want to save your files. You can safely assume that the converted files together will take up about as much space as the source documents.

To convert your files in batches, follow these steps:

  1. Place the source documents you want to convert in one location. (The documents can be located in the same folder or in separate subfolders within the same parent folder.)

  2. Choose File, AutoPilot, Document Converter to open the first pane of the StarOffice conversion utility (see Figure 3.9).

Figure 3.9Figure 3.9 Use the Document Converter to batch-convert binary StarOffice 5.2 or Microsoft Office documents and templates to the StarOffice 6.0 XML file format.

  1. Select the document types you want to convert. By default, the program assumes you want to convert binary StarOffice documents. If you want to convert Microsoft Office documents, you must first select the Microsoft Office option and then select the document types you want to convert. (Note that you can select multiple document types.) The program also gives you the option to generate a log file of the entire conversion process. The finished log consists of a two-column table, listing the name of the source file on the left and the target file on the right—nothing special if it weren't that the names are text-based hyperlinks that give you one-click access to your files. To generate this log, select the Create Log File check box, and then choose Next to advance to the second window.

  2. For each document type you selected in step 3, you must specify, in consecutive windows, the location of the source templates and documents as well as the location of the converted files (see Figure 3.10).

    • By default, StarOffice saves templates in the <StarOffice6.0>\user\template\ (StarOffice documents) and <StarOffice6.0>\user\template\Imported_Templates (Microsoft Office documents) directories. Although you can specify a different path for your templates, if you accept the default setting for templates, StarOffice automatically registers the converted templates with its template-management system, so you can access the templates via the Templates and Documents dialog box without having to import them first.

    • Converted documents by default are saved to the work directory, but you can save them anywhere you like. When specifying paths, you don't have to type the new path information in the respective boxes; you can click the push buttons to the right of each path box and then navigate to and select the appropriate parent folder in the Select Path dialog box that opens (see Figure 3.10).

    • Also by default, StarOffice earmarks files located in subfolders of the currently specified Import path for conversion. If you want to convert only those templates and files located in the current parent folder, clear the Include Subdirectories check box.

    • When you're all set, choose Next to specify the path information for the next document type you selected in step 3 and so on. After you've finished setting up the import information for all selected document types and choose Next, StarOffice provides you with a list that summarizes your selections.

Figure 3.10Figure 3.10 Specify the location of the StarOffice 5.2 and Microsoft Office files.

  1. Review the summary list to verify that you've specified the proper paths. (At this point, you can still choose Back at the bottom of the dialog box to return to a previous window and make any necessary changes.) When everything is set, choose Convert to start the process. This may take a while, depending on the number and complexity of the documents and templates the program has to convert.

    • If you selected the Create Log File option in step 3, the program creates a new text document called Logfile.sxw and inserts a two-column table. During conversion, the file doubles as a progress indicator. When the conversion of a file has been completed, the program inserts a new row for the source file and target file.

    • If you didn't select the Create Log File option, you can trace your progress by the numbers on the final window of the Document Converter. When the process is completed, choose Finished to exit the Document Converter.

TIP

Want to see just what the Document Converter AutoPilot did? Open the URL Locator history list on the function bar immediately after the AutoPilot completes its work. You can see a list of the last 100 templates and/or documents that were imported.

Customizing Your Microsoft Office Conversion Options

All necessary Microsoft Office 97/2000/XP import and export filters are automatically installed during StarOffice setup, regardless of the setup method (Standard, Custom, Minimum) you choose—so no additional action is required on your part. In addition to conversion filters, however, StarOffice 6.0 provides a number of Microsoft Office compatibility settings options that give you a certain degree of control over the import and export of files in the Microsoft Office formats. To access these features, select Tools, Options from within any StarOffice document window. The options you may want to set can be accessed through the Load/Save, Text Document, and Presentation portions of the Options dialog box.

Load/Save

This is where you define general settings for opening and saving documents in external formats. Using the following options, you can control the behavior of macros or OLE objects in Microsoft Office documents (as well as define settings for HTML documents):

  • General. Select various default settings for saving documents as well as the default file format.

  • VBA Properties. Specify the general properties for loading and saving Microsoft Office documents that contain macros. For each document type (text, spreadsheet, and presentation), your options include

  • Load Basic Code To Edit. Use this if you plan to convert macros from a Microsoft Office to a StarOffice environment. Check this box to load and save the source code of the document-based Visual Basic macro as a special StarOffice Basic module with the document. Using the StarOffice Basic IDE, you can then edit the source code. When you save the document to the StarOffice format, the source code is saved as well. When you save to another format, however, the source code from the StarOffice Basic IDE is lost.

  • Save Original Basic Code Again. This option is recommended in co-existence scenarios, when exchanging documents with Microsoft Office users. Select this option if you want to protect the source code of document-based Microsoft Visual Basic macros. With this option selected, the source code is placed in a special internal memory location until the user decides to save the document.

    • When you save the document in a Microsoft Office 97/2000/XP format, the source code of the macro is saved as well, unchanged, so it can still be used by the Microsoft Office user.

    • When you save the document in any other format, the source code is lost. To prevent users from accidental losses, you get an alert informing you that the existing Microsoft Visual Basic code will not be saved.

NOTE

The Save Original Basic Code Again option takes precedence over the Load Basic Code to Edit option. If both boxes are marked and you edit the macro source code in the StarOffice Basic IDE, the original Microsoft Basic code is saved when you save in the Microsoft format. You see a message to that effect when you save the document.

Figure 3.11Figure 3.11 StarOffice enables you to coexist with Microsoft Office users.

TIP

To remove any possible Visual Basic macro viruses from the Microsoft document, deselect the Save Original Basic Code Again check box and save the document in Microsoft format. The document is saved without the macro.

  • Microsoft Office. Here you can specify the settings for importing and exporting Microsoft Office OLE objects. In the list box, you find entries for the pairs of OLE objects that can be converted when you load a Microsoft Office document into StarOffice or save a StarOffice document in a Microsoft Office format. For example, if a Microsoft Word document contains a Microsoft Excel table as an embedded OLE object, a StarOffice user who loads this text document can edit the spreadsheet table in StarOffice Calc if the respective option is selected.

  • Mark the box in the [L] column in front of the entry if a Microsoft OLE object is to be converted into the specified StarOffice OLE object each time a Microsoft document is imported into StarOffice.

  • Mark the box in the [S] column in front of the entry if a StarOffice OLE object is to be converted into the specified Microsoft OLE object when a document is saved in a Microsoft file format.

NOTE

These settings are also valid when no Microsoft OLE server exists (in a Unix environment, for example).

Specifying Compatibility Settings (Writer and Impress)

Formatting definitions are not always the same in all word-processing and presentation programs. Fortunately, Writer and Impress provide Microsoft Office compatibility settings that you can turn on or off in the Compatibility group of the Options Text Document General and Options Presentation General dialog boxes. Using these options enables you to mimic certain behaviors of Microsoft Word and PowerPoint documents in the current StarOffice Writer or Impress document:

  • Add Spacing Between Paragraphs and Tables in the Current Document. Writer and Impress documents use different definitions for paragraph spacing than Microsoft Word and Microsoft PowerPoint documents. For example, if between paragraphs you have defined spacing above and below each paragraph, then the actual spacing between paragraphs in a Word document is the sum total of the spacing above and below each paragraph. By contrast, Writer uses only the larger of the two definitions. If you want Writer and Impress to mimic the Microsoft Word and PowerPoint behaviors, respectively, then select this check box.

  • Add Paragraph and Table Spacing to Start of Pages. If this box is checked in the Text Documents General settings, the paragraph spacing to the top is also effective at the beginning of a page or column if the paragraph is positioned on the document's first page. The same applies for a page break. When you import a Word document, the spaces are automatically added during the conversion.

  • Aligning Tab Positions. If this check box is selected, centered and right-aligned paragraphs containing tabs are formatted as a whole in the center or aligned to the right. If this field is not marked, only the text to the right of the last tab, for example, is aligned to the right, while the text to the left remains where it is.

Reviewing Converted Documents

As a rule, review the converted documents carefully to verify document fidelity. More specifically, keep an eye on the following elements:

  • Character size

  • Margins, tabs, and indentations

  • Line length (how much text fits on a line)

  • Line spacing (space between lines within a paragraph)

  • Paragraph spacing (space between paragraphs)

  • Tables

  • Headers and footers

  • Lists

  • Graphics

To verify the proper appearance of elements and document fidelity, do the following:

  1. Review documents onscreen to ensure the converted document looks and functions the same onscreen as in the original document. For example, you can open the original document in the source application and the converted document in StarOffice 6.0, arranging the windows side by side or one on top of the other and then scrolling through the documents.

  2. Print and compare documents to ensure formatting and layout are correct. If you notice any strange formatting or layout changes in text documents, turn on the Non-printing Characters feature and look for tab stops, extra returns, or spaces. Also compare the styles in the original document to the styles in the converted document.

  • + Share This
  • 🔖 Save To Your Account