Exporting Word to Madcap Flare
[Home] [Writing samples] [LinkedIn]

By Robert Delwood


Flare has the ability to convert Microsoft Word documents into Flare topics, complete with style sheets. The mechanical act of that conversion is reliable (although not completely). The problems arise mostly from how the Word document is styled. It begins with Microsoft saying that Word is style based. Meaning, that all formatting should be applied as named styles. They even have an obscure option to enforce this. On the Developer tab (which for starters isn't visible by default), click Restrict Editing and check Limit formatting to a selection of styles. You can even password enforce this. Instead, however, they make formatting absurdly easy if not condoned by putting those buttons on the Home tab, which is the default view.

I always thought they said this as a matter of best practice. That you can later change the formatting for all occurrences just by changing the style. That's true, and it's also a best practice. But there is more to it. Local formatting introduces some internal problems. Word is notorious about not cleaning up its own internal code. It's gotten better with the 2003's introduction of Office XML. However, those local formatting anomalies still exist. Most users neither see this, nor ever have a reason to see this. Converting the documents to Flare exposes the problem. In a big way.

Users notice this problem after converting documents because they show up as superfluous Span tags seemingly at random in the document. They occur almost anywhere, splitting words, or between words or sentences. They could be extensive, too. This is typical if a document has been around for a long time, and/or edited by many users, as is common in a corporate environment. The spans tend not to be critical. That is, they're not where formatting should be, like indicating a bolded word. There're just in the middle of things. You should be able to get rid of them without risking losing intended information.

The Flare way of deleting them is to put the cursor inside the spanned text and selecting Home|Remove Break. If you have more than one, perhaps thousands of pairs over multiple topics, you need a better way. And that's by using regular expressions. Regular expressions, or regex and a common variation is called grep, is a find and replace method but using text patterns instead of the typical exact text method. For example, Jeffery is a name with many variations. Suppose you have a document and not sure how the name is spelled or even if it doesn't appear in several forms. The conventional way is to search for each variation (Jeff, Jeffry, Jeffy, Jeffery, Geoff, Geoffrey, Jeffeory, Geffrey), a game you may very well lose at considering I didn't include all the variations. With regular expressions you can craft a single expression to catch them all.

It's not a product that you can buy per se, but more like a guideline. Each company comes out with their own version that you can use. You will need a product that supports regular expressions. Flare supports regular expressions for finding but not replacing. There are plenty of good free tools that can do that. I use NotePad++, an excellent text editor if you've not tried it yet. Word has its own variation called wild cards. Windows NotePad does not support regular expressions.

For an example you can use the following marked up public domain text. The span tags are typical of how they display in the converted text.

We had everything before us, we h<span class="Span_3">ad nothi</span>ng bef<span class="Span_3">ore us, </span>we were 
all going direct to Hea<span class="Span_3">ven, we were</span> all going direct the other way in short, the period was so far 
like the present period, that some of its noisiest authorities <span class="Span_3">insisted </span>on its being received, 
for good or for evil, in the superlative degree of comparison only.

Using Word

  1. Open the file. Word works only one file at a time, or paste the code into a Word document.
  2. Open Advanced Find (Ctrl-H).
  3. Enter Find what: \<span class="Span_[0-9]"\>(?@)\</span\> and Replace with: \1
  4. Click More to show the advanced features.
  5. Check Use wildcards.
  6. Click Find Next and then Replace. The span tags are removed, leaving the enclosed text
  7. .

Using NotePad++

  1. Open the file. Word works only one file at a time, or paste the code in.
  2. Open Find (Ctrl-F), click Replace.
  3. Click Search Mode|Regular expression.
  4. Enter Find what: <span class="Span_[0-9]">(.*?)</span> and Replace with: \1
  5. Click Replace All. If you want replace them in all files, select the Find in Files tab.

If you haven't learned regular expressions, it's worth adding to your repertoire. Because Flare is HTML and XML based, regular expressions are especially useful. The great thing about tools like this is once you're aware of them, you'll be surprised how often you can use them in places you never noticed before. Of course, taking care to edit the Word document beforehand is important. It's easier to fix the styles, tables, text, and formatting in Word.