Microsoft Word’s Wildcard Replacements

Can Word Solve Its Own HTML Problem? You Betcha!

By Robert Delwood

A Lead API Documentation Writer

[Home] [Writing samples] [LinkedIn]

Introduction

Search, along with find and replace, are ubiquitous tools that no one thinks to talk about often. After all, we know how to use them, right? However, over the years they have become powerful text editing tools. We are in a market that technical writers and programmer-writers cannot afford to overlook the few tools we have. In this case, these are Microsoft Word’s wildcard search and wildcard replace features.

This article is not a tutorial about wildcards. A short article like this one couldn’t do justice to this topic. Instead, it is a demonstration for what wildcards can do. If these look promising, you are encouraged to investigate further. There are suggestions at the end of the article.

Wildcard search allows you to find patterns rather than exact text. We’re all used to the conventional search that specifies exact text, perhaps the word properties in a document. However, there are cases when you do not know exactly what you’re looking for. This could be color/colour, or any form of Jeffery/Jeffeory/Geffrey. Yes, you can find all of those within a set in a single search. Wildcard replace allows the found pattern, or part of the pattern, to be used in the replacement. The real power is using this with code, such as HTML or XML. Writers are increasingly expected to fix code.

Example

Word can save a document as HTML but produces some of the worst HTML code. So bad in fact, some companies discourage Word-generated HTML files. Ironically, you can use Word to clean up its own HTML.

First, prepare a document.

  1. Create a new Word document, adding some text to it. A quick way to add text is type =rand(5,3) and Enter.
  2. Save it within Word as an HTML file, preferably as Web Page, Filtered, confirming the format change with Yes.
  3. Open that document in Explorer with NotePad. Right-click and select Open with NotePad.
  4. Copy all that text and paste into a new Word document. This may seem redundant but now the HTML is editable and will later be pasted back.

Start cleaning.

  1. Delete everything between the Style tags. Use a wildcard search in the Open and Replace dialog (Crtl-H), clicking More and check Use wildcards. Select the Replace tab. In Find what, enter \<style\>*\</style\> and Replace with nothing. In all operations replacing text with nothing, make sure the Replace with edit box is empty, removing spaces. Click Replace All.
  2. Get rid of line breaks for the moment since they interfere with these find operations. Find what ^013 and Replace with *. If your document contains asterisks, choose any another character, one that is not used. Click Replace All. The document looks messy for the moment.
  3. Remove unneeded tags. It’ll probably take three sets of Replaces All.
    1. Delete all the open Span tags. Find what \</span*\> and Replace with nothing. Click Replace All.
    2. Delete all the close Span tags. Find what \<span*\> and Replace with nothing. Click Replace All.
    3. Single tags (those with no matching bookend needed), can also be replaced the same way, although only one step is needed for each tag. Find what \ and Replace with nothing.
  4. Add back line breaks. Find what \*\* and Replace with ^p. Click Replace All.
  5. Finally, some editors (such as NotePad) truncate each line with a paragraph mark, which we don’t always want. Find what \* and Replace with: (a single space character). Click Replace All.
  6. Select the entire document and paste back into NotePad. Save with a name like HTML_Test.html. Double click to open the file.

The bottom line is that the code can be clean as you need it and won’t take long by using wildcard replacement.

How it’s done

Making these changes requires the following set up:

Letters and numbers entered in the Find what text box are still treated as literals. For example, entering properties finds that exact word. It’s the symbols that add the power.

The asterisk (*) is a wildcard for any number of characters. If fact, you have to tell it when to start and/or stop. Brackets are a strange case. They are used to create expressions. However, we need to use them as literal brackets that form the style tag. For that, the backslash indicates to use the immediately following character as a literal.

With these notations, the expression \ comes to mean a literal open bracket, the literal span, any number of characters, and until it gets to the first literal close bracket.

The wildcard asterisk picks up all the elements inside the span tag. In other words, we don’t care what other elements there are inside. We’ll select them all just to then delete them.

Click Find Next to test the expressions to make sure they’re finding what you want.

Text replacement uses the values in Replace with text box with what is found. The example uses either nothing (there is no value), or a single space.

What’s Next

This is only an example of what Word’s wildcards can do. Use the following links to learn more, and continue to search the Internet for other sites and examples.