Microsoft Word’s Wildcard Replacements

Can Word Solve Its Own HTML Problem? You Betcha!

By Robert Delwood

A Lead API Documentation Writer

[Home] [Writing samples] [LinkedIn]

Introduction

Search, along with find and replace, are ubiquitous tools that no one thinks to talk about often. After all, we know how to use them, right? However, over the years they have become powerful text editing tools. We are in a market that technical writers and programmer-writers cannot afford to overlook the few tools we have. In this case, these are Microsoft Word’s wildcard search and wildcard replace features.

This article is not a tutorial about wildcards. A short article like this one couldn’t do justice to this topic. Instead, it is a demonstration for what wildcards can do. If these look promising, you are encouraged to investigate further. There are suggestions at the end of the article.

Wildcard search allows you to find patterns rather than exact text. We’re all used to the conventional search that specifies exact text, perhaps the word properties in a document. However, there are cases when you do not know exactly what you’re looking for. This could be color/colour, or any form of Jeffery/Jeffeory/Geffrey. Yes, you can find all of those within a set in a single search. Wildcard replace allows the found pattern, or part of the pattern, to be used in the replacement. The real power is using this with code, such as HTML or XML. Writers are increasingly expected to fix code.

Example

Word can save a document as HTML but produces some of the worst HTML code. So bad in fact, some companies discourage Word-generated HTML files. Ironically, you can use Word to clean up its own HTML.

First, prepare a document.

  1. Create a new Word document, adding some text to it. A quick way to add text is type =rand(5,3) and Enter.
  2. Save it within Word as an HTML file, preferably as Web Page, Filtered, confirming the format change with Yes.
  3. Open that document in Explorer with NotePad. Right-click and select Open with NotePad.
  4. Copy all that text and paste into a new Word document. This may seem redundant but now the HTML is editable and will later be pasted back.

Start cleaning.

  1. Delete everything between the Style tags. Use a wildcard search in the Open and Replace dialog (Crtl-H), clicking More and check Use wildcards. Select the Replace tab. In Find what, enter \<style\>*\</style\> and Replace with nothing. In all operations replacing text with nothing, make sure the Replace with edit box is empty, removing spaces. Click Replace All.
  2. Get rid of line breaks for the moment since they interfere with these find operations. Find what ^013 and Replace with *. If your document contains asterisks, choose any another character, one that is not used. Click Replace All. The document looks messy for the moment.
  3. Remove unneeded tags. It’ll probably take three sets of Replaces All.
    1. Delete all the open Span tags. Find what \</span*\> and Replace with nothing. Click Replace All.
    2. Delete all the close Span tags. Find what \<span*\> and Replace with nothing. Click Replace All.
    3. Single tags (those with no matching bookend needed), can also be replaced the same way, although only one step is needed for each tag. Find what \ and Replace with nothing.
  4. Add back line breaks. Find what \*\* and Replace with ^p. Click Replace All.
  5. Finally, some editors (such as NotePad) truncate each line with a paragraph mark, which we don’t always want. Find what \* and Replace with: (a single space character). Click Replace All.
  6. Select the entire document and paste back into NotePad. Save with a name like HTML_Test.html. Double click to open the file.

The bottom line is that the code can be clean as you need it and won’t take long by using wildcard replacement.

How it’s done

Making these changes requires the following set up:

Letters and numbers entered in the Find what text box are still treated as literals. For example, entering properties finds that exact word. It’s the symbols that add the power.

The asterisk (*) is a wildcard for any number of characters. If fact, you have to tell it when to start and/or stop. Brackets are a strange case. They are used to create expressions. However, we need to use them as literal brackets that form the style tag. For that, the backslash indicates to use the immediately following character as a literal.

With these notations, the expression \ comes to mean a literal open bracket, the literal span, any number of characters, and until it gets to the first literal close bracket.

The wildcard asterisk picks up all the elements inside the span tag. In other words, we don’t care what other elements there are inside. We’ll select them all just to then delete them.

Click Find Next to test the expressions to make sure they’re finding what you want.

Text replacement uses the values in Replace with text box with what is found. The example uses either nothing (there is no value), or a single space.

What’s Next

This is only an example of what Word’s wildcards can do. Use the following links to learn more, and continue to search the Internet for other sites and examples.

About the photo

This is Mission Control at Johnson Space Center in Houston, Texas, made famous during the 1960s and 70s Apollo missions. It was the one with the big syne wave through the world map. There are two similar rooms. During Apollo there were backups to each other. Afterwards, they were called Red and Black. Red was for secret missions, usually for the Department of Defense, and Black, for all other missions. It was in what is now called the Black, that had the visitor’s gallery, a glassed-in area that visitors could sit in on. This was made famous by all the presidential guests and other VIPs.

The console seen here are the flight controller’s stations. It looks high-tech in a Star Trek kind of way with all the flashing lights. It’s the rectangular lights next to the monitors that are of interest to me. Since they are colored, it was a photographic process, but getting the text on them was troublesome. As a junior programmer, I innovated a new way to print those labels using the then-new Postscript printers. One printer replaced two Linotype machines that dated back to the late 1950s.