XPF Universal Crossword Puzzle Data Format

A proposed open, portable, extensible data format for crossword puzzles

This page is for programmers.

XPF Format Version 1.0

What's New in 1.0?

This is the first official release of XPF. Here is the change list:

Many people contributed suggestions for this spec. I thank them all.


XPF

The current problem

There is no standard data format for formally describing a crossword puzzle. Across Lite text format comes close but it does not cover some common features like shaded squares and it is not extensible. It cannot even describe all possible Across Lite puzzles — for example, those with more complex combinations of rebus entries and circles.

The dream

Imagine working on a puzzle in Crossword Compiler on your PC, then sending it to your collaborator so she can tweak it on her Mac, and then submitting it to your publisher who uses UNIX, all using the same data file that anyone can read, create or edit. Imagine being able to give access to your puzzle to any website regardless of whether it's an HTML5 or Flash or Silverlight or Java site or a personal crossword blog, and know it will work.

XPF alone doesn't make any of that happen but having a standard like XPF is the first step in making it possible.

What does XPF look like?

XPF is based on standard XML so it is easy to read or write using any software language or operating system, and it is readable by humans.

Like most XML, it is self-describing. It can be easily extended if new capabilities need to be added, without affecting existing parsers. It can be read directly into databases. For example, if you are using Internet Explorer and have Microsoft Office installed, you can right-click on an XPF file and export it directly to Excel. (Try it below.) You can also save XPF files (as XML) to your computer from within your browser.

Here is what the most recent NYT puzzle looks like in XPF.

Look at this example of XPF which includes rebus entries, circles, and a notepad, and see its XWord Info view here.

Important browser note

Internet Explorer (recommended for XML) and Firefox do a reasonable job of displaying XML. Users of Safari or Chrome may have to "view source" from that page.

XPF is XML

XPF files validate as XML and can be read, modified, or created by any XML editor. Each XPF file starts with a standard XML declaration.

There is one required top-level node called <Puzzles>. It may have an optional "Version" attribute specifying the version of XPF these puzzle adhere to. Versions are expected to be backwards compatible but some crossword software may find this useful.

Inside, there can be any number of <Puzzle> nodes containing entries described on this page.

<?xml version="1.0" encoding="utf-8" ?>
<!-- Conforms to Universal Crossword Puzzle Format XPF Version 1.0 -->
<Puzzles Version="1.0">
<Puzzle>
...
</Puzzle>
<Puzzle>
...
</Puzzle>
</Puzzles>

Top section

The top of each <Puzzle> section contains the basic puzzle information. These are similar to Across Lite fields except that they must be well-formed XML and there are some defined extensions.

<Type> is optional and means normal puzzle if omitted. The one important value is "diagramless" because it provides important information to the rendering software. Other types can be used for your own purposes. Suggested values are cryptic, diagramless, and vowelless. I also use panda for Puns and Anagrams crosswords.

<Title> and <Author> are optional. I use the strings from the NYT Across Lite files here.

<Copyright> is optional. Any text is allowed. Display programs may assume the copyright symbol © can precede this string.

<Publisher> and <Date> are also optional. Date should be the date associated with the puzzle itself and not, for instance, file creation time. American m/d/y format is used. Year should be 4 digits.

The mandatory Size section must specify the number of rows and columns. This is the size of the bounding rectangle for non-rectangular grids. Column is consistently abbreviated to Col in XPF.

<Type>normal</Type>
<Title>NY Times, Thu, Sep 11, 2008</Title>
<Author>Caleb Madison</Author>
<Editor>Will Shortz</Editor>
<Copyright>2008, The New York Times</Copyright>
<Publisher>The New York Times</Publisher>
<Date>9/11/2008</Date>
<Size>
<Rows>15</Rows>
<Cols>15</Cols>
</Size>

Defining the grid

The mandatory <Grid> node defines the shape of the grid and shows the answer letters in place. Blocks are indicated with a period. Answers letters are capitalized. Each row is represented by a single line.

Note that, unlike in Across Lite text format, there is no indication of either circles or rebus entries in this section. One benefit of XPF is that circles, shades, and rebus squares are all independently defined. There can be as many of each as desired and they can overlap however you want.

Rebus squares should have the capital letter of the short form of the rebus. This would typically be the single letter that would also be accepted as correct. For example, the rebus entry "HEART" would have just "H" here. A symbol would have the first letter of the most common English words used to describe it. For example, "&" would have A for ampersand.

It may be useful to store unfinished grids in XPF, for example, to share an incomplete grid with a co-constructor. Use a blank character for squares that are still to be filled in. A blank can also be part of a finished grid in the odd case where the solution requires not filling in each empty square.

<Grid>
<Row>GREW.ANAIS.LABS</Row>
<Row>LARA.NOSCH.EBON</Row>
<Row>ANIL.ESTEE.ADUE</Row>
<Row>RACKETEERS.DOLE</Row>
<Row>ETHOS.......MER</Row>
<Row>...FAM.RAP.LIV.</Row>
<Row>CHEF.IRENE.ONAN</Row>
<Row>COLA.LOSTS.SARS</Row>
<Row>SLAM.ATEST.ALDA</Row>
<Row>.LIE.NEA.OWN...</Row>
<Row>AYN.......AGATE</Row>
<Row>TWEE.VAMPIREBAT</Row>
<Row>TOMA.EDUCT.LASH</Row>
<Row>HOAR.RESTS.ETTA</Row>
<Row>EDYS.ANDSO.SEEN</Row>
</Grid>

Odd-shaped puzzles can be specified by using "~" to indicate a completely missing square. Here's how the 1/21/2010 puzzle by Elizabeth C. Gorski would look. It has missing corners.

I realize this may significantly complicate rendering. Software programs may choose to display "~" as a normal black square as an acceptable but not ideal solution.

<Grid>
<Row>~PERSONAL.COST~</Row>
<Row>FIVETOONE.DCCAB</Row>
...
</Grid>

Circles, rebus entries, and shaded squares

These next three sections can appear in any order but all of each type, for example all circles, must be together.

Circles

Circles locations are defined using XML attributes. Note that rows and columns are numbered starting with 1 in the top left. Programmers will have to make adjustments in their code for zero-based calculations. Human readability is given priority over machine convenience. A future XPF version might add a value here to describe the type of circle.

<Circles>
<Circle Row="1" Col="8" />
...
</Circles>

Rebus

Rebus works similarly except with two added pieces of information. The Short attribute gives the single letter that will also be accepted as correct by the puzzle. The value of each entry is the fully expanded rebus string.

Here is the XPF for the first NYT puzzle of the Will Shortz era. It has several rebus entries.

<RebusEntries>
<Rebus Row="1" Col="2" Short="Y">YELLOW</Rebus>
...
</RebusEntries>

Shaded squares

Some puzzles have shaded squares. Here is an example with several: normal view, XPF view. XPF format follows a pattern similar to Rebus entries. The specified color must be either "gray" or an RRGGBB #hex value. (Previous XPF specs allowed any HTML color name but this tighter specification makes it easier for client applications.)

<Shades>
<Shade Row="2" Col="8">gray</Shade>
<Shade Row="6" Col="12">#ffffe0</Shade>
...
</Shades>

Here's an example with both circles and shaded squares: normal view, XPF view.

Clues

Unlike Across Lite, clues in XPF are all in one section. Instead of determining grid numbers and which clues go with which answer by examining the grid, these attributes may be specified with each clue. This gives maximum flexibility. You can have unchecked squares, or missing grid numbers, or clues that start in the middle of words.

The following XML attributes for each <Clue> are available.

Row, Col, Num, and Dir are optional. For each <Puzzle> they should be either all included (recommended) or all omitted. If they are missing, those values are determined algorithmically the same way Across Lite does. If they are included, they take precedence over the automatic calculations. This makes any grid numbering system possible. There is no requirement that each answer be clued.

<Clues>
<Clue Row="1" Col="1" Num="1" Dir="Across" Ans="GREW">Waxed</Clue>
<Clue Row="1" Col="6" Num="5" Dir="Across" Ans="ANAIS">First name in erotica</Clue>
...
<Clue Row="1" Col="1" Num="1" Dir="Down" Ans="GLARE">Ruiner of many a photo</Clue>
<Clue Row="1" Col="2" Num="2" Dir="Down" Ans="RANAT">Charged</Clue>
...
</Clues>

Notepad

Notepad can be any text so it is inside CDATA, meaning XML programs won't try to parse it and embedded HTML is allowed. Note that XPF can be extended with similar kinds of entries. For example, <JNotes> is not part of defined XPF but extensions like this do not affect standard XPF parsers.

<Notepad>
<![CDATA[ TEEN PUZZLEMAKER WEEK <br /> All the daily crosswords this week... ]]>
</Notepad>
<JNotes>
<![CDATA[ In my opinion... ]]>
</JNotes>

Multiple puzzles in a single XPF file

The <Puzzle> section can be repeated as many times as desired.

Here are all the puzzles from October, 2009 in a single file.

Special characters

Author, Title, Clue, and Answer strings should conform to XML rules. These two special characters must be replaced by their respective XML escape sequences. (Your XML editor or generator may do this for you already. I use the .NET XmlTextWriter class which takes care of this automatically.)

All other special characters including quote, apostrophe, and foreign characters can be entered directly.

<Author>Tony Orbach &amp; Amy Reynaldo / Will Shortz</Author>
<Clue> ... Num="5" Dir="Across" Ans="MAY">&lt;-- What this is, on a calendar</Clue>

Right to Left languages (Hebrew and Arabic)

When the Puzzle node has the DIR=RTL attribute, the crossword should be displayed from right-to-left. There is no difference in the data, the difference should only be in display. This requires two changes in display:

  1. The first column is the column on the right, and the last column is on the left. The first letter in the Grid/Row node should appear on the far right. Col=1 in the Clue node means the first column on the right.
  2. Clue numbers should appear on the top right corner of the square instead of the top left corner.

The RTL attribute has no effect on rows.

Special thanks to Yariv Habot for this section of the XPF specification.

Licensing

XPF is free to use, even for commercial applications. As is usual for such things, I maintain ownership of the specification and would appreciate attribution, but you can use it as you see fit. Have your lawyer click the logo below for details if she's concerned.

Creative Commons LicenseXPF Puzzle specification by Jim Horne is licensed under a Creative Commons Attribution-No Derivative Works 3.0 Unported License.

Now what?

Any open standard is only useful if others adopt it. In an ideal world, Crossword Compiler, Across Lite, and every other editing and viewing program would add support for a common, open format, and every crossword publisher would accept submissions in this format as well.

I am starting the ball rolling by recommending XPF as that universal format, and by making recent puzzles in my database available in XPF.

XPF vs JSON

XPF is the recommended format for reading, saving, and submitting crosswords but sometimes it would be convenient to have the data in JSON for client-side web programming. See my JSON proposal here.

Open issues

Comments

XPF is still open for comments. Please with your questions or your suggestions for improvement. Thank you.