This page is for programmers.
This is the first official release of XPF. Here is the change list:
Many people contributed suggestions for this spec. I thank them all.
There is no standard data format for formally describing a crossword puzzle. Across Lite text format comes close but it does not cover some common features like shaded squares and it is not extensible. It cannot even describe all possible Across Lite puzzles — for example, those with more complex combinations of rebus entries and circles.
Imagine working on a puzzle in Crossword Compiler on your PC, then sending it to your collaborator so she can tweak it on her Mac, and then submitting it to your publisher who uses UNIX, all using the same data file that anyone can read, create or edit. Imagine being able to give access to your puzzle to any website regardless of whether it's an HTML5 or Flash or Silverlight or Java site or a personal crossword blog, and know it will work.
XPF alone doesn't make any of that happen but having a standard like XPF is the first step in making it possible.
XPF is based on standard XML so it is easy to read or write using any software language or operating system, and it is readable by humans.
Like most XML, it is self-describing. It can be easily extended if new capabilities need to be added, without affecting existing parsers. It can be read directly into databases. For example, if you are using Internet Explorer and have Microsoft Office installed, you can right-click on an XPF file and export it directly to Excel. (Try it below.) You can also save XPF files (as XML) to your computer from within your browser.
Here is what the most recent NYT puzzle looks like in XPF.
Most modern browsers work just fine but some older ones have trouble displaying XML. You may have to "view source" from that page to see the the XML properly formatted.
XPF files validate as XML and can be read, modified, or created by any XML editor. Each XPF file starts with a standard XML declaration.
There is one required top-level node called <Puzzles>. It may have an optional "Version" attribute specifying the version of XPF these puzzle adhere to. Versions are expected to be backwards compatible but some crossword software may find this useful.
Inside, there can be any number of <Puzzle> nodes containing entries described on this page.
The top of each <Puzzle> section contains the basic puzzle information. These are similar to Across Lite fields except that they must be well-formed XML and there are some defined extensions.
<Type> is optional and means normal puzzle if omitted. The one important value is "diagramless" because it provides important information to the rendering software. Other types can be used for your own purposes. Suggested values are cryptic, diagramless, and vowelless. I also use panda for Puns and Anagrams crosswords.
<Title> and <Author> are optional. I use the strings from the NYT Across Lite files here.
<Copyright> is optional. Any text is allowed. Display programs may assume the copyright symbol © can precede this string.
<Publisher> and <Date> are also optional. Date should be the date associated with the puzzle itself and not, for instance, file creation time. American m/d/y format is used. Year should be 4 digits.
The mandatory Size section must specify the number of rows and columns. This is the size of the bounding rectangle for non-rectangular grids. Column is consistently abbreviated to Col in XPF.
The mandatory <Grid> node defines the shape of the grid and shows the answer letters in place. Blocks are indicated with a period. Answers letters are capitalized. Each row is represented by a single line.
Note that, unlike in Across Lite text format, there is no indication of either circles or rebus entries in this section. One benefit of XPF is that circles, shades, and rebus squares are all independently defined. There can be as many of each as desired and they can overlap however you want.
Rebus squares should have the capital letter of the short form of the rebus. This would typically be the single letter that would also be accepted as correct. For example, the rebus entry "HEART" would have just "H" here. A symbol would have the first letter of the most common English words used to describe it. For example, "&" would have A for ampersand.
It may be useful to store unfinished grids in XPF, for example, to share an incomplete grid with a co-constructor. Use a blank character for squares that are still to be filled in. A blank can also be part of a finished grid in the odd case where the solution requires not filling in each empty square.
Odd-shaped puzzles can be specified by using "~" to indicate a completely missing square. Here's how the 1/21/2010 puzzle by Elizabeth C. Gorski would look. It has missing corners.
I realize this may significantly complicate rendering. Software programs may choose to display "~" as a normal black square as an acceptable but not ideal solution.
These next three sections can appear in any order but all of each type, for example all circles, must be together.
Circles locations are defined using XML attributes. Note that rows and columns are numbered starting with 1 in the top left. Programmers will have to make adjustments in their code for zero-based calculations. Human readability is given priority over machine convenience. A future XPF version might add a value here to describe the type of circle.
Rebus works similarly except with two added pieces of information. The Short attribute gives the single letter that will also be accepted as correct by the puzzle. The value of each entry is the fully expanded rebus string.
Here is the XPF for the first NYT puzzle of the Will Shortz Era. It has several rebus entries.
Some puzzles have shaded squares. Here is an example with several: normal view, XPF view. XPF format follows a pattern similar to Rebus entries. The specified color must be either "gray" or an RRGGBB #hex value. (Previous XPF specs allowed any HTML color name but this tighter specification makes it easier for client applications.)
Unlike Across Lite, clues in XPF are all in one section. Instead of determining grid numbers and which clues go with which answer by examining the grid, these attributes may be specified with each clue. This gives maximum flexibility. You can have unchecked squares, or missing grid numbers, or clues that start in the middle of words.
The following XML attributes for each <Clue> are available.
Row, Col, Num, and Dir are optional. For each <Puzzle> they should be either all included (recommended) or all omitted. If they are missing, those values are determined algorithmically the same way Across Lite does. If they are included, they take precedence over the automatic calculations. This makes any grid numbering system possible. There is no requirement that each answer be clued.
Notepad can be any text so it is inside CDATA, meaning XML programs won't try to parse it and embedded HTML is allowed. Note that XPF can be extended with similar kinds of entries. For example, <JNotes> is not part of defined XPF but extensions like this do not affect standard XPF parsers.
The <Puzzle> section can be repeated as many times as desired.
Here are all the puzzles from October, 2009 in a single file.
Author, Title, Clue, and Answer strings should conform to XML rules. These two special characters must be replaced by their respective XML escape sequences. (Your XML editor or generator may do this for you already. I use the .NET XmlTextWriter class which takes care of this automatically.)
All other special characters including quote, apostrophe, and foreign characters can be entered directly.
When the Puzzle node has the DIR=RTL attribute, the crossword should be displayed from right-to-left. There is no difference in the data, the difference should only be in display. This requires two changes in display:
The RTL attribute has no effect on rows.
Special thanks to Yariv Habot for this section of the XPF specification.
XPF is free to use, even for commercial applications. As is usual for such things, I maintain ownership of the specification and would appreciate attribution, but you can use it as you see fit. Have your lawyer click the logo below for details if she's concerned.
Any open standard is only useful if others adopt it. In an ideal world, Crossword Compiler, Across Lite, and every other editing and viewing program would add support for a common, open format, and every crossword publisher would accept submissions in this format as well.
I am starting the ball rolling by recommending XPF as that universal format, and by making recent puzzles in my database available in XPF.
XPF is the recommended format for reading, saving, and submitting crosswords but sometimes it would be convenient to have the data in JSON for client-side web programming. See my JSON proposal here.
XPF is still open for comments. Please with your questions or your suggestions for improvement. Thank you.