specification

1. About

RFC8259 describes the JSON interchange format, which is widely used in application-level protocols including RESTful APIs. It is common for applications to request resources via the HTTP POST method with JSON entities because, per RFC7231 Section 4.3.1:

A payload within a GET request message has no defined semantics;
sending a payload body on a GET request might cause some existing
implementations to reject the request.

However, POST is suboptimal for requests which do not modify a resource’s state because it is not idempotent and limits or prevents client caching.

While one could simply percent encode JSON text such that it’s suitable for inclusion in an HTTP GET request, anything other than a trivial payload would be difficult for a human to read or modify; that’s presumably why it’s not often done in practice. Alternatively, one could use something other than JSON, however, there’s good reason to choose JSON. It defines a simple but powerful data model, and it’s very well known.

JSON→URL defines a text format for the JSON data model suitable for use within a URL/URI.

1.1 Terminology

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC2119 when, and only when, they appear in all capitals, as shown here.

The terms “JSON text”, “value”, “object”, “array”, “number”, “string”, “name”, and “member” in this document are to be interpreted as described in RFC8259.

1.2 Requirements Notation

This document uses the Augmented Backus-Naur Form (ABNF) notation described in RFC5234.

1.3 Latest Version

The latest version of this document may be found here: https://github.com/jsonurl/specification/

2. Grammar

RFC8259 describes the JSON data model, which includes objects, arrays, and value literals. This document defines a new grammar for the JSON data model called JSON→URL. It also borrows heavily from RFC8259 so it’s recommended that you read it first.

JSON→URL text is a sequence of characters as defined by RFC3986, section 2. Encoded octets MUST be UTF-8 encoded UNICODE codepoints. JSON→URL text complies with RFC3986, section 3.4 and is suitable for use as a URI query string.

Like JSON text, JSON→URL text is also a sequence of tokens. The set of tokens includes structural characters, strings, numbers, and three literal names.

JSON-URL-text = value

These are the four structural characters:

begin-composite    = %x28  ; ( left paren
end-composite      = %x29  ; ) right paren
name-separator     = %x3A  ; : colon
value-separator    = %x2C  ; , comma

Note that while RFC8259 defines one set of tokens for arrays and another set for objects JSON→URL defines only a single set for both arrays and objects. This change is due to the limitations imposed by the query production defined in RFC3986. With one exception arrays and objects are semantically no different here than in RFC8259 – it’s simply the grammar that’s different. A lookahead token will allow a JSON→URL parser to distinguish between arrays and objects in all cases except when the object or array has no members.

Whitespace MUST NOT be used.

2.1 Values

A JSON→URL value MUST be a composite, number, string, or one of three literal names. The three literals and the number production are defined exactly as in RFC8259. Object, array, and string are defined in the following sections of this document.

value = false / null / true / composite / number / string

2.2 Composites

A composite value is an array, object, or the empty-composite value which represents an array/object with no members. In JSON→URL, an empty array and empty object are indistinguishable. This constitutes the only difference between the data model defined in this document and RFC8259.

composite       = empty-composite / object / array
empty-composite = begin-composite end-composite ; ()

2.3 Objects

An object structure is represented as a pair of parentheses surrounding one or more name/value pairs (or members). A name is a string. A single colon comes after each name, separating the name from the value. A single comma separates a value from a following name.

object = begin-composite member *( value-separator member ) end-composite
member = string name-separator value

2.4 Arrays

An array structure is represented as a pair of parentheses surrounding one or more values. Multiple values are separated by commas.

array = begin-composite value *( value-separator value ) end-composite

As in RFC8259 there is no requirement that the values in an array be of the same type.

2.5 Strings

Though semantically equivalent to string as defined in RFC8259 the grammar for a JSON→URL string is quite different. A JSON→URL string MAY be surrounded by single-quotes (a.k.a. apostrophes), however, it is only necessary to do so when the value would be otherwise ambiguous. In practice, this means single quotes MUST be used to represent a string literal ‘true’, ‘false’, ‘null’, or number. Otherwise, their use is OPTIONAL. Object keys are always assumed to be strings and need not be quoted even if they would otherwise be interpreted as a Number, Boolean, or null.

When used in a string literal a plus character (U+002B) represents a single space character (U+0020) just like the x-www-form-urlencoded type.

All text must comply with the query production defined in RFC3986, section 3.4. Therefore, any characters outside the set of allowed literal characters MUST be percent encoded.

There is a meaningful difference between a structural character and an encoded structural character. When encoded, a parser MUST interpret the character as part of a string literal. When not encoded the character retains its structural meaning. Quoted strings need not encode structural characters.

string        = uchar *(uchar / apos) ; unquoted string
              / apos *qchar apos      ; quoted string

uchar         = unencoded / pct-encoded / space-encoded

qchar         = uchar / struct-char

unencoded     = digit
              / %x41-5A            ; A-Z  uppercase letters
              / %x61-7A            ; a-z  lowercase letters
              / %x2D               ; -    dash
              / %x2E               ; .    period
              / %x5F               ; _    underscore
              / %x7E               ; ~    tilde
              / %x21               ; !    exclamation point
              / %x24               ; $    dollar sign
              / %x2A               ; *    asterisk
              / %x2F               ; /    solidus
              / %x3B               ; ;    semicolon
              / %x3F               ; ?    question mark
              / %x40               ; @    at sign
              
apos          = %x27               ; '    single quote/apostrophe

struct-char   = %x28               ; (    open paren
              / %x29               ; )    close paren
              / %x2C               ; ,    comma
              / %x3A               ; :    colon

pct-encoded   = %x25 hexdig hexdig ; %XX  percent encoded

space-encoded = %x2B               ; +    plus sign

hexdig        = digit / %x41-46    ; A-F  hexadecimal digits
digit         = %x30-39            ; 0-9  digits

2.6 Numbers

Numbers are represented exactly as defined in RFC8259, Section 6. Note that when used in a number the plus character (U+002B) is literal and MUST NOT be interpreted by a JSON→URL parser as a space character (as it would be in a string literal).

2.7 Whitespace

The grammar defined in RFC8259 allows for “insignificant whitespace” as it can make it easier for the human eye to parse JSON text. However, unescaped whitespace is not allowed in a URL and escaped whitespace would likely make it more difficult for the human eye to parse. Therefore, unescaped whitespace MUST NOT be present in JSON→URL text. Escaped whitespace MAY be present, however, it is always considered significant.

2.8 x-www-form-urlencoded

JSON→URL text is designed to play well with x-www-form-urlencoded data. JSON→URL text MUST percent-encode literal & and = characters. This allows one or more traditional HTML form variables to be standalone JSON→URL text.

2.9 Optional Syntaxes

A JSON→URL parser implementation MAY support additional syntax options. Implementations SHOULD default to the grammar described above and only allow an alternate syntax when explicitly enabled. Each section defines a syntax option as a modification of previously defined grammar productions and/or definition of new ones.

2.9.1 Implied Arrays

If both a sender and its receiver agree a priori that the top-level value is an array then a parser MAY accept JSON→URL text that omits the first begin-composite and last end-composite characters.

implied-array = [value] *( value-separator value )

Note that, unlike an array, an implied-array may be empty (i.e. contain zero values). There is no ambiguity in this case and the parse result MAY be an array rather than the empty-composite.

implied-array is OPTIONAL. A JSON→URL parser is not required to support it.

2.9.2 Implied Objects

If both a sender and its receiver agree a priori that the top-level value is an object then a parser MAY accept JSON→URL text that omits the first begin-composite and last end-composite characters.

implied-object = [member] *( value-separator member )

Note that, unlike an object, an implied-object may be empty (i.e. contain zero members). There is no ambiguity in this case and the parse result MAY be an object rather than the empty-composite.

implied-object is OPTIONAL. A JSON→URL parser is not required to support it.

2.9.3 x-www-form-urlencoded Arrays and Objects

A parser MAY accept x-www-form-urlencoded style separators as structural characters for a top-level array or object.

wfu-name-separator      = %x3D  ; = equal 
wfu-value-separator     = %x26  ; & ampersand

wfu-composite           = empty-composite / wfu-object / wfu-array

wfu-object = begin-composite wfu-member *( wfu-value-separator wfu-member ) end-composite
wfu-member = string wfu-name-separator value

wfu-array  = begin-composite value *( wfu-value-separator value ) end-composite

This may be combined with an implied array or object to form a URL query string that is a syntactically valid, traditional HTML form data query string.

wfu-implied-composite = wfu-implied-object / wfu-implied-array

wfu-implied-object    = [wfu-member] *( wfu-value-separator wfu-member )

wfu-implied-array     = [value] *( wfu-value-separator value )

Note that wfu-composite and wfu-implied-composite indirectly reference qchar, which allows unencoded literal , and : characters but requires literal & and = characters to be encoded. This is intentional, and allows JSON→URL text to meet the goal outlined in section 2.7.

wfu-composite and wfu-implied-composite are OPTIONAL. A JSON→URL parser is not required to support them.

2.9.4 Implied Object Missing Values

If a parser supports implied-object and/or wfu-implied-object then it MAY accept JSON→URL text where values (and their respective name-separators) are omitted for the top-level implied object.

mv-implied-object = [mv-member] *( value-separator mv-member )
mv-member = string [name-separator value]

mv-wfu-implied-object = [wfu-member] *( wfu-value-separator wfu-member )
mv-wfu-member = string [wfu-name-separator value]

mv-implied-object and mv-wfu-implied-object are OPTIONAL. A JSON→URL parser is not required to support them. A JSON→URL parser implementation SHOULD provide a mechanism which allows the caller to supply a default value.

2.9.5 Empty Objects and Arrays

Section 2.2 defines the empty composite and outlines that there is no distinction between an empty object and empty array. A parser MAY distinguish between the two by defining composite as follows.

composite       = empty-object / empty-array / object / array
empty-array     = begin-composite end-composite ; ()
empty-object    = begin-composite name-separator end-composite ; (:)

2.9.6 Address Bar Query String Friendly

A web browser address bar will often percent encode characters even when it’s not necessary to do so. In particular, apostrophes are encoded despite the fact that they are identified by RFC3986 as sub-delims. This interferes with the JSON→URL string encoding strategy outlined above. There can’t be a meaningful difference between ' and %x27 if the browser changes one to the other at its own discretion.

The address bar query string friendly (AQF) syntax allows JSON→URL to work in this context. It relies on escaping rather than encoding (or quoting) to distinguish between characters that are part of a string and characters that are structural or form one of the other literal values. Except for ampersand (%x26), equals (%x3D), and plus (%x2B), a parser MUST decode each UNICODE codepoint before it’s evaluated, which will ensure the browser’s encoding preferences do not affect how a a JSON→URL parser interprets the character sequence.

The AQF syntax modifies the string production as follows.

string    = 1*char

char      = uchar / apos / esc-seq

esc-seq   = escape eval

escape    = %x21               ; !

eval      = struct-char
          / digit
          / %x2B               ; +    plus
          / %x2D               ; -    dash
          / %x21               ; !    exclamation point
          / %x65               ; e    the empty string
          / %x66               ; f    lowercase f
          / %x6E               ; n    lowercase n
          / %x74               ; t    lowercase t

An escape sequence begins with an exclamation point and MUST be following by another character, the escape value. The set of valid escape values is defined by eval.

Note that, like any other octet, the exclamation point escape character itself and/or eval may be percent encoded.

Because browsers do recognize a meaningful difference between literal and encoded ampersand and equals, when used in a query string, the AQF syntax retains that distinction. This allows for it to continue to work with x-www-form-urlencoded arrays and objects.

The AQF syntax is OPTIONAL. A JSON→URL parser is not required to support it.

3. Examples

Here are a few examples.

3.1 String Literal

Here are some string literals:

word
two+words
Hello%2C+World!
'Hello,+World!'
'true'
'42'

3.2 Number Literal

Here are some number literals:

0
1.0
1e2
-3e4
42

3.3 Object

Here are some objects:

(key:value)
(Hello:World!)
(key:value,nested:(key:value))

3.4 Array

Here are some arrays:

(1)
(1,2,3)
(a,b,c)
(a,b,(nested,array))
(array,of,objects,(object:1),(object:2))

3.5 Implied Array

Here are some implied arrays:

1
1,2,3
a,b,c
a,b,(nested,array)
array,with,objects,(object:1),(object:2)

3.6 Implied Object

Here are some implied objects:

key:value
Hello:World!
key:value,nested:(key:value)

3.7 x-www-form-urlencoded Implied Array

Here are some implied arrays that make use of x-www-form-urlencoded style separators:

1
1&2&3
a&b&c
a&b&(nested,array)
array&with&objects&(object:1)&(object:2)

3.8 x-www-form-urlencoded Implied Object

Here are some implied objects that make use of x-www-form-urlencoded style separators:

key=value
Hello=World!
key=value&nested=(key:value)

3.9 Implied Object With Missing Values

Here are some implied objects with missing values:

key
key,Hello=World!
key=value&marker&nested=(key:value)

3.9 Address Bar Query String Friendly

Here are some AQF values:

(Hello:World!!)
(key:value,strings:(a,!true,c,!3.14,!-5))
(1,2,3,Hello!,+World!!)
(a,!e,c)