RDL

Resource Description Language

View project on GitHub

Overview

RDL is a machine-readable description of a schema that describes data types, as well as resources using those types. Such a schema can be used to describe HTTP web services, as well as serve as the source of truth for data encoding mechanisms like Protocol Buffers and Avro, as well as augment JSON and other encoding schemes by providing data validation.

Types are defined by deriving from an already defined type. Every type is thus derived (perhaps indirectly) from a primitive base type. For each base type various different options may be available to further restrict the type.

For more information and source code, look at the Github repository.

Syntax

RDL's syntax is similar to C and Java, and is fairly familiar-looking to most programmers, i.e.

type Point Struct {
    Int32 x;
    Int32 y;
}
            
The syntax is defined by an EBNF grammer, which has been used to generate a visual railroad diagram.

Primitive Types

Name Description
Null No value
Bool Either `true` or `false`
Int8 An 8-bit signed integer
Int16 A 16-bit signed integer
Int32 A 32-bit signed integer
Int64 A 64-bit signed integer
Float32 A single precision (32-bit) IEEE 754 floating-point number
Float64 A double precision (64) IEEE 754 floating-point number
Bytes A sequence of 8 bit bytes
String A sequence of unicode characters expressed in the UTF8 character set.
Symbol A simple identifier, like a string but restricted in the characters accepted, generally following what most llanguages would consider a valid variable name
UUID A universally unique identifier, as defined by RFC 4122 [UUID]
Timestamp An instance in time, expressed as a floating point number number of seconds since 1970. May also be represented as a string in UTC as described in RFC 3339 [Timestamp])
Array An ordered collection of other values
Map An unordered mapping of keys to values
Enum An enumerated set of symbolic identifiers.
Union A tagged union of other types
Struct An ordered collection of named fields, describable by a schema

 

Note: all type names in RDL are case-insensitive. Capitalized types are used in this document.

Representation

Such a structured type definition gets compiled to a Schema, a data structure that describes the typedefs. Although schemas could be directly written as data, i.e. in JSON or YAML, the RDL source is designed to be more expressive, less noisy and easier to diff in a source control system. The Point type defined above would be expressed as the following schema, shown here in JSON:

{
    "types": [
        {
            "StructTypeDef": {
                "type": "Struct",
                "name": "Point",
                "fields": [
                    {
                        "name": "x",
                        "type": "Int32"
                    },
                    {
                        "name": "y",
                        "type": "Int32"
                    }
                ]
            }
        }
    ]
}
          

Each of the types in the array are of type Type, which is a Union of a variety of type definition structures. The format of the Schema data structure is defined itself in RDL, see rdl.rdl for this definition.

Type Mappings

Below are mappings of RDL types to some other common type systems. The following table summarizes the relationship, with notes following:

RDL JSON Protobuf Avro Hive XSD
Null null - [1] null - [1] - [1]
Bool true or false bool boolean boolean boolean
Int8 number [2] sint32 [3] int [4] tinyint byte
Int16 number [2] sint32 [3] int [4] smallint short
Int32 number [2] sint32 [5] int [6] int [5] integer
Int64 number [2] sint64 [5] long [6] bigint [5] long
Float32 number [2] float [5] float [6] float [5] float
Float64 number [2] double [5] double [6] double [5] double
Bytes string [7] bytes bytes binary hexBinary
String string [8] string [8] string [9] string [8] string
Symbol string [8] string [8] string [9] string [8] string
UUID string [10] string [10] string [11] string [10] string [12,13]
Timestamp string [2] double [2] double [2] double [2] dateTime [2]
Array array repeated <V> [14] array [14] array [14, 8] sequence
Map object repeated T<K,V> [14,15] map [14,13] map [14, 8] all
Struct object message record struct all
Enum string [2] enum enum string [2] string [2]
Union value [2] message optional [2] union union [8] union
 
Notes:
[1] null is not supported in this representation
[2] type information is lost
[3] mapped to larger size number, original type is lost
[4]mapped to larger size number, original type becomes an annotation
[5] mapped to the number, subtype info is lost
[6] mapped to the number, subtype info becomes an annotation
[7] base64 url-friendly encoding, subtype info is lost
[8] subtype information is lost
[9] subtype is preserved as an annotation
[10] RFC 4122 string, type information is lost
[11] fixed[16], type is preserved as an annotation
[12] RFC 4122 format URN, i.e. "urn:uuid:fae891e0-0538-11e3-851b-d875f41b36e4"
[13] keys converted to string, key type lost
[14] item type required
[15] key type required

 

Note that most JSON implementations use `double` as the type to hold numbers, so Int64 cannot be accurately represented. Most other types in JSON can be represented (usually as strings), but type information is lost. Decoding with a schema can recover this information.

For Protobuf, note that not all types can be derived from. Numbers, Booleans, and String types get encoded as the base type, and other type information is lost.

Avro uses JSON to represent schemas, and a type structure can generally be annotated with additional information, for example the RDL schema object itself. This can be used to preserve type (and subtype) information, but after decoding, post-processing must be done to recover that information.

References