Overview
RDL is a machine-readable description of a schema
that describes data types,
as well as resources using those types. Such a schema can be used to describe
HTTP web services, as well as serve as the source of truth for data encoding
mechanisms like Protocol Buffers and Avro, as well as augment JSON and other encoding schemes by
providing data validation.
Types are defined by deriving from an already defined type. Every type is thus derived (perhaps indirectly) from a primitive base type. For each base type various different options may be available to further restrict the type.
For more information and source code, look at the Github repository.
Syntax
RDL's syntax is similar to C and Java, and is fairly familiar-looking to most programmers, i.e.
type Point Struct { Int32 x; Int32 y; }The syntax is defined by an EBNF grammer, which has been used to generate a visual railroad diagram.
Primitive Types
Name | Description |
---|---|
Null |
No value |
Bool |
Either `true` or `false` |
Int8 |
An 8-bit signed integer |
Int16 |
A 16-bit signed integer |
Int32 |
A 32-bit signed integer |
Int64 |
A 64-bit signed integer |
Float32 |
A single precision (32-bit) IEEE 754 floating-point number |
Float64 |
A double precision (64) IEEE 754 floating-point number |
Bytes |
A sequence of 8 bit bytes |
String |
A sequence of unicode characters expressed in the UTF8 character set. |
Symbol |
A simple identifier, like a string but restricted in the characters accepted, generally following what most llanguages would consider a valid variable name |
UUID |
A universally unique identifier, as defined by RFC 4122 [UUID] |
Timestamp |
An instance in time, expressed as a floating point number number of seconds since 1970. May also be represented as a string in UTC as described in RFC 3339 [Timestamp]) |
Array |
An ordered collection of other values |
Map |
An unordered mapping of keys to values |
Enum |
An enumerated set of symbolic identifiers. |
Union |
A tagged union of other types |
Struct |
An ordered collection of named fields, describable by a schema |
Note: all type names in RDL are case-insensitive. Capitalized types are used in this document.
Representation
Such a structured type definition gets compiled to a Schema
, a data structure that describes
the typedefs. Although schemas could be directly written as data, i.e. in JSON or YAML, the
RDL source is designed to be more expressive, less noisy and easier to diff in a source control system. The
Point
type defined above would be expressed as the following schema, shown
here in JSON:
{ "types": [ { "StructTypeDef": { "type": "Struct", "name": "Point", "fields": [ { "name": "x", "type": "Int32" }, { "name": "y", "type": "Int32" } ] } } ] }
Each of the types in the array are of type Type
, which is a Union of
a variety of type definition structures. The format of the Schema data structure is
defined itself in RDL, see rdl.rdl for this definition.
Type Mappings
Below are mappings of RDL types to some other common type systems. The following table summarizes the relationship, with notes following:
RDL | JSON | Protobuf | Avro | Hive | XSD | ||
---|---|---|---|---|---|---|---|
Null |
null |
- [1] |
null |
- [1] |
- [1] |
||
Bool |
true or false |
bool |
boolean |
boolean |
boolean |
||
Int8 |
number [2] |
sint32 [3] |
int [4] |
tinyint |
byte |
||
Int16 |
number [2] |
sint32 [3] |
int [4] |
smallint |
short |
||
Int32 |
number [2] |
sint32 [5] |
int [6] |
int [5] |
integer |
||
Int64 |
number [2] |
sint64 [5] |
long [6] |
bigint [5] |
long |
||
Float32 |
number [2] |
float [5] |
float [6] |
float [5] |
float |
||
Float64 |
number [2] |
double [5] |
double [6] |
double [5] |
double |
||
Bytes |
string [7] |
bytes |
bytes |
binary |
hexBinary |
||
String |
string [8] |
string [8] |
string [9] |
string [8] |
string |
||
Symbol |
string [8] |
string [8] |
string [9] |
string [8] |
string |
||
UUID |
string [10] |
string [10] |
string [11] |
string [10] |
string [12,13] |
||
Timestamp |
string [2] |
double [2] |
double [2] |
double [2] |
dateTime [2] |
||
Array |
array |
repeated <V> [14] |
array [14] |
array [14, 8] |
sequence |
||
Map |
object |
repeated T<K,V> [14,15] |
map [14,13] |
map [14, 8] |
all |
||
Struct |
object |
message |
record |
struct |
all |
||
Enum |
string [2] |
enum |
enum |
string [2] |
string [2] |
||
Union |
value [2] |
message optional [2] |
union |
union [8] |
union |
||
Notes: | |||||||
[1] null is not supported in this representation | |||||||
[2] type information is lost | |||||||
[3] mapped to larger size number, original type is lost | |||||||
[4]mapped to larger size number, original type becomes an annotation | |||||||
[5] mapped to the number, subtype info is lost | |||||||
[6] mapped to the number, subtype info becomes an annotation | |||||||
[7] base64 url-friendly encoding, subtype info is lost | |||||||
[8] subtype information is lost | |||||||
[9] subtype is preserved as an annotation | |||||||
[10] RFC 4122 string, type information is lost | |||||||
[11] fixed[16], type is preserved as an annotation | |||||||
[12] RFC 4122 format URN, i.e. "urn:uuid:fae891e0-0538-11e3-851b-d875f41b36e4" | |||||||
[13] keys converted to string, key type lost | |||||||
[14] item type required | |||||||
[15] key type required |
Note that most JSON implementations use `double` as the type to hold numbers, so Int64 cannot be accurately represented. Most other types in JSON can be represented (usually as strings), but type information is lost. Decoding with a schema can recover this information.
For Protobuf, note that not all types can be derived from. Numbers, Booleans, and String types get encoded as the base type, and other type information is lost.
Avro uses JSON to represent schemas, and a type structure can generally be annotated with additional information, for example the RDL schema object itself. This can be used to preserve type (and subtype) information, but after decoding, post-processing must be done to recover that information.
References
-
[UUID] http://tools.ietf.org/html/rfc4122
[Timestamp] http://tools.ietf.org/html/rfc3339
[JSON] http://tools.ietf.org/html/rfc4627
[Protobuf] https://developers.google.com/protocol-buffers/docs/proto
[Avro] http://avro.apache.org/docs/current/spec.html#schemas
[Hive] https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types