D-Factor: How Strong is your Data Contract?
REST APIs, and APIs in general, converse in structured data. APIs will vary in the degree and type of structure, the format, and the intended use. But they also vary greatly in how the expectations around that data are specified.
As a client developer, I need to know what kind of data I can send to your service, and what kind of data I can expect to get back. How will you, as the API provider, communicate this data contract? Is the data contract machine-readable? Is it technology-specific? Does it only specify a wire format, like XML or JSON, or does it include domain-specific business data types? Are these types private to your system, or standardized at some level?
D-Factor, inspired by Mike Amundsen's H-Factor, is introduced here as a way to characterize your API's data contract across different dimensions. In this post, we will build a small catalog of data contract factors; and in a later post, we'll evaluate some specific APIs, using the D-Factor as a data contract scorecard.
Is stronger always better?
Each of the factors described here brings some specific advantage. But there are tradeoffs, and a formal, tightly specified, and machine-readable data contract may not be the most appropriate choice for every API. So we’re not taking a position here (at least not yet) on whether your API should have all of these factors or which factors are most important.
Further, we’re definitely not trying to argue the merits of any particular schema language or API description language. XML Schema and WSDL became such a sore spot in earlier SOA initiatives that some REST API developers took an active stance against statically defined, machine-readable contracts. REST-style APIs have a reputation, good or bad, for being schema-less. But there are other choices now, and machine-readable contracts are neither mandated nor prohibited in REST and Hypermedia API design.
So we don’t want to make this a debate about technology, or even about data contract specification style. Our goal is to create a coherent vocabulary of data contract capabilities and characteristics.
These factors are of interest to most APIs, regardless of API design style.
Domain-specificity is a cornerstone of the D-Factor, because most APIs converse in domain-specific data. A domain-neutral message would only conform to a format-level media type, like application/json, or a generic structured media type like application/hal+json. A domain-specific message is expected to have some content relevant to the context and purpose of the service. Social APIs converse in terms of people, interactions and personal connections. Transactional APIs converse in parties, payments and accounts.
So almost every API will score a checkmark in the DS factor. But it serves as a threshold for the other factors: For example, your domain-specific API may be using a standard generic structure, such as application/ld+json. But if the domain vocabulary is non-standard, we couldn’t consider this API to have a standardized data contract.
SC: Standardization Context
An API has a standardization context if it uses data definitions that are standardized in some scope: worldwide, industry, country, organization or ecosystem. So a given API can score a yes/no value for the SC factor, but a yes value can be further qualified by stating the specific standardization context for those definitions.
MR: Machine Readability
The MR factor indicates that the data contract for this API is expressed in some machine-readable form. Usually this means a message schema: JSON-Schema, XML Schema, Avro Schema, etc. But even a flat CSV file with a header row would qualify.
FI: Format Independence
FI indicates that the data contract is specified at some level of abstraction above the wire format, such that the contract could be applied to multiple formats. Usually this requires an explicit mapping, to specify how the contract is realized in a specific format.
Most data contracts, whether machine-readable or only specified in human-readable documentation, are format-specific. Format independence might be a desirable property, but it is currently the exception.
Still, there are some ways to express a data contract with a degree of format independence:
- ALPS profiles are technology-neutral, and ALPS specifies mapping guidelines for various media types.
- OWL2 expresses inference-based ontologies, which can repurposed as schemas to described linked data. Linked data can be serialized to several different formats, including JSON-LD and several XML-based formats.
- API description languages, including Swagger 2.0 and API Blueprint, are evolving to support a degree of format-independence.
Static Description Factors
The static description factors are of interest to API clients who require design-time knowledge of the data contract. These factors score the API’s suitability for this kind of client development.
SDC - Static Data Contract
APIs that make the data contract for individual message types available at design time, as part of the API documentation or specification, meet the SDC factor criteria. Most APIs will score positive for this factor.
SRA – Static Resource Association
Note that SDC doesn’t require the API to document which message types will be received in response to specific methods invoked on specific resources or link relations.
An SDC API can still be highly dynamic and, by the same token, can require that clients respond appropriately to various kinds of response data, wherever that data might occur. SDC only requires the API to document the full set of possible response data, not the context in which that data is expected.
The SRA factor, Static Resource Association, requires this extra level of design-time specification. API clients know what kind of response data to expect on invocation of a specific method in a specific context. The context may be defined by the URI template of the target resource, the context in which the link to this resource occurred, or a link relation leading to this resource.
Dynamic Description Factors
Dynamic description factors are of interest to API clients who need to discover the data contract at runtime.
MSD: Message Self-Description
REST defines self-describing messages broadly as any message that specifies its media type. In the D-Factor context, we are looking more specifically for messages that make their data contract known to the API client at runtime.
The specification for the media type of a response message, or the specification for the link relation leading to the resource responding with this message, may include or reference a domain-specific data contract. And in this case, the MSD criteria are met just by the use of that media type or link relation.
In other cases, the message would need to include the data contract or a link to the data contract. This could take the form of a profile reference, a schema reference, or a specially designated hyperlink.
OWA: Open World Assumption
APIs with the Open World Assumption allow out-of-schema elements, and/or message schema extensions. Taking XML Schema as an example, extensions could include derived types or substitution groups not explicitly defined as part of the API specification.
These factors, taken together, should help us to understand how a given API defines its data contract. It should give us some specific information about the responsibilities and expectations of the API client. And it should help us to evaluate the suitability of the API for a particular style or pattern of client interaction.
In a later post, we’ll apply the D-Factor to some widely used APIs, to get a more concrete sense of how this works.
Meanwhile, D-Factor itself is a work in progress. Is it useful? Is it missing something you think is important? Please post your comments here, and let’s evolve this together.
A note about the author: Ted Epstein is Founder & CEO of ModelSolv, and co-organizes the API-Craft NYC Meetup.