Custom Schemas

Core to the design of The Data Grid (TDG) is the ability to create custom databases. In addition to providing extensive functionality for users who want to query open data, TDG has also been designed for those who want to host their own data. While the upload and organization management guides describe how to use TDG as a tool to manage databases, this guide describes how to use TDG to create the databases themselves.

Overview

Stepping back and looking at the system as a whole, TDG can be simplified into the following components:
  1. A tool to easily create custom databases
  2. A private "portal" for owners of the data to manage it
  3. A public interface for querying the databases
This section covers 1, which more descriptively, is a tool to create highly normalized and performant PostgreSQL databases by defining a custom schema using our data representation model. In a lot of ways, this is similar to an ORM. The data representation model, which is represented internally as a collection of objects, is converted into a relational database. The first part of this guide motivates and gives a high level understanding of the data representation model. The second part talks about how to actually use this model to create a custom database.

Data Representation Model

This section is a work in progress

Motivation and Scope

The overarching goal of TDG is to provide a way to represent and manage arbitrary data in a standardized way. This motivates the model: an opinionated format in which unopinionated data can be represented. ie: by specifying your arbitrary data format in the language of a non-arbitrary model, your data now has a bridge to be compared to and stored with data unlike it. This document discusses only the model itself and not its implementation. Although PostgreSQL and JavaScript examples are given, the model is meant to be generalizable across languages and environments. We present the model as a high level standard for representing arbitrary data, implemented on www.thedatagrid.org.

Introduction

The model consists of three abstract data types: the item, observation, and data column. The user creates customized instances of these types, which collectively make up their custom schema. Together, items and observations create an observation-based model. Data is recorded by creating new items and observing existing items. data columns are attached to items and observations.

Abstract Data Types

This section gives the minimum amount of information needed to create a custom schema. To see the extended model definition, read our whitepaper draft.

Putting it all Together

As a final conceptual foundation, the entire system has three levels of abstraction, each level being an instance of the last.
  1. Data representation model which abstractly represents the format of all data
  2. User created schema, which are custom instances of the data representation model
  3. User uploaded data, which are instances of their custom schema

Creating a Schema

Now that you have an understanding of how the model works, let's make a schema. Custom schemas are represented as JSON, and then TDG converts them into PostgreSQL relations. Currently, the only way to create a schema is to write the JSON yourself. We are working on a UI so schemas can be created on www.thedatagrid.org. However, this isn't so complicated! We will walk you through the fields you need to fill out.

Object Format

Column

{ "itemOrObservation": "Item", "name": "Waste Bin", "information": "Unique alphanumeric bin identifier", "sqlType": "TEXT", "referenceType": "item-id", "presetValues": null, "isNullable": false }

Feature

{ "frontendName": "Victor Stanley Waste Bin", "information": "A Compost, Landfill, or Recycle Waste Bin", "observableItem": { "requiredItem": [ { "name": "item_entity", "isID": true, "isNullable": false, "frontendName": "Entity of Cluster", "information": null } ], "realGeo": { "itemName": "item_vs_bin", "tableName": "location_point", "columnName": "data_point" }, "frontendName": "Victor Stanley Waste Bin", "creationPrivilege": 2 }, "authorization": { "queryPrivilege": "guest", "queryRole": null, "uploadPrivilege": "user", "uploadRole": "auditor" } }

Field Definitions

This is a work in progress. Detailed field definitions will be added soon.

Complete Example

Next Guide: Organization Management