Quickstart

Using this API users are able to easily access data and manage their data projects regardless of language or tool of preference.

Creating Datasets

To create a dataset, use the POST:/datasets/{owner} endpoint. For example, you can create a PRIVATE dataset called API sandbox, with the following command:

curl --request POST \
  --url "https://api.data.world/v0/datasets/${DW_USERNAME}" \
  --header "Authorization: Bearer ${DW_API_TOKEN}" \
  --header "Content-type: application/json" \
  --data '{"title":"API Sandbox","visibility":"PRIVATE"}'

Where the following must be replaced:

  • ${DW_USERNAME}: your data.world username
  • ${DW_API_TOKEN}: your API authentication token

As a result, the server response should look like this:

{
  "message": "Dataset has been successfully created.",
  "uri": "https://data.world/[USERNAME]/api-sandbox"
}

Once a dataset is created, other endpoints can be used to update it, including its set of files. Check out PATCH:/datasets/{owner}/{id}, for additional details.

Now, let’s see how files can be added to datasets.

Uploading files

Files can be added to datasets by direct upload (POST:/uploads/{owner}/{id}/files), from URLs (POST:/datasets/{owner}/{id}/files) or via streams (POST:/streams/{owner}/{id}/{streamId}).

Direct Uploads

The POST:/uploads/{owner}/{id}/files endpoint allows direct upload os files from the local file system.
It expects requests of type multipart/form-data and you can include one or more parts named file, each containing a different file to be uploaded.

For example, assuming that you want to upload two files in the current directory named file1.csv and file2.csv to the API Sandbox dataset you just created, this is what the cURL command would look like.

curl \
  --header "Authorization: Bearer ${DW_API_TOKEN}" \
  -F "file=@file1.csv" \
  -F "file=@file2.csv" \
  --url https://api.data.world/v0/uploads/${DW_USERNAME}/api-sandbox/files

Add from URL

Alternatively, if you have the URL for a file hosted publicly on the web, you can skip the download/upload steps altogether.

For example, you can get a complete list of NYC subway stations from data.cityofnewyork.us in CSV format at https://data.cityofnewyork.us/api/views/kk4q-3rt2/rows.csv?accessType=DOWNLOAD.
You can add that file to our API Sandbox dataset using the POST:/datasets/{owner}/{id}/files endpoint with the following command:

curl --request POST \
  --url "https://api.data.world/v0/datasets/${DW_USERNAME}/api-sandbox/files" \
  --header "Authorization: Bearer ${DW_API_TOKEN}" \
  --header "Content-type: application/json" \
  --data '{"files":[{"name":"nyc-subways.csv","source":{"url":"https://data.cityofnewyork.us/api/views/kk4q-3rt2/rows.csv?accessType=DOWNLOAD"},"description":"List of NYC subway stations","labels":["raw data"]}]}'

For clarity, this is the JSON data included in the above request:

{
  "files": [
    {
      "name": "nyc-subways.csv",
      "source": {
        "url": "https://data.cityofnewyork.us/api/views/kk4q-3rt2/rows.csv?accessType=DOWNLOAD"
      },
      "description": "List of NYC subway stations",
      "labels": [
        "raw data"
      ]
    }
  ]
}

As a result, the server response should look like this:

{
  "message": "Dataset has been successfully updated with new sources. Sync in progress."
}

PRO Tip: The POST:/datasets/{owner}/{id}/files offers great fexibility for those looking to retrieve data from web APIs. You can specify parameters other than the URL, including HTTP method, request payload and authentication.

Streaming

Data can be streamed to data.world one record at a time using the POST:/streams/{owner}/{id}/{streamId} endpoint. data.world will accumulate stream data and periodically process it (by default once a day), at which point the data becomes consumable with the related dataset.
A stream named temperatures, for example, will appear in the associated dataset as temperatures.jsonl once processed.

As needed, users can force stream data to be processed, by invoking the POST:/datasets/{owner}/{id}/sync endpoint.

To create a stream, simply append the first row of data to a stream that didn’t exist before. For example:

curl --request POST \
  --url "https://api.data.world/v0/streams/${DW_USERNAME}/api-sandbox/temperatures" \
  --header "Authorization: Bearer ${DW_API_TOKEN}" \
  --header "Content-type: application/json" \
  --data '{"temperature": 34, "location": "New York", "measurement_time": "2017-10-24T01:00:00Z"}'

It is also possible to append multiple records at once, so long as the content type in the request is set to application/json-l (JSON lines). For example:

curl --request POST \
  --url "https://api.data.world/v0/streams/${DW_USERNAME}/api-sandbox/temperatures" \
  --header "Authorization: Bearer ${DW_API_TOKEN}" \
  --header "Content-type: application/json-l" \
  --data '{"temperature": 37, "location": "New York", "measurement_time": "2017-10-24T02:00:00Z"}
  {"temperature": 40, "location": "New York", "measurement_time": "2017-10-24T03:00:00Z"}'

Processing of files is asyncrhonous and finishes quickly. However, large files can take a few minutes to process. Once the dataset is in LOADED status, the data is ready to be consumed.

Let’s look at how you can check the status of a dataset.

Retrieving dataset info

To retrieve dataset info, use the GET:/datasets/{owner}/{id} endpoint. For example:

curl --request GET \
  --url "https://api.data.world/v0/datasets/${DW_USERNAME}/api-sandbox" \
  --header "Authorization: Bearer ${DW_API_TOKEN}" \
  --header "Content-type: application/json"

As a result, the server response should look like this:

{
  "owner": "rflprr",
  "id": "api-sandbox",
  "title": "API Sandbox",
  "visibility": "PRIVATE",
  "files": [
    {
      "name": "nyc-subways.csv",
      "sizeInBytes": 64188,
      "source": {
        "url": "https://data.cityofnewyork.us/api/views/kk4q-3rt2/rows.csv?accessType=DOWNLOAD",
        "expandArchive": false,
        "syncStatus": "OK",
        "lastSyncStart": "2017-10-24T21:27:15.406Z",
        "lastSyncSuccess": "2017-10-24T21:27:04.197Z"
      },
      "created": "2017-10-24T21:27:04.618Z",
      "updated": "2017-10-24T21:27:04.618Z",
      "description": "List of NYC subway stations",
      "labels": [
        "raw data"
      ]
    },
    {
      "name": "stream-temperatures.jsonl",
      "sizeInBytes": 178,
      "source": {
        "url": "https://data.world/rflprr/d/api-sandbox/stream-temperatures",
        "syncStatus": "OK",
        "lastSyncStart": "2017-10-24T21:27:15.406Z",
        "lastSyncSuccess": "2017-10-24T21:27:15.424Z"
      },
      "created": "2017-10-24T21:27:15.683Z",
      "updated": "2017-10-24T21:27:15.683Z"
    }
  ],
  "status": "LOADED",
  "created": "2017-10-24T21:26:49.543Z",
  "updated": "2017-10-24T21:27:18.658Z",
  "accessLevel": "ADMIN",
  "isProject": false
}

Note that syncStatus is OK and status is LOADED. That indicates that the file was synchronized correctly and that the dataset is ready to use.

Below is what the same dataset looks like on data.world. Observe how each of the dataset attributes returned previously are presented.

"API Sandbox Dataset"

Querying with SQL

Now that the dataset is ready to be used, you can, for example, discover which NYC subway stations the 7 Express train stops at using the POST:/sql/{owner}/{id} endpoint. For example:

curl --request POST \
  --url "https://api.data.world/v0/sql/${DW_USERNAME}/api-sandbox" \
  --header "Authorization: Bearer ${DW_API_TOKEN}" \
  --header "Accept: text/csv" \
  --data-urlencode 'query=SELECT name, line FROM nyc_subways WHERE line LIKE "%7 Express%"'

As a result, the server response should look like this:

NAME,LINE
Vernon Blvd - Jackson Ave,7-7 Express
Queensboro Plz,7-7 Express-N-W
Times Sq - 42nd St,7-7 Express
Grand Central - 42nd St,7-7 Express
Mets - Willets Point,7-7 Express
Junction Blvd,7-7 Express
Flushing - Main St,7-7 Express
5th Ave - Bryant Pk,7-7 Express
34th St - Hudson Yards,7-7 Express
Woodside - 61st St,7-7 Express
Court Sq,7-7 Express
Hunters Point Ave,7-7 Express

Additional query options

Query results can be obtained in a variety of different formats, including:

  • text/csv: Comma-separated values
  • text/tab-separated-values: Tab-separated values
  • application/json: JSON array
  • application/x-ndjson and application/json-l: JSON lines

When requesting a JSON format, you can obtain schema information alongside the data. To do that, add includeTableSchema=true in the query string. The schema information will be the first element or row in the response.

Here is what the previous request would look like if modified to produce JSON lines including schema information:

curl --request POST \
  --url "https://api.data.world/v0/sql/${DW_USERNAME}/api-sandbox?includeTableSchema=true" \
  --header "Authorization: Bearer ${DW_API_TOKEN}" \
  --header "Accept: application/json-l" \
  --data-urlencode 'query=SELECT name, line FROM nyc_subways WHERE line LIKE "%7 Express%"'

In this case, this is what the response will look like:

{"fields":[{"name":"name","type":"string","rdfType":"http://www.w3.org/2001/XMLSchema#string"},{"name":"line","type":"string","rdfType":"http://www.w3.org/2001/XMLSchema#string"}]}
{"name":"Vernon Blvd - Jackson Ave","line":"7-7 Express"}
{"name":"Queensboro Plz","line":"7-7 Express-N-W"}
{"name":"Times Sq - 42nd St","line":"7-7 Express"}
{"name":"Grand Central - 42nd St","line":"7-7 Express"}
{"name":"Mets - Willets Point","line":"7-7 Express"}
{"name":"Junction Blvd","line":"7-7 Express"}
{"name":"Flushing - Main St","line":"7-7 Express"}
{"name":"5th Ave - Bryant Pk","line":"7-7 Express"}
{"name":"34th St - Hudson Yards","line":"7-7 Express"}
{"name":"Woodside - 61st St","line":"7-7 Express"}
{"name":"Court Sq","line":"7-7 Express"}
{"name":"Hunters Point Ave","line":"7-7 Express"}

Notice that the first line is the table schema, as expected:

{
   "fields":[
      {
         "name":"name",
         "type":"string",
         "rdfType":"http://www.w3.org/2001/XMLSchema#string"
      },
      {
         "name":"line",
         "type":"string",
         "rdfType":"http://www.w3.org/2001/XMLSchema#string"
      }
   ]
}

Next steps

  1. Take some time to familiarize yourself with our API Reference documentation
  2. Browse some endpoints and use the Send a Test Request feature to interact with them in your browser.
  3. Learn how easily you can generate Client Libraries for the language of your choice.
  4. Check out existing integrations to see if an integration for your favorite tools already exists, and skip using this API altogether. Pro tip: come back often as we are constantly implementing new integrations.
  5. Let us know what cool things you create and make sure to reach out if you need support. We love to hear from you, so connect with us through our help portal or on Slack.