FAQ

What are valid table names to use in SQL queries?

As files are added to datasets, data.world extracts and normalizes all the structured content in them, and derives a schema (tables and columns) that can be used for querying.

To list all the tables in a dataset, you can query the Tables metadata table. Specifically, you can run SELECT * FROM Tables, using the POST:/sql/{owner}/{id} endpoint.

To list all the columns, the process is similar, but uses TableColumns instead of Tables—i.e.SELECT * FROM TableColumns.

What is the difference between datasets and projects?

For simplicity, think of datasets as repositories of data. That’s normally where you’ll collect data files, in addition to other files that belong with them (documentation, scripts, etc).

Think of data projects as a tool to coordinate and document the use of data from datasets in fulfilling the project’s objective. Data projects start with a task, a question or a hypothesis and end with a conclusion supported by assets and insights collected along the way.

The primary source of data for data projects are linked datasets, however, every data project includes a default dataset that you can use to store files assets that only make sense in the context of the data project and don’t need a separate dataset for themselves.

From an API perspective, all endpoints designed to work with datasets work with data projects too (they will act on the project’s default dataset). In addition, specific projects APIs allow you to perform operations that apply to projects only, such as creating insights and linking datasets.

Blog poset: Introducing Data Projects

Can I create an insight using an image from a dataset or project?

Yes, you can address images that live in datasets using the following URL patter: https://data.world/api/{ownerId}/dataset/{datasetId}/file/raw/{urlEncodedFileName}

For example, imagine you wanted to use the following file in an insight: https://data.world/jenka13all/lara-hotel-reviews/workspace/file?filename=languages.png

In this case, the parameters would be:

  • ownerId: jenka13all
  • datasetId: lara-hotel-reviews
  • urlEncodedFileName: languages.png

To create the insight, you’d then call POST:/insights/jenka13all/lara-hotel-reviews with a payload that may look like:

{
	"title": "The majority of the reviews were written in English.",
	"description": "That the top languages include some more untypical languages warrants a closer look at the data and a second round of classification...",
	"body": {
	    "imageUrl": "https://data.world/api/jenka13all/dataset/lara-hotel-reviews/file/raw/languages.png"
}

Why don’t I see the data I have appended to my stream?

data.world separates the act of appending to a stream from actually processing that data for performance optimization reasons. By default, data is processed daily, but that setting can be changed on a per dataset basis. In addition, data can be processed on demand. For example, if your data is appended in batches, you can call POST:/datasets/{owner}/{id}/sync to force processing of the data at the end of a given batch.