57 views
 owned this note
<img src="https://data.4tu.nl/static/images/logo.png" width="300" /> # πŸ”§ Workshop: Data access & Publication with the 4TU.ResearchData WEBAPI **Duration**: 90 minutes **Audience**: Researchers in engineering and natural sciences field & research data support **Tools**: [4TU.ResearchData WEB API documentation](https://djehuty.4tu.nl/#titlepage), Unix shell terminal/Git bash for Windows/ WSL for Windows, Python --- ## βœ… Learning Objectives By the end of the workshop, participants will be able to: 1. **Explain what a REST WEB API is** and how it facilitates access and publication of research data. 2. **Use the 4TU.ResearchData API** to fetch datasets and metadata using curl commands , to search for datasets , to upload datasets --- ## πŸ—‚οΈ Workshop Schedule | Time | Activity | |------------|--------------------------------------------------------------------------| | 0–20 min | **Intro to REST APIs**: Concepts and mechanics | | 20–25 min | **Use case**: Why APIs matter for data reuse & automation | | 25–45 min | **Fetching datasets via API (reuse)** _Hands-on with curl commands_ | | 45–60 min | **Uploading datasets via API (publishing)** _Hands-on with curl commands_ | | | 75–90 min | **Showcase the interaction with the webapi with Python** | --- # Roll Call Put your name, position, field of your research, and if you have ever use APIs for your work, and what reason you have used APIs - ... - .... - .... - ... --- # πŸ”§ What is an API? ## πŸ“˜ Definition An **API (Application Programming Interface)** is a set of **rules and tools** that allow software applications to communicate with each other. - Think of it as a **contract** between two software components. - It defines **how to request services** and **what response to expect**. --- ## 🧠 How APIs Work APIs expose certain **functions or data** to external users (or internal systems), allowing them to: - Retrieve information - Send or modify data - Trigger actions or computations APIs can be: - **Local** (between software on the same machine) - **Remote** (between software across the internet β€” aka Web APIs) --- ## πŸ§ͺ Examples of APIs in Research | Use Case | Description | API Type | |---------------------------------------|--------------------------------------------------------------------------|---------------| | Dataset access from repositories | Query and download datasets from 4TU.ResearchData, Zenodo, Dataverse | Web API (REST)| | Publication automation | Submit and update research outputs programmatically | Web API (REST)| | Metadata enrichment | Add ORCID info, grant metadata from Crossref or Fundref | Web API (REST)| | Data analysis environments | Use APIs in R/Python to call scientific packages (e.g., NumPy, SciPy) | Local API | | Instrument control in laboratories | Programmatically access sensors and machines (via vendor SDKs/APIs) | Local API | | High-performance computing (HPC) jobs | Submit/monitor jobs using APIs like SLURM or workload managers | Local/Web API | | Linked data queries | Use SPARQL APIs to extract structured data from semantic repositories | Web API (SPARQL)| --- # πŸ”„ API vs Web API: What's the difference? APIs Can be **local or remote**, not limited to the web. Here are some examples of **local** APIs: - Python libraries (e.g., `os`, `math`) - Operating system APIs - Internal research data pipelines --- ## 🌐 Web API (Web-based API) - **A type of API** that uses **HTTP/S** to communicate over the **internet**. - Often follows **REST** or **GraphQL** conventions. - Enables **remote access to data and services**. --- ## πŸŽ“ Research Relevance - APIs help **structure and automate** research workflows. - Web APIs enable **integration, publication, and reuse** of research data. - Works great with tools like Python, R, Jupyter, and shell scripts. - Data repositories (e.g., **4TU.ResearchData**, **Zenodo**, **Figshare**) expose the published datasets via APIs. - Reduces manual work and encourages reproducibility. ## Examples - Uploading large files (also files stored on remote servers) - Automating regular upload of datasets from sensors using a workflow manager (a.g. Airflow) - Automating retrieval and integration of datasets, e.g. for a monitoring dashboard - Example : [Impact Aware Robotics Database](https://www.impact-aware-robotics-database.tue.nl/); [Dataset collection at 4TU.ResearchData Collection](https://data.4tu.nl/collections/c46f0d20-c62d-407b-8355-6737243d11c9) # 🌐 Introduction to Web APIs and REST ## πŸ” Types of Web APIs | Type | Protocol | Format | Best For | Example Use Cases | |----------|------------------|---------------|----------------------------------|---------------------------------------------| | **REST** | HTTP/HTTPS | JSON, XML | Public APIs, data sharing | 4TU.ResearchData, GitHub, Twitter | | **SOAP** | SOAP over HTTP | XML | Enterprise systems, high security| Banking, healthcare systems | | **GraphQL** | HTTP/HTTPS | JSON | Custom queries, efficiency | Facebook API, GitHub GraphQL | | **gRPC** | HTTP/2 | Protobuf | High performance, microservices | Google internal APIs, Kubernetes | --- ## βœ… Why REST is the most popular Web API style | Reason | Explanation | |--------------------|-----------------------------------------------------------------------------| | **Simplicity** | Easy to use with standard tools like `curl` or `requests` in Python | | **Uses HTTP** | Works with the same protocol as the web | | **Stateless** | Each request is independent; better for scalability | | **Human-readable** | Typically uses JSON β€” easy to debug and understand | | **Language-agnostic**| Works with all major programming languages | | **Community support**| Huge ecosystem and documentation | ## Understanding REST API methods ![](https://codimd-cdn.rs.tudelft.nl/codimd/uploads/upload_a32f8824102b6764190f61111043287d.jpg) *Source:* https://www.numpyninja.com/post/rest-api-for-dummies-explained-using-mommies There are five HTTP methods that you can use when making an API request: | Method | Description | |----------|----------------------------------------------------------| | `GET` | Retrieve data from the database/server. | | `POST` | Create a new record. | | `PUT` | Modify/replace the record. Replaces the entire record. | | `PATCH` | Modify/update the record. Replaces parts of the record. | | `DELETE` | Delete the record. | ### Anatomy of REST API request ![](https://codimd-cdn.rs.tudelft.nl/codimd/uploads/upload_a4993cefd3afd34f2b67044b3c3d2e64.png) *Source:* https://www.altexsoft.com/blog/rest-api-design/ Apart from the HTTP methods, you need a few other components to make the API request. The components are: | Component | Description | |---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | **HTTP method** | Specifies the action you want to perform (e.g., `GET`, `POST`, `PUT`, `DELETE`). | | **Endpoint** | A URL to locate the resource on the Internet. Consists of:<br>β€’ **Base URL** – the consistent part of the URL.<br>β€’ **Relative URL** – the specific reference to the resource. | | **Headers** | Provide information relevant to both the client and server. Often used for authentication or describing body content. [See full list of HTTP headers](https://en.wikipedia.org/wiki/List_of_HTTP_header_fields). | | **Body** | Contains data you want to send to the server (mainly used in `POST`, `PUT`, or `PATCH` requests). | :::info #### Passing parameters - **`GET`** request parameters are usually included in the endpoint URL. - **`PUT`** and **`POST`** methods accept parameters in the request body. ::: ### Common Status Codes & Errors Once you send the request to the server, you will receive a response with a status code. Here are some responses that you might see: #### βœ… 2xx Success | **Status Code** | **Description** | |---------------------|--------------------------------------------------------| | **200 OK** | βœ… The request has succeeded. | | **201 Created** | βœ… A new resource has been successfully created. | #### ⚠️ 4xx Client Errors | **Status Code** | **Description** | |---------------------|-------------------------------------------------------------------| | **400 Bad Request** | ⚠️ The server couldn't understand the request due to bad syntax. | | **401 Unauthorized**| ⚠️ Authentication is required or has failed. | | **403 Forbidden** | ❌ The client does not have access rights to the content. | | **404 Not Found** | ❌ The server cannot find the requested resource. | #### πŸ’₯ 5xx Server Errors | **Status Code** | **Description** | |-------------------------------|---------------------------------------------------------------------------------| | **500 Internal Server Error** | πŸ’₯ The server encountered an unexpected condition. | :::info #### TIP - **2xx** codes mean success (although this doesn't mean it did what you wanted it to do) - **4xx** errors are usually your fault (e.g., bad syntax, unauthorized access). - **5xx** errors mean something went wrong on the server. ::: See the full list of [HTTP response status codes](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status#client_error_responses). ### Authentication in REST APIs The 4TU.ResearchData Web API is publicβ€”anyone can access it for basic data retrieval. However, to perform certain actions (like uploading, editing, or deleting data), you need to authenticate yourself. As a general rule: >If you need to log in via the web interface to perform an action, you’ll also need an authentication token to do the same through the API. This is typically required when using methods that alter the database (e.g., POST, PUT, DELETE). In this workshop, we'll create and use a personal access token to authenticate our requests. Remember, it's a secret key, that you never want to share with the world! :::danger #### ⚠️ Important ⚠️ Never hard-code your token into scripts β€” especially if you plan to share them, even accidentally! ::: A better and safer way to store your token is by using an environment variable on your system. You can or place it in a `.env` file and store it in the root of the project. This file should never be shared or committed to version control. Example of an `.env` file: ```shell= API_TOKEN=your-secret-token-goes-here ``` To make sure you never accidentally upload your token to GitHub or another repository, add this to your `.gitignore`: ```shell= .env ``` ## Binder playground [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/leilaicruz/WebAPI-TUD_ME-workshop/HEAD?urlpath=%2Fdoc%2Ftree%2FLive-coding-notebook.ipynb) ## Collaborative notes Go to binder -> folder icon-> + -> new terminal ``` curl "https://data.4tu.nl/v2/articles" | jq ``` :::success ## API Documentation: `djehuty.4tu.nl` ::: List of 2 dataset published since May 2025: ``` curl "https://data.4tu.nl/v2/articles?limit=2&published_since=2025-05-01" | jq ``` Save date to a file : ``` curl "https://data.4tu.nl/v2/articles?limit=2&published_since=2025-05-01" >data.json ``` List 10 software types published since 2025: ``` curl "https://data.4tu.nl/v2/articles?limit=10&published_since=2025-01-01&item_type=9 ``` Search for articles by search term: ``` curl --request POST --header "Content-Type: application/json" --data '{ "search_for": "mechanical engineering" }' https://data.4tu.nl/v2/articles/search | jq ``` Look into details of specific dataset (copy the `uuid` value from one of the previous entries): ``` curl https://data.4tu.nl/v2/articles/1f9995d3-2fb3-4548-b889-fd27da3619a8 |jq ``` Creating a "API_TOKEN" file for save usage: ``` echo 'API_TOKEN="ENTER YOUR TOKEN HERE" > ~/.env ``` use the enviornment variable: ``` source .env ``` Find the datasets by author: ``` curl --request POST --header "Authorization: token API_TOKEN" --data '{ "search": "Aleksandra" }' https://next.data.4tu.nl/v2/account/authors/search | jq ``` ## Exercises :::info ### Exercise 1 Get 10 datasets published since January 1 2025. **Answer:** ``` curl "https://data.4tu.nl/v2/articles?limit=10&published_since=2025-01-01&item_type=3" | jq ``` ::: # Feedback ### what did you like about the session? ### what could be improved ? ### which follow up would you like for other sessions?