# Project Xanadu

## A Hybrid execution environment for Serverless Workloads

Execution environments are typically container oriented for FaaS platforms but this creates problems since containers are on one hand not totally secure and
on the other hand not capable of high-performance. This project looks into creating a hybrid execution environment to cater to different workload needs.

## Buy one for yourself!

Clone using "git clone --recursive https://git.cse.iitb.ac.in/synerg/xanadu"

## Architecture

Xanadu is divided into two extremely loosely coupled modules, the **Dispatch System (DS)** and the **Resource System (RS)** module. The RS looks after 
resource provisioning and consolidation at the host level while the DS looks after handling user requests and executing those requests at the requisite 
isolation level using resources provided by the RS. A loose architecture diagram of Xanadu is given below.

![Xanadu Architecture](design_documents/hybrid_serverless.png)

## Inter-component Communication Interface

The Dispatch Manager (DM) sends a request to the Resource Manager (RM), detailing what resources it needs, on the Kafka topic `REQUEST_DM_2_AM`.
```javascript
{
    "resource_id": "unique-transaction-id",
    "memory": 1024, // in MiB
    ... // Any other resources
}
```

The RM finds a list of nodes that will satisfy those resource demands and return it to the DM on the Kafka topic `RESPONSE_RM_2_DM`.
Format:
```javascript
{
    "resource_id": "unique-transaction-id",
//    "port": 2343 --- NOT IMPLEMENTED YET
    "grunts": ["a", "b", ...] // List of machine IDs
}
```

Once the runtime entity has been launched (or the launch has failed), the Executor sends back a status message on the `LOG_COMMON` topic.
```javascript
{
    "node_id"
    "resource_id"
    "function_id"
    "reason": "deployment"/"termination"
    "status": true/false // Only valid if reason==deployment
}
```

Instrumentation data is also sent on the `LOG_COMMON` topic. This data is sent from whichever part of the pipeline has access to the relevant information, 
and whoever needs the data is allowed to read it. Each message is required to have atleast three fields: `node_id`, `resource_id` and `function_id`.
 ```javascript
{ // Example message from Executor
    "node_id"
    "resource_id"
    "function_id"
    "cpu"
    "memory"
    "network"
}

{ // Example message from reverse proxy
    "node_id"
    "resource_id"
    "function_id"
    "average_fn_time"
}

{ // Example message from dispatch manager
    "node_id"
    "resource_id"
    "function_id"
    "coldstart_time" 
}
```
## Dispatch System (DS)

The DS is divided into two submodules the **Dispatch Manager** and the **Dispatch Daemon**. The Dispatcher runs on the Master node while the Dispatch Daemon 
runs on each Worker nodes. When a request arrives at the dispatcher, it queries the RM for resources and on receiving the resource requests the Dispatch Daemon 
to run and execute the function on the specified worker node.

### Directory Structure

```bash
.
├── constants.json
├── dispatch_daemon
│   ├── config.json
│   ├── execute.js
│   ├── index.js
│   ├── isolate.js
│   ├── lib.js
│   ├── local_repository
│   ├── package.json
│   └── package-lock.json
├── dispatcher
│   ├── index.js
│   ├── isolate.js
│   ├── lib.js
│   ├── package.json
│   ├── package-lock.json
│   └── repository
└── package-lock.json
```

### System Requirements

- Node.js (10.x and above)
- g++
- build-essential
- Docker
- Java
- Apache Kafka (Configure to allow auto-delete and auto-registration of topics)

### Starting the server

 After nodejs has been installed

- Install the dependencies: execute `npm install` from within the project folder
- Modify the constants.json file as required.
- For Worker nodes modify the config.json in dispatch_daemon to provide an unique ID to each node.
- Run the Master and Worker server as `npm start` or `node index.js`

### Internal Communication Interfaces

#### Dispatcher

Internally DM uses Apache Kafka for interaction between the Dispatcher and the Dispatch Agents, while the messages are in JSON format. 

Every Dispatch Agent listens on a topic which is its own UID (Currently the primary IP Address), the Dispatcher listens on the topics *"response"* and 
*"heartbeat"*.

- **Request Message:** When a request is received at the Dispatcher, it directs the Dispatch Agent to start a worker environment. A message is sent via the 
- chose Worker's ID topic. \
Format:

```javascript
{ type: "execute",
  function_id: "onetime unique ID",
  runtime: "isolation runtime",
  functionHash: "hash of the function to be run" }
```

- **Response Message:** In response, the worker executes the function, pulling resources from the central repository as required and sends a response. \
Format:

```javascript
{ status: 'success',
  result: 'result of the execution',
  function_id: 'onetime unique ID' }
```

- **Heartbeat Message:** The Dispatch Daemons also publish a periodic Heartbeat message back to the Dispatcher as a liveness test.\
Format:

```javascript
{ address: 'UID of the worker' }
```

### Interaction API

The platform works via a HTTP API based interface, the interface is divided into two parts:

- Deploy: The deploy interface is used to upload the function file and store on the server, and also setup containers and VM images.\
An example CURL command:

```bash
curl -X POST \
  http://localhost:8080/serverless/deploy \
  -H 'Content-Type: application/x-www-form-urlencoded' \
  -H 'cache-control: no-cache' \
  -H 'content-type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW' \
  -F runtime=container \
  -F serverless=@/home/nilanjan/Desktop/serverless/hybrid/test/script.js
```

The POST request contains two parameters: 1. *runtime* which specifies the runtime to use viz. isolate, process, container or virtual machine and 2. *severless* which sends the serverless function as file via multipart/form-data.\
On successful deployment the API returns a function key which is to be for function execution.

- Execute: To execute the submitted function, we use the Execute API.\
 An example CURL command:

 ```bash
 curl -X POST \
  http://localhost:8080/serverless/execute/761eec785d64451203293427bea5c7ad \
  -H 'cache-control: no-cache' \
  -H 'content-type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW' \
  -F runtime=process
 ```

The API takes a route value as the key returned by the deploy API and a runtime parameter specifying the runtime to be used.

## Resource System

### Dependencies

1. **clang**: version 9.0
2. **librdkafka**: version 0.11.6

### Internal Messages

Upon being launched, each Resource Daemon (RD) sends a JOIN message to the RM on the Kafka topic `JOIN_RD_2_RM`.

```javascript
{
    "node_id": "unique-machine-id",
}
```

After this, RDs send a heartbeat message to the RM periodically on topic `HEARTBEAT_RD_2_RM`. These messages contain the current state of all the 
resources being tracked by RDs on each machine. This data is cached by the RM.

```javascript
{
    "node_id": "unique-machine-id",
    "memory": 1024, // in MiB
    ... // Any other resources
}
```

The RM, upon recieving the request from the DM, checks its local cache to find a suitable machine. If it finds some, it sends a message back to the 
DM on topic `RESPONSE_RM_2_DM`.
```javascript
{
    "resource_id": "unique-transaction-id",
//    "port": 2343 --- NOT IMPLEMENTED YET
    "nodes": ["a", "b", ...] // List of unique machine IDs
}
```

If, on the other hand, the RM can't find any such machine in its cache, it sends a message to all the RDs requesting their current status. This message 
is posted on the topic `REQUEST_RM_2_RD`.
Format:
```javascript
{
    "resource_id": "unique-transaction-id",
    "memory": 1024, // in MiB
    ... // Any other resources
}
```

The RDs recieve this message and send back whether on not they satisfy the constraints on topic `RESPONSE_RD_2_RM`.
```javascript
{
    "node_id": "unique-machine-id",
    "resource_id": "unique-transaction-id",
    "success" : 0/1 // 0 = fail, 1 = success
}
```

The RM waits for a certain amount of time for the RDs; then, it sends a list of however many RDs have replied affirmatively to the DM on topic 
`RESPONSE_RM_2_DM`, as described above.
