Tide under the microscope, the secret life of requests

Hi there! Welcome back, It has been some time. This will be a different kind of post since we will looking under the hood to learn and understand how tide works, for this purpose we will examine the life cycle of a request.

Tide is a modular web framework, that means is built by the composition of different modules (crates to be correct) that cooperate to give the users the features that expect in a web framework ( e.g. listeners, routing, extraction and more ).

Setup

Minimal example

So, let's start digging Tide's design by following the request and to do that we can create a minimal application.

#[async_std::main]
async fn main() -> tide::Result<()> {
    let mut app = tide::new();
    app.at("/").get( |_req| async {
        Ok("Hi there!")
    });
    app.listen("127.0.0.1:8080").await?;
    Ok(())
}

And check the response

$ curl localhost:8080
Hi there!

Great! we have our minimal application working.

We have a server that is listening for connections on port 8080, accepting http requests and producing responses.

Expanding the main macro

Let's now start to examine the building blocks, first you may notice the #[async_std::main] macro, that allow us to write our main function as async. If we expand the macro we check how the code looks after expansion

#![feature(prelude_import)]
#[prelude_import]
use std::prelude::v1::*;
#[macro_use]
extern crate std;
fn main() -> tide::Result<()> {
    async fn main() -> tide::Result<()> {
        {
            let mut app = tide::new();
            app.at("/").get(|_req| async { Ok("Hi there!") });
            app.listen("127.0.0.1:8080").await?;
            Ok(())
        }
    }
    async_std::task::block_on(async { main().await })
}

We can see that our main function is wrapped inside another not async main function that run our code inside a an async task blocking the current thread.

Creating the app

Back to our code, inside of our main fn we are creating a new tide application.

let mut app = tide::new();

We call app but the actual type is Server since new return a Server

// lib.rs
#[must_use]
pub fn new() -> server::Server<()> {
    Server::new()
}

And Servers are built up as a combination of state, endpoints and middleware

// server.rs
pub struct Server<State> {
    router: Arc<Router<State>>,
    state: State,
    (...)
    #[allow(clippy::rc_buffer)]
    middleware: Arc<Vec<Arc<dyn Middleware<State>>>>,
}

Where:

State is defined by users and tide make it available as shared reference in each request.
router the server's routing table, used behind and Arc.
middleware, allow users to extend the default behavior in both input (request) and output (response) direction. This field in particular holds a vector behind an Arc.

We will talk about the State and Middlewares in the next post, but we will focus later on how the routing decision is made based on the routing table.

Adding routes

Our next line make a couple of things changing the at and get methods.

app.at("/").get(|_req| async { Ok("Hi there!") });

The at function allow users to add a new route (at a given path)to the router and return the created Route allowing the chaining.

( You can read the official path and segment definition in the tide server module documentation) The path (e.g /hello/:name) is composed by zero or many segments, each segment represent a non empty string separated by / in the path. There are two kind of segments, concreate and wilfcard

Contreate: match exactly with the part of the path ( e.g /hello)
Wildcard: extracts and parses the respective part of the path of the incoming request to pass it along to the endpoint as an argument. Wildcards segments have also different alternnatives:
- named (e.g /:name ) that create an endpoint parameter called name.
- optional ( /*:name) will match to the end of given path, no matter how many segments are left, even nothing.
- unnamed (e.g /:) name of the parameter can be omitted to define a path that matches the required structure, but where the parameters are not required : will match a segment, and * will match an entire path.

As we say before, the at method return a new Route and if we look the definition of Route

// route.rs
pub struct Route<'a, State> {
    router: &'a mut Router<State>,
    path: String,
    middleware: Vec<Arc<dyn Middleware<State>>>,
    prefix: bool,
}

The route holds a ref of the router, have a path and a vector of middlewares to apply. Also, there is a prefix flag used to decide if strip_prefix should be applyed or not.

But, in our example we use the get method to set the endpoint (or in our case the closure to execute when the request arrive). Let's check that method.

    /// Add an endpoint for `GET` requests
    pub fn get(&mut self, ep: impl Endpoint<State>) -> &mut Self {
        self.method(http_types::Method::Get, ep);
        self

Awesome, tide provides methods for each http verb ( e.g get, post, put, etc) that internally call the method method with the correct http method type as argument.

Until now we were always looking the code in the tide source code, but now this methods are using the http-types dependency. This crate provides shared types for common HTTP operations.

Let's also looks how the method function looks like

    pub fn method(&mut self, method: http_types::Method, ep: impl Endpoint<State>) -> &mut Self {
        if self.prefix {
            let ep = StripPrefixEndpoint::new(ep);

            self.router.add(
                &self.path,
                method,
                MiddlewareEndpoint::wrap_with_middleware(ep.clone(), &self.middleware),
            );
            let wildcard = self.at("*--tide-path-rest");
            wildcard.router.add(
                &wildcard.path,
                method,
                MiddlewareEndpoint::wrap_with_middleware(ep, &wildcard.middleware),
            );
        } else {
            self.router.add(
                &self.path,
                method,
                MiddlewareEndpoint::wrap_with_middleware(ep, &self.middleware),
            );
        }
        self

For now let's focus on the else part, since we don't need to strip any prefix. This function is adding the route definition (a path, http verb and endpoint) to the router, but is wrapping the endpoint with the middlewares that should be executed. Also, notice that is returning Self (a Route allowing to chaining with other methods).

Great! We already setup our server (a.k.a app). At the moment we define a route that:

should match at /hello/:name path and the http get verb.
should run the defined endpoint, a clousure in our case.

But we are not listening any connection yet, let take a look how tide allow us to listen.

Listening

Next line in our example app is

   app.listen("127.0.0.1:8080").await?;

This line set the listener and start listen to incomming connections by awaiting (remember that futures are lazy in rust). Let's take a look of the listen method.

    pub async fn listen<L: ToListener<State>>(self, listener: L) -> io::Result<()> {
       let mut listener = listener.to_listener()?;
       listener.bind(self).await?;
       for info in listener.info().iter() {
           log::info!("Server listening on {}", info);
       }
       listener.accept().await?;
       Ok(())

Tide have the concept of listener that is implemented as an async trait that represent an http transport, an build using the to_listener implementation. Out of the box tide provide a tcp listener and a unix socket listener, but you can create your owns.

#[async_trait]
pub trait Listener<State>: Debug + Display + Send + Sync + 'static
where
   State: Send + Sync + 'static,
{
   async fn bind(&mut self, app: Server<State>) -> io::Result<()>;

   async fn accept(&mut self) -> io::Result<()>;

   fn info(&self) -> Vec<ListenInfo>;
}

The listen fn then call the bind method of the listener that start the listening process by opening the neccessary networks ports. At this points the ports are open but not accepting connection yet, for that the listen method call the accept method of the listener.

Awesome! now we are running our app and listening for networks connections, we can easy check that using netstat command.

$ netstat -nal| grep 8080
tcp4       0      0  127.0.0.1.8080         *.*                    LISTEN

Examine

Follow the trace

Now that we have the setup in place and our application running we can start review the life of a request. Let's start with a simple test

curl -v localhost:8080/
*   Trying ::1...
* TCP_NODELAY set
* Connection failed
* connect to ::1 port 8080 failed: Connection refused
*   Trying fe80::1...
* TCP_NODELAY set
* Connection failed
* connect to fe80::1 port 8080 failed: Connection refused
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET / HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 200 OK
< content-length: 9
< content-type: text/plain;charset=utf-8
< date: Sun, 07 Mar 2021 15:50:13 GMT
<
* Connection #0 to host localhost left intact
Hi there!

Lot's of things happens before get the Hi there! response, let's dive in...

First, we want to add the logger and set the debug level to Trace

   tide::log::with_level(tide::log::LevelFilter::Trace);

Let's run the app again and make the test request to see the log (leaving the async_io and polling outside).

tide::log::middleware <-- Request received
   method GET
   path /
tide::log::middleware --> Response sent
   method GET
   path /
   status 200 - OK
   duration 76.45µs
async_h1::server wrote 124 response bytes
async_h1::server discarded 0 unread request body bytes

So, we can see the logs from the middleware and also from async_h1, and this is another dep crate used to parse HTTP 1.1. And this is something to note now, tide currently support only HTTP 1.1 at this moment.

// HTTP 1
curl -v -0 localhost:8080/
*   Trying ::1...
* TCP_NODELAY set
* Connection failed
* connect to ::1 port 8080 failed: Connection refused
*   Trying fe80::1...
* TCP_NODELAY set
* Connection failed
* connect to fe80::1 port 8080 failed: Connection refused
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET / HTTP/1.0
> Host: localhost:8080
> User-Agent: curl/7.54.0
> Accept: */*
>
* Empty reply from server
* Connection #0 to host localhost left intact
curl: (52) Empty reply from server

Going deeper

Now is time to examine how the connection is stablished and follow the path from the listener to the endpoint.

First, going back to our listener (a tcp listener in our case), remember that we need to call accept in order to start accepting connections, so let's take a look there to see the behavior.

// tcp_listener.rs
    async fn accept(&mut self) -> io::Result<()> {
        let server = self
            .server
            .take()
            .expect("`Listener::bind` must be called before `Listener::accept`");
        let listener = self
            .listener
            .take()
            .expect("`Listener::bind` must be called before `Listener::accept`");

        let mut incoming = listener.incoming();

        while let Some(stream) = incoming.next().await {
            match stream {
                Err(ref e) if is_transient_error(e) => continue,
                Err(error) => {
                    let delay = std::time::Duration::from_millis(500);
                    crate::log::error!("Error: {}. Pausing for {:?}.", error, delay);
                    task::sleep(delay).await;
                    continue;
                }

                Ok(stream) => {
                    handle_tcp(server.clone(), stream);
                }
            };
        }
        Ok(())
    }

listener.incomming return us an stream that we can then loop calling next to handle each connection calling handle_tcp with the server and the stream.

// tcp_listener.rs
fn handle_tcp<State: Clone + Send + Sync + 'static>(app: Server<State>, stream: TcpStream) {
    task::spawn(async move {
        let local_addr = stream.local_addr().ok();
        let peer_addr = stream.peer_addr().ok();

        let fut = async_h1::accept(stream, |mut req| async {
            req.set_local_addr(local_addr);
            req.set_peer_addr(peer_addr);
            app.respond(req).await
        });

        if let Err(error) = fut.await {
            log::error!("async-h1 error", { error: error.to_string() });
        }
    });
}

This spawn a new async task and inside that task call async_h1.accept (the http parser) with the stream to parse and a clousure to execute. So, let's follow this request to see how the parser handle it

// async_h1
pub async fn accept<RW, F, Fut>(io: RW, endpoint: F) -> http_types::Result<()>
where
    RW: Read + Write + Clone + Send + Sync + Unpin + 'static,
    F: Fn(Request) -> Fut,
    Fut: Future<Output = http_types::Result<Response>>,
{
    Server::new(io, endpoint).accept().await
}

Internally, async_h1 create a new instance of Server with the io stream and the endpoint. Then is calling the accept method of the server and return the future.

The accept method just loop while the connetion keep alive calling accept_one.

    pub async fn accept(&mut self) -> http_types::Result<()> {
        while ConnectionStatus::KeepAlive == self.accept_one().await? {}
        Ok(())
    }

And accept_one method is the one that decode the incomming request, read the body and parse the headers.

Pass the request to the endpoint and encode and write the response.

    (...)
    let mut res = (self.endpoint)(req).await?;

    let bytes_written = io::copy(&mut encoder, &mut self.io).await?;
    log::trace!("wrote {} response bytes", bytes_written);
    (...)

Nice! we follow all the path from accepting the connection, decoding, calling endpoint, encoding and writing the response.

We can now go depper and follow the clousure...

One level further

After decoding and parsing the headers, the clousure passed to async_h1 is executed

// tcp_listener.rs

(...)
        let fut = async_h1::accept(stream, |mut req| async {
            req.set_local_addr(local_addr);
            req.set_peer_addr(peer_addr);
            app.respond(req).await
        })

Now is time to go deeper into the respond method and see how this request is proccessed inside tide.

    pub async fn respond<Req, Res>(&self, req: Req) -> http_types::Result<Res>
    where
        Req: Into<http_types::Request>,
        Res: From<http_types::Response>,
    {
        let req = req.into();
        let Self {
            router,
            state,
            middleware,
        } = self.clone();

        let method = req.method().to_owned();
        let Selection { endpoint, params } = router.route(&req.url().path(), method);
        let route_params = vec![params];
        let req = Request::new(state, req, route_params);

        let next = Next {
            endpoint,
            next_middleware: &middleware,
        };

        let res = next.run(req).await;
        let res: http_types::Response = res.into();
        Ok(res.into())
    }

respond recive a request, first need to figure out which endpoint need to be called based on the path and method of the request.

The router route method tries different strategies to select the endpoint that should be used and if no one matching the request a 404 endpoint is called to return NOT FOUND to the client.

    pub(crate) fn route(&self, path: &str, method: http_types::Method) -> Selection<'_, State> {
        if let Some(Match { handler, params }) = self
            .method_map
            .get(&method)
            .and_then(|r| r.recognize(path).ok())
        {
            Selection {
                endpoint: &**handler,
                params,
            }
        } else if let Ok(Match { handler, params }) = self.all_method_router.recognize(path) {
            Selection {
                endpoint: &**handler,
                params,
            }
        } else if method == http_types::Method::Head {
            // If it is a HTTP HEAD request then check if there is a callback in the endpoints map
            // if not then fallback to the behavior of HTTP GET else proceed as usual

            self.route(path, http_types::Method::Get)
        } else if self
            .method_map
            .iter()
            .filter(|(k, _)| **k != method)
            .any(|(_, r)| r.recognize(path).is_ok())
        {
            // If this `path` can be handled by a callback registered with a different HTTP method
            // should return 405 Method Not Allowed
            Selection {
                endpoint: &method_not_allowed,
                params: Params::new(),
            }
        } else {
            Selection {
                endpoint: &not_found_endpoint,
                params: Params::new(),
            }
        }
    }
}

Once we have the best match endpoint, the middleware use the Next struct to drive the execution, including the actual endpoint, and call run to start proccessing.

// middleware.rs
impl<State: Clone + Send + Sync + 'static> Next<'_, State> {
    /// Asynchronously execute the remaining middleware chain.
    pub async fn run(mut self, req: Request<State>) -> Response {
        if let Some((current, next)) = self.next_middleware.split_first() {
            self.next_middleware = next;
            match current.handle(req, self).await {
                Ok(request) => request,
                Err(err) => err.into(),
            }
        } else {
            match self.endpoint.call(req).await {
                Ok(request) => request,
                Err(err) => err.into(),
            }
        }
    }
}

Notice that we are using handle to execute the middlewares and call to run the endpoint, that is because the middleware receive also the next as argument, so can continue calling the next middleware or break the chain with a response.

#[async_trait]
pub trait Middleware<State>: Send + Sync + 'static {
    /// Asynchronously handle the request, and return a response.
    async fn handle(&self, request: Request<State>, next: Next<'_, State>) -> crate::Result;

    /// Set the middleware's name. By default it uses the type signature.
    fn name(&self) -> &str {
        std::any::type_name::<Self>()
    }
}

Awesome! we follow the request until we call the endpoint.

// endpoint.rs

#[async_trait]
pub trait Endpoint<State: Clone + Send + Sync + 'static>: Send + Sync + 'static {
    /// Invoke the endpoint within the given context
    async fn call(&self, req: Request<State>) -> crate::Result;
}

Now the response is send to the client!

That's all for today, we follow the code ( an crates ) that tide us to accept connections, decode and parse the request, decide the best endpoint (route) to use, execute the middleware chain and call the endpoint. There are still lots of topics to cover like body parsing, parameter extraction and middleware execution in both directions ( input/output). In the next notes we will start covering some of those topics.

As always, I write this as a learning journal and there could be errors or misunderstandings and any feedback is welcome.

Thanks!