Filename: 174-optimistic-data-server.txt
Title: Optimistic Data for Tor: Server Side
Author: Ian Goldberg
Created: 2-Aug-2010
Status: Closed
Implemented-In: 0.2.3.1-alpha

Overview:

When a SOCKS client opens a TCP connection through Tor (for an HTTP
request, for example), the query latency is about 1.5x higher than it
needs to be.  Simply, the problem is that the sequence of data flows
is this:

1. The SOCKS client opens a TCP connection to the OP
2. The SOCKS client sends a SOCKS CONNECT command
3. The OP sends a BEGIN cell to the Exit
4. The Exit opens a TCP connection to the Server
5. The Exit returns a CONNECTED cell to the OP
6. The OP returns a SOCKS CONNECTED notification to the SOCKS client
7. The SOCKS client sends some data (the GET request, for example)
8. The OP sends a DATA cell to the Exit
9. The Exit sends the GET to the server
10. The Server returns the HTTP result to the Exit
11. The Exit sends the DATA cells to the OP
12. The OP returns the HTTP result to the SOCKS client

Note that the Exit node knows that the connection to the Server was
successful at the end of step 4, but is unable to send the HTTP query to
the server until step 9.

This proposal (as well as its upcoming sibling concerning the client
side) aims to reduce the latency by allowing:
1. SOCKS clients to optimistically send data before they are notified
    that the SOCKS connection has completed successfully
2. OPs to optimistically send DATA cells on streams in the CONNECT_WAIT
    state
3. Exit nodes to accept and queue DATA cells while in the
    EXIT_CONN_STATE_CONNECTING state

This particular proposal deals with #3.

In this way, the flow would be as follows:

1. The SOCKS client opens a TCP connection to the OP
2. The SOCKS client sends a SOCKS CONNECT command, followed immediately
    by data (such as the GET request)
3. The OP sends a BEGIN cell to the Exit, followed immediately by DATA
    cells
4. The Exit opens a TCP connection to the Server
5. The Exit returns a CONNECTED cell to the OP, and sends the queued GET
    request to the Server
6. The OP returns a SOCKS CONNECTED notification to the SOCKS client,
    and the Server returns the HTTP result to the Exit
7. The Exit sends the DATA cells to the OP
8. The OP returns the HTTP result to the SOCKS client

Motivation:

This change will save one OP<->Exit round trip (down to one from two).
There are still two SOCKS Client<->OP round trips (negligible time) and
two Exit<->Server round trips.  Depending on the ratio of the
Exit<->Server (Internet) RTT to the OP<->Exit (Tor) RTT, this will
decrease the latency by 25 to 50 percent.  Experiments validate these
predictions. [Goldberg, PETS 2010 rump session; see
https://thunk.cs.uwaterloo.ca/optimistic-data-pets2010-rump.pdf ]

Design:

The current code actually correctly handles queued data at the Exit; if
there is queued data in a EXIT_CONN_STATE_CONNECTING stream, that data
will be immediately sent when the connection succeeds.  If the
connection fails, the data will be correctly ignored and freed.  The
problem with the current server code is that the server currently
drops DATA cells on streams in the EXIT_CONN_STATE_CONNECTING state.
Also, if you try to queue data in the EXIT_CONN_STATE_RESOLVING state,
bad things happen because streams in that state don't yet have
conn->write_event set, and so some existing sanity checks (any stream
with queued data is at least potentially writable) are no longer sound.

The solution is to simply not drop received DATA cells while in the
EXIT_CONN_STATE_CONNECTING state.  Also do not send SENDME cells in this
state, so that the OP cannot send more than one window's worth of data
to be queued at the Exit.  Finally, patch the sanity checks so that
streams in the EXIT_CONN_STATE_RESOLVING state that have buffered data
can pass.

If no clients ever send such optimistic data, the new code will never be
executed, and the behaviour of Tor will not change.  When clients begin
to send optimistic data, the performance of those clients' streams will
improve.

After discussion with nickm, it seems best to just have the server
version number be the indicator of whether a particular Exit supports
optimistic data.  (If a client sends optimistic data to an Exit which
does not support it, the data will be dropped, and the client's request
will fail to complete.)  What do version numbers for hypothetical future
protocol-compatible implementations look like, though?

Security implications:

Servers (for sure the Exit, and possibly others, by watching the
pattern of packets) will be able to tell that a particular client
is using optimistic data.  This will be discussed more in the sibling
proposal.

On the Exit side, servers will be queueing a little bit extra data, but
no more than one window.  Clients today can cause Exits to queue that
much data anyway, simply by establishing a Tor connection to a slow
machine, and sending one window of data.

Specification:

tor-spec section 6.2 currently says:

    The OP waits for a RELAY_CONNECTED cell before sending any data.
    Once a connection has been established, the OP and exit node
    package stream data in RELAY_DATA cells, and upon receiving such
    cells, echo their contents to the corresponding TCP stream.
    RELAY_DATA cells sent to unrecognized streams are dropped.

It is not clear exactly what an "unrecognized" stream is, but this last
sentence would be changed to say that RELAY_DATA cells received on a
stream that has processed a RELAY_BEGIN cell and has not yet issued a
RELAY_END or a RELAY_CONNECTED cell are queued; that queue is processed
immediately after a RELAY_CONNECTED cell is issued for the stream, or
freed after a RELAY_END cell is issued for the stream.

The earlier part of this section will be addressed in the sibling
proposal.

Compatibility:

There are compatibility issues, as mentioned above.  OPs MUST NOT send
optimistic data to Exit nodes whose version numbers predate (something).
OPs MAY send optimistic data to Exit nodes whose version numbers match
or follow that value.  (But see the question about independent server
reimplementations, above.)

Implementation:

Here is a simple patch.  It seems to work with both regular streams and
hidden services, but there may be other corner cases I'm not aware of.
(Do streams used for directory fetches, hidden services, etc. take a
different code path?)

diff --git a/src/or/connection.c b/src/or/connection.c
index 7b1493b..f80cd6e 100644
--- a/src/or/connection.c
+++ b/src/or/connection.c
@@ -2845,7 +2845,13 @@ _connection_write_to_buf_impl(const char *string, size_t len,
     return;
   }
 
-  connection_start_writing(conn);
+  /* If we receive optimistic data in the EXIT_CONN_STATE_RESOLVING
+   * state, we don't want to try to write it right away, since
+   * conn->write_event won't be set yet.  Otherwise, write data from
+   * this conn as the socket is available. */
+  if (conn->state != EXIT_CONN_STATE_RESOLVING) {
+      connection_start_writing(conn);
+  }
   if (zlib) {
     conn->outbuf_flushlen += buf_datalen(conn->outbuf) - old_datalen;
   } else {
@@ -3382,7 +3388,11 @@ assert_connection_ok(connection_t *conn, time_t now)
     tor_assert(conn->s < 0);
 
   if (conn->outbuf_flushlen > 0) {
-    tor_assert(connection_is_writing(conn) || conn->write_blocked_on_bw ||
+    /* With optimistic data, we may have queued data in
+     * EXIT_CONN_STATE_RESOLVING while the conn is not yet marked to writing.
+     * */
+    tor_assert(conn->state == EXIT_CONN_STATE_RESOLVING ||
+	    connection_is_writing(conn) || conn->write_blocked_on_bw ||
             (CONN_IS_EDGE(conn) && TO_EDGE_CONN(conn)->edge_blocked_on_circ));
   }
 
diff --git a/src/or/relay.c b/src/or/relay.c
index fab2d88..e45ff70 100644
--- a/src/or/relay.c
+++ b/src/or/relay.c
@@ -1019,6 +1019,9 @@ connection_edge_process_relay_cell(cell_t *cell, circuit_t *circ,
   relay_header_t rh;
   unsigned domain = layer_hint?LD_APP:LD_EXIT;
   int reason;
+  int optimistic_data = 0;  /* Set to 1 if we receive data on a stream
+			       that's in the EXIT_CONN_STATE_RESOLVING
+			       or EXIT_CONN_STATE_CONNECTING states.*/
 
   tor_assert(cell);
   tor_assert(circ);
@@ -1038,9 +1041,20 @@ connection_edge_process_relay_cell(cell_t *cell, circuit_t *circ,
   /* either conn is NULL, in which case we've got a control cell, or else
    * conn points to the recognized stream. */
 
-  if (conn && !connection_state_is_open(TO_CONN(conn)))
-    return connection_edge_process_relay_cell_not_open(
-             &rh, cell, circ, conn, layer_hint);
+  if (conn && !connection_state_is_open(TO_CONN(conn))) {
+    if ((conn->_base.state == EXIT_CONN_STATE_CONNECTING ||
+	    conn->_base.state == EXIT_CONN_STATE_RESOLVING) &&
+	rh.command == RELAY_COMMAND_DATA) {
+	/* We're going to allow DATA cells to be delivered to an exit
+	 * node in state EXIT_CONN_STATE_CONNECTING or
+	 * EXIT_CONN_STATE_RESOLVING.  This speeds up HTTP, for example. */
+	log_warn(domain, "Optimistic data received.");
+	optimistic_data = 1;
+    } else {
+	return connection_edge_process_relay_cell_not_open(
+		 &rh, cell, circ, conn, layer_hint);
+    }
+  }
 
   switch (rh.command) {
     case RELAY_COMMAND_DROP:
@@ -1090,7 +1104,9 @@ connection_edge_process_relay_cell(cell_t *cell, circuit_t *circ,
       log_debug(domain,"circ deliver_window now %d.", layer_hint ?
                 layer_hint->deliver_window : circ->deliver_window);
 
-      circuit_consider_sending_sendme(circ, layer_hint);
+      if (!optimistic_data) {
+	  circuit_consider_sending_sendme(circ, layer_hint);
+      }
 
       if (!conn) {
         log_info(domain,"data cell dropped, unknown stream (streamid %d).",
@@ -1107,7 +1123,9 @@ connection_edge_process_relay_cell(cell_t *cell, circuit_t *circ,
       stats_n_data_bytes_received += rh.length;
       connection_write_to_buf(cell->payload + RELAY_HEADER_SIZE,
                               rh.length, TO_CONN(conn));
-      connection_edge_consider_sending_sendme(conn);
+      if (!optimistic_data) {
+	  connection_edge_consider_sending_sendme(conn);
+      }
       return 0;
     case RELAY_COMMAND_END:
       reason = rh.length > 0 ?

Performance and scalability notes:

There may be more RAM used at Exit nodes, as mentioned above, but it is
transient.