Rust’s ownership system makes it easy and safe to create a zero-copy parser that takes a slice of bytes as input and outputs some structure containing references to the original input. Rust ensures that such references exist only while the underlying slice cannot be mutated.

As a concrete example say we have a &[u8] containing “3foo3bar3baz4quux” and want to parse it into vec![“foo”, “bar”, “baz”, “quux”]. This is easily accomplished by defining a couple of nom parser combinators:


named!(strings<Vec<&str>>,
       many0!(map_res!(length_value!(ascii_num, rest), str::from_utf8))
);

named!(ascii_num<usize>,
       map_res!(map_res!(take_while!(is_digit), str::from_utf8), usize::from_str)
);

fn main() {
    let input  = b"3foo3bar3baz4quux";
    let expect = vec!["foo", "bar", "baz", "quux"];
    let output = strings(input).unwrap().1;
    assert_eq!(expect, output);
}

In real-world use the input slice may contain only partial data, for example “3foo3bar3baz4q”, in which case the parser will return IResult::Incomplete. Or it may contain multiple messages, e.g. “3foo3bar 3baz4quux”, and the parser will return the parsed results plus the remaining bytes.

Buffers

If we’re reading data from the network into a fixed-size buffer which is passed to the parser then we must copy any partial or remaining bytes somewhere else before the next read overwrites them. When more data is received it can be appended to the existing data and passed to the parser again.

Copying is expensive so we should parse directly from the input buffer whenever possible and only copy when there is existing data that the input must be appended to. Here is a Buffer type containing a Vec to store these partial or remaining bytes:


pub struct Buffer {
    vec: Vec<u8>
}

impl Buffer {
    pub fn new() -> Buffer {
        Buffer {
            vec: Vec::new(),
        }
    }

    pub fn buf<'a: 'b, 'b>(&'a mut self, more: &'b [u8]) -> Buf<'b> {
        if self.vec.is_empty() {
            Buf::Empty(&mut self.vec, more)
        } else {
            self.vec.extend_from_slice(more);
            Buf::Some(&mut self.vec)
        }
    }
}

The buf(..) method is called with a reference to the input buffer and returns a Buf that can be passed to the parser as a &[u8] via the Deref trait. The lifetimes (<‘a: ‘b, ‘b>) are a bit gnarly because the compiler must be told that the returned Buf has the same lifetime as the input buffer which may be shorter than the lifetime of the Buffer.

When no partial or remaining bytes have been buffered the Buf simply dereferences to the input buffer directly. However when the internal buffer is not empty the input buffer is appended to it and the Buf dereferences to that larger buffer.


use std::ops::Deref;

pub enum Buf<'a> {
    Empty(&'a mut Vec<u8>, &'a [u8]),
    Some(&'a mut Vec<u8>),
}


impl<'a> Buf<'a> {
    pub fn keep(&mut self, n: usize) {
        match *self {
            Buf::Empty(ref mut vec, more) => {
                let n = more.len() - n;
                vec.extend_from_slice(&more[n..]);
            },
            Buf::Some(ref mut vec) => {
                let n = vec.len() - n;
                vec.drain(..n);
            },
        }
    }
}

impl<'a> Deref for Buf<'a> {
    type Target = [u8];
    fn deref(&self) -> &[u8] {
        match *self {
            Buf::Empty(_, more) => more,
            Buf::Some(ref vec)  => &vec[..],
        }
    }
}

When parsing is complete the keep(..) method of Buf is called with the number of bytes that have not been consumed. Those bytes are retained in the internal buffer for use later.

Example

Here is an example parse function that uses Buffer:


fn parse(buffer: &mut Buffer, b: &[u8]) -> Option<Vec<String>> {
    let mut buf = buffer.buf(b);
    let mut res = None;
    let mut len = buf.len();

    if let IResult::Done(rest, vec) = strings(&buf[..]) {
        res = Some(vec.into_iter().map(str::to_owned).collect());
        len = rest.len();
    }
    buf.keep(len);

    res
}

#[test]
fn test_partial() {
    let mut buffer = Buffer::new();
    let input  = b"3foo3bar3baz4q";
    let expect = vec!["foo", "bar", "baz", "quux"];

    let res = parse(&mut buffer, input);
    assert_eq!(None, res);

    let res = parse(&mut buffer, b"uux").unwrap();
    assert_eq!(expect, res);
}

Note that parse returns an optional Vec of String not &str. The lifetime of the return value is longer than the lifetime of the Buf so a copy is necessary. Additionally the call to buf.keep(..) may shrink the buffer, invalidating any references to its contents.