We use pipe in our program and face a new problem: it fail when we try to write 16MB data into a pipe in one time. Looks pipe has a limited size. But what exactly the size is? After searching on the web, the answers are not inconsistent, some say it’s 16KB and others say it’s 64KB. Therefore I have to watch kernel code by myself to find the correct answer.
Since all the servers in my company is using ali_kernel, which is based on 2.6.32 centos kernel, I find the original routine of codes:
sys_pipe() --> sys_pipe2() --> do_pipe_flags() --> create_write_pipe():
struct file *create_write_pipe(int flags)
{
......
path.dentry->d_flags &= ~DCACHE_UNHASHED;
d_instantiate(path.dentry, inode);
err = -ENFILE;
f = alloc_file(&path, FMODE_WRITE, &write_pipefifo_fops);
if (!f)
goto err_dentry;
f->f_mapping = inode->i_mapping;
......
Looks all the operations to the pipe about write are managed by “write_pipefifio_fops”. Let’s get in:
const struct file_operations write_pipefifo_fops = {
.llseek = no_llseek,
.read = bad_pipe_r,
.write = do_sync_write,
.aio_write = pipe_write,
.poll = pipe_poll,
.unlocked_ioctl = pipe_ioctl,
.open = pipe_write_open,
.release = pipe_write_release,
.fasync = pipe_write_fasync,
};
Clearly, pipe_write() is responsed for writting. Keep going.
static ssize_t
pipe_write(struct kiocb *iocb, const struct iovec *_iov,
unsigned long nr_segs, loff_t ppos)
{
......
for (;;) {
int bufs;
if (!pipe->readers) {
send_sig(SIGPIPE, current, 0);
if (!ret)
ret = -EPIPE;
break;
}
bufs = pipe->nrbufs;
if (bufs < PIPE_BUFFERS) {
int newbuf = (pipe->curbuf + bufs) & (PIPE_BUFFERS-1);
struct pipe_buffer *buf = pipe->bufs + newbuf;
struct page *page = pipe->tmp_page;
char *src;
int error, atomic = 1;
if (!page) {
page = alloc_page(GFP_HIGHUSER);
if (unlikely(!page)) {
ret = ret ? : -ENOMEM;
break;
}
pipe->tmp_page = page;
}
......
pipe->nrbufs = ++bufs;
pipe->tmp_page = NULL;
total_len -= chars;
if (!total_len)
break;
}
......
pipe_wait(pipe);
......
As above, kernel will allocate a page if new operation of write comes and pipe has not enough space. Every time it add a page, it increase the ‘pipe->nrbufs’, and if the ‘nrbufs’ is great than PIPE_BUFFERS, the routine will be blocked, which means the system-call of write() will be waiting. The ‘PIPE_BUFFERS’ is setted to 16, and a page in linux kernel is 4KB, so a pipe in ali_kernel can store 64KB (16 * 4KB) data at one time.
This condition has changed since kernel version of 3.6.35, which add a new proc entry in ‘/proc/sys/fs/pipe-max-size’.