Namedpipe Log Writer: Difference between revisions

From EDURange
Jump to navigationJump to search
mNo edit summary
mNo edit summary
 
Line 12: Line 12:
"fifos", also known as "named pipes", are one of the core building blocks of inter-process communication on POSIX/UNIX platforms. In line with UNIX "everything is a file" design thinking, they look just like a file. But they have no persistent storage. They are a temporary first-in first-out queue.
"fifos", also known as "named pipes", are one of the core building blocks of inter-process communication on POSIX/UNIX platforms. In line with UNIX "everything is a file" design thinking, they look just like a file. But they have no persistent storage. They are a temporary first-in first-out queue.


fifos behave differently from files in one other key way. When a reader is reading from a normal file and reaches the end of the data, the filesystem sends an end-of-file character. When the file is a fifo, a read or seek beyond the end of the buffer simply blocks.
Fifos behave differently from files in one other key way. When a reader is reading from a normal file and reaches the end of the data, the filesystem sends an end-of-file character. When the file is a fifo, a read or seek beyond the end of the buffer simply blocks.


This is important because many core programming language libraries which implement basic IO with the operating system respond to the EOF character by closing the file and releasing the file descriptor. In a live data pipeline, we know that most of the time when we run out of data, there will be more ready for us very soon.
This is important because many core programming language libraries which implement basic IO with the operating system respond to the EOF character by closing the file and releasing the file descriptor. In a live data pipeline, we know that most of the time when we run out of data, there will be more ready for us very soon.

Latest revision as of 20:43, 30 May 2025

See https://github.com/edurange/demo-namedpipe-log-writer/tree/main.

This is a demo of using a fifo via the mkfifo utility for concurrent IO between two processes, such as a log producer and log consumer.

Instructions[edit | edit source]

To run it one can use `./toy_start_log.sh`, which is a very reductive mock of start_ttylog.sh. The commands from toy_start_log.sh are easy to run manually, as well.

Explanation[edit | edit source]

Previous versions rely heavily on polling and other busy waiting when coordinating between processes. Code that uses appropriate synchronization mechanisms is more performant, and easier to read and maintain.

"fifos", also known as "named pipes", are one of the core building blocks of inter-process communication on POSIX/UNIX platforms. In line with UNIX "everything is a file" design thinking, they look just like a file. But they have no persistent storage. They are a temporary first-in first-out queue.

Fifos behave differently from files in one other key way. When a reader is reading from a normal file and reaches the end of the data, the filesystem sends an end-of-file character. When the file is a fifo, a read or seek beyond the end of the buffer simply blocks.

This is important because many core programming language libraries which implement basic IO with the operating system respond to the EOF character by closing the file and releasing the file descriptor. In a live data pipeline, we know that most of the time when we run out of data, there will be more ready for us very soon.

Working around this without fifos takes a lot of effort to monitor the filesystem for a change after the log consumer reaches an EOF character. And that's because the desired behavior actually already exists in the form of this very mundane fossil from the command line.

Rather than let the file close in this demo, I use a fifo so that reads always block even when the buffer is empty. Then I wrap this communication in asynchronous reads so that asyncio will respond to the availability of readable data, rather than polling for it.

Due to some technical limitations of the Python interpreter (related to the Global Interpreter Lock) it's necessary to wrap the read operation in a thread to make it awaitable by asyncio. That's not a general endorsement of threading, and such details would be wrapped in a production implementation.