Piping in Linux : How to use pipes in shell?

One of the most widely used feature in Linux shell is the pipe (|). Piping in a shell helps in redirecting the output of one command as an input to the other. The main advantage of piping is that it helps in combining simple commands to achieve a particular purpose. In fact even if you know some basic commands in Linux, you can make the best use of your knowledge by using piping. One more advantage is that the total number of commands will be reduced because what can be achieved from a single new command can be achieved by combining two already present commands. In addition to this, the number of options for a particular command can be reduced because a new option is not required if we can achieve the same using piping.Before moving on with examples, let’s see the commands which I am going to use to explain piping1. ls :
list directory contents
To list the files in the directory /bin

$ ls /bin
alsacard
alsaunmute
arch
awk
basename
bash
catchgrp
chmod
chown
cp
cpio
csh
.....

2. wc:
To print the number of newlines, words and bytes in files

$ wc test.c
44   93 1179 test.c

where the number of newlines, words and bytes in test.c is 44, 93 and 1179 respectively

To print the number of newlines, -l option is used

$ wc -l test.c
44 test.c

Now suppose to find the count of newlines we write on the terminal

$ wc -l
After writing
press ctrl + d
this will display the
numberof lines
4

Thus wc can take input both from the terminal and from a file

3. grep :
print lines matching a pattern
To print the lines which contain the pattern ‘main’ in test.c

$ grep main test.c
int main (int argc, char *argv[])

Now if we wish to print the line number in addition to the above information, -n option is used

$ grep -n main test.c
25:int main (int argc, char *argv[])

Just like wc, grep can also take input from the terminal

$ grep -n main
we are writing
some words on the terminal
and using grep to find the
line where we wrote main
4:line where we wrote main(^D)

grep as soon as it finds the pattern ‘main’ displays the corresponding line on the terminal.

4. ps :
report a snapshot of the current processes
To see the processes running in the current shell

$ ps PID TTY TIME CMD 593 pts/2 00:00:00 bash 11270 pts/2 00:00:00 ps

To see the details of all the processes run by all the users

$ ps

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.1   2044   512 ?        Ss   Jan29   0:00 init [5]
root         2  0.0  0.0      0     0 ?        S    Jan29   0:00 [migration/0]
root         3  0.0  0.0      0     0 ?        SN   Jan29   0:00 [ksoftirqd/0]
root         4  0.0  0.0      0     0 ?        S    Jan29   0:00 [watchdog/0]
root         5  0.0  0.0      0     0 ?        S<   Jan29   0:00 [events/0]
root         6  0.0  0.0      0     0 ?        S<   Jan29   0:00 [khelper]
...

Let’s see some examples to demonstrate piping

Example 1
To see the number of files or subdirectories starting with the letter ‘a’ in a directory /bin

$ ls /bin/a*
/bin/alsacard
/bin/alsaunmute
/bin/arch/bin
/awk

It is quite clear that the number of files is four. Here we can manually count them, but suppose there are more entries. Let’s use wc for the purpose, wc -l can count the number of newlines. So we can redirect the output of ls to wc. (Note wc can take input from both the file and the terminal)

$ ls /bin/a* | wc -l4

Thus using piping we have achieved our purpose. The output of ls which has to be displayed on the terminal, has been redirected as an input to wc (which was expecting input from the terminal).

Example 2

We can use more than one pipes. Suppose we wish to find the number of commands containg pattern ‘aw’

$ ls /bin/ | grep aw
awk
gawk
igawk
pgawk

The above only displays the commands containing the pattern ‘aw’. But we need the count. Let’s use wc and one more pipe

$ ls /bin/ | grep aw | wc -l4

Thus we can have more than one pipe to achieve a purpose

Example 3
Suppose we wish to know the details of all the processes run by the user ‘root’
We can use ps to search for the process details and grep to search for the pattern ‘root

root         1  0.0  0.1   2044   512 ?        Ss   Jan29   0:00 init [5]
root         2  0.0  0.0      0     0 ?        S    Jan29   0:00 [migration/0]
root         3  0.0  0.0      0     0 ?        SN   Jan29   0:00 [ksoftirqd/0]
root         4  0.0  0.0      0     0 ?        S    Jan29   0:00 [watchdog/0]
root         5  0.0  0.0      0     0 ?        S<   Jan29   0:00 [events/0]
root         6  0.0  0.0      0     0 ?        S<   Jan29   0:00 [khelper]
root         7  0.0  0.0      0     0 ?        S<   Jan29   0:00 [kthread]
...
abcd     12369  0.0  0.1   3892   656 pts/2    R+   13:55   0:00 grep root

You would have noticed in the last line user ‘abcd’ is displayed, because of the pattern root in ‘grep root’.

Example 4
Suppose we wish to know the details of a process like ‘firefox‘. We can use ps to find that. But instead of manually looking for the particular command in the long displayed list from ps, we can make use of the piping. grep can be used to search for a particular pattern. So the output of ps can be redirected to the grep command which can be used for searching the command.

$ ps -x | grep firefox

 5100 ? S 0:00 /bin/sh /usr/lib/firefox-1.5.0.7/firefox -UILocale en-US 5122 ? S 0:00 /bin/sh /usr/lib/firefox-1.5.0.7/run-mozilla.sh /usr/lib/firefox-1.5.0.7/firefox-bin -UILocale en-US 5127 ? Sl 1:03 /usr/lib/firefox-1.5.0.7/firefox-bin -UILocale en-US 5284 pts1 S+ 0:00 grep firefox

Thus instead of designing a new command for searching a particular running process, we have utilized the concept of piping.

 

The working of Piping in a Linux Shell

We use piping to utilize the output of one command as an input to another. For example , if we wish to count the number of files in a directory(not recursive)

$ls -n| wc -l

The question is how does piping work.
I would like to use the same code to explain piping.

The basic idea of piping works on the system calls like dup, close and pipe.
pipe()

int pipe(int fd[2]);

pipe() creates a pair of file descriptors, pointing to a pipe inode, and places them in the array pointed to by filedes. fd[0] is for reading, fd[1] is for writing.

close()

int close(int fd);

close() closes a file descriptor, so that it no longer refers to any file and may be reused.

dup()

int dup(int fd);

dup() create a copy of the file descriptor fd

Let’s see how using these system calls, the piping works. For this let’s check a piece of code

int some_fd=open("abc",O_RDONLY);
close(0);
fd=dup(some_fd);
printf("%d\n",fd);

The output of this code is

0

how??
This is because as soon as we close the file descriptor 0, the slot 0 in the file table becomes vacant. So when we issue the dup system call, it duplicates the file descriptor and puts into the slot 0. So from now whatever we write to stdin is written to the file ‘abc’

This idea is utilized in piping in the shell. We create a pipe and create a new process. In the child process, we close stdin and in parent stdout (or vice versa). In both of the processes we also issue the dup call immediately and at the same time close the write end in one and read end in another

Let’s check the program. The program on execution expects two commands given as

$ cmd1 | cmd2

The program

#include <stdio.h>
#include <error.h>
#include <stdlib.h>
#include <string.h>  

#define CMD_LEN 1024    //Maximum length of Command
#define MAX_LEN 1024    //Maximum Length of the input
#define MAX_CMD 2       //Maximum number of Commands
#define ARG_LEN 128     //Maximum Length of Argument
#define ARG_COUNT 64    //Maximum number of arguments

int main(int argc, char *argv[], char *envp[])
{
        while(1){
                char *arg[MAX_CMD][ARG_COUNT];
                char cmd[MAX_LEN];

                char *command[MAX_CMD+1];
                char *pos=&cmd[0];
                int cmd_len;  int pid;  int status;

                printf("\n$ ");           //Displaying the prompt
                fgets(cmd,MAX_LEN,stdin); //Reading user input/
                cmd_len=strnlen(cmd,MAX_LEN);

                int i=0,k;
                do{
                        int len;
                        char a[CMD_LEN];
                        sscanf(pos,"%[^|]",a);//Parsing out the commands
                        len=strnlen(a,CMD_LEN);

                        command[i]=malloc(len);
                        if(!command[i]){
                                printf("Unable to allocate memory\n");
                                exit(1);
                        }

                        strncpy(command[i++],a,CMD_LEN);
                        pos+=(len+1);    //Repositioning for next command
                }while(pos<cmd+cmd_len);

                command[i]=NULL;

                i=0;

                while(command[i]){
                        pos=command[i];
                        int arg_count=0;
                        cmd_len=strnlen(command[i],CMD_LEN);

                        do{
                                int len;
                                char a[ARG_LEN];
                                int count=sscanf(pos,"%s",a);

                                //Parsing out the arguments
                                if(count < 1)
                                        break;

                                len=strnlen(a,ARG_LEN);

                                arg[i][arg_count]=malloc(len);
                                if(!arg[i][arg_count]){
                                        printf("Unable to allocate memory\n");
                                        exit(1);
                                }

                                strncpy(arg[i][arg_count++],a,ARG_LEN);
                                pos+=(len+1);  //Repositioning for next argument
                        }while(pos<command[i]+cmd_len);

                        arg[i][arg_count]=NULL;   i++;
                }

                if(i!=2){
                        printf("Two commands required");
                        continue;
                }

                //Obtained the arguments

                pid=fork();                     //Creating a process to execute command 2

                if(pid==0){
                        int fd[2];
                        if(pipe(fd)==-1){       //Creating a Pipe
                                perror("pipe:");
                                exit(1);
                        }

                        pid=fork();             //Creating process to execute command 1

                        if(pid==0){
                                close(fd[0]);
                                close(1);        //closing stdout

                                if(dup(fd[1])==-1){      //stdout now points to fd[1]
                                        perror("dup");
                                        exit(1);
                                }

                                //In child process
                                if(execve(arg[0][0],&arg[0][0],envp)==-1){
                                        perror("execve");
                                }

                                exit(1); //Reached if Child failed to execute process
                        }

                        else{
                                close(fd[1]);//close the write end
                                close(0);//closing the stdin

                                if(dup(fd[0])==-1){//stdin now points to fd[0]
                                        perror("dup:");
                                        exit(1);
                                }

                                //In parent process
                                if(execve(arg[1][0],&arg[1][0],envp)==-1){
                                        perror("execve");
                                }
                                exit(1); //Reached if failed to execute process
                        }
                }
                wait(&status);    //Wait for process to terminate //continue
        }
        return 0;
}

Output

$  /bin/ls / | /bin/grep root
root

$

Implementation of a Linux Shell

The shell is used to write commands and execute them. But the question ,”How does a Linux shell work?” is quite relevant. Before continuing our discussion, one thing you must understand that a shell is also a process or a task(as called in the Linux world).

The working of a basic shell can be explained as follows
1. Wait for user input (command)
2. Parse the input to get the command and the arguments
3. Create a new process to execute the command (Use fork() system call)
4. In the new process, execute the command (Use execve() system call) and exit with the status of the command executed
5. In the parent process, wait for the new process to terminate(Use wait() system call). Get the exit status of the child process (User may need it!!)
6. Goto step 1 to continue waiting for the user input.

A simple C program can be used to demonstrate this

#include <stdio.h>

#include <string.h>

#include <stdlib.h>

#define MAX_LEN 	1024    /* Maximum Length of the command alogwith the arguments */

#define ARG_LEN 	128     /* Maximum Length of Argument */

#define ARG_COUNT	64    	/* Maximum number of arguments */

int main(int argc, char *argv[], char *envp[])

{

 	while(1){

         char 	*arg[ARG_COUNT];

         char 	cmd[MAX_LEN];

         char 	*pos = &cmd[0];

         int 	cmd_len, pid, status;

  	 printf ("\n$ "); 		/* Displaying the prompt */

 	 fgets (cmd, MAX_LEN, stdin); 	/* Reading user input	 */

         cmd_len = strnlen(cmd, MAX_LEN);

 	 int i=0,k;

         do{

              int 	len;

              char 	a[ARG_LEN];

              sscanf(pos,"%s",a); 		/* Parsing out the arguments */

 	      len=strnlen(a,ARG_LEN);

	                     arg[i] = malloc(len);

              if (!arg[i]) {

                    printf("Unable to allocate memory\n");

                    exit(1);

              }

	      strncpy(arg[i++],a,ARG_LEN);

	      pos+=(len+1);		/* Repositioning: for next argument */

	}while(pos<cmd+cmd_len);   	/* Obtained the arguments */

      	pid=fork();			/* Create child process to execute command */

	if(pid==0){       		/* In child process */

		arg[i] = NULL;

        	if (execve(arg[0], arg, envp) == -1){

              		 perror ("Error");

         	}

         	exit(1); 	 	/* Reached if Child failed to execute */

    	}

	wait(&status);  		/* Wait for Child to terminate */

/* continue */

  }

return 0;

}

Output

# /bin/ls /

bin   dev  home  lib64       media  mnt    net  proc  sbin       srv  tmp  var

boot  etc  lib   lost+found  misc   opt    root  selinux    sys  usr

If you run this program, you will see $ in the beginning of the line. But the shell we have designed is not comparable to the bash shell or any other shell. You can not use the arrow keys for navigation. To execute some simple commands like ls, you have to specify the complete path (/bin/ls). But these are the features that we have to add.

So any shell utilizes the above basic program adding on more and more features!!

Comment if you found any bug and like to add some features to the basic shell