C: getopt_long_only Example: Accessing command line arguments

getopt_long is useful to work with the command line arguments. But while working with getopt_long, — is used for long options and – for short options. getopt_long_only accepts both — and – for long options

  1. rectangle -a -l 12 -b 34: will calculate the area of the rectangle
  2. square -p -l 12 -b 34: will calculate the perimeter of the rectangle
  3. rectangle -a -p -l 12 -b 34: will calculate the area and perimeter of the rectangle
  4. rectangle –area –length 12 –breadth 34: will calculate the area of the rectangle
  5. square -perimeter –length 12 –breadth 34: will calculate the perimeter of the rectangle
  6. rectangle -area -perimeter –length 12 –breadth 34: will calculate the area and perimeter of the rectangle

The program is much like getopt_long.

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <getopt.h>

/** Program to calculate the area and perimeter of 
 * a rectangle using command line arguments
 */
void print_usage() {
    printf("Usage: rectangle [ap] -l num -b num\n");
}

int main(int argc, char *argv[]) {
    int opt= 0;
    int area = -1, perimeter = -1, breadth = -1, length =-1;

    //Specifying the expected options
    //The two options l and b expect numbers as argument
    static struct option long_options[] = {
        {"area",      no_argument,       0,  'a' },
        {"perimeter", no_argument,       0,  'p' },
        {"length",    required_argument, 0,  'l' },
        {"breadth",   required_argument, 0,  'b' },
        {0,           0,                 0,  0   }
    };

    int long_index =0;
    while ((opt = getopt_long_only(argc, argv,"", 
                   long_options, &long_index )) != -1) {
        switch (opt) {
             case 'a' : area = 0;
                 break;
             case 'p' : perimeter = 0;
                 break;
             case 'l' : length = atoi(optarg); 
                 break;
             case 'b' : breadth = atoi(optarg);
                 break;
             default: print_usage(); 
                 exit(EXIT_FAILURE);
        }
    }
    if (length == -1 || breadth ==-1) {
        print_usage();
        exit(EXIT_FAILURE);
    }

    // Calculate the area
    if (area == 0) {
        area = length * breadth;
        printf("Area: %d\n",area);
    }

    // Calculate the perimeter
    if (perimeter == 0) {
        perimeter = 2 * (length + breadth);
        printf("Perimeter: %d\n",perimeter);
    }
    return 0;
}

C: getopt_long example: Accessing command line arguments

A command can have both long and short options. getopt is useful only for short options, that are nothing but options of one char (character) long. To support both short and long options like

  1. rectangle -a -l 12 -b 34: will calculate the area of the rectangle
  2. square -p -l 12 -b 34: will calculate the perimeter of the rectangle
  3. rectangle -a -p -l 12 -b 34: will calculate the area and perimeter of the rectangle
  4. rectangle –area –length 12 –breadth 34: will calculate the area of the rectangle
  5. square –perimeter –length 12 –breadth 34: will calculate the perimeter of the rectangle
  6. rectangle –area –perimeter –length 12 –breadth 34: will calculate the area and perimeter of the rectangle

In the following program, the use of getopt_long is shown

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <getopt.h>

/** Program to calculate the area and perimeter of 
 * a rectangle using command line arguments
 */
void print_usage() {
    printf("Usage: rectangle [ap] -l num -b num\n");
}

int main(int argc, char *argv[]) {
    int opt= 0;
    int area = -1, perimeter = -1, breadth = -1, length =-1;

    //Specifying the expected options
    //The two options l and b expect numbers as argument
    static struct option long_options[] = {
        {"area",      no_argument,       0,  'a' },
        {"perimeter", no_argument,       0,  'p' },
        {"length",    required_argument, 0,  'l' },
        {"breadth",   required_argument, 0,  'b' },
        {0,           0,                 0,  0   }
    };

    int long_index =0;
    while ((opt = getopt_long(argc, argv,"apl:b:", 
                   long_options, &long_index )) != -1) {
        switch (opt) {
             case 'a' : area = 0;
                 break;
             case 'p' : perimeter = 0;
                 break;
             case 'l' : length = atoi(optarg); 
                 break;
             case 'b' : breadth = atoi(optarg);
                 break;
             default: print_usage(); 
                 exit(EXIT_FAILURE);
        }
    }
    if (length == -1 || breadth ==-1) {
        print_usage();
        exit(EXIT_FAILURE);
    }

    // Calculate the area
    if (area == 0) {
        area = length * breadth;
        printf("Area: %d\n",area);
    }

    // Calculate the perimeter
    if (perimeter == 0) {
        perimeter = 2 * (length + breadth);
        printf("Perimeter: %d\n",perimeter);
    }
    return 0;
}

C:Working with command line arguments

Almost all of the commands in Linux/Unix have options. An option for a commmand is a mechanism by which you provide additional parameters to the command to change its behavior. Take for example, the command ls is used to list the files in a directory. But to obtain a detailed listing of the files, the option -l is used. Similarly the -a option with ls allows to see all the hidden files (file names starting with .a). Commands with option removes the need for creating multiple commands to achieve a purpose.

To access the command line parameters, make sure that the main function() looks something like this

int main(int argc, char *argv[]) {
}

Now you can write a program which access every parameter that you pass using argv[0], argv[1],…. And the number of command line arguments passed by the user can be obtained from argc

Working with command line arguments in this manner is tedious. Linux/Unix provides the following functions to easily work with the command line arguments

  1. getopt()
  2. getopt_long()
  3. getopt_long_only()

Before delving deep into this topic, let’s take a look what is meant by long arguments. Most of the options in a command have both a long and a short form. For example, to list all the hidden files, one can write ls -a or ls –all. Here -a is a short argument and –a is a long argument. As mentioned earlier, some options may not have a long option. -l option of ls doesn’t have a long option. With this in mind, let’s continue to look at the various functions

  1. getopt()
  2. getopt_long()
  3. getopt_long_only()

C: getopt Example: Accessing command line arguments

The simplest way to work with command line arguments is to use the getopt() function. To understand more about it, first let’s see a command which can calculate the area and perimeter of a rectangle

  1. rectangle a -l 12 -b 34: will calculate the area of the rectangle
  2. square p -l 12 -b 34: will calculate the perimeter of the rectangle
  3. rectangle ap -l 12 -b 34: will calculate the area and perimeter of the rectangle

As we can see, some options take arguments and some do not. Here a and p do not take any argument. But -l and -b take the arguments (number) for length and breadth respectively.

So to distinguish them, getopt provides a mechanism. All the options that require argument will be preceded by a : (colon).

The following program shows this

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <getopt.h>

/** Program to calculate the area and perimeter of 
 * a rectangle using command line arguments
 */
void print_usage() {
    printf("Usage: rectangle [ap] -l num -b num\n");
}

int main(int argc, char *argv[]) {
    int option = 0;
    int area = -1, perimeter = -1, breadth = -1, length =-1;

    //Specifying the expected options
    //The two options l and b expect numbers as argument
    while ((option = getopt(argc, argv,"apl:b:")) != -1) {
        switch (option) {
             case 'a' : area = 0;
                 break;
             case 'p' : perimeter = 0;
                 break;
             case 'l' : length = atoi(optarg); 
                 break;
             case 'b' : breadth = atoi(optarg);
                 break;
             default: print_usage(); 
                 exit(EXIT_FAILURE);
        }
    }
    if (length == -1 || breadth ==-1) {
        print_usage();
        exit(EXIT_FAILURE);
    }

    // Calculate the area
    if (area == 0) {
        area = length * breadth;
        printf("Area: %d\n",area);
    }

    // Calculate the perimeter
    if (perimeter == 0) {
        perimeter = 2 * (length + breadth);
        printf("Perimeter: %d\n",perimeter);
    }
    return 0;
}

Error Handling in Programming

Just like separating out the configuration parameters from the program is important, it is also important to separate the error handling mechanism from general running of the program. There are different ways by which errors are handled by the program.

Errors are of various types. There can be computational errors like Division by Zero, file handling errors like missing files, memory allocation errors and device read or write errors. Depends on the context, an error can be fatal or ignorable. Take for example, if a configuration file is missing, it is a fatal error. The program must terminate and report the error to the user.  Of course, your program can have default values for all the required settings and continue. It is a choice left to the programmers. Some programmers prefer to report the missing file as a warning and continue. Take another example of division by zero. One can always check before the division whether the divisor is zero. If it is a zero, it is reported as an error to the user and the program will continue to work (probably asking for the change in input from the user).

There are various categories of errors and depending on the requirement, we decide what action has to be taken. It is difficult to generalize what must be put in a specific category. But fatal errors normally include the missing of crucial files or libraries, inability to allocate enough memory for computation. There are errors that can be ignored, probably there is a work around for the error. Possible examples include spelling mistakes in a text editor. These are errors with respect to the user, but they don’t have an impact on the software. And often such errors are shown to the user in a light fashioned manner (spelling mistakes underlined red). But spelling mistakes of keywords in a programming language are important errors and can not be ignored, because it will lead to compile time errors. A compiler will return critical errors for such mistakes, but a text editor may not.

Once we have identified the errors and their respective categories, the next step is to determine what action must be taken. The program should

  • Exit
  • Warn the user
  • Signal the user for a change
  • Look for a Workaround
  • Ignore
  • Alert or Email the user
  • Log the errors in a file

So while writing programs, we must

  1. Identify (all) the possible errors
  2. Identify the categories of errors
  3. Action to be taken for every category of error

 

Location of Vim Syntax files for different programming languages

If you use Vim for programming, you find that by default based on the type of the file, Vim enables syntax highlighting. That is if your file name is myfile.c or myfile.C, Vim enables Syntax highlighting for the C language. Likewise, it enables syntax highlighting for a large number of well known (programming) languages. If you are wondering where you can find these syntax highlighting files, you can check the following directory.

Currently in my system, it is located in the following directory

/usr/share/vim/vim73/syntax

Let’s check the directory contents

$ ls /usr/share/vim/vim73/syntax
...
config.vim        idl.vim           plp.vim          tf.vim
conf.vim          indent.vim        plsql.vim        tidy.vim
context.vim       inform.vim        pod.vim          tilde.vim
cpp.vim           initex.vim        postscr.vim      tli.vim
crm.vim           initng.vim        po.vim           tpp.vim
crontab.vim       inittab.vim       povini.vim       trasys.vim
....

Thus for every programming language, they have an associated .vim file. For C++, there is cpp.vim. For crontab entries (/etc/cron.d), there is crontab.vim

Since, I am using Vim 7.3, the above syntax files are located in the vim73. But if you are looking for future versions, you can run the following command to get the location

$ find /usr/share/vim/ -iname "*syntax"
/usr/share/vim/vim73/syntax

Separating out the source and the configuration files

One of the first steps before delving deep into the coding is to make a clear distinction between the source and the configuration files. Every software must give enough options to its users to configure the software behavior. A user may wish to change the interface color, rearrange certain options and reposition certain elements. Sometimes giving this additional power to the users to tweak the softwares is seen as challenging. This is the prime reason that many softwares that allow the users to configure their softwares often give an option “Restore to the Default values”. There is this common fear among the programmers that the naive users will tweak the software in a manner that will make the software unusable.

Another possible reason is that the programmers often tend to hard code all the possible configurations in their programs. This is mostly because of the ease in programming. Instead of reading a value from the configuration file every time an action must be performed, it is easier to work with the hard coded values. And of course, there is an associated delay in reading the configuration value. So most programmers tend to work with values hard coded into their programs.

And then when there is a demand for a change (for example color), they make the change in the code and release a new version. I agree that there is often no clear distinction between what must be available to the user for configuration and what must not be not. And personally I found very less books and papers that talk on the topic. But it can be concluded that ease of programming and fear of software damage are the prime reasons why programmers do not give enough configurable options to their users.

The caveat associated this non-separation is that the programmers often spend a lot of time tweaking with various values. They have to yield to the popular demand. Often, it results in the loss of some potential customers who are not satisfied with the change and move on with other softwares available in the market.

Sometimes it is not intentionally done. Take for example in mission critical systems where every fraction of second counts and the values chosen are made after enough careful research.

To make a clear distinction between what must be configurable and what must be not at the time of development is a challenge in itself. On one hand, you can make every possible option to be configurable, to an extent that you can give the users the choice between algorithms (for example bubble sort, quick sort for sorting). Such an approach seems to be laughable, but an expert knows what is the best algorithm for his/her requirements. And on the other hand, you can hard code, leaving no option for configuration. If you are designing a generic software, there seems to be no perfect option. There must be enough options for the user to configure their software.

Here is what I feel every software development should look like hypothetically

$ ls project
src config

Two separate directories (or some other means of separation) for source code and configuration

C: Using scanf and wchar_t to read and print UTF-8 strings

We saw how using scanf and char to read UTF-8 strings led us to some strange answers. So now we need to discuss the solution provided by C.

/** Program to read a single character of different language
using wchar_t array and scanf. The program prints back the
string along with its length
*/

#include 
#include 
#include <wchar.h>
#include 

int main() {

    wchar_t string[100];

    setlocale(LC_ALL, "");

    printf ("Enter a string: ");
    scanf("%ls",string);

    printf("String Entered: %ls: length: %dn", string, wcslen(string));

    return 0;
}

Let’s see the various aspects of this program.

  • Use of wchar_t instead of char. wchar_t is used by C to deal with the characters of various locales. Note that there are various locales other than UTF-8, but most of them focus on a particular language. wchar_t corresponds to a wide character. wchar is wider than char (1 bytes), so it can carry a large number of characters of various languages
  • To read and print a wide character string, we use the %ls format. Instead of %s, we use %ls to work with the UTF-8 characters. This directs printf and scanf to do special treatment (call additional functions) to the entered string
  • Use of wcslen instead of strlen to get the length of the string. C library provides the function wcslen to get the length of wide character strings
  • There are different ways by which a locale needs to be treated. For example, in some cases, the locale treatment just involves treatment with date or current representation. But here we used LC_ALL to deal with all the locale specific features.

Let’s see more. I need to first show that I am using UTF-8
$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

On executing the program, I will first enter the English character a
$ ./a.out
Enter a single character and press enter: a
String Entered: a: length: 1

The output is as expected, we entered a single character. So the length is 1

Next, I use the French character é
$ ./a.out
Enter a single character and press enter: é
String Entered: é: length: 1

So we got the length 1 as expected

Let’s try the same experiment with a Chinese letter 诶
$ ./a.out
Enter a single character and press enter: 诶
String Entered: 诶: length: 1

Once again, we got what we were looking for, the length 1.

Thus, we must ensure that we use the UTF-8 string for our softwares. It is also important to use the right functions

C: Using scanf and char to read UTF-8 strings

As businesses are turning global, softwares are made that are intended to meet the global customers. UTF-8 has now become a de-facto standard for use in the web. There are obvious questions that arise in the minds of a C programmer whether C supports UTF-8 and is it possible to read a UTF-8 content. In this example, I show how scanf and char are used to read a UTF-8 string. But at the end of the post you will understand why char is not a good option for working with UTF-8.

/** Program to read a single character of different language
  using char array and scanf and printing the string
  along with its length
*/

#include 
#include 

int main() {

    char string[10];

    printf ("Enter a single character and press enter: ");
    scanf("%s",string);

    printf("String Entered: %s: length: %dn", string, strlen(string));

    return 0;
}

We see that in the program, we declare a char array of length 10, we read a string and then print the string along with its length.

I need to first show that I am using UTF-8
$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

On executing the program, I will first enter the English character a
$ ./a.out
Enter a single character and press enter: a
String Entered: a: length: 1

The output is as expected, we entered a single character. So the length is 1

Next, I use the French character é
$ ./a.out
Enter a single character and press enter: é
String Entered: é: length: 2

Here comes the difficult part, we see that even though we entered a single character, we get the length of the character as 2.

Let’s try the same experiment with a chinese letter 诶
$ ./a.out
Enter a single character and press enter: 诶
String Entered: 诶: length: 3

The result is bizarre, we see that the length is 3.
How can we explain this?

The first thing we should recall is that the size of char is 8 bits or 1 byte. It means it can only carry 256 values. Consider the vast number of languages and dialects in the world, char is not enough to carry the value. So we need a better mechanism called the UTF-8.

As already discussed, I am using UTF-8 in my terminal. So it is able to handle the characters from different languages, but my program is not capable to. Since it is showing very strange answers about the length of the character entered. So we need a better option

Piping: A simple means of data transfer between commands and programs

When we were discussing about Data processing, we saw that a data processing machine consists of a number of data processors working in unison (in a series or in parallel) to generate a meaningful output. To see such a data processing in action, we can take a look at the idea of piping in Linux (or Unix). A pipe reminds me of the cylindrical tubes that are used to transfer water or oil from one place to another. So a pipe is something which doesn’t modify the data (or entity) passed through it, rather it is just for transfer. In other words, the input to a pipe is the same as the output.

When we discussed about commands, we saw that there was a time, when the users had to enter the name of a command on the terminal of a computer to get things done. So to list the contents of a directory, the command ‘ls’ is used. To change the current directory, the command ‘cd’ was used. To count the number of words, characters or lines, the command ‘wc’ was used. All these commands were meant for a specific purpose. But what if I want to get another work done, that is not possible with a particular command, should i program a new command. What if I am not a programmer, it will be really difficult to create a new command. So there must exist a simpler option. And here comes the idea of piping. As we saw earlier, pipe is used to transfer an object or simply it gives out what is fed into it. So what if the output of one command is fed as an input to the other command. So we need a mechanism to transfer the contents of one command as an input to the other command. And for this purpose, piping is used in many Linux (Unix) based variants.

So a pipe transfers the output of one command as an input to the other command.

Take for example, I want to know the number of entries in a directory. As we saw earlier, ‘ls’ is used to see the contents of a directory and ‘wc’ is used to count the number of lines. What if we combine these two commands using a pipe, we get our work done. We feed the output of ls as an input to wc using a pipe.

In Linux, the symbol | is used for the purpose of piping

ls | wc -l

The option -l is used to get only the number of lines displayed. Thus we are able to get the number of entries in a directory without creating a new command. The best way to reuse the existing resources to get things done.

The idea of piping helped in a way that the users need not learn a large number of commands, but a very few number of commands. With this limited number of commands, one can get all the things done required to get their daily jobs done.

Compare this to the present day state of softwares and applications. A large number of softwares doing the same thing but with different user interfaces. What’s the result? A regular user is baffled to make a choice of a software ending up reading multiple review sites to make a choice. Sometimes, it is good to look at what people did in the past when they had a crunch of resources. Those lessons are helpful in making good design decisions.