Total Pageviews

Saturday, 17 March 2012

how to write programs for lex and program structure of lex

The input program  structure of lex contains three  parts namely 
  1. Declaration section  (or definition section starts with "%{"   and ends with "%}"  )
  2. Rules section           ( starts with "%%" and ends with "%%" )
  3. Subroutine section (or c code section)
a lex file would have the extension ".l"
the ".l " file is given as input to the lex it will generate a file with extension ".c"  named as "lex.yy.c" 

to put  it in a more precise way  Lex source is a table of regular expressions and corresponding program fragments. The table is translated to a program which reads an input stream, copying it to an output stream and partitioning the input into strings which match the given expressions. As each such string is recognized the corresponding program fragment is executed. The recognition of the expressions is performed by a deterministic finite automaton generated by Lex. The program fragments written by the user are executed in the order in which the corresponding regular expressions occur in the input stream.


  1. A simple pattern:  letter(letter|digit)*
  2. Regular expressions are translated by lex to a computer program that mimics an FSA.(Finite State Automata)
  3. This pattern matches a string of characters that begins with a single letter followed by zero or more letters or digits.
 now let us understand how the lex program behaves by considering an program
let us consider a ,
 program to count no of vowels or consonants in the input string 

1.%{ #include<stdio.h> 
2.     int  vow=0,cons=0;
3.%}

4.%%
5.       \n                        {return;}
6.        [  \t]+                   {;}
7.        [aeiouAEIOU]     {vow++ ;}
8.        [a-zA-Z]                {cons++;}
9.         .                               ;
10.%%
11.    main()
12.{    printf(“enter a string \n“);
13.     yylex();
14.     printf(“no of vowels=%d \n”,vow);
15.     printf(“no of consonants=%d”,cons);
16.}
  •         lines 1-3 in the above program indicate "declaration section" 
  •         the symbol "%{" in line 1 indicate starting of declaration section 
  •          the symbol "%}" in line 3 indicate ending of declaration section
  •          in declaration section we declare the preprocessor statements  to be used and data types of the variables which we will be using in the program. here we have used "standard I/O header" (a preprocessor statement) and we will be using  variables "vow,cons " which are of type " int"
  •         lines 4-10 indicate "rules section" of the program 
  •         the symbol "%%" in line 4 indicates starting of rules section
  •         the symbol "%%" in the line 5 indicate ending of rules section
  •         it's in the rules section we make use of regular expressions which are translated to  C functions or program by DFA 
  •         in the above program we hv made use of 5 regular expressions their usage in the program is as explained below
  •           the expression in line 5 means whenever "new line" is encountered return the control
  •         the expression  in line 6 means whenever "blank space"  or "tab" is encountered ignore them 
  •         the expression in line 7 indicates whenever "either 'a' or 'e' or 'i' or 'o' or 'u' or 'A' or 'E' or 'I' or 'O' or 'U' is encountered increment the vowel count(vow)  by one 
  •          the expression in line 8 indicates whenever any alphabet between either "a to z" or "A to Z" is encountered increment consonants count by one 
  •         some of us might get a doubt when we say  whenever any alphabet between "a to z" or "A to Z" is encountered if we increment consonant count ."even vowels are alphabets between "a to z" or "A to Z" then whenever it encounters vowels also it will increment consonants by one " .But this assumption is wrong here whenever a expression in "line 7" is executed the expression in "line 8" wont execute i.e . whenever a preceeding regular expression is satisified the expression which is next to(following)  it wont be executed .
  •          the expression in line 9 indicates any other character other than newline ignore it 
  •        lines 11-16 indicate "subroutine section"  .this section conatins legal " C" code  and inbuilt  function called "yylex()" is called in this section
compilation and working of lex program
       There are two steps in compiling a Lex source program. in the first step following actions take place
  1.          First, the Lex source must be turned into a generated program in the host general purpose    language.  this is achieved by using "lex"  command  the general  syantax of this command is as shown below  SYNTAX: lex program name (with extension .l)   The lex command passes C code, unchanged, to the lexical analyzer in the following circumstances:
  2.            Lines beginning with a blank or tab in the definitions section, or at the start of the rules section before the first rule, are copied into the lexical analyzer. If the entry is in the definitions section, it is copied to the external declaration area of the lex.yy.c file. If the entry is at the start of the rules section, the entry is copied to the local declaration area of the yylex subroutine in the lex.yy.c file.
  3.            Lines that lie between delimiter lines containing only %{  (percent sign, left brace) and %} (percent sign, right brace) either in the definitions section or at the start of the rules section are copied into the lexical analyzer in the same way as lines beginning with a blank or tab.
  4.             Any lines occurring after the second %% (percent sign, percent sign) delimiter are copied to the lexical analyzer without format restrictions.
  5.        when above actions are completed all the expressions specified in rules section are converted to "C" statements and are placed in funtion "yylex()" . 
   the second step in complilation  results in following actions
  1.          then this program must be compiled and loaded, usually with a library of Lex subroutines. The generated program is on a file named lex.yy.c. The I/O library is defined in terms of the C standard library
  2.         The library is accessed by the loader flag -ll. So an appropriate set of commands to  lex source is "cc lex.yy.c -ll"   The resulting program is placed on the usual file a.out for later execution  
    in the next post i will post about how the other inbuilt functions in lex work. i hope the the above article would be helpful in understanding lex

An introduction to lex and yacc

hi, i m posting here about lex and yacc ,as per my knowledge i m posting here whatever i have understood if there are any mistakes please excuse me,

let us begin our understanding with the definition of lex

what is lex?
lex- is a lexical analyzer generator 
 Lex is a program generator designed for lexical processing of character input streams .It accepts a high-level, problem oriented specification for character string matching, and produces a program in a general purpose language which recognizes regular expressions. The regular expressions are specified by the user in the source specifications given to Lex. The Lex written code recognizes these expressions in an input stream and partitions the input stream into strings matching the expressions

to put it in more simpler way we can define it as,

lex is a program generator that generates lexical analyzers.
the main job of lexical analysers is to break up an input stream into more usable elements(tokens)
example:  a=b+c*d;

             compilation sequence of above expression in lex is as shown below 
 


now by looking at above example we can say 
Lexical analysers tokenise input streams
tokens are the terminals of a language,we use regular expressions to define these tokens/terminals

we can notice that as we go on understanding lex we are redifining it's definition ,
now we will give more simpler definition which will suit our context of use in the ss lab

lex is a program (generator) that generates lexical analyzers, (widely used on Unix).
It is mostly used with Yacc parser generator.
Written by Eric Schmidt and Mike Lesk.
It reads the input stream (specifying the lexical analyzer ) and outputs source code implementing the lexical analyzer in the C programming language.
Lex will read patterns (regular expressions); then produces  C code for a lexical analyzer that scans for identifiers

i think we hv already spoken more about lex now move to how to write programs for lexical analyser

before we start writing source programs for lexical analyser we should have some knowledge about regular expressions so first i will give some examples of regular expression and then begin begin with lex

example of regular expression  

the symbols used while writing regular expressions and there meaning are described below
EXPRESSION                                         MEANING
  1. abc*                           "ab" followed by zero or any number of   "c"  ex: ab abc abcc abccc... etc.
  2. abc+                            "ab" followed by atleast one or more no. of "c"  ex:  abc abcc ....  etc
  3. a(bc)+                          "a" followed by atleast one or more "bcex: abc abcbc abcbcbc... etc
  4. a(bc) ?                         "a" followed by zero or one "bc"   ex: a abc (it can accept only these two)
  5. [abc]                             either "a" or "b" or "c" i.e one out of a,b,c 
  6. [a-z]                              any letter between "a-z" ex: a,b,c,d...............,x,y,z
  7. [a\-z]                            either "a" or "-" or "z" i.e. (one of a,-,z)  here we are using backslash                                                because   if we give it as [a-z] it means any letter between "a to z" 
  8. [-az]                              this is another way of defining above expression it means one   outof-,a,z 
  9. [^ab]                           anything except "ab"
  10.   a|b                             either "a" or "b" 
 i hope above examples convey the meanings of regular expressions easily 
i will post how to write programs for lex and how they work in next post
asdsadsaddasdsdsadsadasdsadsadsdsdsadsadsad<strike>cvzxcdsfdsfdf<font color="#444444">sadsad</font></strike>lex and yacc