blog to make students extreme creative

The input program structure of lex contains three parts namely

Declaration section (or definition section starts with "%{" and ends with "%}" )
Rules section ( starts with "%%" and ends with "%%" )
Subroutine section (or c code section)

a lex file would have the extension ".l"

the ".l " file is given as input to the lex it will generate a file with extension ".c" named as "lex.yy.c"

to put it in a more precise way Lex source is a table of regular expressions and corresponding program fragments. The table is translated to a program which reads an input stream, copying it to an output stream and partitioning the input into strings which match the given expressions. As each such string is recognized the corresponding program fragment is executed. The recognition of the expressions is performed by a deterministic finite automaton generated by Lex. The program fragments written by the user are executed in the order in which the corresponding regular expressions occur in the input stream.

A simple pattern: letter(letter|digit)*
Regular expressions are translated by lex to a computer program that mimics an FSA.(Finite State Automata)
This pattern matches a string of characters that begins with a single letter followed by zero or more letters or digits.

now let us understand how the lex program behaves by considering an program

let us consider a ,

program to count no of vowels or consonants in the input string

1.%{ #include<stdio.h>

2. int vow=0,cons=0;

3.%}

4.%%

5. \n {return;}

6. [ \t]+ {;}

7. [aeiouAEIOU] {vow++ ;}

8. [a-zA-Z] {cons++;}

9. . ;

10.%%

11. main()

12.{ printf(“enter a string \n“);

13. yylex();

14. printf(“no of vowels=%d \n”,vow);

15. printf(“no of consonants=%d”,cons);

16.}

lines 1-3 in the above program indicate "declaration section"
the symbol "%{" in line 1 indicate starting of declaration section
the symbol "%}" in line 3 indicate ending of declaration section
in declaration section we declare the preprocessor statements to be used and data types of the variables which we will be using in the program. here we have used "standard I/O header" (a preprocessor statement) and we will be using variables "vow,cons " which are of type " int"
lines 4-10 indicate "rules section" of the program
the symbol "%%" in line 4 indicates starting of rules section
the symbol "%%" in the line 5 indicate ending of rules section
it's in the rules section we make use of regular expressions which are translated to C functions or program by DFA

in the above program we hv made use of 5 regular expressions their usage in the program is as explained below
the expression in line 5 means whenever "new line" is encountered return the control
the expression in line 6 means whenever "blank space" or "tab" is encountered ignore them
the expression in line 7 indicates whenever "either 'a' or 'e' or 'i' or 'o' or 'u' or 'A' or 'E' or 'I' or 'O' or 'U' is encountered increment the vowel count(vow) by one
the expression in line 8 indicates whenever any alphabet between either "a to z" or "A to Z" is encountered increment consonants count by one
some of us might get a doubt when we say whenever any alphabet between "a to z" or "A to Z" is encountered if we increment consonant count ."even vowels are alphabets between "a to z" or "A to Z" then whenever it encounters vowels also it will increment consonants by one " .But this assumption is wrong here whenever a expression in "line 7" is executed the expression in "line 8" wont execute i.e . whenever a preceeding regular expression is satisified the expression which is next to(following) it wont be executed .
the expression in line 9 indicates any other character other than newline ignore it
lines 11-16 indicate "subroutine section" .this section conatins legal " C" code and inbuilt function called "yylex()" is called in this section

compilation and working of lex program

There are two steps in compiling a Lex source program. in the first step following actions take place

First, the Lex source must be turned into a generated program in the host general purpose language. this is achieved by using "lex" command the general syantax of this command is as shown below SYNTAX: lex program name (with extension .l) The lex command passes C code, unchanged, to the lexical analyzer in the following circumstances:
Lines beginning with a blank or tab in the definitions section, or at the start of the rules section before the first rule, are copied into the lexical analyzer. If the entry is in the definitions section, it is copied to the external declaration area of the lex.yy.c file. If the entry is at the start of the rules section, the entry is copied to the local declaration area of the yylex subroutine in the lex.yy.c file.
Lines that lie between delimiter lines containing only %{ (percent sign, left brace) and %} (percent sign, right brace) either in the definitions section or at the start of the rules section are copied into the lexical analyzer in the same way as lines beginning with a blank or tab.
Any lines occurring after the second %% (percent sign, percent sign) delimiter are copied to the lexical analyzer without format restrictions.
when above actions are completed all the expressions specified in rules section are converted to "C" statements and are placed in funtion "yylex()" .

the second step in complilation results in following actions

then this program must be compiled and loaded, usually with a library of Lex subroutines. The generated program is on a file named lex.yy.c. The I/O library is defined in terms of the C standard library
The library is accessed by the loader flag -ll. So an appropriate set of commands to lex source is "cc lex.yy.c -ll" The resulting program is placed on the usual file a.out for later execution

in the next post i will post about how the other inbuilt functions in lex work. i hope the the above article would be helpful in understanding lex

hi, i m posting here about lex and yacc ,as per my knowledge i m posting here whatever i have understood if there are any mistakes please excuse me,

let us begin our understanding with the definition of lex

what is lex?

lex- is a lexical analyzer generator

Lex is a program generator designed for lexical processing of character input streams .It accepts a high-level, problem oriented specification for character string matching, and produces a program in a general purpose language which recognizes regular expressions. The regular expressions are specified by the user in the source specifications given to Lex. The Lex written code recognizes these expressions in an input stream and partitions the input stream into strings matching the expressions

to put it in more simpler way we can define it as,

lex is a program generator that generates lexical analyzers.

the main job of lexical analysers is to break up an input stream into more usable elements(tokens)

example: a=b+c*d;

compilation sequence of above expression in lex is as shown below

now by looking at above example we can say

Lexical analysers tokenise input streams

tokens are the terminals of a language,we use regular expressions to define these tokens/terminals

we can notice that as we go on understanding lex we are redifining it's definition ,

now we will give more simpler definition which will suit our context of use in the ss lab

lex is a program (generator) that generates lexical analyzers, (widely used on Unix).

It is mostly used with Yacc parser generator.

Written by Eric Schmidt and Mike Lesk.

It reads the input stream (specifying the lexical analyzer ) and outputs source code implementing the lexical analyzer in the C programming language.

Lex will read patterns (regular expressions); then produces C code for a lexical analyzer that scans for identifiers

i think we hv already spoken more about lex now move to how to write programs for lexical analyser

before we start writing source programs for lexical analyser we should have some knowledge about regular expressions so first i will give some examples of regular expression and then begin begin with lex

example of regular expression

the symbols used while writing regular expressions and there meaning are described below

EXPRESSION MEANING

abc* "ab" followed by zero or any number of "c" ex: ab abc abcc abccc... etc.
abc+ "ab" followed by atleast one or more no. of "c" ex: abc abcc .... etc
a(bc)+ "a" followed by atleast one or more "bcex: abc abcbc abcbcbc... etc
a(bc) ? "a" followed by zero or one "bc" ex: a abc (it can accept only these two)
[abc] either "a" or "b" or "c" i.e one out of a,b,c
[a-z] any letter between "a-z" ex: a,b,c,d...............,x,y,z
[a\-z] either "a" or "-" or "z" i.e. (one of a,-,z) here we are using backslash because if we give it as [a-z] it means any letter between "a to z"
[-az] this is another way of defining above expression it means one outof-,a,z
[^ab] anything except "ab"
a|b either "a" or "b"

i hope above examples convey the meanings of regular expressions easily

i will post how to write programs for lex and how they work in next post

asdsadsaddasdsdsadsadasdsadsadsdsdsadsadsad<strike>cvzxcdsfdsfdf<font color="#444444">sadsad</font></strike>lex and yacc

blog to make students extreme creative

Total Pageviews

Saturday, 17 March 2012

how to write programs for lex and program structure of lex

An introduction to lex and yacc

Contributors