Question: What is input buffering? Explain technique of buffer pair. OR
Question: Which technique is used for speeding up the lexical analyzer?
There are mainly two techniques for input buffering,
Question: Which technique is used for speeding up the lexical analyzer?
There are mainly two techniques for input buffering,
- Buffer pair
- Sentinels
1. Buffer pair:
- The lexical analysis scans the input string from left to right one character at a time.
- So, specialized buffering techniques have been developed to reduce the amount of overhead required to process a single input character.
- We use a buffer divided into two N-character halves, as shown in figure 2.2. N is the number of character on one disk block.
- We read N input character into each half of the buffer.
- Two pointers to the input are maintained and string between current lexemes.
- Pointer Lexeme Begin, marks the beginning of the current lexeme.
- Pointer Forward, scans ahead until a pattern match is found.
- If forward pointer is at the end of first buffer half then second is filled with N input character.
- If forward pointer is at the end of second buffer half then first is filled with N input character.
code to advance forward pointer is given below,
if forward at end of first half then begin
reload second half;
forward := forward + 1;
end
else if forward at end of second half then begin
reload first half;
move forward to beginning of first half;
end
else forward := forward + 1;
Once the next lexeme is determined, forward is set to character at its right end. Then, after the lexeme is recorded as an attribute value of a token returned to the parser, Lexeme Begin is set to the character immediately after the lexeme just found.
2. Sentinels:
- If we use the scheme of Buffer pairs we must check, each time we move the forward pointer that we have not moved off one of the buffers; if we do, then we must reload the other buffer. Thus, for each character read, we make two tests.
- We can combine the buffer-end test with the test for the current character if we extend each buffer to hold a sentinel character at the end. The sentinel is a special character that cannot be part of the source program, and a natural choice is the character EOF.
- Look ahead code with sentinels is given below:
forward := forward + 1;
if forward = eof then begin
if forward at end of first half then begin
reload second half;
forward := forward + 1;
end
else if forward at the second half then begin
reload first half;
move forward to beginning of first half;
end
else terminate lexical analysis;
end;