Package dns :: Module tokenizer :: Class Tokenizer
[show private | hide private]
[frames | no frames]

Class Tokenizer

object --+
         |
        Tokenizer


A DNS master file format tokenizer.

A token is a (type, value) tuple, where type is an int, and value is a string. The valid types are EOF, EOL, WHITESPACE, IDENTIFIER, QUOTED_STRING, COMMENT, and DELIMITER.
Method Summary
  __init__(self, f)
Initialize a tokenizer instance.
  __iter__(self)
a new object with type S, a subtype of T __new__(S, ...)
(int, string) tuple get(self, want_leading, want_comment)
Get the next token.
string get_eol(self)
Read the next token and raise an exception if it isn't EOL or EOF.
int get_int(self)
Read the next token and interpret it as an integer.
dns.name.Name object get_name(self, origin)
Read the next token and interpret it as a DNS name.
string get_string(self, origin)
Read the next token and interpret it as a string.
int get_uint16(self)
Read the next token and interpret it as a 16-bit unsigned integer.
int get_uint32(self)
Read the next token and interpret it as a 32-bit unsigned integer.
int get_uint8(self)
Read the next token and interpret it as an 8-bit unsigned integer.
(int, string) next(self)
Return the next item in an iteration.
int skip_whitespace(self)
Consume input until a non-whitespace character is encountered.
  unget(self, token)
Unget a token.

Instance Variable Summary
dict delimiters - The current delimiter dictionary.
bool eof - This variable is true if the tokenizer has encountered EOF.
file file - The file tokenize
int multiline - The current multiline level.
bool quoting - This variable is true if the tokenizer is currently reading a quoted string.
string ungotten_char - The most recently ungotten character, or None.
string ungotten_token - The most recently ungotten token, or None.

Method Details

__init__(self, f=<epydoc.imports._DevNull instance at 0x...)
(Constructor)

Initialize a tokenizer instance.
Parameters:
f - The file to tokenize. The default is sys.stdin. This parameter may also be a string, in which case the tokenizer will take its input from the contents of the string.
           (type=file or string)

__new__(S, ...)

Returns:
a new object with type S, a subtype of T

get(self, want_leading=0, want_comment=0)

Get the next token.
Parameters:
want_leading - If True, return a WHITESPACE token if the first character read is whitespace. The default is False.
           (type=bool)
want_comment - If True, return a COMMENT token if the first token read is a comment. The default is False.
           (type=bool)
Returns:
(int, string) tuple
Raises:
dns.exception.UnexpectedEnd - input ended prematurely
dns.exception.SyntaxError - input was badly formed

get_eol(self)

Read the next token and raise an exception if it isn't EOL or EOF.
Returns:
string
Raises:
dns.exception.SyntaxError -

get_int(self)

Read the next token and interpret it as an integer.
Returns:
int
Raises:
dns.exception.SyntaxError -

get_name(self, origin=None)

Read the next token and interpret it as a DNS name.
Returns:
dns.name.Name object
Raises:
dns.exception.SyntaxError -

get_string(self, origin=None)

Read the next token and interpret it as a string.
Returns:
string
Raises:
dns.exception.SyntaxError -

get_uint16(self)

Read the next token and interpret it as a 16-bit unsigned integer.
Returns:
int
Raises:
dns.exception.SyntaxError -

get_uint32(self)

Read the next token and interpret it as a 32-bit unsigned integer.
Returns:
int
Raises:
dns.exception.SyntaxError -

get_uint8(self)

Read the next token and interpret it as an 8-bit unsigned integer.
Returns:
int
Raises:
dns.exception.SyntaxError -

next(self)

Return the next item in an iteration.
Returns:
(int, string)

skip_whitespace(self)

Consume input until a non-whitespace character is encountered.

The non-whitespace character is then ungotten, and the number of whitespace characters consumed is returned.

If the tokenizer is in multiline mode, then newlines are whitespace.
Returns:
int

unget(self, token)

Unget a token.

The unget buffer for tokens is only one token large; it is an error to try to unget a token when the unget buffer is not empty.
Parameters:
token - the token to unget
           (type=string)
Raises:
UngetBufferFull - there is already an ungotten char

Instance Variable Details

delimiters

The current delimiter dictionary.
Type:
dict

eof

This variable is true if the tokenizer has encountered EOF.
Type:
bool

file

The file tokenize
Type:
file

multiline

The current multiline level. This value is increased by one every time a '(' delimiter is read, and decreased by one every time a ')' delimiter is read.
Type:
int

quoting

This variable is true if the tokenizer is currently reading a quoted string.
Type:
bool

ungotten_char

The most recently ungotten character, or None.
Type:
string

ungotten_token

The most recently ungotten token, or None.
Type:
string

Generated by Epydoc 1.1 on Sat Jul 19 02:54:29 2003 http://epydoc.sf.net