- class QXmlStreamReader#
The
QXmlStreamReader
class provides a fast parser for reading well-formed XML via a simple streaming API. More…Synopsis#
Methods#
def
__init__()
def
addData()
def
atEnd()
def
attributes()
def
clear()
def
columnNumber()
def
device()
def
dtdName()
def
dtdPublicId()
def
dtdSystemId()
def
entityResolver()
def
error()
def
errorString()
def
hasError()
def
isCDATA()
def
isCharacters()
def
isComment()
def
isDTD()
def
isEndDocument()
def
isEndElement()
def
isStartElement()
def
isWhitespace()
def
lineNumber()
def
name()
def
namespaceUri()
def
prefix()
def
qualifiedName()
def
raiseError()
def
readNext()
def
setDevice()
def
text()
def
tokenString()
def
tokenType()
Note
This documentation may contain snippets that were automatically translated from C++ to Python. We always welcome contributions to the snippet translation. If you see an issue with the translation, you can also let us know by creating a ticket on https:/bugreports.qt.io/projects/PYSIDE
Detailed Description#
Warning
This section contains snippets that were automatically translated from C++ to Python and may contain errors.
QXmlStreamReader
provides a simple streaming API to parse well-formed XML. It is an alternative to first loading the complete XML into a DOM tree (see QDomDocument).QXmlStreamReader
reads data either from aQIODevice
(seesetDevice()
), or from a rawQByteArray
(seeaddData()
).Qt provides
QXmlStreamWriter
for writing XML.The basic concept of a stream reader is to report an XML document as a stream of tokens, similar to SAX. The main difference between
QXmlStreamReader
and SAX is how these XML tokens are reported. With SAX, the application must provide handlers (callback functions) that receive so-called XML events from the parser at the parser’s convenience. WithQXmlStreamReader
, the application code itself drives the loop and pulls tokens from the reader, one after another, as it needs them. This is done by callingreadNext()
, where the reader reads from the input stream until it completes the next token, at which point it returns thetokenType()
. A set of convenient functions includingisStartElement()
andtext()
can then be used to examine the token to obtain information about what has been read. The big advantage of this pulling approach is the possibility to build recursive descent parsers with it, meaning you can split your XML parsing code easily into different methods or classes. This makes it easy to keep track of the application’s own state when parsing XML.A typical loop with
QXmlStreamReader
looks like this:xml = QXmlStreamReader() ... while not xml.atEnd(): xml.readNext() ... // do processing if xml.hasError(): ... // do error handling
QXmlStreamReader
is a well-formed XML 1.0 parser that does not include external parsed entities. As long as no error occurs, the application code can thus be assured, thatthe data provided by the stream reader satisfies the W3C’s criteria for well-formed XML,
tokens are provided in a valid order.
Unless
QXmlStreamReader
raises an error, it guarantees the following:All tags are nested and closed properly.
References to internal entities have been replaced with the correct replacement text.
Attributes have been normalized or added according to the internal subset of the
DTD
.Tokens of type
StartDocument
happen before all others, aside from comments and processing instructions.At most one DOCTYPE element (a token of type
DTD
) is present.If present, the DOCTYPE appears before all other elements, aside from
StartDocument
, comments and processing instructions.
In particular, once any token of type
StartElement
,EndElement
,Characters
,EntityReference
orEndDocument
is seen, no tokens of typeStartDocument
or DTD will be seen. If one is present in the input stream, out of order, an error is raised.Note
The token types
Comment
andProcessingInstruction
may appear anywhere in the stream.If an error occurs while parsing,
atEnd()
andhasError()
return true, anderror()
returns the error that occurred. The functionserrorString()
,lineNumber()
,columnNumber()
, andcharacterOffset()
are for constructing an appropriate error or warning message. To simplify application code,QXmlStreamReader
contains araiseError()
mechanism that lets you raise custom errors that trigger the same error handling described.The QXmlStream Bookmarks Example illustrates how to use the recursive descent technique to read an XML bookmark file (XBEL) with a stream reader.
Namespaces#
QXmlStream understands and resolves XML namespaces. E.g. in case of a
StartElement
,namespaceUri()
returns the namespace the element is in, andname()
returns the element’s local name. The combination ofnamespaceUri
and name uniquely identifies an element. If a namespace prefix was not declared in the XML entities parsed by the reader, thenamespaceUri
is empty.If you parse XML data that does not utilize namespaces according to the XML specification or doesn’t use namespaces at all, you can use the element’s
qualifiedName()
instead. A qualified name is the element’sprefix()
followed by colon followed by the element’s localname()
- exactly like the element appears in the raw XML data. Since the mappingnamespaceUri
to prefix is neither unique nor universal,qualifiedName()
should be avoided for namespace-compliant XML data.In order to parse standalone documents that do use undeclared namespace prefixes, you can turn off namespace processing completely with the
namespaceProcessing
property.Incremental Parsing#
QXmlStreamReader
is an incremental parser. It can handle the case where the document can’t be parsed all at once because it arrives in chunks (e.g. from multiple files, or over a network connection). When the reader runs out of data before the complete document has been parsed, it reports aPrematureEndOfDocumentError
. When more data arrives, either because of a call toaddData()
or because more data is available through the networkdevice()
, the reader recovers from thePrematureEndOfDocumentError
error and continues parsing the new data with the next call toreadNext()
.For example, if your application reads data from the network using a network access manager, you would issue a network request to the manager and receive a network reply in return. Since a QNetworkReply is a
QIODevice
, you connect itsreadyRead()
signal to a custom slot, e.g.slotReadyRead()
in the code snippet shown in the discussion for QNetworkAccessManager. In this slot, you read all available data withreadAll()
and pass it to the XML stream reader usingaddData()
. Then you call your custom parsing function that reads the XML events from the reader.Performance and Memory Consumption#
QXmlStreamReader
is memory-conservative by design, since it doesn’t store the entire XML document tree in memory, but only the current token at the time it is reported. In addition,QXmlStreamReader
avoids the many small string allocations that it normally takes to map an XML document to a convenient and Qt-ish API. It does this by reporting all string data asQStringView
rather than realQString
objects. CallingtoString()
on any of those objects returns an equivalent realQString
object.- class TokenType#
This enum specifies the type of token the reader just read.
Constant
Description
QXmlStreamReader.NoToken
The reader has not yet read anything.
QXmlStreamReader.Invalid
An error has occurred, reported in
error()
anderrorString()
.QXmlStreamReader.StartDocument
The reader reports the XML version number in
documentVersion()
, and the encoding as specified in the XML document indocumentEncoding()
. If the document is declared standalone,isStandaloneDocument()
returnstrue
; otherwise it returnsfalse
.QXmlStreamReader.EndDocument
The reader reports the end of the document.
QXmlStreamReader.StartElement
The reader reports the start of an element with
namespaceUri()
andname()
. Empty elements are also reported as StartElement, followed directly by EndElement. The convenience functionreadElementText()
can be called to concatenate all content until the corresponding EndElement. Attributes are reported inattributes()
, namespace declarations innamespaceDeclarations()
.QXmlStreamReader.EndElement
The reader reports the end of an element with
namespaceUri()
andname()
.QXmlStreamReader.Characters
The reader reports characters in
text()
. If the characters are all white-space,isWhitespace()
returnstrue
. If the characters stem from a CDATA section,isCDATA()
returnstrue
.QXmlStreamReader.Comment
The reader reports a comment in
text()
.QXmlStreamReader.DTD
The reader reports a DTD in
text()
, notation declarations innotationDeclarations()
, and entity declarations inentityDeclarations()
. Details of the DTD declaration are reported indtdName()
,dtdPublicId()
, anddtdSystemId()
.QXmlStreamReader.EntityReference
The reader reports an entity reference that could not be resolved. The name of the reference is reported in
name()
, the replacement text intext()
.QXmlStreamReader.ProcessingInstruction
The reader reports a processing instruction in
processingInstructionTarget()
andprocessingInstructionData()
.
- class ReadElementTextBehaviour#
This enum specifies the different behaviours of
readElementText()
.Constant
Description
QXmlStreamReader.ErrorOnUnexpectedElement
Raise an
UnexpectedElementError
and return what was read so far when a child element is encountered.QXmlStreamReader.IncludeChildElements
Recursively include the text from child elements.
QXmlStreamReader.SkipChildElements
Skip child elements.
New in version 4.6.
- class Error#
This enum specifies different error cases
Constant
Description
QXmlStreamReader.NoError
No error has occurred.
QXmlStreamReader.CustomError
A custom error has been raised with
raiseError()
QXmlStreamReader.NotWellFormedError
The parser internally raised an error due to the read XML not being well-formed.
QXmlStreamReader.PrematureEndOfDocumentError
The input stream ended before a well-formed XML document was parsed. Recovery from this error is possible if more XML arrives in the stream, either by calling
addData()
or by waiting for it to arrive on thedevice()
.QXmlStreamReader.UnexpectedElementError
The parser encountered an element or token that was different to those it expected.
- __init__(data)#
- Parameters:
data – str
Creates a new stream reader that reads from
data
.Note
In Qt versions prior to 6.5, this constructor was overloaded for
QString
andconst char*
.See also
- __init__()
Constructs a stream reader.
See also
- __init__(device)
- Parameters:
device –
QIODevice
Creates a new stream reader that reads from
device
.See also
- addData(data)#
- Parameters:
data – str
Adds more
data
for the reader to read. This function does nothing if the reader has adevice()
.Note
In Qt versions prior to 6.5, this function was overloaded for
QString
andconst char*
.See also
- addExtraNamespaceDeclaration(extraNamespaceDeclaraction)#
- Parameters:
extraNamespaceDeclaraction –
QXmlStreamNamespaceDeclaration
Adds an
extraNamespaceDeclaration
. The declaration will be valid for children of the current element, or - should the function be called before any elements are read - for the entire XML document.- addExtraNamespaceDeclarations(extraNamespaceDeclaractions)#
- Parameters:
extraNamespaceDeclaractions – .list of QXmlStreamNamespaceDeclaration
Adds a vector of declarations specified by
extraNamespaceDeclarations
.- atEnd()#
- Return type:
bool
Returns
true
if the reader has read until the end of the XML document, or if anerror()
has occurred and reading has been aborted. Otherwise, it returnsfalse
.When atEnd() and
hasError()
return true anderror()
returnsPrematureEndOfDocumentError
, it means the XML has been well-formed so far, but a complete XML document has not been parsed. The next chunk of XML can be added withaddData()
, if the XML is being read from aQByteArray
, or by waiting for more data to arrive if the XML is being read from aQIODevice
. Either way, atEnd() will return false once more data is available.See also
- attributes()#
- Return type:
Returns the attributes of a
StartElement
.- characterOffset()#
- Return type:
int
Returns the current character offset, starting with 0.
See also
- clear()#
Removes any
device()
or data from the reader and resets its internal state to the initial state.See also
- columnNumber()#
- Return type:
int
Returns the current column number, starting with 0.
See also
Returns the current device associated with the
QXmlStreamReader
, orNone
if no device has been assigned.See also
- documentEncoding()#
- Return type:
str
If the
tokenType()
isStartDocument
, this function returns the encoding string as specified in the XML declaration. Otherwise an empty string is returned.- documentVersion()#
- Return type:
str
If the
tokenType()
isStartDocument
, this function returns the version string as specified in the XML declaration. Otherwise an empty string is returned.- dtdName()#
- Return type:
str
If the
tokenType()
isDTD
, this function returns the DTD’s name. Otherwise an empty string is returned.- dtdPublicId()#
- Return type:
str
If the
tokenType()
isDTD
, this function returns the DTD’s public identifier. Otherwise an empty string is returned.- dtdSystemId()#
- Return type:
str
If the
tokenType()
isDTD
, this function returns the DTD’s system identifier. Otherwise an empty string is returned.- entityDeclarations()#
- Return type:
.list of QXmlStreamEntityDeclaration
If the
tokenType()
isDTD
, this function returns the DTD’s unparsed (external) entity declarations. Otherwise an empty vector is returned.The
QXmlStreamEntityDeclarations
class is defined to be aQList
ofQXmlStreamEntityDeclaration
.- entityExpansionLimit()#
- Return type:
int
Returns the maximum amount of characters a single entity is allowed to expand into. If a single entity expands past the given limit, the document is not considered well formed.
See also
- entityResolver()#
- Return type:
Returns the entity resolver, or
None
if there is no entity resolver.See also
Returns the type of the current error, or
NoError
if no error occurred.See also
- errorString()#
- Return type:
str
Returns the error message that was set with
raiseError()
.- hasError()#
- Return type:
bool
Returns
true
if an error has occurred, otherwisefalse
.See also
- hasStandaloneDeclaration()#
- Return type:
bool
Returns
true
if this document has an explicit standalone declaration (can be ‘yes’ or ‘no’); otherwise returnsfalse
;If no XML declaration has been parsed, this function returns
false
.See also
- isCDATA()#
- Return type:
bool
Returns
true
if the reader reports characters that stem from a CDATA section; otherwise returnsfalse
.See also
- isCharacters()#
- Return type:
bool
Returns
true
iftokenType()
equalsCharacters
; otherwise returnsfalse
.See also
- isComment()#
- Return type:
bool
Returns
true
iftokenType()
equalsComment
; otherwise returnsfalse
.- isDTD()#
- Return type:
bool
Returns
true
iftokenType()
equalsDTD
; otherwise returnsfalse
.- isEndDocument()#
- Return type:
bool
Returns
true
iftokenType()
equalsEndDocument
; otherwise returnsfalse
.- isEndElement()#
- Return type:
bool
Returns
true
iftokenType()
equalsEndElement
; otherwise returnsfalse
.- isEntityReference()#
- Return type:
bool
Returns
true
iftokenType()
equalsEntityReference
; otherwise returnsfalse
.- isProcessingInstruction()#
- Return type:
bool
Returns
true
iftokenType()
equalsProcessingInstruction
; otherwise returnsfalse
.- isStandaloneDocument()#
- Return type:
bool
Returns
true
if this document has been declared standalone in the XML declaration; otherwise returnsfalse
.If no XML declaration has been parsed, this function returns
false
.See also
- isStartDocument()#
- Return type:
bool
Returns
true
iftokenType()
equalsStartDocument
; otherwise returnsfalse
.- isStartElement()#
- Return type:
bool
Returns
true
iftokenType()
equalsStartElement
; otherwise returnsfalse
.- isWhitespace()#
- Return type:
bool
Returns
true
if the reader reports characters that only consist of white-space; otherwise returnsfalse
.See also
- lineNumber()#
- Return type:
int
Returns the current line number, starting with 1.
See also
- name()#
- Return type:
str
Returns the local name of a
StartElement
,EndElement
, or anEntityReference
.See also
- namespaceDeclarations()#
- Return type:
.list of QXmlStreamNamespaceDeclaration
If the
tokenType()
isStartElement
, this function returns the element’s namespace declarations. Otherwise an empty vector is returned.The
QXmlStreamNamespaceDeclarations
class is defined to be aQList
ofQXmlStreamNamespaceDeclaration
.- namespaceProcessing()#
- Return type:
bool
See also
- namespaceUri()#
- Return type:
str
Returns the namespaceUri of a
StartElement
orEndElement
.See also
- notationDeclarations()#
- Return type:
.list of QXmlStreamNotationDeclaration
If the
tokenType()
isDTD
, this function returns the DTD’s notation declarations. Otherwise an empty vector is returned.The
QXmlStreamNotationDeclarations
class is defined to be aQList
ofQXmlStreamNotationDeclaration
.- prefix()#
- Return type:
str
Returns the prefix of a
StartElement
orEndElement
.See also
- processingInstructionData()#
- Return type:
str
Returns the data of a
ProcessingInstruction
.- processingInstructionTarget()#
- Return type:
str
Returns the target of a
ProcessingInstruction
.- qualifiedName()#
- Return type:
str
Returns the qualified name of a
StartElement
orEndElement
;A qualified name is the raw name of an element in the XML data. It consists of the namespace prefix, followed by colon, followed by the element’s local name. Since the namespace prefix is not unique (the same prefix can point to different namespaces and different prefixes can point to the same namespace), you shouldn’t use qualifiedName(), but the resolved
namespaceUri()
and the attribute’s localname()
.See also
- raiseError([message=""])#
- Parameters:
message – str
Raises a custom error with an optional error
message
.See also
- readElementText([behaviour=QXmlStreamReader.ReadElementTextBehaviour.ErrorOnUnexpectedElement])#
- Parameters:
behaviour –
ReadElementTextBehaviour
- Return type:
str
Convenience function to be called in case a
StartElement
was read. Reads until the correspondingEndElement
and returns all text in-between. In case of no error, the current token (seetokenType()
) after having called this function isEndElement
.The function concatenates
text()
when it reads eitherCharacters
orEntityReference
tokens, but skipsProcessingInstruction
andComment
. If the current token is notStartElement
, an empty string is returned.The
behaviour
defines what happens in case anything else is read before reachingEndElement
. The function can include the text from child elements (useful for example for HTML), ignore child elements, or raise anUnexpectedElementError
and return what was read so far (default).Reads the next token and returns its type.
With one exception, once an
error()
is reported by readNext(), further reading of the XML stream is not possible. ThenatEnd()
returnstrue
,hasError()
returnstrue
, and this function returnsInvalid
.The exception is when
error()
returnsPrematureEndOfDocumentError
. This error is reported when the end of an otherwise well-formed chunk of XML is reached, but the chunk doesn’t represent a complete XML document. In that case, parsing can be resumed by callingaddData()
to add the next chunk of XML, when the stream is being read from aQByteArray
, or by waiting for more data to arrive when the stream is being read from adevice()
.See also
- readNextStartElement()#
- Return type:
bool
Reads until the next start element within the current element. Returns
true
when a start element was reached. When the end element was reached, or when an error occurred, false is returned.The current element is the element matching the most recently parsed start element of which a matching end element has not yet been reached. When the parser has reached the end element, the current element becomes the parent element.
This is a convenience function for when you’re only concerned with parsing XML elements. The QXmlStream Bookmarks Example makes extensive use of this function.
See also
Sets the current device to
device
. Setting the device resets the stream to its initial state.- setEntityExpansionLimit(limit)#
- Parameters:
limit – int
Sets the maximum amount of characters a single entity is allowed to expand into to
limit
. If a single entity expands past the given limit, the document is not considered well formed.The limit is there to prevent DoS attacks when loading unknown XML documents where recursive entity expansion could otherwise exhaust all available memory.
The default value for this property is 4096 characters.
See also
- setEntityResolver(resolver)#
- Parameters:
resolver –
QXmlStreamEntityResolver
Makes
resolver
the newentityResolver()
.The stream reader does not take ownership of the resolver. It’s the callers responsibility to ensure that the resolver is valid during the entire life-time of the stream reader object, or until another resolver or
None
is set.See also
- setNamespaceProcessing(arg__1)#
- Parameters:
arg__1 – bool
See also
- skipCurrentElement()#
Reads until the end of the current element, skipping any child nodes. This function is useful for skipping unknown elements.
The current element is the element matching the most recently parsed start element of which a matching end element has not yet been reached. When the parser has reached the end element, the current element becomes the parent element.
- text()#
- Return type:
str
Returns the text of
Characters
,Comment
,DTD
, orEntityReference
.- tokenString()#
- Return type:
str
Returns the reader’s current token as string.
See also
Returns the type of the current token.
The current token can also be queried with the convenience functions
isStartDocument()
,isEndDocument()
,isStartElement()
,isEndElement()
,isCharacters()
,isComment()
,isDTD()
,isEntityReference()
, andisProcessingInstruction()
.See also