Package org.apache.xerces.impl
Class XMLScanner
- java.lang.Object
-
- org.apache.xerces.impl.XMLScanner
-
- All Implemented Interfaces:
org.apache.xerces.xni.parser.XMLComponent
- Direct Known Subclasses:
XMLDocumentFragmentScannerImpl,XMLDTDScannerImpl
public abstract class XMLScanner extends java.lang.Object implements org.apache.xerces.xni.parser.XMLComponentThis class is responsible for holding scanning methods common to scanning the XML document structure and content as well as the DTD structure and content. Both XMLDocumentScanner and XMLDTDScanner inherit from this base class.This component requires the following features and properties from the component manager that uses it:
- http://xml.org/sax/features/validation
- http://xml.org/sax/features/namespaces
- http://apache.org/xml/features/scanner/notify-char-refs
- http://apache.org/xml/properties/internal/symbol-table
- http://apache.org/xml/properties/internal/error-reporter
- http://apache.org/xml/properties/internal/entity-manager
INTERNAL:
- Usage of this class is not supported. It may be altered or removed at any time.
- Version:
- $Id: XMLScanner.java 1499506 2013-07-03 18:29:43Z mrglavas $
- Author:
- Andy Clark, IBM, Arnaud Le Hors, IBM, Eric Ye, IBM
-
-
Field Summary
Fields Modifier and Type Field Description protected static booleanDEBUG_ATTR_NORMALIZATIONDebug attribute normalization.protected static java.lang.StringENTITY_MANAGERProperty identifier: entity manager.protected static java.lang.StringERROR_REPORTERProperty identifier: error reporter.protected static java.lang.StringfAmpSymbolSymbol: "amp".protected static java.lang.StringfAposSymbolSymbol: "apos".protected java.lang.StringfCharRefLiteralLiteral value of the last character refence scanned.protected static java.lang.StringfEncodingSymbolSymbol: "encoding".protected intfEntityDepthEntity depth.protected XMLEntityManagerfEntityManagerEntity manager.protected XMLEntityScannerfEntityScannerEntity scanner.protected XMLErrorReporterfErrorReporterError reporter.protected static java.lang.StringfGtSymbolSymbol: "gt".protected static java.lang.StringfLtSymbolSymbol: "lt".protected booleanfNamespacesNamespaces.protected booleanfNotifyCharRefsCharacter references notification.protected booleanfParserSettingsInternal parser-settings featureprotected static java.lang.StringfQuotSymbolSymbol: "quot".protected booleanfReportEntityReport entity boundary.protected XMLResourceIdentifierImplfResourceIdentifierprotected booleanfScanningAttributeScanning attribute.protected static java.lang.StringfStandaloneSymbolSymbol: "standalone".protected SymbolTablefSymbolTableSymbol table.protected booleanfValidationValidation.protected static java.lang.StringfVersionSymbolSymbol: "version".protected static java.lang.StringNAMESPACESFeature identifier: namespaces.protected static java.lang.StringNOTIFY_CHAR_REFSFeature identifier: notify character references.protected static java.lang.StringPARSER_SETTINGSprotected static java.lang.StringSYMBOL_TABLEProperty identifier: symbol table.protected static java.lang.StringVALIDATIONFeature identifier: validation.
-
Constructor Summary
Constructors Constructor Description XMLScanner()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidendEntity(java.lang.String name, org.apache.xerces.xni.Augmentations augs)This method notifies the end of an entity.booleangetFeature(java.lang.String featureId)protected java.lang.StringgetVersionNotSupportedKey()protected booleanisInvalid(int value)protected booleanisInvalidLiteral(int value)protected intisUnchangedByNormalization(org.apache.xerces.xni.XMLString value)Checks whether this string would be unchanged by normalization.protected booleanisValidNameChar(int value)protected booleanisValidNameStartChar(int value)protected booleanisValidNameStartHighSurrogate(int value)protected booleanisValidNCName(int value)protected voidnormalizeWhitespace(org.apache.xerces.xni.XMLString value)Normalize whitespace in an XMLString converting all whitespace characters to space characters.protected voidnormalizeWhitespace(org.apache.xerces.xni.XMLString value, int fromIndex)Normalize whitespace in an XMLString converting all whitespace characters to space characters.protected voidreportFatalError(java.lang.String msgId, java.lang.Object[] args)Convenience function used in all XML scanners.protected voidreset()voidreset(org.apache.xerces.xni.parser.XMLComponentManager componentManager)Resets the component.protected booleanscanAttributeValue(org.apache.xerces.xni.XMLString value, org.apache.xerces.xni.XMLString nonNormalizedValue, java.lang.String atName, boolean checkEntities, java.lang.String eleName)Scans an attribute value and normalizes whitespace converting all whitespace characters to space characters.protected intscanCharReferenceValue(XMLStringBuffer buf, XMLStringBuffer buf2)Scans a character reference and append the corresponding chars to the specified buffer.protected voidscanComment(XMLStringBuffer text)Scans a comment.protected voidscanExternalID(java.lang.String[] identifiers, boolean optionalSystemId)Scans External ID and return the public and system IDs.protected voidscanPI()Scans a processing instruction.protected voidscanPIData(java.lang.String target, org.apache.xerces.xni.XMLString data)Scans a processing data.java.lang.StringscanPseudoAttribute(boolean scanningTextDecl, org.apache.xerces.xni.XMLString value)Scans a pseudo attribute.protected booleanscanPubidLiteral(org.apache.xerces.xni.XMLString literal)Scans public ID literal.protected booleanscanSurrogates(XMLStringBuffer buf)Scans surrogates and append them to the specified buffer.protected voidscanXMLDeclOrTextDecl(boolean scanningTextDecl, java.lang.String[] pseudoAttributeValues)Scans an XML or text declaration.voidsetFeature(java.lang.String featureId, boolean value)Sets the state of a feature.voidsetProperty(java.lang.String propertyId, java.lang.Object value)Sets the value of a property during parsing.voidstartEntity(java.lang.String name, org.apache.xerces.xni.XMLResourceIdentifier identifier, java.lang.String encoding, org.apache.xerces.xni.Augmentations augs)This method notifies of the start of an entity.protected booleanversionSupported(java.lang.String version)
-
-
-
Field Detail
-
VALIDATION
protected static final java.lang.String VALIDATION
Feature identifier: validation.- See Also:
- Constant Field Values
-
NAMESPACES
protected static final java.lang.String NAMESPACES
Feature identifier: namespaces.- See Also:
- Constant Field Values
-
NOTIFY_CHAR_REFS
protected static final java.lang.String NOTIFY_CHAR_REFS
Feature identifier: notify character references.- See Also:
- Constant Field Values
-
PARSER_SETTINGS
protected static final java.lang.String PARSER_SETTINGS
- See Also:
- Constant Field Values
-
SYMBOL_TABLE
protected static final java.lang.String SYMBOL_TABLE
Property identifier: symbol table.- See Also:
- Constant Field Values
-
ERROR_REPORTER
protected static final java.lang.String ERROR_REPORTER
Property identifier: error reporter.- See Also:
- Constant Field Values
-
ENTITY_MANAGER
protected static final java.lang.String ENTITY_MANAGER
Property identifier: entity manager.- See Also:
- Constant Field Values
-
DEBUG_ATTR_NORMALIZATION
protected static final boolean DEBUG_ATTR_NORMALIZATION
Debug attribute normalization.- See Also:
- Constant Field Values
-
fValidation
protected boolean fValidation
Validation. This feature identifier is: http://xml.org/sax/features/validation
-
fNamespaces
protected boolean fNamespaces
Namespaces.
-
fNotifyCharRefs
protected boolean fNotifyCharRefs
Character references notification.
-
fParserSettings
protected boolean fParserSettings
Internal parser-settings feature
-
fSymbolTable
protected SymbolTable fSymbolTable
Symbol table.
-
fErrorReporter
protected XMLErrorReporter fErrorReporter
Error reporter.
-
fEntityManager
protected XMLEntityManager fEntityManager
Entity manager.
-
fEntityScanner
protected XMLEntityScanner fEntityScanner
Entity scanner.
-
fEntityDepth
protected int fEntityDepth
Entity depth.
-
fCharRefLiteral
protected java.lang.String fCharRefLiteral
Literal value of the last character refence scanned.
-
fScanningAttribute
protected boolean fScanningAttribute
Scanning attribute.
-
fReportEntity
protected boolean fReportEntity
Report entity boundary.
-
fVersionSymbol
protected static final java.lang.String fVersionSymbol
Symbol: "version".
-
fEncodingSymbol
protected static final java.lang.String fEncodingSymbol
Symbol: "encoding".
-
fStandaloneSymbol
protected static final java.lang.String fStandaloneSymbol
Symbol: "standalone".
-
fAmpSymbol
protected static final java.lang.String fAmpSymbol
Symbol: "amp".
-
fLtSymbol
protected static final java.lang.String fLtSymbol
Symbol: "lt".
-
fGtSymbol
protected static final java.lang.String fGtSymbol
Symbol: "gt".
-
fQuotSymbol
protected static final java.lang.String fQuotSymbol
Symbol: "quot".
-
fAposSymbol
protected static final java.lang.String fAposSymbol
Symbol: "apos".
-
fResourceIdentifier
protected final XMLResourceIdentifierImpl fResourceIdentifier
-
-
Method Detail
-
reset
public void reset(org.apache.xerces.xni.parser.XMLComponentManager componentManager) throws org.apache.xerces.xni.parser.XMLConfigurationExceptionDescription copied from interface:org.apache.xerces.xni.parser.XMLComponentResets the component. The component can query the component manager about any features and properties that affect the operation of the component.- Specified by:
resetin interfaceorg.apache.xerces.xni.parser.XMLComponent- Parameters:
componentManager- The component manager.- Throws:
SAXException- Throws exception if required features and properties cannot be found.org.apache.xerces.xni.parser.XMLConfigurationException
-
setProperty
public void setProperty(java.lang.String propertyId, java.lang.Object value) throws org.apache.xerces.xni.parser.XMLConfigurationExceptionSets the value of a property during parsing.- Specified by:
setPropertyin interfaceorg.apache.xerces.xni.parser.XMLComponent- Parameters:
propertyId-value-- Throws:
org.apache.xerces.xni.parser.XMLConfigurationException- Thrown for configuration error. In general, components should only throw this exception if it is really a critical error.
-
setFeature
public void setFeature(java.lang.String featureId, boolean value) throws org.apache.xerces.xni.parser.XMLConfigurationExceptionDescription copied from interface:org.apache.xerces.xni.parser.XMLComponentSets the state of a feature. This method is called by the component manager any time after reset when a feature changes state.Note: Components should silently ignore features that do not affect the operation of the component.
- Specified by:
setFeaturein interfaceorg.apache.xerces.xni.parser.XMLComponent- Parameters:
featureId- The feature identifier.value- The state of the feature.- Throws:
org.apache.xerces.xni.parser.XMLConfigurationException- Thrown for configuration error. In general, components should only throw this exception if it is really a critical error.
-
getFeature
public boolean getFeature(java.lang.String featureId) throws org.apache.xerces.xni.parser.XMLConfigurationException- Throws:
org.apache.xerces.xni.parser.XMLConfigurationException
-
reset
protected void reset()
-
scanXMLDeclOrTextDecl
protected void scanXMLDeclOrTextDecl(boolean scanningTextDecl, java.lang.String[] pseudoAttributeValues) throws java.io.IOException, org.apache.xerces.xni.XNIExceptionScans an XML or text declaration.[23] XMLDecl ::= '' [24] VersionInfo ::= S 'version' Eq (' VersionNum ' | " VersionNum ") [80] EncodingDecl ::= S 'encoding' Eq ('"' EncName '"' | "'" EncName "'" ) [81] EncName ::= [A-Za-z] ([A-Za-z0-9._] | '-')* [32] SDDecl ::= S 'standalone' Eq (("'" ('yes' | 'no') "'") | ('"' ('yes' | 'no') '"')) [77] TextDecl ::= ''- Parameters:
scanningTextDecl- True if a text declaration is to be scanned instead of an XML declaration.pseudoAttributeValues- An array of size 3 to return the version, encoding and standalone pseudo attribute values (in that order). Note: This method uses fString, anything in it at the time of calling is lost.- Throws:
java.io.IOExceptionorg.apache.xerces.xni.XNIException
-
scanPseudoAttribute
public java.lang.String scanPseudoAttribute(boolean scanningTextDecl, org.apache.xerces.xni.XMLString value) throws java.io.IOException, org.apache.xerces.xni.XNIExceptionScans a pseudo attribute.- Parameters:
scanningTextDecl- True if scanning this pseudo-attribute for a TextDecl; false if scanning XMLDecl. This flag is needed to report the correct type of error.value- The string to fill in with the attribute value.- Returns:
- The name of the attribute Note: This method uses fStringBuffer2, anything in it at the time of calling is lost.
- Throws:
java.io.IOExceptionorg.apache.xerces.xni.XNIException
-
scanPI
protected void scanPI() throws java.io.IOException, org.apache.xerces.xni.XNIExceptionScans a processing instruction.[16] PI ::= '<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>' [17] PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l'))Note: This method uses fString, anything in it at the time of calling is lost.- Throws:
java.io.IOExceptionorg.apache.xerces.xni.XNIException
-
scanPIData
protected void scanPIData(java.lang.String target, org.apache.xerces.xni.XMLString data) throws java.io.IOException, org.apache.xerces.xni.XNIExceptionScans a processing data. This is needed to handle the situation where a document starts with a processing instruction whose target name starts with "xml". (e.g. xmlfoo) Note: This method uses fStringBuffer, anything in it at the time of calling is lost.- Parameters:
target- The PI targetdata- The string to fill in with the data- Throws:
java.io.IOExceptionorg.apache.xerces.xni.XNIException
-
scanComment
protected void scanComment(XMLStringBuffer text) throws java.io.IOException, org.apache.xerces.xni.XNIException
Scans a comment.[15] Comment ::= '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->'Note: Called after scanning past '<!--' Note: This method uses fString, anything in it at the time of calling is lost.
- Parameters:
text- The buffer to fill in with the text.- Throws:
java.io.IOExceptionorg.apache.xerces.xni.XNIException
-
scanAttributeValue
protected boolean scanAttributeValue(org.apache.xerces.xni.XMLString value, org.apache.xerces.xni.XMLString nonNormalizedValue, java.lang.String atName, boolean checkEntities, java.lang.String eleName) throws java.io.IOException, org.apache.xerces.xni.XNIExceptionScans an attribute value and normalizes whitespace converting all whitespace characters to space characters. [10] AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'"- Parameters:
value- The XMLString to fill in with the value.nonNormalizedValue- The XMLString to fill in with the non-normalized value.atName- The name of the attribute being parsed (for error msgs).checkEntities- true if undeclared entities should be reported as VC violation, false if undeclared entities should be reported as WFC violation.eleName- The name of element to which this attribute belongs.- Returns:
- true if the non-normalized and normalized value are the same Note: This method uses fStringBuffer2, anything in it at the time of calling is lost.
- Throws:
java.io.IOExceptionorg.apache.xerces.xni.XNIException
-
scanExternalID
protected void scanExternalID(java.lang.String[] identifiers, boolean optionalSystemId) throws java.io.IOException, org.apache.xerces.xni.XNIExceptionScans External ID and return the public and system IDs.- Parameters:
identifiers- An array of size 2 to return the system id, and public id (in that order).optionalSystemId- Specifies whether the system id is optional. Note: This method uses fString and fStringBuffer, anything in them at the time of calling is lost.- Throws:
java.io.IOExceptionorg.apache.xerces.xni.XNIException
-
scanPubidLiteral
protected boolean scanPubidLiteral(org.apache.xerces.xni.XMLString literal) throws java.io.IOException, org.apache.xerces.xni.XNIExceptionScans public ID literal. [12] PubidLiteral ::= '"' PubidChar* '"' | "'" (PubidChar - "'")* "'" [13] PubidChar::= #x20 | #xD | #xA | [a-zA-Z0-9] | [-'()+,./:=?;!*#@$_%] The returned string is normalized according to the following rule, from http://www.w3.org/TR/REC-xml#dt-pubid: Before a match is attempted, all strings of white space in the public identifier must be normalized to single space characters (#x20), and leading and trailing white space must be removed.- Parameters:
literal- The string to fill in with the public ID literal.- Returns:
- True on success. Note: This method uses fStringBuffer, anything in it at the time of calling is lost.
- Throws:
java.io.IOExceptionorg.apache.xerces.xni.XNIException
-
normalizeWhitespace
protected void normalizeWhitespace(org.apache.xerces.xni.XMLString value)
Normalize whitespace in an XMLString converting all whitespace characters to space characters.
-
normalizeWhitespace
protected void normalizeWhitespace(org.apache.xerces.xni.XMLString value, int fromIndex)Normalize whitespace in an XMLString converting all whitespace characters to space characters.
-
isUnchangedByNormalization
protected int isUnchangedByNormalization(org.apache.xerces.xni.XMLString value)
Checks whether this string would be unchanged by normalization.- Returns:
- -1 if the value would be unchanged by normalization, otherwise the index of the first whitespace character which would be transformed.
-
startEntity
public void startEntity(java.lang.String name, org.apache.xerces.xni.XMLResourceIdentifier identifier, java.lang.String encoding, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIExceptionThis method notifies of the start of an entity. The document entity has the pseudo-name of "[xml]" the DTD has the pseudo-name of "[dtd]" parameter entity names start with '%'; and general entities are just specified by their name.- Parameters:
name- The name of the entity.identifier- The resource identifier.encoding- The auto-detected IANA encoding name of the entity stream. This value will be null in those situations where the entity encoding is not auto-detected (e.g. internal entities or a document entity that is parsed from a java.io.Reader).augs- Additional information that may include infoset augmentations- Throws:
org.apache.xerces.xni.XNIException- Thrown by handler to signal an error.
-
endEntity
public void endEntity(java.lang.String name, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIExceptionThis method notifies the end of an entity. The document entity has the pseudo-name of "[xml]" the DTD has the pseudo-name of "[dtd]" parameter entity names start with '%'; and general entities are just specified by their name.- Parameters:
name- The name of the entity.augs- Additional information that may include infoset augmentations- Throws:
org.apache.xerces.xni.XNIException- Thrown by handler to signal an error.
-
scanCharReferenceValue
protected int scanCharReferenceValue(XMLStringBuffer buf, XMLStringBuffer buf2) throws java.io.IOException, org.apache.xerces.xni.XNIException
Scans a character reference and append the corresponding chars to the specified buffer.[66] CharRef ::= '' [0-9]+ ';' | '' [0-9a-fA-F]+ ';'
Note: This method uses fStringBuffer, anything in it at the time of calling is lost.- Parameters:
buf- the character buffer to append chars tobuf2- the character buffer to append non-normalized chars to- Returns:
- the character value or (-1) on conversion failure
- Throws:
java.io.IOExceptionorg.apache.xerces.xni.XNIException
-
isInvalid
protected boolean isInvalid(int value)
-
isInvalidLiteral
protected boolean isInvalidLiteral(int value)
-
isValidNameChar
protected boolean isValidNameChar(int value)
-
isValidNameStartChar
protected boolean isValidNameStartChar(int value)
-
isValidNCName
protected boolean isValidNCName(int value)
-
isValidNameStartHighSurrogate
protected boolean isValidNameStartHighSurrogate(int value)
-
versionSupported
protected boolean versionSupported(java.lang.String version)
-
getVersionNotSupportedKey
protected java.lang.String getVersionNotSupportedKey()
-
scanSurrogates
protected boolean scanSurrogates(XMLStringBuffer buf) throws java.io.IOException, org.apache.xerces.xni.XNIException
Scans surrogates and append them to the specified buffer.Note: This assumes the current char has already been identified as a high surrogate.
- Parameters:
buf- The StringBuffer to append the read surrogates to.- Returns:
- True if it succeeded.
- Throws:
java.io.IOExceptionorg.apache.xerces.xni.XNIException
-
reportFatalError
protected void reportFatalError(java.lang.String msgId, java.lang.Object[] args) throws org.apache.xerces.xni.XNIExceptionConvenience function used in all XML scanners.- Throws:
org.apache.xerces.xni.XNIException
-
-