kmail

EncodingDetector Class Reference

#include <encodingdetector.h>

List of all members.


Detailed Description

Provides encoding detection capabilities.Guess encoding of char array.

Searches for encoding declaration inside raw data -- meta and xml tags. In the case it can't find it, uses heuristics for specified language.

If it finds unicode BOM marks, it changes encoding regardless of what the user has told

Intended lifetime of the object: one instance per document.

Typical use:

 QByteArray data;
 ...
 EncodingDetector detector;
 detector.setAutoDetectLanguage(EncodingDetector::Cyrillic);
 QString out=detector.decode(data);

Do not mix decode() with decodeWithBuffering()

Definition at line 57 of file encodingdetector.h.


Public Types

enum  EncodingChoiceSource {
  DefaultEncoding, AutoDetectedEncoding, BOM, EncodingFromXMLHeader,
  EncodingFromMetaTag, EncodingFromHTTPHeader, UserChosenEncoding
}
enum  AutoDetectScript {
  None, SemiautomaticDetection, Arabic, Baltic,
  CentralEuropean, ChineseSimplified, ChineseTraditional, Cyrillic,
  Greek, Hebrew, Japanese, Korean,
  NorthernSaami, SouthEasternEurope, Thai, Turkish,
  Unicode, WesternEuropean
}

Public Member Functions

 EncodingDetector ()
 EncodingDetector (QTextCodec *codec, EncodingChoiceSource source, AutoDetectScript script=None)
 ~EncodingDetector ()
bool setEncoding (const char *encoding, EncodingChoiceSource type)
const char * encoding () const
bool visuallyOrdered () const
void setAutoDetectLanguage (AutoDetectScript)
AutoDetectScript autoDetectLanguage () const
EncodingChoiceSource encodingChoiceSource () const
bool analyze (const char *data, int len)
bool analyze (const QByteArray &data)

Static Public Member Functions

static AutoDetectScript scriptForName (const QString &lang)
static QString nameForScript (AutoDetectScript)
static AutoDetectScript scriptForLanguageCode (const QString &lang)
static bool hasAutoDetectionForScript (AutoDetectScript)

Protected Member Functions

bool errorsIfUtf8 (const char *data, int length)
QTextDecoder * decoder ()

Constructor & Destructor Documentation

EncodingDetector::EncodingDetector (  ) 

Default codec is latin1 (as html spec says), EncodingChoiceSource is default, AutoDetectScript=Semiautomatic.

Definition at line 877 of file encodingdetector.cpp.

EncodingDetector::EncodingDetector ( QTextCodec *  codec,
EncodingChoiceSource  source,
AutoDetectScript  script = None 
)

Allows to set Default codec, EncodingChoiceSource, AutoDetectScript.

Definition at line 881 of file encodingdetector.cpp.


Member Function Documentation

bool EncodingDetector::setEncoding ( const char *  encoding,
EncodingChoiceSource  type 
)

Returns:
true if specified encoding was recognized

Definition at line 926 of file encodingdetector.cpp.

const char * EncodingDetector::encoding (  )  const

Convenience method.

Returns:
mime name of detected encoding

Definition at line 905 of file encodingdetector.cpp.

bool EncodingDetector::analyze ( const char *  data,
int  len 
)

Analyze text data.

Returns:
true if there was enough data for accurate detection

Definition at line 986 of file encodingdetector.cpp.

bool EncodingDetector::analyze ( const QByteArray &  data  ) 

Analyze text data.

Returns:
true if there was enough data for accurate detection

Definition at line 981 of file encodingdetector.cpp.

EncodingDetector::AutoDetectScript EncodingDetector::scriptForName ( const QString &  lang  )  [static]

Takes lang name _after_ it were i18n()'ed.

Definition at line 1246 of file encodingdetector.cpp.

bool EncodingDetector::errorsIfUtf8 ( const char *  data,
int  length 
) [protected]

Check if we are really utf8.

Taken from kate

Returns:
true if current encoding is utf8 and the text cannot be in this encoding
Please somebody read http://de.wikipedia.org/wiki/UTF-8 and check this code...

Definition at line 813 of file encodingdetector.cpp.

QTextDecoder * EncodingDetector::decoder (  )  [protected]

Returns:
QTextDecoder for detected encoding

Definition at line 921 of file encodingdetector.cpp.


The documentation for this class was generated from the following files:
KDE Home | KDE Accessibility Home | Description of Access Keys