class documentation

Implementation of a cluster in a ZIM file.

A cluster contains the blobs (=content) of content entries. As these are compressed together, it allows for higher compression rates.

A cluster can be extended, which means that it allows to be larger than 4 GiB, but will have a larger overhead.

Method __init__ The default constructor.
Method generate_infobyte Generate the infobyte for this cluster.
Method get_blob_size Get the size of a blob.
Method get_content_size Return the content size of this cluster.
Method get_number_of_blobs Return the number of blobs in this cluster.
Method get_number_of_offsets Return the number of offsets in this cluster.
Method get_offset Return the offset with the specified index.
Method get_total_compressed_size Return the total compressed size of the cluster.
Method get_total_decompressed_size Return the total decompressed size of this cluster.
Method get_total_offset_size Return the total size of the offsets.
Method iter_blob_offsets Read the blob offsets, yielding them as an iterator.
Method iter_read_blob Iteratively read the specified blob.
Method parse_infobyte Parse the cluster information byte, setting the attributes of this cluster as necessary.
Method read_blob Read the entirety of the specified range in the specified blob and return the content.
Method read_infobyte Read the cluster information byte, returning it.
Method read_infobyte_if_needed Read and parse the infobyte if this has not yet happened.
Method reset Reset all internal state except the cluster offset, causing said offset to be read again the next time it is required.
Instance Variable compression compression to use, None when unknown
Instance Variable is_extended whether this cluster is extended, None if not set
Instance Variable offset absolute offset of the cluster
Property did_read_infobyte True if the infobyte was already read and parsed.
Method _get_compressor Return a compressor suitable to compress this cluster.
Method _get_decompressing_reader Return a decompressing reader that can be sued to decompress the content.
Method _get_decompressor Return a decompressor suitable to decompress this cluster.
Method _seek_if_needed Seek to the specified position (relative to the cluster start) in the file only if it is needed.
Instance Variable _decompressing_reader Undocumented
Property _pointer_format The pointer format.

Inherited from BindableMixIn:

Method bind Bind this object to a ZIM file.
Method unbind Unbind this object. Can be called multiple times.
Property bound Whether this object is bound to a ZIM file or not.
Property zim The bound ZIM archive, if any is bound. Otherwise None.
Instance Variable _zim the bound ZIM archive or None
def __init__(self, zim=None, offset=None):

The default constructor.

Parameters
zim:pyzim.archive.Zimif specified, bind this ZIM immediately.
offset:int or Noneabsolute offset of the cluster
Raises
ValueErrorif offset was specified but zim was not specified.
def generate_infobyte(self):

Generate the infobyte for this cluster.

Returns
bytes of length 1the generated infobyte
def get_blob_size(self, i):

Get the size of a blob.

Parameters
i:intindex of blob to get size for
Returns
intthe size of the uncompressed blob
Raises
pyzim.exceptions.BindRequiredif cluster is unbound
pyzim.exceptions.BlobNotFoundif the specified blob does not exists
def get_content_size(self):

Return the content size of this cluster.

This is the uncompressed size of the content of this cluster, not including the offsets and infobyte.

Returns
intthe size of the content of this cluster
Raises
pyzim.exceptions.BindRequiredif cluster is unbound
def get_number_of_blobs(self):

Return the number of blobs in this cluster.

This value differs from the number of offsets in the cluster.

Returns
intthe number of blobs in this cluster
Raises
pyzim.exceptions.BindRequiredif cluster is unbound
def get_number_of_offsets(self):

Return the number of offsets in this cluster.

This value differs from the number of blobs in the cluster.

Returns
intthe number of offsets in this cluster.
Raises
pyzim.exceptions.BindRequiredif cluster is unbound
def get_offset(self, i):

Return the offset with the specified index.

Parameters
i:intindex of blob to get offset for
Raises
IndexErrorif i < 0 or i >= len(offsets)
pyzim.exceptions.BindRequiredif cluster is unbound
def get_total_compressed_size(self):

Return the total compressed size of the cluster.

This includes the entirety of the cluster, including the infobyte.

NOTE: this method is horribly inefficient, as it requires decompressing the entire cluster

Returns
intthe size of this cluster
Raises
pyzim.exceptions.BindRequiredif cluster is unbound
def get_total_decompressed_size(self):

Return the total decompressed size of this cluster.

This is the uncompressed size of the content of this cluster, including the offsets but not the infobyte.

Returns
intthe size of the content of this cluster including offsets
Raises
pyzim.exceptions.BindRequiredif cluster is unbound
def get_total_offset_size(self):

Return the total size of the offsets.

Returns
intthe total size of the offsets in bytes
def iter_blob_offsets(self, blob_numbers=None):

Read the blob offsets, yielding them as an iterator.

The order of blob_numbers does not matter, all offsets are always yielded in regular order (offfset 1, offset 2, ...).

Parameters
blob_numbers:None or list of intif specified, load only these offsets
Yields
intthe offset of each blob in the decompressed body, relative to cluster start
Raises
pyzim.exceptions.BindRequiredif cluster is unbound
def iter_read_blob(self, i, buffersize=4096, start=None, end=None):

Iteratively read the specified blob.

The parameters 'start' and 'end' can be used to specify a range within the blob to read. In this case, both values are interpreted relative to the actual blob start. Similar to how python slices work, the 'start' value will be inclusive and the 'end' value exclusive. If start >= size of the blob, the return value will be b"". If the end lies outside the blob, read only up until the end of the blob.

Parameters
i:intindex of blob to read
buffersize:intnumber of bytes to read at once
start:None or intif specified, the offset relative to the start of the blob to start reading from
end:None or intif specified, the offset relative to the start of the blob to stop reading at
Yields
byteschunks of the blob content
Raises
pyzim.exceptions.BindRequiredif cluster is unbound
pyzim.exceptions.BlobNotFoundif the blob index is out of range
def parse_infobyte(self, infobyte):

Parse the cluster information byte, setting the attributes of this cluster as necessary.

Parameters
infobyte:bytes of length 1the cluster information byte
Raises
pyzim.exceptions.UnsupportedCompressionTypeif the compression type is unknown.
def read_blob(self, i, start=None, end=None):

Read the entirety of the specified range in the specified blob and return the content.

The parameters 'start' and 'end' can be used to specify a range within the blob to read. In this case, both values are interpreted relative to the actual blob start. Similar to how python slices work, the 'start' value will be inclusive and the 'end' value exclusive. If start >= size of the blob, the return value will be b"". If the end lies outside the blob, read only up until the end of the blob.

Parameters
i:intindex of blob to read
start:None or intif specified, the offset relative to the start of the blob to start reading from
end:None or intif specified, the offset relative to the start of the blob to stop reading at
Returns
bytesthe content of the blob
Raises
pyzim.exceptions.BindRequiredif cluster is unbound
pyzim.exceptions.BlobNotFoundif the blob index is out of range
def read_infobyte(self):

Read the cluster information byte, returning it.

Returns
bytesthe byte containing cluster information
Raises
pyzim.exceptions.BindRequiredif cluster is unbound
def read_infobyte_if_needed(self):

Read and parse the infobyte if this has not yet happened.

def reset(self):

Reset all internal state except the cluster offset, causing said offset to be read again the next time it is required.

compression to use, None when unknown

is_extended: bool or None =

whether this cluster is extended, None if not set

offset: int or None =

absolute offset of the cluster

did_read_infobyte: bool =

True if the infobyte was already read and parsed.

def _get_compressor(self):

Return a compressor suitable to compress this cluster.

Returns
a compressor-like object. See pyzim.compression.BaseCompressionInterface for more info.a compressor suitable to compress this cluster.
def _get_decompressing_reader(self, offset=0):

Return a decompressing reader that can be sued to decompress the content.

If offset is specified, the decompressor will have read to that offset. This may reuse the decompressor, depending on the implementation and the offset.

Parameters
offset:intoffset, relative to the start of the compressed data (cluster start + 1)
Raises
pyzim.exceptions.BindRequiredif cluster is unbound
def _get_decompressor(self):

Return a decompressor suitable to decompress this cluster.

Returns
a decompressor-like object. See pyzim.compression.BaseCompressionInterface for more info.a decompressor suitable to decompress this cluster
def _seek_if_needed(self, f, offset):

Seek to the specified position (relative to the cluster start) in the file only if it is needed.

Needs to be bound.

Parameters
f:file-like objectfile to seek
offset:intoffset to seek, relative to the start of the cluster
_decompressing_reader =

Undocumented

_pointer_format: str =

The pointer format.

Raises
RuntimeErrorif accessed before extension has been specified