class documentation

class ModifiableClusterWrapper(Cluster, ModifiableMixIn):

Constructor: ModifiableClusterWrapper(cluster)

View In Hierarchy

A special type of cluster that wraps another cluster and adds methods for modifying the cluster.

This type of cluster is used when creating a new ZIM file or when modifying an existing one.

Method __init__ The default constructor.
Method after_flush_or_read This method should be called after this object has been read and/or flushed to disk. In other words, it should be called at least once whenever this object matches the state of the object on the disk.
Method append_blob Append a blob to this cluster.
Method bind Bind this object to a ZIM file.
Method compression.setter Set the compression to use.
Method empty_blob Set the content of a blob to empty.
Method flush If this cluster has been modified, write it it to the archive.
Method get_blob_size Get the size of a blob.
Method get_content_size Return the content size of this cluster.
Method get_disk_size Calculate the size of this object when written to a file.
Method get_initial_disk_size Return the size of this object on disk as it has been read.
Method get_number_of_offsets Return the number of offsets in this cluster.
Method get_offset Return the offset with the specified index.
Method get_total_decompressed_size Return the total decompressed size of this cluster.
Method get_total_offset_size Return the total size of the offsets.
Method is_extended.setter Force-set the extension status of this cluster.
Method iter_blob_offsets Read the blob offsets, yielding them as an iterator.
Method iter_read_blob Iteratively read the specified blob.
Method iter_write Iteratively serialize this cluster, yielding each chunk that must be written.
Method offset.setter Set the offset.
Method parse_infobyte Parse the cluster information byte, setting the attributes of this cluster as necessary.
Method read_blob Read the entirety of the specified range in the specified blob and return the content.
Method read_infobyte Read the cluster information byte, returning it.
Method remove_blob Remove the specified blob.
Method reset Reset all internal state except the cluster offset, causing said offset to be read again the next time it is required.
Method set_blob Set the blob for the specified index.
Method unbind Unbind this object. Can be called multiple times.
Constant DEFAULT_BLOB_READ how many bytes to read from a blob at once
Property compression Compression to use, None when unknown.
Property did_read_infobyte True if the infobyte was already read and parsed.
Property is_extended Check whether this cluster needs to be an extended cluster.
Property offset Absolute offset of the cluster.
Method _adjust_index Adjust an index to work with deletions.
Method _iter_blob_sizes Iterate over the blob sizes.
Method _iter_write_raw Iteratively serialize the compressed part of this cluster, yielding each chunk that must be compressed.
Instance Variable _added_blobs a list of indexes of newly added blobs
Instance Variable _blobs a dict mapping blob number to the new/modified blobs
Instance Variable _cluster the wrapped cluster. Any binding will be inherited.
Instance Variable _force_compression if not None, use this value for Cluster.compression
Instance Variable _force_extension if not None, use this value for ModifiableClusterWrapper.is_extended
Instance Variable _removed_blobs a sorted list of removed blob numbers
Property _has_modifications Return True if this wrapper has any modifications registered.

Inherited from Cluster:

Method generate_infobyte Generate the infobyte for this cluster.
Method get_number_of_blobs Return the number of blobs in this cluster.
Method get_total_compressed_size Return the total compressed size of the cluster.
Method read_infobyte_if_needed Read and parse the infobyte if this has not yet happened.
Method _get_compressor Return a compressor suitable to compress this cluster.
Method _get_decompressing_reader Return a decompressing reader that can be sued to decompress the content.
Method _get_decompressor Return a decompressor suitable to decompress this cluster.
Method _seek_if_needed Seek to the specified position (relative to the cluster start) in the file only if it is needed.
Instance Variable _decompressing_reader Undocumented
Property _pointer_format The pointer format.

Inherited from BindableMixIn (via Cluster):

Property bound Whether this object is bound to a ZIM file or not.
Property zim The bound ZIM archive, if any is bound. Otherwise None.
Instance Variable _zim the bound ZIM archive or None

Inherited from ModifiableMixIn (via Cluster, BindableMixIn):

Method add_submodifiable Add another modifiable object as a child of this one.
Method dirty.setter Setter for ModifiableMixIn.dirty
Method ensure_mutable If this object is non-mutable, raise an Exception.
Method get_unmodified_disk_size Return the size of this object when written to a file before any modifications has been made since the last read/flush.
Method mark_dirty Convenience function to mark this object as dirty.
Method remove_submodifiable Remove a submodifiable from this object.
Instance Variable dirty True if this object or a sub-modifiable has been modified.
Instance Variable mutable if not nonzero, prevent modifications of this object.
Instance Variable _dirty a boolean flag that's nonzero if this object has been modified
Instance Variable _old_disk_size the size of this object on disk before any modifications since the last flush/read
Instance Variable _submodifiables a list of child objects, whose dirty state will affect this objects dirty state.
def __init__(self, cluster):

The default constructor.

Parameters
cluster:Clustercluster to wrap
def after_flush_or_read(self):

This method should be called after this object has been read and/or flushed to disk. In other words, it should be called at least once whenever this object matches the state of the object on the disk.

This method sets the old disk size, which allows us to late free the allocated space of the old object on disk. Thus, this method requires ModifiableMixIn.get_disk_size to work.

In addition, the object will be marked as non-dirty afterwards.

def append_blob(self, blob_source):

Append a blob to this cluster.

Parameters
blob_source:BaseBlobSourcesource for the new blob
Returns
intthe index (=blob number) of the new blob
Raises
pyzim.exceptions.NonMutableif this cluster is set to be inmutable.
def bind(self, zim):

Bind this object to a ZIM file.

This method behaves mostly like pyzim.bindable.BindableMixIn.bind, but will also bind the wrapped cluster EXCEPT if it is already bound. This is so that the wrapped cluster and the wrapper can be bound to two different ZIM objects.

Parameters
zim:pyzim.archive.ZimZIM archive to bind to
Raises
pyzim.exceptions.AlreadyBoundwhen already bound to a zim archive
def compression(self, value):

Set the compression to use.

Parameters
value:pyzim.compression.CompressionType or int or Nonevalue to set
Raises
TypeErroron type error
pyzim.exceptions.UnsupportedCompressionTypewhen value is an int not registered in pyzim.compression.CompressionType
def empty_blob(self, i):

Set the content of a blob to empty.

This does not delete the specified blob, but reduces its size to 0 (plus an additonal 4/8 bytes for the offset). The advantage of this method over ModifiableClusterWrapper.remove_blob is that any entries linking to subsequent blobs do not need to be modified.

Parameters
i:intindex of blob to empty
Raises
pyzim.exceptions.NonMutableif this cluster is set to be inmutable.
def flush(self):

If this cluster has been modified, write it it to the archive.

Raises
pyzim.exceptions.BindRequiredif unbound
def get_blob_size(self, i):

Get the size of a blob.

Parameters
i:intindex of blob to get size for
Returns
intthe size of the uncompressed blob
Raises
pyzim.exceptions.BindRequiredif cluster is unbound
pyzim.exceptions.BlobNotFoundif the specified blob does not exists
def get_content_size(self):

Return the content size of this cluster.

This is the uncompressed size of the content of this cluster, not including the offsets and infobyte.

Returns
intthe size of the content of this cluster
Raises
pyzim.exceptions.BindRequiredif cluster is unbound
def get_disk_size(self):

Calculate the size of this object when written to a file.

NOTE: in this context, size refers to the direct size of the object. If this object contains references to other objects, their sizes will not be included. For example, a pyzim.entry.ContentEntry also links to a blob, but this function will only return the size of the entry itself, excluding the referenced blob.

Returns
intthe size, in bytes
def get_initial_disk_size(self):

Return the size of this object on disk as it has been read.

This differs from ModifiableMixIn.get_disk_size and ModifiableMixIn.get_unmodified_disk_size, as both methods return the size this object would have it would be written. This is important, because sometimes we can not guarantee that an object has the same size it would have when we write it without any further modification. An example would be a pyzim.cluster.Cluster, which may have a different size due to a mismatch in configuration parameters even when using the same compression type.

This method should be implemented by subclasses if the previously mentioned behavior is possible. By default, this just returns the same value as ModifiableMixIn.get_disk_size.

Returns
intthe disk size as the object had when it was read from disk in bytes
def get_number_of_offsets(self):

Return the number of offsets in this cluster.

This value differs from the number of blobs in the cluster.

Returns
intthe number of offsets in this cluster.
Raises
pyzim.exceptions.BindRequiredif cluster is unbound
def get_offset(self, i):

Return the offset with the specified index.

Parameters
i:intindex of blob to get offset for
Raises
IndexErrorif i < 0 or i >= len(offsets)
pyzim.exceptions.BindRequiredif cluster is unbound
def get_total_decompressed_size(self):

Return the total decompressed size of this cluster.

This is the uncompressed size of the content of this cluster, including the offsets but not the infobyte.

Returns
intthe size of the content of this cluster including offsets
Raises
pyzim.exceptions.BindRequiredif cluster is unbound
def get_total_offset_size(self):

Return the total size of the offsets.

Returns
intthe total size of the offsets in bytes
def is_extended(self, value):

Force-set the extension status of this cluster.

Set to None to disable force setting.

Parameters
value:bool or Nonenew value for the extension state, None to disable.
def iter_blob_offsets(self, blob_numbers=None):

Read the blob offsets, yielding them as an iterator.

The order of blob_numbers does not matter, all offsets are always yielded in regular order (offfset 1, offset 2, ...).

Parameters
blob_numbers:None or list of intif specified, load only these offsets
Yields
intthe offset of each blob in the decompressed body, relative to cluster start
Raises
pyzim.exceptions.BindRequiredif cluster is unbound
def iter_read_blob(self, i, buffersize=4096, start=None, end=None):

Iteratively read the specified blob.

The parameters 'start' and 'end' can be used to specify a range within the blob to read. In this case, both values are interpreted relative to the actual blob start. Similar to how python slices work, the 'start' value will be inclusive and the 'end' value exclusive. If start >= size of the blob, the return value will be b"". If the end lies outside the blob, read only up until the end of the blob.

Parameters
i:intindex of blob to read
buffersize:intnumber of bytes to read at once
start:None or intif specified, the offset relative to the start of the blob to start reading from
end:None or intif specified, the offset relative to the start of the blob to stop reading at
Yields
byteschunks of the blob content
Raises
pyzim.exceptions.BindRequiredif cluster is unbound
pyzim.exceptions.BlobNotFoundif the blob index is out of range
def iter_write(self):

Iteratively serialize this cluster, yielding each chunk that must be written.

Yields
bytesthe data of this cluster as it needs to be written
def offset(self, value):

Set the offset.

Parameters
value:int or Nonethe new offset
def parse_infobyte(self, infobyte):

Parse the cluster information byte, setting the attributes of this cluster as necessary.

Parameters
infobyte:bytes of length 1the cluster information byte
Raises
pyzim.exceptions.UnsupportedCompressionTypeif the compression type is unknown.
def read_blob(self, i, start=None, end=None):

Read the entirety of the specified range in the specified blob and return the content.

The parameters 'start' and 'end' can be used to specify a range within the blob to read. In this case, both values are interpreted relative to the actual blob start. Similar to how python slices work, the 'start' value will be inclusive and the 'end' value exclusive. If start >= size of the blob, the return value will be b"". If the end lies outside the blob, read only up until the end of the blob.

Parameters
i:intindex of blob to read
start:None or intif specified, the offset relative to the start of the blob to start reading from
end:None or intif specified, the offset relative to the start of the blob to stop reading at
Returns
bytesthe content of the blob
Raises
pyzim.exceptions.BindRequiredif cluster is unbound
pyzim.exceptions.BlobNotFoundif the blob index is out of range
def read_infobyte(self):

Read the cluster information byte, returning it.

Returns
bytesthe byte containing cluster information
Raises
pyzim.exceptions.BindRequiredif cluster is unbound
def remove_blob(self, i):

Remove the specified blob.

NOTE: it is not recommended to use this method. Removing a blob also requires changing the blob numbers in entries pointing to blobs after the specified blob. This method does not take care of this. You should use ModifiableClusterWrapper.empty_blob instead.

Parameters
i:intindex of blob to remove
Raises
pyzim.exceptions.NonMutableif this cluster is set to be inmutable.
def reset(self):

Reset all internal state except the cluster offset, causing said offset to be read again the next time it is required.

def set_blob(self, i, blob_source):

Set the blob for the specified index.

Index must be <= number of blobs. If equal, the blob will be appended.

Parameters
i:intindex of blob to set
blob_source:BaseBlobSourcesource for the new blob
Raises
pyzim.exceptions.NonMutableif this cluster is set to be inmutable.
def unbind(self):

Unbind this object. Can be called multiple times.

DEFAULT_BLOB_READ: int =

how many bytes to read from a blob at once

Value
8192

Compression to use, None when unknown.

did_read_infobyte =

True if the infobyte was already read and parsed.

Returns
boolTrue if the infobyte was already read and parsed.
is_extended: bool =

Check whether this cluster needs to be an extended cluster.

Note that this function does not return the extension status as set in the cluster information byte, but calculates the required extension status based on the offsets. It may be forcefully set to a specific value.

offset: int or None =

Absolute offset of the cluster.

def _adjust_index(self, i):

Adjust an index to work with deletions.

Basically, if an index is deleted, all further indexes get reduced by one. As deletions only happen virtually in the wrapper and the wrapped cluster still use the original index system, such indexes need to be adjusted.

Speaking from experience, it's rater easy to get confused on when to use the adjusted index. The simplified answer is: if it is passed to the wrapped cluster, use the adjusted index. Otherwise, use the raw index.

Parameters
i:intindex to adjust
Returns
intthe adjusted index
def _iter_blob_sizes(self):

Iterate over the blob sizes.

This only includes blobs that are not removed.

Yields
intthe blob sizes of each blob in bytes
def _iter_write_raw(self):

Iteratively serialize the compressed part of this cluster, yielding each chunk that must be compressed.

Yields
bytesthe data of this cluster as it needs to be compressed and then written
_added_blobs: list of int =

a list of indexes of newly added blobs

_blobs: dict mapping int to pyzim.blob.BaseBlobSource =

a dict mapping blob number to the new/modified blobs

_cluster: Cluster =

the wrapped cluster. Any binding will be inherited.

if not None, use this value for Cluster.compression

_force_extension: None or bool =

if not None, use this value for ModifiableClusterWrapper.is_extended

_removed_blobs: list of int =

a sorted list of removed blob numbers

_has_modifications: bool =

Return True if this wrapper has any modifications registered.