pyzim.cluster.ModifiableClusterWrapper

class documentation

class ModifiableClusterWrapper(Cluster, ModifiableMixIn):

Constructor: ModifiableClusterWrapper(cluster)

A special type of cluster that wraps another cluster and adds methods for modifying the cluster.

This type of cluster is used when creating a new ZIM file or when modifying an existing one.

Method	`__init__`	The default constructor.
Method	`after_flush_or_read`	This method should be called after this object has been read and/or flushed to disk. In other words, it should be called at least once whenever this object matches the state of the object on the disk.
Method	`append_blob`	Append a blob to this cluster.
Method	`bind`	Bind this object to a ZIM file.
Method	`compression.setter`	Set the compression to use.
Method	`empty_blob`	Set the content of a blob to empty.
Method	`flush`	If this cluster has been modified, write it it to the archive.
Method	`get_blob_size`	Get the size of a blob.
Method	`get_content_size`	Return the content size of this cluster.
Method	`get_disk_size`	Calculate the size of this object when written to a file.
Method	`get_initial_disk_size`	Return the size of this object on disk as it has been read.
Method	`get_number_of_offsets`	Return the number of offsets in this cluster.
Method	`get_offset`	Return the offset with the specified index.
Method	`get_total_decompressed_size`	Return the total decompressed size of this cluster.
Method	`get_total_offset_size`	Return the total size of the offsets.
Method	`is_extended.setter`	Force-set the extension status of this cluster.
Method	`iter_blob_offsets`	Read the blob offsets, yielding them as an iterator.
Method	`iter_read_blob`	Iteratively read the specified blob.
Method	`iter_write`	Iteratively serialize this cluster, yielding each chunk that must be written.
Method	`offset.setter`	Set the offset.
Method	`parse_infobyte`	Parse the cluster information byte, setting the attributes of this cluster as necessary.
Method	`read_blob`	Read the entirety of the specified range in the specified blob and return the content.
Method	`read_infobyte`	Read the cluster information byte, returning it.
Method	`remove_blob`	Remove the specified blob.
Method	`reset`	Reset all internal state except the cluster offset, causing said offset to be read again the next time it is required.
Method	`set_blob`	Set the blob for the specified index.
Method	`unbind`	Unbind this object. Can be called multiple times.
Constant	`DEFAULT_BLOB_READ`	how many bytes to read from a blob at once
Property	`compression`	Compression to use, None when unknown.
Property	`did_read_infobyte`	True if the infobyte was already read and parsed.
Property	`is_extended`	Check whether this cluster needs to be an extended cluster.
Property	`offset`	Absolute offset of the cluster.
Method	`_adjust_index`	Adjust an index to work with deletions.
Method	`_iter_blob_sizes`	Iterate over the blob sizes.
Method	`_iter_write_raw`	Iteratively serialize the compressed part of this cluster, yielding each chunk that must be compressed.
Instance Variable	`_added_blobs`	a list of indexes of newly added blobs
Instance Variable	`_blobs`	a dict mapping blob number to the new/modified blobs
Instance Variable	`_cluster`	the wrapped cluster. Any binding will be inherited.
Instance Variable	`_force_compression`	if not None, use this value for `Cluster.compression`
Instance Variable	`_force_extension`	if not None, use this value for `ModifiableClusterWrapper.is_extended`
Instance Variable	`_removed_blobs`	a sorted list of removed blob numbers
Property	`_has_modifications`	Return True if this wrapper has any modifications registered.

Inherited from Cluster:

Method	`generate_infobyte`	Generate the infobyte for this cluster.
Method	`get_number_of_blobs`	Return the number of blobs in this cluster.
Method	`get_total_compressed_size`	Return the total compressed size of the cluster.
Method	`read_infobyte_if_needed`	Read and parse the infobyte if this has not yet happened.
Method	`_get_compressor`	Return a compressor suitable to compress this cluster.
Method	`_get_decompressing_reader`	Return a decompressing reader that can be sued to decompress the content.
Method	`_get_decompressor`	Return a decompressor suitable to decompress this cluster.
Method	`_seek_if_needed`	Seek to the specified position (relative to the cluster start) in the file only if it is needed.
Instance Variable	`_decompressing_reader`	Undocumented
Property	`_pointer_format`	The pointer format.

Inherited from BindableMixIn (via Cluster):

Property	`bound`	Whether this object is bound to a ZIM file or not.
Property	`zim`	The bound ZIM archive, if any is bound. Otherwise None.
Instance Variable	`_zim`	the bound ZIM archive or None

Inherited from ModifiableMixIn (via Cluster, BindableMixIn):

Method	`add_submodifiable`	Add another modifiable object as a child of this one.
Method	`dirty.setter`	Setter for `ModifiableMixIn.dirty`
Method	`ensure_mutable`	If this object is non-mutable, raise an Exception.
Method	`get_unmodified_disk_size`	Return the size of this object when written to a file before any modifications has been made since the last read/flush.
Method	`mark_dirty`	Convenience function to mark this object as dirty.
Method	`remove_submodifiable`	Remove a submodifiable from this object.
Instance Variable	`dirty`	True if this object or a sub-modifiable has been modified.
Instance Variable	`mutable`	if not nonzero, prevent modifications of this object.
Instance Variable	`_dirty`	a boolean flag that's nonzero if this object has been modified
Instance Variable	`_old_disk_size`	the size of this object on disk before any modifications since the last flush/read
Instance Variable	`_submodifiables`	a list of child objects, whose dirty state will affect this objects dirty state.

def __init__(self, cluster): ¶

overrides pyzim.cluster.Cluster.__init__

The default constructor.

Parameters
cluster:`Cluster`	cluster to wrap

def after_flush_or_read(self): ¶

overrides pyzim.modifiable.ModifiableMixIn.after_flush_or_read

This method should be called after this object has been read and/or flushed to disk. In other words, it should be called at least once whenever this object matches the state of the object on the disk.

This method sets the old disk size, which allows us to late free the allocated space of the old object on disk. Thus, this method requires ModifiableMixIn.get_disk_size to work.

In addition, the object will be marked as non-dirty afterwards.

def append_blob(self, blob_source): ¶

Append a blob to this cluster.

Parameters
blob_source:`BaseBlobSource`	source for the new blob

Returns
`int`	the index (=blob number) of the new blob

Raises
`pyzim.exceptions.NonMutable`	if this cluster is set to be inmutable.

def bind(self, zim): ¶

overrides pyzim.bindable.BindableMixIn.bind

Bind this object to a ZIM file.

This method behaves mostly like pyzim.bindable.BindableMixIn.bind, but will also bind the wrapped cluster EXCEPT if it is already bound. This is so that the wrapped cluster and the wrapper can be bound to two different ZIM objects.

Parameters
zim:`pyzim.archive.Zim`	ZIM archive to bind to

Raises
`pyzim.exceptions.AlreadyBound`	when already bound to a zim archive

@compression.setter

def compression(self, value): ¶

Set the compression to use.

Parameters
value:`pyzim.compression.CompressionType` or `int` or `None`	value to set

Raises
`TypeError`	on type error
`pyzim.exceptions.UnsupportedCompressionType`	when value is an int not registered in `pyzim.compression.CompressionType`

def empty_blob(self, i): ¶

Set the content of a blob to empty.

This does not delete the specified blob, but reduces its size to 0 (plus an additonal 4/8 bytes for the offset). The advantage of this method over ModifiableClusterWrapper.remove_blob is that any entries linking to subsequent blobs do not need to be modified.

Parameters
i:`int`	index of blob to empty

Raises
`pyzim.exceptions.NonMutable`	if this cluster is set to be inmutable.

def flush(self): ¶

If this cluster has been modified, write it it to the archive.

Raises
`pyzim.exceptions.BindRequired`	if unbound

def get_blob_size(self, i): ¶

overrides pyzim.cluster.Cluster.get_blob_size

Get the size of a blob.

Parameters
i:`int`	index of blob to get size for

Returns
`int`	the size of the uncompressed blob

Raises
`pyzim.exceptions.BindRequired`	if cluster is unbound
`pyzim.exceptions.BlobNotFound`	if the specified blob does not exists

def get_content_size(self): ¶

overrides pyzim.cluster.Cluster.get_content_size

Return the content size of this cluster.

This is the uncompressed size of the content of this cluster, not including the offsets and infobyte.

Returns
`int`	the size of the content of this cluster

Raises
`pyzim.exceptions.BindRequired`	if cluster is unbound

def get_disk_size(self): ¶

overrides pyzim.modifiable.ModifiableMixIn.get_disk_size

Calculate the size of this object when written to a file.

NOTE: in this context, size refers to the direct size of the object. If this object contains references to other objects, their sizes will not be included. For example, a pyzim.entry.ContentEntry also links to a blob, but this function will only return the size of the entry itself, excluding the referenced blob.

Returns
`int`	the size, in bytes

def get_initial_disk_size(self): ¶

overrides pyzim.modifiable.ModifiableMixIn.get_initial_disk_size

Return the size of this object on disk as it has been read.

This differs from ModifiableMixIn.get_disk_size and ModifiableMixIn.get_unmodified_disk_size, as both methods return the size this object would have it would be written. This is important, because sometimes we can not guarantee that an object has the same size it would have when we write it without any further modification. An example would be a pyzim.cluster.Cluster, which may have a different size due to a mismatch in configuration parameters even when using the same compression type.

This method should be implemented by subclasses if the previously mentioned behavior is possible. By default, this just returns the same value as ModifiableMixIn.get_disk_size.

Returns
`int`	the disk size as the object had when it was read from disk in bytes

def get_number_of_offsets(self): ¶

overrides pyzim.cluster.Cluster.get_number_of_offsets

Return the number of offsets in this cluster.

This value differs from the number of blobs in the cluster.

Returns
`int`	the number of offsets in this cluster.

Raises
`pyzim.exceptions.BindRequired`	if cluster is unbound

def get_offset(self, i): ¶

overrides pyzim.cluster.Cluster.get_offset

Return the offset with the specified index.

Parameters
i:`int`	index of blob to get offset for

Raises
`IndexError`	if i < 0 or i >= len(offsets)
`pyzim.exceptions.BindRequired`	if cluster is unbound

def get_total_decompressed_size(self): ¶

overrides pyzim.cluster.Cluster.get_total_decompressed_size

Return the total decompressed size of this cluster.

This is the uncompressed size of the content of this cluster, including the offsets but not the infobyte.

Returns
`int`	the size of the content of this cluster including offsets

Raises
`pyzim.exceptions.BindRequired`	if cluster is unbound

def get_total_offset_size(self): ¶

overrides pyzim.cluster.Cluster.get_total_offset_size

Return the total size of the offsets.

Returns
`int`	the total size of the offsets in bytes

@is_extended.setter

def is_extended(self, value): ¶

Force-set the extension status of this cluster.

Set to None to disable force setting.

Parameters
value:`bool` or `None`	new value for the extension state, `None` to disable.

def iter_blob_offsets(self, blob_numbers=None): ¶

overrides pyzim.cluster.Cluster.iter_blob_offsets

Read the blob offsets, yielding them as an iterator.

The order of blob_numbers does not matter, all offsets are always yielded in regular order (offfset 1, offset 2, ...).

Parameters
blob_numbers:`None` or `list` of `int`	if specified, load only these offsets

Yields
`int`	the offset of each blob in the decompressed body, relative to cluster start

Raises
`pyzim.exceptions.BindRequired`	if cluster is unbound

def iter_read_blob(self, i, buffersize=4096, start=None, end=None): ¶

overrides pyzim.cluster.Cluster.iter_read_blob

Iteratively read the specified blob.

The parameters 'start' and 'end' can be used to specify a range within the blob to read. In this case, both values are interpreted relative to the actual blob start. Similar to how python slices work, the 'start' value will be inclusive and the 'end' value exclusive. If start >= size of the blob, the return value will be b"". If the end lies outside the blob, read only up until the end of the blob.

Parameters
i:`int`	index of blob to read
buffersize:`int`	number of bytes to read at once
start:`None` or `int`	if specified, the offset relative to the start of the blob to start reading from
end:`None` or `int`	if specified, the offset relative to the start of the blob to stop reading at

Yields
`bytes`	chunks of the blob content

Raises
`pyzim.exceptions.BindRequired`	if cluster is unbound
`pyzim.exceptions.BlobNotFound`	if the blob index is out of range

def iter_write(self): ¶

Iteratively serialize this cluster, yielding each chunk that must be written.

Yields
`bytes`	the data of this cluster as it needs to be written

@offset.setter

def offset(self, value): ¶

Set the offset.

Parameters
value:`int` or `None`	the new offset

def parse_infobyte(self, infobyte): ¶

overrides pyzim.cluster.Cluster.parse_infobyte

Parse the cluster information byte, setting the attributes of this cluster as necessary.

Parameters
infobyte:`bytes` of length 1	the cluster information byte

Raises
`pyzim.exceptions.UnsupportedCompressionType`	if the compression type is unknown.

def read_blob(self, i, start=None, end=None): ¶

overrides pyzim.cluster.Cluster.read_blob

Read the entirety of the specified range in the specified blob and return the content.

Parameters
i:`int`	index of blob to read
start:`None` or `int`	if specified, the offset relative to the start of the blob to start reading from
end:`None` or `int`	if specified, the offset relative to the start of the blob to stop reading at

Returns
`bytes`	the content of the blob

Raises
`pyzim.exceptions.BindRequired`	if cluster is unbound
`pyzim.exceptions.BlobNotFound`	if the blob index is out of range

def read_infobyte(self): ¶

overrides pyzim.cluster.Cluster.read_infobyte

Read the cluster information byte, returning it.

Returns
`bytes`	the byte containing cluster information

Raises
`pyzim.exceptions.BindRequired`	if cluster is unbound

def remove_blob(self, i): ¶

Remove the specified blob.

NOTE: it is not recommended to use this method. Removing a blob also requires changing the blob numbers in entries pointing to blobs after the specified blob. This method does not take care of this. You should use ModifiableClusterWrapper.empty_blob instead.

Parameters
i:`int`	index of blob to remove

Raises
`pyzim.exceptions.NonMutable`	if this cluster is set to be inmutable.

def reset(self): ¶

overrides pyzim.cluster.Cluster.reset

Reset all internal state except the cluster offset, causing said offset to be read again the next time it is required.

def set_blob(self, i, blob_source): ¶

Set the blob for the specified index.

Index must be <= number of blobs. If equal, the blob will be appended.

Parameters
i:`int`	index of blob to set
blob_source:`BaseBlobSource`	source for the new blob

Raises
`pyzim.exceptions.NonMutable`	if this cluster is set to be inmutable.

def unbind(self): ¶

overrides pyzim.bindable.BindableMixIn.unbind

Unbind this object. Can be called multiple times.

DEFAULT_BLOB_READ: int = ¶

how many bytes to read from a blob at once

Value

@property

compression: pyzim.compression.CompressionType or None = ¶

overrides pyzim.cluster.Cluster.compression

Compression to use, None when unknown.

@property

did_read_infobyte = ¶

overrides pyzim.cluster.Cluster.did_read_infobyte

True if the infobyte was already read and parsed.

Returns
`bool`	True if the infobyte was already read and parsed.

@property

is_extended: bool = ¶

overrides pyzim.cluster.Cluster.is_extended

Check whether this cluster needs to be an extended cluster.

Note that this function does not return the extension status as set in the cluster information byte, but calculates the required extension status based on the offsets. It may be forcefully set to a specific value.

@property

offset: int or None = ¶

overrides pyzim.cluster.Cluster.offset

Absolute offset of the cluster.

def _adjust_index(self, i): ¶

Adjust an index to work with deletions.

Basically, if an index is deleted, all further indexes get reduced by one. As deletions only happen virtually in the wrapper and the wrapped cluster still use the original index system, such indexes need to be adjusted.

Speaking from experience, it's rater easy to get confused on when to use the adjusted index. The simplified answer is: if it is passed to the wrapped cluster, use the adjusted index. Otherwise, use the raw index.

Parameters
i:`int`	index to adjust

Returns
`int`	the adjusted index

def _iter_blob_sizes(self): ¶

Iterate over the blob sizes.

This only includes blobs that are not removed.

Yields
`int`	the blob sizes of each blob in bytes

def _iter_write_raw(self): ¶

Iteratively serialize the compressed part of this cluster, yielding each chunk that must be compressed.

Yields
`bytes`	the data of this cluster as it needs to be compressed and then written

_added_blobs: list of int = ¶

a list of indexes of newly added blobs

_blobs: dict mapping int to pyzim.blob.BaseBlobSource = ¶

a dict mapping blob number to the new/modified blobs

_cluster: Cluster = ¶

the wrapped cluster. Any binding will be inherited.

_force_compression: None or pyzim.compression.CompressionType = ¶

if not None, use this value for Cluster.compression

_force_extension: None or bool = ¶

if not None, use this value for ModifiableClusterWrapper.is_extended

_removed_blobs: list of int = ¶

a sorted list of removed blob numbers

@property

_has_modifications: bool = ¶

Return True if this wrapper has any modifications registered.