class ModifiableClusterWrapper(Cluster, ModifiableMixIn):
Constructor: ModifiableClusterWrapper(cluster)
A special type of cluster that wraps another cluster and adds methods for modifying the cluster.
This type of cluster is used when creating a new ZIM file or when modifying an existing one.
| Method | __init__ |
The default constructor. |
| Method | after |
This method should be called after this object has been read and/or flushed to disk. In other words, it should be called at least once whenever this object matches the state of the object on the disk. |
| Method | append |
Append a blob to this cluster. |
| Method | bind |
Bind this object to a ZIM file. |
| Method | compression |
Set the compression to use. |
| Method | empty |
Set the content of a blob to empty. |
| Method | flush |
If this cluster has been modified, write it it to the archive. |
| Method | get |
Get the size of a blob. |
| Method | get |
Return the content size of this cluster. |
| Method | get |
Calculate the size of this object when written to a file. |
| Method | get |
Return the size of this object on disk as it has been read. |
| Method | get |
Return the number of offsets in this cluster. |
| Method | get |
Return the offset with the specified index. |
| Method | get |
Return the total decompressed size of this cluster. |
| Method | get |
Return the total size of the offsets. |
| Method | is |
Force-set the extension status of this cluster. |
| Method | iter |
Read the blob offsets, yielding them as an iterator. |
| Method | iter |
Iteratively read the specified blob. |
| Method | iter |
Iteratively serialize this cluster, yielding each chunk that must be written. |
| Method | offset |
Set the offset. |
| Method | parse |
Parse the cluster information byte, setting the attributes of this cluster as necessary. |
| Method | read |
Read the entirety of the specified range in the specified blob and return the content. |
| Method | read |
Read the cluster information byte, returning it. |
| Method | remove |
Remove the specified blob. |
| Method | reset |
Reset all internal state except the cluster offset, causing said offset to be read again the next time it is required. |
| Method | set |
Set the blob for the specified index. |
| Method | unbind |
Unbind this object. Can be called multiple times. |
| Constant | DEFAULT |
how many bytes to read from a blob at once |
| Property | compression |
Compression to use, None when unknown. |
| Property | did |
True if the infobyte was already read and parsed. |
| Property | is |
Check whether this cluster needs to be an extended cluster. |
| Property | offset |
Absolute offset of the cluster. |
| Method | _adjust |
Adjust an index to work with deletions. |
| Method | _iter |
Iterate over the blob sizes. |
| Method | _iter |
Iteratively serialize the compressed part of this cluster, yielding each chunk that must be compressed. |
| Instance Variable | _added |
a list of indexes of newly added blobs |
| Instance Variable | _blobs |
a dict mapping blob number to the new/modified blobs |
| Instance Variable | _cluster |
the wrapped cluster. Any binding will be inherited. |
| Instance Variable | _force |
if not None, use this value for Cluster.compression |
| Instance Variable | _force |
if not None, use this value for ModifiableClusterWrapper.is_extended |
| Instance Variable | _removed |
a sorted list of removed blob numbers |
| Property | _has |
Return True if this wrapper has any modifications registered. |
Inherited from Cluster:
| Method | generate |
Generate the infobyte for this cluster. |
| Method | get |
Return the number of blobs in this cluster. |
| Method | get |
Return the total compressed size of the cluster. |
| Method | read |
Read and parse the infobyte if this has not yet happened. |
| Method | _get |
Return a compressor suitable to compress this cluster. |
| Method | _get |
Return a decompressing reader that can be sued to decompress the content. |
| Method | _get |
Return a decompressor suitable to decompress this cluster. |
| Method | _seek |
Seek to the specified position (relative to the cluster start) in the file only if it is needed. |
| Instance Variable | _decompressing |
Undocumented |
| Property | _pointer |
The pointer format. |
Inherited from BindableMixIn (via Cluster):
| Property | bound |
Whether this object is bound to a ZIM file or not. |
| Property | zim |
The bound ZIM archive, if any is bound. Otherwise None. |
| Instance Variable | _zim |
the bound ZIM archive or None |
Inherited from ModifiableMixIn (via Cluster, BindableMixIn):
| Method | add |
Add another modifiable object as a child of this one. |
| Method | dirty |
Setter for ModifiableMixIn.dirty |
| Method | ensure |
If this object is non-mutable, raise an Exception. |
| Method | get |
Return the size of this object when written to a file before any modifications has been made since the last read/flush. |
| Method | mark |
Convenience function to mark this object as dirty. |
| Method | remove |
Remove a submodifiable from this object. |
| Instance Variable | dirty |
True if this object or a sub-modifiable has been modified. |
| Instance Variable | mutable |
if not nonzero, prevent modifications of this object. |
| Instance Variable | _dirty |
a boolean flag that's nonzero if this object has been modified |
| Instance Variable | _old |
the size of this object on disk before any modifications since the last flush/read |
| Instance Variable | _submodifiables |
a list of child objects, whose dirty state will affect this objects dirty state. |
pyzim.cluster.Cluster.__init__The default constructor.
| Parameters | |
cluster:Cluster | cluster to wrap |
This method should be called after this object has been read and/or flushed to disk. In other words, it should be called at least once whenever this object matches the state of the object on the disk.
This method sets the old disk size, which allows us to late free the allocated space of the old object on disk. Thus, this method requires ModifiableMixIn.get_disk_size to work.
In addition, the object will be marked as non-dirty afterwards.
Append a blob to this cluster.
| Parameters | |
blobBaseBlobSource | source for the new blob |
| Returns | |
int | the index (=blob number) of the new blob |
| Raises | |
pyzim.exceptions.NonMutable | if this cluster is set to be inmutable. |
pyzim.bindable.BindableMixIn.bindBind this object to a ZIM file.
This method behaves mostly like pyzim.bindable.BindableMixIn.bind, but will also bind the wrapped cluster EXCEPT if it is already bound. This is so that the wrapped cluster and the wrapper can be bound to two different ZIM objects.
| Parameters | |
zim:pyzim.archive.Zim | ZIM archive to bind to |
| Raises | |
pyzim.exceptions.AlreadyBound | when already bound to a zim archive |
Set the compression to use.
| Parameters | |
value:pyzim.compression.CompressionType or int or None | value to set |
| Raises | |
TypeError | on type error |
pyzim.exceptions.UnsupportedCompressionType | when value is an int not registered in pyzim.compression.CompressionType |
Set the content of a blob to empty.
This does not delete the specified blob, but reduces its size to 0 (plus an additonal 4/8 bytes for the offset). The advantage of this method over ModifiableClusterWrapper.remove_blob is that any entries linking to subsequent blobs do not need to be modified.
| Parameters | |
i:int | index of blob to empty |
| Raises | |
pyzim.exceptions.NonMutable | if this cluster is set to be inmutable. |
If this cluster has been modified, write it it to the archive.
| Raises | |
pyzim.exceptions.BindRequired | if unbound |
pyzim.cluster.Cluster.get_blob_sizeGet the size of a blob.
| Parameters | |
i:int | index of blob to get size for |
| Returns | |
int | the size of the uncompressed blob |
| Raises | |
pyzim.exceptions.BindRequired | if cluster is unbound |
pyzim.exceptions.BlobNotFound | if the specified blob does not exists |
pyzim.cluster.Cluster.get_content_sizeReturn the content size of this cluster.
This is the uncompressed size of the content of this cluster, not including the offsets and infobyte.
| Returns | |
int | the size of the content of this cluster |
| Raises | |
pyzim.exceptions.BindRequired | if cluster is unbound |
Calculate the size of this object when written to a file.
NOTE: in this context, size refers to the direct size of the object. If this object contains references to other objects, their sizes will not be included. For example, a pyzim.entry.ContentEntry also links to a blob, but this function will only return the size of the entry itself, excluding the referenced blob.
| Returns | |
int | the size, in bytes |
Return the size of this object on disk as it has been read.
This differs from ModifiableMixIn.get_disk_size and ModifiableMixIn.get_unmodified_disk_size, as both methods return the size this object would have it would be written. This is important, because sometimes we can not guarantee that an object has the same size it would have when we write it without any further modification. An example would be a pyzim.cluster.Cluster, which may have a different size due to a mismatch in configuration parameters even when using the same compression type.
This method should be implemented by subclasses if the previously mentioned behavior is possible. By default, this just returns the same value as ModifiableMixIn.get_disk_size.
| Returns | |
int | the disk size as the object had when it was read from disk in bytes |
Return the number of offsets in this cluster.
This value differs from the number of blobs in the cluster.
| Returns | |
int | the number of offsets in this cluster. |
| Raises | |
pyzim.exceptions.BindRequired | if cluster is unbound |
pyzim.cluster.Cluster.get_offsetReturn the offset with the specified index.
| Parameters | |
i:int | index of blob to get offset for |
| Raises | |
IndexError | if i < 0 or i >= len(offsets) |
pyzim.exceptions.BindRequired | if cluster is unbound |
Return the total decompressed size of this cluster.
This is the uncompressed size of the content of this cluster, including the offsets but not the infobyte.
| Returns | |
int | the size of the content of this cluster including offsets |
| Raises | |
pyzim.exceptions.BindRequired | if cluster is unbound |
pyzim.cluster.Cluster.iter_blob_offsetsRead the blob offsets, yielding them as an iterator.
The order of blob_numbers does not matter, all offsets are always yielded in regular order (offfset 1, offset 2, ...).
| Parameters | |
blobNone or list of int | if specified, load only these offsets |
| Yields | |
int | the offset of each blob in the decompressed body, relative to cluster start |
| Raises | |
pyzim.exceptions.BindRequired | if cluster is unbound |
pyzim.cluster.Cluster.iter_read_blobIteratively read the specified blob.
The parameters 'start' and 'end' can be used to specify a range within the blob to read. In this case, both values are interpreted relative to the actual blob start. Similar to how python slices work, the 'start' value will be inclusive and the 'end' value exclusive. If start >= size of the blob, the return value will be b"". If the end lies outside the blob, read only up until the end of the blob.
| Parameters | |
i:int | index of blob to read |
buffersize:int | number of bytes to read at once |
start:None or int | if specified, the offset relative to the start of the blob to start reading from |
end:None or int | if specified, the offset relative to the start of the blob to stop reading at |
| Yields | |
bytes | chunks of the blob content |
| Raises | |
pyzim.exceptions.BindRequired | if cluster is unbound |
pyzim.exceptions.BlobNotFound | if the blob index is out of range |
Iteratively serialize this cluster, yielding each chunk that must be written.
| Yields | |
bytes | the data of this cluster as it needs to be written |
pyzim.cluster.Cluster.parse_infobyteParse the cluster information byte, setting the attributes of this cluster as necessary.
| Parameters | |
infobyte:bytes of length 1 | the cluster information byte |
| Raises | |
pyzim.exceptions.UnsupportedCompressionType | if the compression type is unknown. |
pyzim.cluster.Cluster.read_blobRead the entirety of the specified range in the specified blob and return the content.
The parameters 'start' and 'end' can be used to specify a range within the blob to read. In this case, both values are interpreted relative to the actual blob start. Similar to how python slices work, the 'start' value will be inclusive and the 'end' value exclusive. If start >= size of the blob, the return value will be b"". If the end lies outside the blob, read only up until the end of the blob.
| Parameters | |
i:int | index of blob to read |
start:None or int | if specified, the offset relative to the start of the blob to start reading from |
end:None or int | if specified, the offset relative to the start of the blob to stop reading at |
| Returns | |
bytes | the content of the blob |
| Raises | |
pyzim.exceptions.BindRequired | if cluster is unbound |
pyzim.exceptions.BlobNotFound | if the blob index is out of range |
pyzim.cluster.Cluster.read_infobyteRead the cluster information byte, returning it.
| Returns | |
bytes | the byte containing cluster information |
| Raises | |
pyzim.exceptions.BindRequired | if cluster is unbound |
Remove the specified blob.
NOTE: it is not recommended to use this method. Removing a blob also requires changing the blob numbers in entries pointing to blobs after the specified blob. This method does not take care of this. You should use ModifiableClusterWrapper.empty_blob instead.
| Parameters | |
i:int | index of blob to remove |
| Raises | |
pyzim.exceptions.NonMutable | if this cluster is set to be inmutable. |
pyzim.cluster.Cluster.resetReset all internal state except the cluster offset, causing said offset to be read again the next time it is required.
Set the blob for the specified index.
Index must be <= number of blobs. If equal, the blob will be appended.
| Parameters | |
i:int | index of blob to set |
blobBaseBlobSource | source for the new blob |
| Raises | |
pyzim.exceptions.NonMutable | if this cluster is set to be inmutable. |
pyzim.cluster.Cluster.did_read_infobyteTrue if the infobyte was already read and parsed.
| Returns | |
bool | True if the infobyte was already read and parsed. |
pyzim.cluster.Cluster.is_extendedCheck whether this cluster needs to be an extended cluster.
Note that this function does not return the extension status as set in the cluster information byte, but calculates the required extension status based on the offsets. It may be forcefully set to a specific value.
Adjust an index to work with deletions.
Basically, if an index is deleted, all further indexes get reduced by one. As deletions only happen virtually in the wrapper and the wrapped cluster still use the original index system, such indexes need to be adjusted.
Speaking from experience, it's rater easy to get confused on when to use the adjusted index. The simplified answer is: if it is passed to the wrapped cluster, use the adjusted index. Otherwise, use the raw index.
| Parameters | |
i:int | index to adjust |
| Returns | |
int | the adjusted index |
Iterate over the blob sizes.
This only includes blobs that are not removed.
| Yields | |
int | the blob sizes of each blob in bytes |
Iteratively serialize the compressed part of this cluster, yielding each chunk that must be compressed.
| Yields | |
bytes | the data of this cluster as it needs to be compressed and then written |