pyzim.archive.Zim

class documentation

class Zim(ModifiableMixIn):

Constructor: Zim(f, offset, mode, policy)

A ZIM archive.

This object can be used to read, write and/or modify a ZIM file.

NOTE on modifying ZIM archives: to ensure optimal compression, some modifications will not immediately be written. This also means that reading previously modified entries may not be immediately effective. You can force-write all outstanding changes by calling pyzim.archive.Zim.flush. This will be done automatically on ZIM close.

Class Method	`open`	Open the Zim archive at the specified path.
Method	`__enter__`	Called upon entering a with-statement. Provides self as object for the context.
Method	`__exit__`	Called upon exiting a with-statement. Closes self.
Method	`__init__`	The default constructor for opening a ZIM file.
Method	`acquire_file`	A context manager that locks the file access and provides the wrapped file object for the context.
Method	`add_full_url_redirect`	Add a redirect from the source (full) url to the target (full) url.
Method	`add_item`	Add an item to this archive.
Method	`add_redirect`	Add a redirect from the source (non-full) url to the target (non-full) url.
Method	`calculate_checksum`	Calculate the checksum of this ZIM file and return it.
Method	`close`	Close the ZIM file. Can be safely called multiple times.
Method	`entry_at_url_is_article`	Check if the entry at the specified full url is an article.
Method	`flush`	Write all changes to disk.
Method	`get_checksum`	Read the checksum of this ZIM file and return it.
Method	`get_cluster_at`	Return the cluster at the specified location (offset) in the ZIM file.
Method	`get_cluster_by_index`	Return the cluster for the specified index.
Method	`get_cluster_index_by_offset`	Return the cluster index for the cluster at the specified offset.
Method	`get_content_entry_by_url`	Return the entry at the specified (non-full) URL in the "C" namespace.
Method	`get_disk_size`	Calculate the size of this object when written to a file.
Method	`get_entry_at`	Return the entry at the specified location (offset) in the ZIM file.
Method	`get_entry_by_full_url`	Return the entry at the specified full URL.
Method	`get_entry_by_url`	Return the entry at the specified (non-full) URL.
Method	`get_entry_by_url_index`	Return the entry at the specified index in the URL pointer list.
Method	`get_mainpage_entry`	Return the entry for the mainpage.
Method	`get_metadata`	Read a metadata entry, returning its value.
Method	`get_metadata_dict`	Return a dict containing all metadata of this ZIM.
Method	`get_metadata_keys`	Read all metadata keys, returning them as a list.
Method	`get_mimetype_by_index`	Return the mimetype with the specified index.
Method	`get_mimetype_of_entry`	Return the mimetype of the specified entry.
Method	`get_search`	Return an object that can be used to search this ZIM.
Method	`has_entry_for_full_url`	Return True if this ZIM file contains an entry for the specified full URL.
Method	`install_processor`	Install a processor on this archive.
Method	`iter_articles`	Iterate over all article entries in this ZIM.
Method	`iter_clusters`	Iterate over all clusters in this ZIM.
Method	`iter_entries`	Iterate over all entries in this ZIM.
Method	`iter_entries_by_url`	Iterate over all entries in this ZIM, ordered by full URL.
Method	`iter_mimetypes`	Iterate over all mimetypes in this archive.
Method	`new_cluster`	Add a new cluster to this archive.
Method	`remove_cluster_by_index`	Remove the cluster with the specified index.
Method	`remove_entry_by_full_url`	Remove the entry at the specified url.
Method	`set_mainpage_url`	Set the mainpage url.
Method	`set_metadata`	Set metadata of the ZIM archive.
Method	`update_checksum`	Calculate and write the checksum.
Method	`write_cluster`	Update an existing cluster in this zim.
Method	`write_entry`	Write an entry to this archive.
Instance Variable	`cluster_cache`	internal cache for clusters, mapping the full location to each cluster
Instance Variable	`compression_strategy`	compression strategy for assigning new items to clusters
Instance Variable	`entry_cache`	internal cache for entries, mapping the full location to each cluster
Instance Variable	`filelock`	a lock to ensure file access works with multiple threads. Acquire if whenever any work is done on the file.
Instance Variable	`header`	header of this ZIM file.
Instance Variable	`mimetypelist`	the mimetype list
Instance Variable	`mutable`	Undocumented
Instance Variable	`policy`	policy to use
Instance Variable	`spaceallocator`	an object responsible for managing storage space within the ZIM file, may be `None` if ZIM is read-only
Instance Variable	`uncompressed_compression_strategy`	compression strategy for assigning new items to clusters that are explicity uncompressed
Property	`closed`	Return True if this archive has already been closed, False otherwise.
Property	`counter`	Return the counter used for counting mimetype occurences.
Method	`_check_closed`	Check to ensure this ZIM file has not already been closed.
Method	`_get_full_url_for_entry_at`	Return the full URL for the entry with at the specified location.
Method	`_get_namespace_title_for_entry_by_url_index`	Return the namespace+title for the entry at the specified index in the URL pointer list.
Method	`_get_title_for_entry_by_url_index`	Return the title for the entry at the specified index in the URL pointer list.
Method	`_init_caches`	Initializes internal caches according to policy.
Method	`_init_new`	Initiate as a new, empty archive.
Method	`_load_header`	Read the header.
Method	`_load_mimetypelist`	Load the mimetypelist.
Method	`_load_pointerlists`	Load the URL and title pointer lists.
Method	`_new_cluster_num`	Return the number of the next new cluster.
Method	`_on_cluster_cache_leave`	Called when a cluster leaves the cache.
Method	`_on_entry_cache_leave`	Called when an entry leaves the cache.
Method	`_update_url_pointers`	Update references to URL pointers.
Instance Variable	`_article_title_pointer_list`	a pointerlist to article entries ordered by title
Instance Variable	`_base_offset`	base offset of ZIM archive within the underlying file object
Instance Variable	`_closed`	a flag indicating whether this archive has already been closed
Instance Variable	`_cluster_num`	next cluster number to assign
Instance Variable	`_cluster_pointer_list`	a pointer list to the individual clusters
Instance Variable	`_counter`	the counter counting mimetype occurences
Instance Variable	`_entry_title_pointer_list`	a pointerlist to entries ordered by title
Instance Variable	`_f`	the underlying file object
Instance Variable	`_mode`	the mode this archive has been opened in
Instance Variable	`_operation_buffer`	Undocumented
Instance Variable	`_operationbuffer`	buffer for not-yet-completable operations
Instance Variable	`_processors`	list of processors to that have been installed on this zim
Instance Variable	`_url_pointer_list`	a pointer list to entries ordered by URL
Instance Variable	`_writable`	a flag indicating whether this archvie can be written to.

Inherited from ModifiableMixIn:

Method	`add_submodifiable`	Add another modifiable object as a child of this one.
Method	`after_flush_or_read`	This method should be called after this object has been read and/or flushed to disk. In other words, it should be called at least once whenever this object matches the state of the object on the disk.
Method	`dirty.setter`	Setter for `ModifiableMixIn.dirty`
Method	`ensure_mutable`	If this object is non-mutable, raise an Exception.
Method	`get_initial_disk_size`	Return the size of this object on disk as it has been read.
Method	`get_unmodified_disk_size`	Return the size of this object when written to a file before any modifications has been made since the last read/flush.
Method	`mark_dirty`	Convenience function to mark this object as dirty.
Method	`remove_submodifiable`	Remove a submodifiable from this object.
Instance Variable	`dirty`	True if this object or a sub-modifiable has been modified.
Instance Variable	`_dirty`	a boolean flag that's nonzero if this object has been modified
Instance Variable	`_old_disk_size`	the size of this object on disk before any modifications since the last flush/read
Instance Variable	`_submodifiables`	a list of child objects, whose dirty state will affect this objects dirty state.

@classmethod

def open(cls, path, mode='r', offset=0, policy=DEFAULT_POLICY): ¶

Open the Zim archive at the specified path.

In addition to the modes listed in the documentation of pyzim.archive.Zim.__init__, the mode "x" is also supported. It behaves like mode "w", but raises an exception should the file already exists.

Parameters
path:`str`	path to open
mode:`str`	mode of the Zim archive (currently, only reading is supported)
offset:`int`	offset of the ZIM archive within the file.
policy:`pyzim.policy.Policy`	policy to use, default to `pyzim.policy.DEFAULT_POLICY`

Returns
`pyzim.archive.Zim`	the Zim archive opened from the file

Raises
`FileExistsError`	if mode == "x" and path already exists
`ValueError`	on invalid mode

def __enter__(self): ¶

Called upon entering a with-statement. Provides self as object for the context.

def __exit__(self, exc_type, exc_value, exc_traceback): ¶

Called upon exiting a with-statement. Closes self.

def __init__(self, f, offset=0, mode='r', policy=DEFAULT_POLICY): ¶

overrides pyzim.modifiable.ModifiableMixIn.__init__

The default constructor for opening a ZIM file.

Multiple modes are supported:

"r": read-only
"w": create a new file for writing, truncating the old file
"u"/"a": modify the existing file

Parameters
f:file-like object	file-like object to read from (NOTE: must support reading)
offset:`int`	offset of the ZIM archive within the file.
mode:`str`	in which mode to open the ZIM file (e.g. read)
policy:`pyzim.policy.Policy`	policy to use, default to `pyzim.policy.DEFAULT_POLICY`

Raises
`ValueError`	on invalid value for a parameter
`TypeError`	on invalid type for value

@contextlib.contextmanager

def acquire_file(self): ¶

A context manager that locks the file access and provides the wrapped file object for the context.

Raises
`pyzim.exceptions.ZimFileClosed`	when the ZIM file is already closed.

def add_full_url_redirect(self, source, target, title=None): ¶

Add a redirect from the source (full) url to the target (full) url.

This method uses full urls. You'll likely want to use pyzim.archive.Zim.add_redirect if you want to work with non-full urls in the "C" namespace.

Be warned that a redirect that can not be resolved will be buffered. This will not only result in an increased memory usage, but may also cause an exception to be raised later on if the url redirect can not be resolved during the next flush.

Parameters
source:`str`	full url to redirect from
target:`str`	full url to redirect to
title:`str` or `None`	title for the redirect, defaulting to the target entry title

Raises
`TypeError`	on type error
`ValueError`	on invalid value
`pyzim.exceptions.ZimFileClosed`	if archive is already closed
`pyzim.exceptions.NonMutable`	if this zim file is not mutable

def add_item(self, item, force_uncompressed=False): ¶

Add an item to this archive.

The write may not happen immediately.

Parameters
item:`pyzim.item.Item`	item to write
force_uncompressed:`bool`	if nonzero, add the item to the compression strategy for uncompressed content, regardless of other options

Raises
`TypeError`	on type error
`pyzim.exceptions.ZimFileClosed`	if archive is already closed
`pyzim.exceptions.NonMutable`	if this zim file is not mutable

def add_redirect(self, source, target, title=None): ¶

Add a redirect from the source (non-full) url to the target (non-full) url.

This method uses non-full urls and operates in the "C" namespace. Use pyzim.archive.Zim.add_full_url_redirect to work with full urls.

Parameters
source:`str`	non-full url to redirect from
target:`str`	non-full url to redirect to
title:`str` or `None`	title for the redirect, defaulting to the target entry title

Raises
`TypeError`	on type error
`ValueError`	on invalid value
`pyzim.exceptions.EntryNotFound`	if target url does not yet exists
`pyzim.exceptions.ZimFileClosed`	if archive is already closed
`pyzim.exceptions.NonMutable`	if this zim file is not mutable

def calculate_checksum(self): ¶

Calculate the checksum of this ZIM file and return it.

NOTE: this reads the entire ZIM file and calculates the ZIM file. If you want to read the checksum listed in the ZIM file, use pyzim.archive.Zim.get_checksum instead.

Returns
`bytes`	the calculated (md5) checksum of this ZIM

def close(self): ¶

Close the ZIM file. Can be safely called multiple times.

def entry_at_url_is_article(self, full_url): ¶

Check if the entry at the specified full url is an article.

Articles are always in C namespace, thus the full url must start with a C.

This method returns False if the entry does not exists at all.

Parameters
full_url:`str`	full url of entry to check

Returns
`bool`	whether the entry is an article or not

Raises
`TypeError`	on type error
`ValueErorr`	on value error.
`pyzim.exceptions.ZimFileClosed`	if archive is already closed

def flush(self): ¶

Write all changes to disk.

Raises
`pyzim.exceptions.ZimFileClosed`	when the ZIM file is already closed.
`pyzim.exceptions.NonMutable`	if this ZIM file is set to be non-mutable

def get_checksum(self): ¶

Read the checksum of this ZIM file and return it.

NOTE: this reads the checksum from the ZIM file, it does not calculate the actual checksum of the file. If you want to calculate the checksum of the ZIM, use pyzim.archive.Zim.calculate_checksum instead.

Returns
`bytes`	the (md5) checksum of this ZIM

def get_cluster_at(self, location): ¶

Return the cluster at the specified location (offset) in the ZIM file.

If caching is configured, an instance of a previous cluster may be returned. This entry may already be modified and/or bound (even if bind=False).

Parameters
location:`int`	location/offset of the cluster in the ZIM file

Returns
`pyzim.cluster.Cluster`	the entry at the specified location

def get_cluster_by_index(self, i): ¶

Return the cluster for the specified index.

Parameters
i:`int`	index of cluster to get

def get_cluster_index_by_offset(self, offset): ¶

Return the cluster index for the cluster at the specified offset.

Note that the offset must match exactly the offset of the cluster. This is not the full offset (base offset must be substracted manually).

This method is mostly used as a helper by clusters to determine their own index.

Returns
`int`	the index of the cluster at the offset in the cluser pointer list

Raises
`KeyError`	if the offset does not refer to a cluster.

def get_content_entry_by_url(self, url): ¶

Return the entry at the specified (non-full) URL in the "C" namespace.

NOTE: "content" refers to an entry in the "C" namespace. This function may still return any type of pyzim.entry.BaseEntry and is NOT restricted to pyzim.entry.ContentEntry.

Parameters
url:`str`	url of entry to get

Returns
`pyzim.entry.BaseEntry`	the entry at the specified url

Raises
`pyzim.exceptions.EntryNotFound`	when no entry matches the specified URL

def get_disk_size(self): ¶

overrides pyzim.modifiable.ModifiableMixIn.get_disk_size

Calculate the size of this object when written to a file.

NOTE: in this context, size refers to the direct size of the object. If this object contains references to other objects, their sizes will not be included. For example, a pyzim.entry.ContentEntry also links to a blob, but this function will only return the size of the entry itself, excluding the referenced blob.

Returns
`int`	the size, in bytes

def get_entry_at(self, location, bind=True, allow_cache_replacement=True): ¶

Return the entry at the specified location (offset) in the ZIM file.

If caching is configured, an instance of a previous entry may be returned. This entry may already be modified and/or bound (even if bind=False).

Parameters
location:`int`	location/offset of the entry in the ZIM file
bind:`bool`	if nonzero (default), bind this entry
allow_cache_replacement:`bool`	if nonzero (default), allow cached entries to be replaced

Returns
`pyzim.entry.BaseEntry`	the entry at the specified location

def get_entry_by_full_url(self, full_url): ¶

Return the entry at the specified full URL.

Parameters
full_url:`str`	full URL of entry to get

Returns
`pyzim.entry.BaseEntry`	the entry at the specified URL

Raises
`pyzim.exceptions.EntryNotFound`	when no entry matches the specified URL

def get_entry_by_url(self, namespace, url): ¶

Return the entry at the specified (non-full) URL.

Parameters
namespace:`str` of length 1	namespace of entry to get
url:`str`	url of entry to get

Returns
`pyzim.entry.BaseEntry`	the entry at the specified url

Raises
`pyzim.exceptions.EntryNotFound`	when no entry matches the specified URL

def get_entry_by_url_index(self, i, allow_cache_replacement=True): ¶

Return the entry at the specified index in the URL pointer list.

Parameters
i:`int`	index of entry in URL pointer list
allow_cache_replacement:`bool`	if nonzero (default), allow cached entries to be replaced

Returns
`pyzim.entry.BaseEntry`	the entry at the specified location

Raises
`pyzim.exceptions.EntryNotFound`	when no entry matching the index was found

def get_mainpage_entry(self): ¶

Return the entry for the mainpage.

Returns
`pyzim.entry.BaseEntry`	the entry for the mainpage

Raises
`pyzim.exceptions.EntryNotFound`	when no mainpage exists

def get_metadata(self, key, as_unicode=True): ¶

Read a metadata entry, returning its value.

See https://wiki.openzim.org/wiki/Metadata for metadata keys and values.

By default, this method returns unicode. You can set as_unicode=False to prevent this. If the key is not found, return None.

Parameters
key:`str`	key/URL of metadata
as_unicode:`bool`	whether to decode value or not

Returns
`str` or `bytes` (or `None` if not found)	the metadata value

Raises
`pyzim.exceptions.ZimFileClosed`	if archive is already closed

def get_metadata_dict(self, as_unicode=True): ¶

Return a dict containing all metadata of this ZIM.

NOTE: values of certain metadata keys won't be decoded. This prevents the decoding of binary content of images..

Parameters
as_unicode:`bool`	whether to decode strings or not

Returns
`dict` of `str` or `bytes` -> `bytes` or `str`	a dict containing the metadata

def get_metadata_keys(self, as_unicode=True): ¶

Read all metadata keys, returning them as a list.

By default, this method returns unicode. You can set as_unicode=False to prevent this. If the key is not found, return None.

Parameters
as_unicode:`bool`	whether to decode value or not

Returns
`list` of `str` or `bytes`	the metadata keys

def get_mimetype_by_index(self, i): ¶

Return the mimetype with the specified index.

Parameters
i:`int`	index of mimetype to get

Returns
`str`	the mimetype with the specified index

Raises
`IndexError`	when the index is invalid

def get_mimetype_of_entry(self, entry): ¶

Return the mimetype of the specified entry.

If the entry is a redirect, this will be pyzim.constants.MIMETYPE_REDIRECT.

Parameters
entry:`pyzim.entry.BaseEntry`	entry to get mimetype for

Returns
`str`	the mimetype of this entry

def get_search(self): ¶

Return an object that can be used to search this ZIM.

There are various ways to search a ZIM, for which pyzim tries to provide a unified interface. This method will return any available search. Said search may, however, be more limited than other search implementations. It is as such recommended not to use this method and instead manually instanciating one of the child classes of pyzim.search.BaseSearch. Use this method only if you don't care about what search you get.

Currently, this method will try to provide you with a xapian fulltext search, falling back to a xapian title search and finally to a simple titlestart based search.

Returns
`pyzim.search.BaseSearch`	a search object that can be used to search this ZIM

def has_entry_for_full_url(self, full_url): ¶

Return True if this ZIM file contains an entry for the specified full URL.

Parameters
full_url:`str`	full URL of entry to check existence of

Returns
`bool`	True if an entry for this full URL exists. It may be a redirect.

def install_processor(self, processor): ¶

Install a processor on this archive.

See pyzim.processor for more details.

Parameters
processor:`bool`	processor to install

Raises
`TypeError`	on type error

def iter_articles(self, start=None, end=None): ¶

Iterate over all article entries in this ZIM.

If start and end are specified, they reference the indexes of the first (inclusive) and last (exclusive) entry to return. In other words, this behavior matches the l[start:end] syntax.

This function does not guarantee any specific order of the entries yielded by this function, however it currently *should* be ordered by title.

Parameters
start:`int`	index of first entry to return (inclusive)
end:`int`	index of last entry to return (exclusive)

Yields
`pyzim.entry.BaseEntry`	the entries in the specified range

def iter_clusters(self, start=None, end=None): ¶

Iterate over all clusters in this ZIM.

If start and end are specified, they reference the indexes of the first (inclusive) and last (exclusive) clusters to return. In other words, this behavior matches the l[start:end] syntax.

Parameters
start:`int`	index of first cluster to return (inclusive)
end:`int`	index of last cluster to return (exclusive)

Yields
`pyzim.cluster.Cluster`	the clusters in the specified range

Raises
`IndexError`	on invalid/out of bound indexes

def iter_entries(self, start=None, end=None): ¶

Iterate over all entries in this ZIM.

If start and end are specified, they reference the indexes of the first (inclusive) and last (exclusive) entry to return. In other words, this behavior matches the l[start:end] syntax.

This function does not guarantee any specific order of the entries yielded by this function, however it currently *should* be ordered by URL.

Before, this method iterated by title, but this has been changed following the removal of the v0 entry title index.

Parameters
start:`int`	index of first entry to return (inclusive)
end:`int`	index of last entry to return (exclusive)

Yields
`pyzim.entry.BaseEntry`	the entries in the specified range

def iter_entries_by_url(self, start=None, end=None): ¶

Iterate over all entries in this ZIM, ordered by full URL.

If start and end are specified, they reference the indexes of the first (inclusive) and last (exclusive) entry to return. In other words, this behavior matches the l[start:end] syntax.

Parameters
start:`int`	index of first entry to return (inclusive)
end:`int`	index of last entry to return (exclusive)

Yields
`pyzim.entry.BaseEntry`	the entries in the specified range

def iter_mimetypes(self, as_unicode=False): ¶

Iterate over all mimetypes in this archive.

Parameters
as_unicode:`bool`	if nonzero, decode mimetypes

Yields
`bytes` or `str` if as_unicode is nonzero	the mimetypes in this mimetype list

def new_cluster(self): ¶

Add a new cluster to this archive.

NOTE: the cluster will not be cached until it is written at least once. Consequently, the autoflush function will not work until you've written them at least once.

Returns
`pyzim.cluster.ModifiableClusterWrapper`	a new cluster

Raises
`pyzim.exceptions.ZimFileClosed`	if archive is already closed
`pyzim.exceptions.NonMutable`	if this zim file is not mutable

def remove_cluster_by_index(self, i): ¶

Remove the cluster with the specified index.

Parameters
i:`int`	index of cluster to remove

def remove_entry_by_full_url(self, full_url, blob='empty'): ¶

Remove the entry at the specified url.

You can specify how the associated blob should be treated using the blob parameter:

"keep": do nothing
"empty": empty the associated blob (see pyzim.cluster.ModifiableClusterWrapper.empty_blob)
"remove": delete the blob. Be warned that this will likely cause issues with other indexes.

If the entry has an associated blob, the cluster will be flushed.

Redirects pointing towards this url will also be removed. Buffered operations may interfere with this behavior, so be sure to flush() before.

Parameters
full_url:`str`	full url of entry to remove
blob:`str`	how to treat the associated blob

Raises
`TypeError`	on type error
`ValueErorr`	on value error.
`pyzim.exceptions.EntryNotFound`	if the target entry does not exist
`pyzim.exceptions.ZimFileClosed`	if archive is already closed
`pyzim.exceptions.NonMutable`	if this zim file is not mutable

def set_mainpage_url(self, url): ¶

Set the mainpage url.

An entry for the specified url must already exists.

Parameters
url:`str` or `None`	non-full url of the mainpage (the mainpage is always in the `"C"` namespace). Set to `None` to disable.

Raises
`TypeError`	on type error
`pyzim.exceptions.ZimFileClosed`	if archive is already closed
`pyzim.exceptions.NonMutable`	if this zim file is not mutable

def set_metadata(self, key, value, mimetype='text/plain'): ¶

Set metadata of the ZIM archive.

Parameters
key:`str`	key of metadata to set
value:`str` or `bytes`	value of metadata to set
mimetype:`str` or `bytes`	mimetype of the associated blob

Raises
`TypeError`	on type error
`ValueError`	on invalid value
`pyzim.exceptions.ZimFileClosed`	if archive is already closed
`pyzim.exceptions.NonMutable`	if this zim file is not mutable

def update_checksum(self): ¶

Calculate and write the checksum.

NOTE: this prior to this, pyzim.header.Header.checksum_position should already be set to the new position and the header flushed. This method does not take care of this.

def write_cluster(self, cluster, cluster_num=None): ¶

Update an existing cluster in this zim.

The cluster must already be part of this archive. Use Zim.new_cluster for creating new clusters.

Parameters
cluster:`ModifiableClusterWrapper`	cluster to write
cluster_num	the number/id of the cluster. Providing it speeds up the method.

Returns
`int`	the cluster number

Raises
`TypeError`	on type error
`ValueError`	on invalid values (e.g. negative cluster numbers)
`pyzim.exceptions.ZimFileClosed`	if archive is already closed
`pyzim.exceptions.NonMutable`	if this zim file is not mutable
`pyzim.exceptions.BindingError`	if cluster is not bound to self

def write_entry(self, entry, update_redirects=True, add_to_title_pointer_list=True): ¶

Write an entry to this archive.

Parameters
entry:`pyzim.entry.BaseEntry`	entry to write
update_redirects:`bool`	if nonzero, update redirects to this article if necessary
add_to_title_pointer_list:`bool`	if nonzero (default), add the entry to the title pointer lists

Raises
`TypeError`	on type error
`pyzim.exceptions.ZimFileClosed`	if archive is already closed
`pyzim.exceptions.NonMutable`	if this zim file is not mutable
`pyzim.exceptions.BindingError`	if entry is not bound to self

cluster_cache: pyzim.cache.BaseCache = ¶

internal cache for clusters, mapping the full location to each cluster

compression_strategy: pyzim.compressionstrategy.BaseCompressionStrategy or None = ¶

compression strategy for assigning new items to clusters

entry_cache: pyzim.cache.BaseCache = ¶

internal cache for entries, mapping the full location to each cluster

filelock: threading.Lock = ¶

a lock to ensure file access works with multiple threads. Acquire if whenever any work is done on the file.

header: pyzim.header.Header = ¶

header of this ZIM file.

mimetypelist: pyzim.mimetypelist.MimeTypeList = ¶

the mimetype list

mutable = ¶

overrides pyzim.modifiable.ModifiableMixIn.mutable

Undocumented

policy: pyzim.policy.Policy = ¶

policy to use

spaceallocator: pyzim.spaceallocator.SpaceAllocator or None = ¶

an object responsible for managing storage space within the ZIM file, may be None if ZIM is read-only

uncompressed_compression_strategy: pyzim.compressionstrategy.BaseCompressionStrategy or None = ¶

compression strategy for assigning new items to clusters that are explicity uncompressed

@property

closed: bool = ¶

Return True if this archive has already been closed, False otherwise.

@property

counter: pyzim.counter.Counter or None = ¶

Return the counter used for counting mimetype occurences.

If not counter is available, return None instead.

def _check_closed(self): ¶

Check to ensure this ZIM file has not already been closed.

Raises
`pyzim.exceptions.ZimFileClosed`	when the ZIM file is already closed.

def _get_full_url_for_entry_at(self, location): ¶

Return the full URL for the entry with at the specified location.

This is used as the key function for the URL pointer list.

Parameters
location:`int`	location of the entry in the ZIM file

Returns
`bytes`	the full URL of the specified entry

def _get_namespace_title_for_entry_by_url_index(self, i): ¶

Return the namespace+title for the entry at the specified index in the URL pointer list.

This is used as the key function for the entry title pointer list.

Parameters
i:`int`	index of the entry in the URL pointer list

Returns
`str`	the <namespace><title> of the entry

def _get_title_for_entry_by_url_index(self, i): ¶

Return the title for the entry at the specified index in the URL pointer list.

This is used as the key function for the article pointer list.

Parameters
i:`int`	index of the entry in the URL pointer list

Returns
`str`	the title of the specified entry

def _init_caches(self): ¶

Initializes internal caches according to policy.

def _init_new(self): ¶

Initiate as a new, empty archive.

This instantiated the header, pointerlists, ... .

TODO: find a better name for this method.

def _load_header(self): ¶

Read the header.

def _load_mimetypelist(self): ¶

Load the mimetypelist.

def _load_pointerlists(self): ¶

Load the URL and title pointer lists.

def _new_cluster_num(self): ¶

Return the number of the next new cluster.

This also increments the internal counter.

Returns
`int`	the number of the next cluster

def _on_cluster_cache_leave(self, cluster_offset, cluster): ¶

Called when a cluster leaves the cache.

If the archive is writable and autoflush is enabled, write the cluster if it is dirty.

Parameters
cluster_offset:`int`	total offset of cluster
cluster:`pyzim.cluster.Cluster`	the cluster leaving the cache

def _on_entry_cache_leave(self, full_location, entry): ¶

Called when an entry leaves the cache.

If the archive is writable and autoflush is enabled, write the entry if it is dirty.

Parameters
full_location:`int`	the full offset of the entry
entry:`pyzim.entry.BaseEntry`	the entry leaving the cache

def _update_url_pointers(self, start, diff, edit_etpl=True, edit_atpl=True, update_redirects=True, skip=()): ¶

Update references to URL pointers.

As several pointers point to the position of an entry within the URL pointer list, but said list is sorted, modifying it will likely cause said pointers to point to the wrong entries. This method takes care of updating said references.

Parameters
start:`int`	lowest URL pointer index that needs updating
diff:`int`	integer to update said references by (e.g. `1`)
edit_etpl:`bool`	if nonzero (default), update the entry title pointer list
edit_atpl:`bool`	if nonzero (default), update the article title pointer list
update_redirects:`bool`	if nonzero (default), update redirects
skip:`list` or `tuple` of `str`	list or tuple of full urls not to update recursively

_article_title_pointer_list: pyzim.pointerlist.TitlePointerList = ¶

a pointerlist to article entries ordered by title

_base_offset: int = ¶

base offset of ZIM archive within the underlying file object

_closed: bool = ¶

a flag indicating whether this archive has already been closed

_cluster_num: int = ¶

next cluster number to assign

_cluster_pointer_list: pyzim.pointerlist.SimplePointerList = ¶

a pointer list to the individual clusters

_counter: pyzim.counter.Counter = ¶

the counter counting mimetype occurences

_entry_title_pointer_list: pyzim.pointerlist.TitlePointerList = ¶

a pointerlist to entries ordered by title

_f: file-like object = ¶

the underlying file object

_mode: str = ¶

the mode this archive has been opened in

_operation_buffer = ¶

Undocumented

_operationbuffer: pyzim.operationbuffer.OperationBuffer or None = ¶

buffer for not-yet-completable operations

_processors: list of pyzim.processor.BaseProcessor = ¶

list of processors to that have been installed on this zim

_url_pointer_list: pyzim.pointerlist.OrderedPointerList = ¶

a pointer list to entries ordered by URL

_writable: bool = ¶

a flag indicating whether this archvie can be written to.