class documentation

class BaseProcessor(object):

Known subclasses: pyzim.counter.Counter

View In Hierarchy

Base class for processors.

Each method will be called during certain operations of the Zim archive. They should all take any number of keyword arguments (**kwargs) as we expect more arguments to be changed over time. Most of the default implementations of these methods are NO-OP.

Some methods allow you to return a modified value. Beware that more than one BaseProcessor may return modified values, thus you can not be sure that the value you receive is actually the unmodified original value. Ideally, you should write your processor in such a way that subsequent processors can work on it too.

Method after_close Called when the archive has been closed.
Method after_cluster_get Called when pyzim.archive.Zim.get_cluster_at was called.
Method after_cluster_write Called after a cluster has been written.
Method after_content_flush Called during flush, after all content has been flushed.
Method after_entry_get Called when pyzim.archive.Zim.get_entry_at was called, before the entry is returned.
Method after_entry_remove Called after an entry has been removed.
Method after_entry_write Called after an entry was written.
Method after_flush Called after the archive has been flushed.
Method before_close Called when the archive will be closed.
Method before_cluster_get Called when pyzim.archive.Zim.get_cluster_at was called.
Method before_cluster_write Called before a cluster will be written.
Method before_entry_get Called when pyzim.archive.Zim.get_entry_at was called.
Method before_entry_remove Called before an entry will be removed.
Method before_entry_write Called before an entry will be written.
Method before_flush Called before the archive will be flushed.
Method on_add_redirect Called when a redirect will be added.
Method on_install Called when this processor is installed to a ZIM file.
Instance Variable zim zim archive this processor is bound to
def after_close(self, **kwargs):

Called when the archive has been closed.

Parameters
**kwargs:dictextra keyword arguments
def after_cluster_get(self, **kwargs):

Called when pyzim.archive.Zim.get_cluster_at was called.

The cluster may have been retrieved from the cache or read from disk.

Keyword arguments:

Parameters
**kwargs:dictextra keyword arguments
Returns
pyzim.cluster.Clusterthe cluster that should be returned
def after_cluster_write(self, **kwargs):

Called after a cluster has been written.

Keyword arguments:

  • cluster (pyzim.cluster.Cluster): cluster that should be written
  • old_offset (int or None) offset the cluster had before
  • new_offset (int) offset the cluster has been written to
  • cluster_number (int) number of the cluster that has been written
Parameters
**kwargs:dictextra keyword arguments
def after_content_flush(self, **kwargs):
overridden in pyzim.counter.Counter

Called during flush, after all content has been flushed.

At this point, the various pointerlists may not have yet been flushed.

Parameters
**kwargs:dictextra keyword arguments
def after_entry_get(self, **kwargs):

Called when pyzim.archive.Zim.get_entry_at was called, before the entry is returned.

Keyword arguments:

Parameters
**kwargs:dictextra keyword arguments
Returns
pyzim.entry.BaseEntrythe entry that should be returned
def after_entry_remove(self, **kwargs):
overridden in pyzim.counter.Counter

Called after an entry has been removed.

Keyword arguments:

Parameters
**kwargs:dictextra keyword arguments
def after_entry_write(self, **kwargs):
overridden in pyzim.counter.Counter

Called after an entry was written.

Keyword arguments:

Parameters
**kwargs:dictextra keyword arguments
def after_flush(self, **kwargs):

Called after the archive has been flushed.

Parameters
**kwargs:dictextra keyword arguments
def before_close(self, **kwargs):

Called when the archive will be closed.

Parameters
**kwargs:dictextra keyword arguments
def before_cluster_get(self, **kwargs):

Called when pyzim.archive.Zim.get_cluster_at was called.

This is called at the beginning of said method.

Keyword arguments:

  • location (int): location/offset of the cluster to load
Parameters
**kwargs:dictextra keyword arguments
def before_cluster_write(self, **kwargs):

Called before a cluster will be written.

Keyword arguments:

Parameters
**kwargs:dictextra keyword arguments
Returns
pyzim.cluster.Clusterthe cluster that should be written
def before_entry_get(self, **kwargs):

Called when pyzim.archive.Zim.get_entry_at was called.

This is called at the beginning of said method.

Keyword arguments:

Parameters
**kwargs:dictextra keyword arguments
def before_entry_remove(self, **kwargs):

Called before an entry will be removed.

Keyword arguments:

Parameters
**kwargs:dictextra keyword arguments
def before_entry_write(self, **kwargs):

Called before an entry will be written.

Keyword arguments:

Parameters
**kwargs:dictextra keyword arguments
Returns
pyzim.entry.BaseEntrythe entry that should be written
def before_flush(self, **kwargs):

Called before the archive will be flushed.

Parameters
**kwargs:dictextra keyword arguments
def on_add_redirect(self, **kwargs):

Called when a redirect will be added.

Keyword arguments:

Parameters
**kwargs:dictextra keyword arguments
def on_install(self, zim, **kwargs):
overridden in pyzim.counter.Counter

Called when this processor is installed to a ZIM file.

By default, this sets BaseProcessor.zim.

Parameters
zim:pyzim.archive.ZimZIM archive this processor is installed on
**kwargs:dictextra keyword arguments
overridden in pyzim.counter.Counter

zim archive this processor is bound to