Base class for processors.
Each method will be called during certain operations of the Zim archive. They should all take any number of keyword arguments (**kwargs) as we expect more arguments to be changed over time. Most of the default implementations of these methods are NO-OP.
Some methods allow you to return a modified value. Beware that more than one BaseProcessor may return modified values, thus you can not be sure that the value you receive is actually the unmodified original value. Ideally, you should write your processor in such a way that subsequent processors can work on it too.
| Method | after |
Called when the archive has been closed. |
| Method | after |
Called when pyzim.archive.Zim.get_cluster_at was called. |
| Method | after |
Called after a cluster has been written. |
| Method | after |
Called during flush, after all content has been flushed. |
| Method | after |
Called when pyzim.archive.Zim.get_entry_at was called, before the entry is returned. |
| Method | after |
Called after an entry has been removed. |
| Method | after |
Called after an entry was written. |
| Method | after |
Called after the archive has been flushed. |
| Method | before |
Called when the archive will be closed. |
| Method | before |
Called when pyzim.archive.Zim.get_cluster_at was called. |
| Method | before |
Called before a cluster will be written. |
| Method | before |
Called when pyzim.archive.Zim.get_entry_at was called. |
| Method | before |
Called before an entry will be removed. |
| Method | before |
Called before an entry will be written. |
| Method | before |
Called before the archive will be flushed. |
| Method | on |
Called when a redirect will be added. |
| Method | on |
Called when this processor is installed to a ZIM file. |
| Instance Variable | zim |
zim archive this processor is bound to |
Called when pyzim.archive.Zim.get_cluster_at was called.
The cluster may have been retrieved from the cache or read from disk.
Keyword arguments:
- cluster (
pyzim.cluster.Cluster): cluster that has been loaded
| Parameters | |
**kwargs:dict | extra keyword arguments |
| Returns | |
pyzim.cluster.Cluster | the cluster that should be returned |
Called after a cluster has been written.
Keyword arguments:
- cluster (
pyzim.cluster.Cluster): cluster that should be written - old_offset (
intorNone) offset the cluster had before - new_offset (
int) offset the cluster has been written to - cluster_number (
int) number of the cluster that has been written
| Parameters | |
**kwargs:dict | extra keyword arguments |
pyzim.counter.CounterCalled during flush, after all content has been flushed.
At this point, the various pointerlists may not have yet been flushed.
| Parameters | |
**kwargs:dict | extra keyword arguments |
Called when pyzim.archive.Zim.get_entry_at was called, before the entry is returned.
Keyword arguments:
- location (
int): location/offset of the entry to load - entry (
pyzim.entry.BaseEntry): entry that should be returned - allow_cache_replacement (
int): seepyzim.archive.Zim.get_entry_at
| Parameters | |
**kwargs:dict | extra keyword arguments |
| Returns | |
pyzim.entry.BaseEntry | the entry that should be returned |
pyzim.counter.CounterCalled after an entry has been removed.
Keyword arguments:
- full_url (
str): full url of entry to remove - blob (
str): Seepyzim.archive.Zim.remove_entry_by_full_url - entry (
pyzim.entry.BaseEntry): entry that has been removed - is_article (
bool): whether the removed entry was an article or not.
| Parameters | |
**kwargs:dict | extra keyword arguments |
pyzim.counter.CounterCalled after an entry was written.
Keyword arguments:
- entry (
pyzim.entry.BaseEntry): entry that should be written - old_entry (
pyzim.entry.BaseEntryorNone): previous, unmodified entry, if any - old_offset (
intorNone): offset of old entry, if any - new_offset (
int): offset of new entry - is_new_entry (
bool): whether this is a new entry or not - add_to_title_pointer_list (
bool): Seepyzim.archive.Zim.write_entry - update_redirects (
bool): Seepyzim.archive.Zim.write_entry
| Parameters | |
**kwargs:dict | extra keyword arguments |
Called when pyzim.archive.Zim.get_cluster_at was called.
This is called at the beginning of said method.
Keyword arguments:
- location (
int): location/offset of the cluster to load
| Parameters | |
**kwargs:dict | extra keyword arguments |
Called before a cluster will be written.
Keyword arguments:
- cluster (
pyzim.cluster.Cluster): cluster that should be written
| Parameters | |
**kwargs:dict | extra keyword arguments |
| Returns | |
pyzim.cluster.Cluster | the cluster that should be written |
Called when pyzim.archive.Zim.get_entry_at was called.
This is called at the beginning of said method.
Keyword arguments:
- location (
int): location/offset of the entry to load - allow_cache_replacement (
int): seepyzim.archive.Zim.get_entry_at
| Parameters | |
**kwargs:dict | extra keyword arguments |
Called before an entry will be removed.
Keyword arguments:
- full_url (
str): full url of entry to remove - blob (
str): Seepyzim.archive.Zim.remove_entry_by_full_url
| Parameters | |
**kwargs:dict | extra keyword arguments |
Called before an entry will be written.
Keyword arguments:
- entry (
pyzim.entry.BaseEntry): entry that should be written - add_to_title_pointer_list (
bool): Seepyzim.archive.Zim.write_entry - update_redirects (
bool): Seepyzim.archive.Zim.write_entry
| Parameters | |
**kwargs:dict | extra keyword arguments |
| Returns | |
pyzim.entry.BaseEntry | the entry that should be written |
Called when a redirect will be added.
Keyword arguments:
- entry (
pyzim.entry.RedirectEntry): redirect entry that will be added
| Parameters | |
**kwargs:dict | extra keyword arguments |
pyzim.counter.CounterCalled when this processor is installed to a ZIM file.
By default, this sets BaseProcessor.zim.
| Parameters | |
zim:pyzim.archive.Zim | ZIM archive this processor is installed on |
**kwargs:dict | extra keyword arguments |