module documentation

Iteration utilities.

Function iter_by_cluster Iterate over all entries in an archive, yielding their full URLs grouped by cluster and sorted by blob number.
def iter_by_cluster(zim): ΒΆ

Iterate over all entries in an archive, yielding their full URLs grouped by cluster and sorted by blob number.

This method serves as a way to more efficiently decompress all data within an archive. When coupled with a bit of caching and the right cluster type (e.g. pyzim.cluster.OffsetRememberingCluster, iterating using this method prevents a cluster from being uncompressed more than once.

NOTE: this method reads all entries in a ZIM before it starts iterating. Consequently, this method may have a significant I/O overhead and RAM usage.

Redirects are yielded as their own group after the other groups.

Parameters
zim:pyzim.archive.ZimZIM archive to iterate iver
Yields
tuple of strtuple of URLs of entries, one tuple per cluster, each URL sorted by blob number
Raises
TypeErroron type error