The following logs are generated for the Open Library production server.
This is log of every modification on the OpenLibrary website. This can be used to replay the changes on a different database for mirroring the production database.
This is on available on wiki-beta at /1/pharos/code/production/log.
To accessible via rsync at wiki-beta::pharos/log
This is log of modifications to the pages of type /type/edition. This can be used to import changes to the search engine.
This is on available on wiki-beta at /1/pharos/code/production/pharos/booklog.
To accessible via rsync at wiki-beta::pharos/booklog
infogami.infobase.logreader module provides utilities to read log from local and remote machines. It uses rsync to read remote logs.
Create log reader:
from infogami.infobase.logreader import LogReader, LogFile, RsyncLogFile
# local log reader
reader = LogReader(LogFile("log"))
# remote log reader
reader = LogReader(RsyncLogFile("wiki-beta::pharos/log/", "log"))
Skip the log till a timestamp:
reader.skip_till(datetime.datetime(2007, 7, 7, 7, 7, 7))
Iterate over log:
for entry in reader:
print entry
Iterate over log infinitely:
while True:
for entry in reader:
print entry
time.sleep(10)
Read log in chunks:
while True:
entries = reader.read_entries(1000)
if entries:
do_something(entries)
else:
break
Read log in chunks infinitely:
while True:
entries = reader.read_entries(1000)
if entries:
do_something(entries)
else:
time.sleep(10)
Remembering and restoring log position:
# remember
pos = reader.logfile.tell()
# restore
readedr.logfile.seek(pos)
For importing/replay tasks, it is important to remember the position of the log at the end of the import task, so that next task can start from where it was left in the previous task. The timestamp of the last entry can also be used
Ask Anand if you want to know any more details.