code.pl v1.14 - CGI script to convert on-the-fly html pages across cyrillic charsets
Many Russia WWW servers are based on modified APACHE so, that different encodings are returned when clients connect to different server ports or to different subdomains. This is convenient for servers in Russia, but cannot be used abroad for Web sites using virtual servers or just having some space at an Internet provider's server. The following approach solves the problem by using one CGI script without any changes in WWW server software.
Those are code.pl features:
Can translate localy stored files
Can translate remote files, retrieving them via HTTP
Recognizes source encoding from <META HTTP-EQUIV=``Content-Type'' ...> tag inside
Adjusts the above tag for new encoding or deletes it for buggy browsers.
Charsets supported:
KOI8 - KOI8-R
WIN - WINDOWS-1251
MAC - Macintosh
DOS - DOS, alternative, CP-866
ISO - ISO-8859-5
ISO - UTF-8 (Unicode)
VOL - Volapuk (transliteration)
NOCS - KOI8-R, deleting Content-Type META tag, for buggy browsers
Put the script in your cgi-bin directory.
Edit the script to set script parameters to your configuration
$path=``..''; # <==== path from cgi-bin to the server root.
$defcode=``WIN''; # <==== default source encoding
$IndexFileName
= 'index.html'; # default.htm or index.html,
depending on your server
Refer to the script as: http://www.youserver.here/cgi-bin/code.pl/TAB/URL to be translated.
TAB is one of the above encodings
TAB can also also be of form 'fromcode-tocode' for explicit definition of the original file encoding.
URL is absolute URL from the server root (Don't forget to set $path in code.pl) or full URL like http://cnn.com.
All relative references from this page to other WEB pages will be also translated through the same code table (isn't supported yet for full URLs).
Source encoding is determined by the following algorithm. The first matching rule from this list is selected.
If TAB specified by src-dst form, src is the source encoding.
If Metatag like: <META HTTP-EQUIV=``Content-Type'' CONTENT=``text/plain; charset=win''> is present its charset is used. The tag is updated during translation by replacing source encoding by the destination one.
Default encoding is taken from variable $defcode
in code.pl.
It is recommended that you put <META HTTP-EQUIV=``Content-Type'' ...> on all your pages, and choose only destination encoding in urls. Do not worry for old buggy browsers which can't display correctly pages with this metatag NOCS encoding converts page to koi8 and deletes the metatag.
If you use APPACHE you can add the lines similar to those to your webserver configuration files:
ScriptAlias /koi8 /home/www/neystadt/cgi-bin/code.pl/koi8 ScriptAlias /win /home/www/neystadt/cgi-bin/code.pl/win ScriptAlias /dos /home/www/neystadt/cgi-bin/code.pl/dos ScriptAlias /mac /home/www/neystadt/cgi-bin/code.pl/mac ScriptAlias /iso /home/www/neystadt/cgi-bin/code.pl/iso ScriptAlias /utf8 /home/www/neystadt/cgi-bin/code.pl/utf8 ScriptAlias /vol /home/www/neystadt/cgi-bin/code.pl/vol ScriptAlias /lat /home/www/neystadt/cgi-bin/code.pl/vol ScriptAlias /nocs /home/www/neystadt/cgi-bin/code.pl/nocs
From now you will be able to translate urls like http://www.neystadt.org/russia/ simply by prefixing the url with encoding: http://www.neystadt.org/koi8/russia/ or http://www.neystadt.org/lat/russia/.
Note that code.pl automatically finds index.html if directory names is
given (like in example above). The index file name can be changed by
$IndexFileName
parameter in the script.
To translate http://www.neystadt.org/vist/ from Windows-1251 to KOI8:
http://www.neystadt.org/cgi-bin/code.pl/win-koi8/vist/
To translate output of the script http://www.neystadt.org/cgi-bin/miitqr.pl?abc from its default encoding to KOI8:
http://www.neystadt.org/cgi-bin/code.pl/koi8/http://www.neystadt.org/cgi-bin/miitqr.pl?abc
This script requires the LWP
, Convert::Cyrillic
and HTTP::Headers::UserAgent
modules available from CPAN or at http://www.neystadt.org/cyrillic/.
All UNIXes, Windows NT
CGI/Filter