BioJapan.de | Armin | E-Dictionary | Links |
Front | Technical | Download | FAQ | Examples | Reviews |
We put together the one and only truely useful Japanese-German electronic
dictionary. Other languages can be handled in the same way.
A useful electronic dictionary tool for foreigners in Japan should have the following
functions:
ZOK, short for Zaurus Otaku Kurabu, is a small group of (actually not so otaku) expatriates in Tokyo who did this work: Kurt Fischer is the one who actually cut most of the knots, Uli Plate had plenty of good ideas, while I struggled to keep up with the news somehow, wrote some Perl scipts and this page. ZOK also constitutes a mailing list.
Sharp has been building its "Zaurus" PDA since 1995 or so, for the Japanese market only. Initially, the Zaurus became popular as an electronic organizer, featuring scheduler, address book, note pad and a small but useful set of dictionaries: Japanese-Japanese, Kanji-Japanese, Japanese-English and English-Japanese. The key feature to the Zaurus' popularity was presumably its direct kanji entry utility with an automatic handwriting recognition that actually works.
It must have been with the Igeti-series around 1997 that Sharp adopted the compact flash memory card (CF) standard as the Zaurus' extension port. With the growing popularity of digital cameras, CF cards have become a cheap mass product and make it possible to store and handle an amazing amount of data on the Zaurus - such as a "bookshelf full" of dictionaries.
While the electronic organizer functions havn't changed much, the new development goes towards multimedia gadgets. Recent models can not only handle email and web browsing but also play and record music (MP3) and video (MPEG4) straight from your TV or hifi analog output. Java is the Zaurus' new programming platform, which should make software portation and communication with PCs and other machines increasingly easy. (Note, however, that the software we use still comes from the pre-Java era and runs only on the Zaurus OS.)
From newspaper reports, I gather that Sharp will start selling the Zaurus in the US and Europe starting in March 2002. The international model will run under the operation system LINUX, while the Japanese version will retain Sharp's proprietary OS. The international version, so I imagine, would hardly include the handwriting recognition nor run with our software. Think of it as a totally different machine. Unfortunately, there is NOTHING in terms of English information about the Japanese models of the Zaurus: No English handbooks, only two or three minor application programs in English.
More information on Zaurus models
The larger a flash card you have, the better: The software will handle up to 12add-on dictionaries. We have never had a problem that the Zaurus could not address a memory card properly, eihter for its size or for any other reasons. We don't know what is the upper size limit. The MI-110 works with 128MB CF cards, the MI-P10, E1, L1 and E21 manage 256MB for sure. If you have larger memory cards, please stick them in and let us know of your success - my guess is that they will work. The word is that IBM's Microdrive devices can NOT be used. Should you try it out, let us know!. We have never tried to use the new and expensive high-speed CF cards.
We first formatted the CF by plugging it into the Zaurus. Then, we plugged
the CF into the PCMCIA port of a PC under Japanese Windows98, using the appropriate adapter.
There should be a "__zaurus" subdirectory with lots of small files. Deposit any files in this directory and then plug it back into the Zaurus.
Note: The CF card also works with Macintosh, Linux and digital cameras. It's a really neat tool exchange files between any of those machines. Because I don't have my own digital camera, on parties, I often borrow a camera of a friend. I plug in the dictionary card from the Zaurus, take a few pictures and can readily see them on the Zaurus (in b/w on my model, the software takes some time to open large pictures). At home, then, I plug the CF into my computer to upload the pictues to the web or to my hard disk. (By the way, no digital camera or other device has ever had a problem in working with any noname 256MB CF card I plugged in.)
We have not tried, but technically you should have no problem. Click here for a handy tool by Silas Brown to display Japanese code as graphics on your browser - this will enable you to follow the Japanese links from this page.
Vector has quite a lot of good free software for the Zaurus to download. ZPDVIEW, the dictionary tool we use, is writen by Ogasawara Hiroyuki (小笠原博之), and can be downloaded (scroll down about 3/4 of the page to ZPDVIEW). Deposit the downloaded file WOBP144.ZAC in the _Zaurus directory on the CF.
Slide the CF into the Zaurus. Go to "MORE Soft", click "Card" to find a file PDIC144.ZAC. With the "Tenkai" botton (top right) you can unpack and install the dictionary search engine. This makes the program "ZPDVIEW v1.44" appear in MORE Soft. Click it and jikko (top right) to run it. What you can't see in MORE Soft is that the card's _Zaurus directory also contains instructions, which you can read back on your PC, for example. Find a longer explanation in my FAQ list.
Compared to the Zaurus native dictionary, ZPDVIEW lacks the easy-to-read large display, but it does have a history function. You will probably end up using both dictionaries in combination. With good dictionary files that show you the kanji, hiragana and translation at one sight, though, I have found myself sticking to ZPDVIEW more and more.
Now that you got the tool, what remains to be done is to put the actual vocabulary files on to the CF. The files must be in PDIC format and renamed to WOBP0000.DAT to WOBP0011.DAT for the max. 12 dictionaries you can install. In a file WOBPMENU.TXT, list the names of the dictionaries you used, one per line. As an example, the contents of my WOBPMENU.TXT file is:
EDICT 和独JT 漢字DIC 広辞苑 名前 KanjiABC giongo EDICT-2 [tab character]
Have a look at the Screen shots, too.
New: ZPDVIEW for WindowsCE has just been released. Now, you can use all our dictionaries on the iPaq, too. That is, on the Japanese iPaq, and other JWinCE handhelds. Note that the handwriting recogniton is not part of ZPDVIEW. I know that such tools exist for WinCE, but don't ask me for details - ask a WinCE person.
Problem: ZPDVIEW vor Windows CE on the Toshiba Genio appears to hide the first two lines of the every entry, rendering the dictionary quite useless. I suppose the author, Ogasawara, will fix this bug soon.
New: ZPDVIEW1.5 - a new version has just been released. The main features added are larger fonts and tools to save words you looked up to a vocabulary list for review.
Dictionary data needs to be converted in a number of steps. The data formats are:
dictionary -->(1)--> plain text data -->(2)--> properly formatted text data -->(3)--> PDIC (Zaurus usable) format data
The conversion software used is:
(1) dictionary-specific, for example Filemaker (WadokuJT) or DDWIN (Epwing CD-ROM).
(2) This step needs hands-on programming or tweaking. So I wrote a PERL script, which you are welcome to download and use. It is also possible trick popular software packages such as MS Word and Excel into doing many of the conversions, but the result is a compromise.
(3) PDIC
I can't make the dictionary files ready for use downloadable here for two reasons: Firstly, it would conflict with the copyright of some of the data, and even in cases where not, it is always cleaner to get data from its author and owner directly. Secondly, I do not own enough disk space on this web server. What I can do instead is let you download one dictionary, Jim Breen's KanjiDIC, in all states of processing for demonstration: Here is the original file, here my compiled files. Unpack them and copy the two files starting with "w" to the CF _zaurus directory. Access the kanji dictionary form ZPDVIEW on the Zaurus. The remaining file, kanjidic.txt, ist the intermediate oneline text file.
The task is now to convert your dictionary files to PDIC format. Unfortunately, PDIC is not a simple text format but rather a special dictionary format that includes a search-index. I don't know how to generate PDIC from scratch, so you have no choice but using the proper software. Just download and install it on your PC.
Once you got PDIC running, go to Tools - Jisho no Henkan to import your dictionary files and reformat them properly. PDIC will also let you combine several dictionary files into one.
So, we need to convert dictionary files into a format that can be imported to PDIC. The two input file formats for PDIC that we used are "*.csv", for "comma separated ???" and "one-line text" (*.txt). One-line text allows line breaks and thus better nicer display, while CSV, for some reason, turns out to be smoother to scroll through by ZPDVIEW. I first used CSV, as described below. Later on, I programmed the above-mentioned PERL script which can generate one-line format from a variety of dicitonaries. This page uses CSV format as an example.
CSV format contains one entry word per line. Its definition in the PDIC case is:
"field1","field2","field3",4,5,6,"field7" "English","Japanese","Example of use",x,y,z,"Pronunciation"where x is a number indicating the level of difficulty of the word, while y and z set the "dark" and "practice" flags, respectively, if not set to zero (=default). Fortunately, we do not have to worry about any but the first three entries because the Zaurus will not display them anyway - they are only meaningful for the PC version of PDIC. Important is that the first field contains the keyword which the ZPDIC search engine will search for. Transcript and translation go into field 2 and 3. The Zaurus display won't care much whether you put everythinginto field 2 or use field 3 as well - as long as you are consistent and do it equally for all entries. However, how you use fields 2 and 3 does make a difference when merging several dictionaries.
"きちょう [3]","帰朝","Heimkehr nach Japan.","" "きちょう [4]","記帳","Eintragung; Registrierung.","" "きちょう [5]","貴重","Kostbarkeit; Hochwertigkeit; Unschaetzbarkeit.","" "ぎちょう","議長","Vorsitzender; Praesident; Sprecher (einer Versammlung).","" "きちょうえんぜつ","基調演説","programmatische Erklaerung; Grundsatzrede; Keynote.","" "きちょうする","帰朝する","nach Japan zurueckkehren.","" "きちょうする","記帳する","eintragen; buchen.",""Oneline text is the format of the kanjidic.txt file you downloaded above. For illustration, here a few line of the combined ("the lot") EDICT file, inverted to English-Japanese direction with my script:
cleek (golf) /// クリーク clef(musical) /// 音部記号 - おんぶきごう cleft grafting /// 枝接ぎ - えだつぎ clemency /// 助命 - じょめい clergyman /// 牧師 - ぼくし clerical desk /// 事務机 - じむづくえ clerk /// 店員 - クラーク, てんいん \ 書記 - しょき \ 事務員 - じむいん \ 局員 - きょくいん \ 官吏 - かんり clerk at the information desk /// 案内係 - あんないがかりOnce you convert this file and look at it on the Zaurus, you will see that space-///-space divides the keyword form the explanation, while space-\-space will generate line breaks within the explanation. This short example also shows that my script has bugs: The first kanji word for clerk has only one pronunciation, tenin, while katakana "kuraaku" should really appear in a separate line as an independent translation. You will find many such mistakes, sorry about that - but I can live with them, and I hope you can too.
How do you generate a proper .csv file? Here are a few examples. The rest is up to your resourcefulness. If you know where to find good dictionary files in all languages, please let us know! We would like to collect them here.
With the Wa-Doku running on your PC, get the entire dictionary into the search window (leave the search entry blank and press "suche"), then chose File -> Export. You must specify which data fields you want exported in which order. For German, choose the last field which has umlauts converted to ae, oe, ue which the Zaurus can display. As file type, chose .csv, of course. I exported the dictionary 3 times: kanji-hiragana-German, hiragana-kanji-German, and German-kanji-hiragana. (That's necessary because the Zaurus searches only for the first entry in each line.) The three files were processed as above to PDIC format, renamed WOBP0005.DAT, WOBP0006.DAT and WOBP0007.DAT, each about 7-11MByte in size, were saved on a CF card and are now well in use on my Zaurus. Search speed is faster than I blink my eyes. It often happens that the Zaurus cannot find a German word, though. That is because the dictionary is designed for use in the Japanese-German direction: Many of the German explanations start with such words as "sich ..." or "einen ..." - which will render the actual keyword unfindable for the Zaurus.
My above-mentioned PERL script reduces this conversion problem using a few tricks. Far from perfect, the result is quite usable in my opinion. The script also optimizes the Wadoku dictionary for use on the Zaurus. If you use the script, you only need to export WadokuJT from Filemaker once, namely in the kanji-hiragana-German direction, in tab-separated list format. You will notice that the German-Japanese dictionary generated has all nouns in small letters. This is meant to ease the use on ZPDVIEW as it elimiates the need to enter capital letters.
Test WaDokuJT!
サリ、キシィ、ケ [、オ、キ、キ、皃ケ] /to indicate/to show/to point to/ サリ、ケ [、オ、ケ] /to point/to put up umbrella/to play/ サリ、ホタ・[、讀モ、ホ、ユ、キ] /knuckle/ サリーオ [、キ、「、ト] /finger pressure massage or therapy/You probably notice that the Japanese character display is messed up here: EDICT files are EUC encoded, which is the UNIX way of coding Japanese characters. The Zaurus understands only Shift-JIS encoded characters. At some point (preferrable in the end, once you generated your proper .csv file, before feeding it into PDIC), you need to change the character encoding. There aught to be plenty of conversion utilites around, for example MS Word can do the job.
EDICT lends itself well to manipulations such as dictionary conversion because of its concise and logically consistent format. I was surprised how useful well it works in the English-Japanese direction after conversion with my script. The script has special conversion routines ENAMDICT and KanjiDIC, Jim Breen's Kanji dictionary file, which is actually in quite a different format from the rest of the EDICT family.
Test EDICT!
Webster's Revised Unabridged Dictionary Version published 1913 by the C. & G. Merriam Co. Springfield, Mass. Under the direction of Noah Porter, D.D., LL.D.I recently compiled a Webster for the Zaurus by writing a respective PERL program, but I havn't tested the dictionary enough to tell whether or not it contains major bugs.
DDWIN is an EPWING reader software. First put your dictionary CD into your drive, then start DDWIN. It will
detect the dictionaries on the CD automatically. Click on the proper dictionary, then click on "zenbun" (全文) to search without entry.
Next, click on the Edit menue (編集), then call the editor (エデイタ起動). In the popup window, click the middle radio button (該当項目すべて) and execute to
export the dictionary in raw text format.
Looking at the file, you will realize that there are a lot of special character
codes and commands enclosed in parentheses. You will not get around using a program to fix this: First, replace the important codes by appropriate characters - German umlauts by ae, oe, ue, ss, for example. Then throw out the rest of the garbage and put in the commas and quotation marks to construct a nice .csv filefor PDIC.
Shimotsuki san has done the hack before and explains about it on his web page. He also has produced a script to do the job, but I made my own version included in the above-mentioned script of mine. Shimotsuki uses jperl, while I use perl with the Jcode module. Use whichever you like.
I appreciate your help in improving this page: Please report errors and ambiguities. If you are smart with graphics (I am not!), please capture some important screens and mail them to me as jpeg attachments so I can illustrate the explanations. Good ideas and suggestions: Please!
Plese be considerate with my time and think first, then check the FAQs, then ask your neighbor, and only when you really have a problem, and the problem is not off the subject, ask me.
If you can't manage at all, there is still my ready installation service.
Neither the author nor any members of the ZOK mailing list shall be held liable for damage causally related to infromation on this page or ensuing email correspondence, nor due to program code downloaded from this site. In particular, we shall not be responsible for the integrity of any of the software recommended, nor for suitability for any particular purpose. Use this information at your own risk.
For comments questions about this page, please contact the author, for suggestions of more general interest, mail to the whole group, zok@egroups.com.
We wish you good luck and a great time in Japan!
Front | Technical | Download | FAQ | Examples | Reviews |