Da5h
User
|
Hi there Joe,
I ran into a problem with your script, while trying to translate japanese site to English.
Your script seem not to replace strings fully, if those strings contain in them another string from the "search and replace" list.
Example:
'メッセージ': 'Messages',
'メッセージはありません': 'No messages',
'メッセージを送る': 'Send message',
But after i run it on a website i see:
Messages
Messagesはありません
Messagesジを送る
In case you cant see japanese text above, here is another example of the behavior:
'DOG': 'CAT',
'DOGX1': 'CATY1',
'DOGX2': 'CATY2',
After using your script:
CAT
CATX1
CATX2
any help?
|
|
|
JoeSimmons
Script's Author
|
Probably due to how the Japanese text is, but I don't really know much about it.
|
|
|
Da5h
User
|
No, It's not unicode related.
Input those into your script and try it on my previous post:
'CAT': 'DOG',
'CATX1': 'DOGY1',
'CATX2': 'DOGY2',
This English text:
CAT
CATX1
CATX2
Turns into this:
DOG
DOGX1
DOGX2
But it should be:
DOG
DOGY1
DOGY2
|
|
|
Da5h
User
|
Are there any plans on fixing that issue?
p.s.
I ran across that post in this forum,
http://userscripts.org/topics/34144
The Chinese guy seem to have stumbled on that problem as well (since most Chinese/Japanese words are build from other words) and he seem to say that an earlier version of the script didn't have this problem, so maybe it will help.
p.s.s.
the current script with
'ape': 'MONKEY',
'escape': 'RUN AWAY',
replaces "escapement" with "escMONKEYment".
and
'escape': 'RUN AWAY',
'ape': 'MONKEY',
replaces "escapement" with "RUN AWAYment".
|
|
|
kimatg
User
|
it's an issue of the ordering.
since the script applies the replacements in order of the strings..
so for this:
'メッセージ': 'Messages',
'メッセージはありません': 'No messages',
'メッセージを送る': 'Send message',
it would be first replacing メッセージ to Messages -> Messagesはありません
and because there isn't any rule defining replacement of a string with text 'Messagesはありません', the string is left unchanged.
fix: always place the shortest word on the bottom, like this:
'メッセージはありません': 'No messages',
'メッセージを送る': 'Send message',
'メッセージ': 'Messages',
in this case the script will search first for 'メッセージはありません' and replace with "No messages"
then search for single words 'メッセージ' and replace with 'messages'.
...hope you get what I mean :)
|
|
|
JoeSimmons
Script's Author
|
Ok try the new version out. It no longer changes parts of words, just the full word/sentence.
|
|
|
kimatg
User
|
um, sorry but exactly how does the new version work now?
after I updated the code, the replacement rules I had defined before including symbols such as 'Users:' or '(10) Masterpiece' don't get changed any more.
imo I don't think that was necessarily a part that had to be changed... rather would have been better if we had an option to set specific areas of the page that won't get the replacement applied to. :|
|
|
|
Da5h
User
|
@kimatg
The fix was necessary, just look at my post above your first one.
Simpy ordering strings from longest to shortest wouldn't fix that issue.
@JoeSimmons
Thank you. I'm gonna test this now & report back here.
Edit:
Hmmm, new script works now fine with plane text [A-Za-z0-9], but it completely stopped working for unicode or some "special" characters (such as &, (, : and etc)...
'&': 'AND',
':)': 'HAPPY SMILEY',
':|': 'NEUTRAL SMILEY',
Doesn't do anything now.
'メッセージ': 'TEST',
unicode replacement doesn't work now as well.
'plaintextメッセージ': 'TEST',
will be replaced with TESTメッセージ, so it looks like all unicode/Special character input simply gets ignored now.
|
|
|
JoeSimmons
Script's Author
|
Da5h wrote: Hmmm, new script works now fine with plane text [A-Za-z0-9], but it completely stopped working for unicode or some "special" characters (such as &, (, : and etc)...
Because I put a word boundary on either side of the regex. I'm trying to come up with a fix but it's harder than it looks.
|
|
|
pgr
User
|
i changed the script a bit using an array of replacement specifying objects instead of the word association table, like this:
{what:/(swr|sw3)/gi,by:"$1(00)"},
{what:/(3sat)/gi ,by:"$1(06)"},
it is shorter and should be faster this way too.
|