How should I approach reverse engineering this text encoding?











up vote
9
down vote

favorite
2












So I'm trying to hack the translation from the PS4 version of a game into the Vita version. The script files were conveniently uncompressed, and I was able to drop them in and have it working without a hitch - great!



However, various other message files and quest summaries and the like are not so convenient.



Here's a comparison of two files in a hex editor:
enter image description here



At first I thought it was a simple byte-pair compression, but there's obviously more to it than that, because if you look at the places which correspond to "Let's go!" and "Let's do this!" they don't start with the same string of characters at all.



I uploaded the PS4/Vita versions of a file with rather more readable text:
https://www.dropbox.com/s/bw0nvexyi9ww2be/hm%20vita?dl=0
https://www.dropbox.com/s/sk74zadvndc8v9t/hm%20ps4?dl=0



Going through it and looking for common and recurring words, I found this:



Goblin Thief
´0Ö0ê0ó0·0ü0Õ0
B430 D630 EA30 F330 B730 FC30 D530

Goblin Thief Archer
´0Ö0ê0ó0·0ü0Õ0¢0ü0Á0ã0ü0
B430 D630 EA30 F330 B730 FC30 D530 A230 FC30 C130 E330 FC30

Ancient Grief
¨0ó0·0§0ó0È0°0ê0ü0Õ0
A830 F330 B730 A730 F330 C830 B030 EA30 FC30 D530

Grief Screamer
°0ê0ü0Õ0¹0¯0ê0ü0Þ0ü0
B030 EA30 FC30 D530 B930 AF30 EA30 FC30 DE30 FC30

Grief
EA30 FC30 D530

Thief
B730 FC30 D530


So you can see that FC30 D530 is "ief".
But then I look for more occurences of "gr"



Deep Grudge
Ç0£0ü0×0°0é0Ã0¸0
C730 A330 FC30 D730 B030 E930 C330 B830


And you don't see the EA30 that starts off "Grief".



I have a feeling FC30 could be some kind of switch byte, either an upper case indication or possibly marking the use of some kind of lookup table? It's also interesting that all the lines which are just objectives/boss names have the --30 structure, but some of the descriptive passages don't seem to.



The additional problem, of course, is that the uncompressed English text from the PS4 version won't be a 100% perfect match for the text from the Vita version -- that's the whole point, after all! So even when I look at the name "Deep Grudge" and notice that I don't see anything which looks like it would correspond to the "Gr" from "Grief", I can't be certain that they didn't change the name in the PS4 version.



Does anyone have any suggestions on how I should be approaching this? If I'm right and there's actually some kind of compressed lookup table business going on, then it might be effectively impossible to reverse, right?










share|improve this question







New contributor




Celandine Crane is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • I can't help but notice Grief screamer has an extra B030 at the start that isn't present in the Griefline
    – corsiKa
    12 hours ago















up vote
9
down vote

favorite
2












So I'm trying to hack the translation from the PS4 version of a game into the Vita version. The script files were conveniently uncompressed, and I was able to drop them in and have it working without a hitch - great!



However, various other message files and quest summaries and the like are not so convenient.



Here's a comparison of two files in a hex editor:
enter image description here



At first I thought it was a simple byte-pair compression, but there's obviously more to it than that, because if you look at the places which correspond to "Let's go!" and "Let's do this!" they don't start with the same string of characters at all.



I uploaded the PS4/Vita versions of a file with rather more readable text:
https://www.dropbox.com/s/bw0nvexyi9ww2be/hm%20vita?dl=0
https://www.dropbox.com/s/sk74zadvndc8v9t/hm%20ps4?dl=0



Going through it and looking for common and recurring words, I found this:



Goblin Thief
´0Ö0ê0ó0·0ü0Õ0
B430 D630 EA30 F330 B730 FC30 D530

Goblin Thief Archer
´0Ö0ê0ó0·0ü0Õ0¢0ü0Á0ã0ü0
B430 D630 EA30 F330 B730 FC30 D530 A230 FC30 C130 E330 FC30

Ancient Grief
¨0ó0·0§0ó0È0°0ê0ü0Õ0
A830 F330 B730 A730 F330 C830 B030 EA30 FC30 D530

Grief Screamer
°0ê0ü0Õ0¹0¯0ê0ü0Þ0ü0
B030 EA30 FC30 D530 B930 AF30 EA30 FC30 DE30 FC30

Grief
EA30 FC30 D530

Thief
B730 FC30 D530


So you can see that FC30 D530 is "ief".
But then I look for more occurences of "gr"



Deep Grudge
Ç0£0ü0×0°0é0Ã0¸0
C730 A330 FC30 D730 B030 E930 C330 B830


And you don't see the EA30 that starts off "Grief".



I have a feeling FC30 could be some kind of switch byte, either an upper case indication or possibly marking the use of some kind of lookup table? It's also interesting that all the lines which are just objectives/boss names have the --30 structure, but some of the descriptive passages don't seem to.



The additional problem, of course, is that the uncompressed English text from the PS4 version won't be a 100% perfect match for the text from the Vita version -- that's the whole point, after all! So even when I look at the name "Deep Grudge" and notice that I don't see anything which looks like it would correspond to the "Gr" from "Grief", I can't be certain that they didn't change the name in the PS4 version.



Does anyone have any suggestions on how I should be approaching this? If I'm right and there's actually some kind of compressed lookup table business going on, then it might be effectively impossible to reverse, right?










share|improve this question







New contributor




Celandine Crane is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • I can't help but notice Grief screamer has an extra B030 at the start that isn't present in the Griefline
    – corsiKa
    12 hours ago













up vote
9
down vote

favorite
2









up vote
9
down vote

favorite
2






2





So I'm trying to hack the translation from the PS4 version of a game into the Vita version. The script files were conveniently uncompressed, and I was able to drop them in and have it working without a hitch - great!



However, various other message files and quest summaries and the like are not so convenient.



Here's a comparison of two files in a hex editor:
enter image description here



At first I thought it was a simple byte-pair compression, but there's obviously more to it than that, because if you look at the places which correspond to "Let's go!" and "Let's do this!" they don't start with the same string of characters at all.



I uploaded the PS4/Vita versions of a file with rather more readable text:
https://www.dropbox.com/s/bw0nvexyi9ww2be/hm%20vita?dl=0
https://www.dropbox.com/s/sk74zadvndc8v9t/hm%20ps4?dl=0



Going through it and looking for common and recurring words, I found this:



Goblin Thief
´0Ö0ê0ó0·0ü0Õ0
B430 D630 EA30 F330 B730 FC30 D530

Goblin Thief Archer
´0Ö0ê0ó0·0ü0Õ0¢0ü0Á0ã0ü0
B430 D630 EA30 F330 B730 FC30 D530 A230 FC30 C130 E330 FC30

Ancient Grief
¨0ó0·0§0ó0È0°0ê0ü0Õ0
A830 F330 B730 A730 F330 C830 B030 EA30 FC30 D530

Grief Screamer
°0ê0ü0Õ0¹0¯0ê0ü0Þ0ü0
B030 EA30 FC30 D530 B930 AF30 EA30 FC30 DE30 FC30

Grief
EA30 FC30 D530

Thief
B730 FC30 D530


So you can see that FC30 D530 is "ief".
But then I look for more occurences of "gr"



Deep Grudge
Ç0£0ü0×0°0é0Ã0¸0
C730 A330 FC30 D730 B030 E930 C330 B830


And you don't see the EA30 that starts off "Grief".



I have a feeling FC30 could be some kind of switch byte, either an upper case indication or possibly marking the use of some kind of lookup table? It's also interesting that all the lines which are just objectives/boss names have the --30 structure, but some of the descriptive passages don't seem to.



The additional problem, of course, is that the uncompressed English text from the PS4 version won't be a 100% perfect match for the text from the Vita version -- that's the whole point, after all! So even when I look at the name "Deep Grudge" and notice that I don't see anything which looks like it would correspond to the "Gr" from "Grief", I can't be certain that they didn't change the name in the PS4 version.



Does anyone have any suggestions on how I should be approaching this? If I'm right and there's actually some kind of compressed lookup table business going on, then it might be effectively impossible to reverse, right?










share|improve this question







New contributor




Celandine Crane is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











So I'm trying to hack the translation from the PS4 version of a game into the Vita version. The script files were conveniently uncompressed, and I was able to drop them in and have it working without a hitch - great!



However, various other message files and quest summaries and the like are not so convenient.



Here's a comparison of two files in a hex editor:
enter image description here



At first I thought it was a simple byte-pair compression, but there's obviously more to it than that, because if you look at the places which correspond to "Let's go!" and "Let's do this!" they don't start with the same string of characters at all.



I uploaded the PS4/Vita versions of a file with rather more readable text:
https://www.dropbox.com/s/bw0nvexyi9ww2be/hm%20vita?dl=0
https://www.dropbox.com/s/sk74zadvndc8v9t/hm%20ps4?dl=0



Going through it and looking for common and recurring words, I found this:



Goblin Thief
´0Ö0ê0ó0·0ü0Õ0
B430 D630 EA30 F330 B730 FC30 D530

Goblin Thief Archer
´0Ö0ê0ó0·0ü0Õ0¢0ü0Á0ã0ü0
B430 D630 EA30 F330 B730 FC30 D530 A230 FC30 C130 E330 FC30

Ancient Grief
¨0ó0·0§0ó0È0°0ê0ü0Õ0
A830 F330 B730 A730 F330 C830 B030 EA30 FC30 D530

Grief Screamer
°0ê0ü0Õ0¹0¯0ê0ü0Þ0ü0
B030 EA30 FC30 D530 B930 AF30 EA30 FC30 DE30 FC30

Grief
EA30 FC30 D530

Thief
B730 FC30 D530


So you can see that FC30 D530 is "ief".
But then I look for more occurences of "gr"



Deep Grudge
Ç0£0ü0×0°0é0Ã0¸0
C730 A330 FC30 D730 B030 E930 C330 B830


And you don't see the EA30 that starts off "Grief".



I have a feeling FC30 could be some kind of switch byte, either an upper case indication or possibly marking the use of some kind of lookup table? It's also interesting that all the lines which are just objectives/boss names have the --30 structure, but some of the descriptive passages don't seem to.



The additional problem, of course, is that the uncompressed English text from the PS4 version won't be a 100% perfect match for the text from the Vita version -- that's the whole point, after all! So even when I look at the name "Deep Grudge" and notice that I don't see anything which looks like it would correspond to the "Gr" from "Grief", I can't be certain that they didn't change the name in the PS4 version.



Does anyone have any suggestions on how I should be approaching this? If I'm right and there's actually some kind of compressed lookup table business going on, then it might be effectively impossible to reverse, right?







encodings






share|improve this question







New contributor




Celandine Crane is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question







New contributor




Celandine Crane is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question






New contributor




Celandine Crane is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 16 hours ago









Celandine Crane

483




483




New contributor




Celandine Crane is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Celandine Crane is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Celandine Crane is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • I can't help but notice Grief screamer has an extra B030 at the start that isn't present in the Griefline
    – corsiKa
    12 hours ago


















  • I can't help but notice Grief screamer has an extra B030 at the start that isn't present in the Griefline
    – corsiKa
    12 hours ago
















I can't help but notice Grief screamer has an extra B030 at the start that isn't present in the Griefline
– corsiKa
12 hours ago




I can't help but notice Grief screamer has an extra B030 at the start that isn't present in the Griefline
– corsiKa
12 hours ago










1 Answer
1






active

oldest

votes

















up vote
18
down vote



accepted










The way you can learn this encoding is to study Japanese. :) From the first line of your text file diff, we can see that this on the left:



1131 3210 3130 3330 1021 0000 0000 0000  .12.1030.!......


Translates to this on the right:



1100 3100 3200 1000 3100 3000 3300 3000  ..1.2....1.0.3.0.
1000 01ff 0000 0000 0000 0000 0000 0000 .................


This is a very strong hint that the file on the left is using an 8-bit encoding and that the one on the right is using a 16-bit encoding. The digits are translated in a very straightforward way, but the exclamation point ("!") is 0x21 in ASCII or UTF-8 but 0x01ff is the Unicode "fullwidth exclamation mark" encoded as UCS-2 (that is, UTF-16 encoded as little-endian).



So with the hint about encoding we can see that the hex you've identified as "Goblin Thief"



b430 d630 ea30 f330 b730 fc30 d530


is rendered as ゴブリンシーフ which is the Katakana representation of the English words "Goblin Thief". It's common for Japanese speakers to render foreign words into phonetic equivalents with Katakana. So taking it syllable-by-syllable we have:



ゴ  go
ブ bu
リ ri
ン n
シ shi
ー (extended vowel)
フ fu


So if we say it aloud, "go bu ri n shi-i fu" sounds like "Goblin Thief" as pronounced by a native Japanese speaker. You can try Google translate to experiment a bit more and to see and hear what this sounds like.






share|improve this answer





















  • Can you switch around your hex words to the correct endianness 30b4 30d6 ..? That way they form correct Unicode values for Japanese.
    – usr2564301
    14 hours ago










  • As noted in the answer, these are Unicode values that just happen to be encoded as UCS-2, also known as UTF-16LE
    – Edward
    13 hours ago






  • 1




    Wait, this is the bloody Japanese file? That's odd. I need to dig through the package and see if I can track down the actual English one then... I know for a fact that some of the files I extracted are the English ones because they have actual plaintext in them, which is why substituting them worked. Plus I'm 99% certain the package I have is from the SCEA version of the game. Hmmm. Thanks ever so much, that explains a lot that was driving me nuts!
    – Celandine Crane
    13 hours ago








  • 2




    Follow up note for anyone who was curious: the English text is actually stored in a completely separate file that's unique to the PS4 version of the game and is presumably pulled from to override the Japanese strings which exist in the base files. God only knows if there's any way to convince the Vita build to read from there.
    – Celandine Crane
    8 hours ago











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "489"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});






Celandine Crane is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2freverseengineering.stackexchange.com%2fquestions%2f20109%2fhow-should-i-approach-reverse-engineering-this-text-encoding%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
18
down vote



accepted










The way you can learn this encoding is to study Japanese. :) From the first line of your text file diff, we can see that this on the left:



1131 3210 3130 3330 1021 0000 0000 0000  .12.1030.!......


Translates to this on the right:



1100 3100 3200 1000 3100 3000 3300 3000  ..1.2....1.0.3.0.
1000 01ff 0000 0000 0000 0000 0000 0000 .................


This is a very strong hint that the file on the left is using an 8-bit encoding and that the one on the right is using a 16-bit encoding. The digits are translated in a very straightforward way, but the exclamation point ("!") is 0x21 in ASCII or UTF-8 but 0x01ff is the Unicode "fullwidth exclamation mark" encoded as UCS-2 (that is, UTF-16 encoded as little-endian).



So with the hint about encoding we can see that the hex you've identified as "Goblin Thief"



b430 d630 ea30 f330 b730 fc30 d530


is rendered as ゴブリンシーフ which is the Katakana representation of the English words "Goblin Thief". It's common for Japanese speakers to render foreign words into phonetic equivalents with Katakana. So taking it syllable-by-syllable we have:



ゴ  go
ブ bu
リ ri
ン n
シ shi
ー (extended vowel)
フ fu


So if we say it aloud, "go bu ri n shi-i fu" sounds like "Goblin Thief" as pronounced by a native Japanese speaker. You can try Google translate to experiment a bit more and to see and hear what this sounds like.






share|improve this answer





















  • Can you switch around your hex words to the correct endianness 30b4 30d6 ..? That way they form correct Unicode values for Japanese.
    – usr2564301
    14 hours ago










  • As noted in the answer, these are Unicode values that just happen to be encoded as UCS-2, also known as UTF-16LE
    – Edward
    13 hours ago






  • 1




    Wait, this is the bloody Japanese file? That's odd. I need to dig through the package and see if I can track down the actual English one then... I know for a fact that some of the files I extracted are the English ones because they have actual plaintext in them, which is why substituting them worked. Plus I'm 99% certain the package I have is from the SCEA version of the game. Hmmm. Thanks ever so much, that explains a lot that was driving me nuts!
    – Celandine Crane
    13 hours ago








  • 2




    Follow up note for anyone who was curious: the English text is actually stored in a completely separate file that's unique to the PS4 version of the game and is presumably pulled from to override the Japanese strings which exist in the base files. God only knows if there's any way to convince the Vita build to read from there.
    – Celandine Crane
    8 hours ago















up vote
18
down vote



accepted










The way you can learn this encoding is to study Japanese. :) From the first line of your text file diff, we can see that this on the left:



1131 3210 3130 3330 1021 0000 0000 0000  .12.1030.!......


Translates to this on the right:



1100 3100 3200 1000 3100 3000 3300 3000  ..1.2....1.0.3.0.
1000 01ff 0000 0000 0000 0000 0000 0000 .................


This is a very strong hint that the file on the left is using an 8-bit encoding and that the one on the right is using a 16-bit encoding. The digits are translated in a very straightforward way, but the exclamation point ("!") is 0x21 in ASCII or UTF-8 but 0x01ff is the Unicode "fullwidth exclamation mark" encoded as UCS-2 (that is, UTF-16 encoded as little-endian).



So with the hint about encoding we can see that the hex you've identified as "Goblin Thief"



b430 d630 ea30 f330 b730 fc30 d530


is rendered as ゴブリンシーフ which is the Katakana representation of the English words "Goblin Thief". It's common for Japanese speakers to render foreign words into phonetic equivalents with Katakana. So taking it syllable-by-syllable we have:



ゴ  go
ブ bu
リ ri
ン n
シ shi
ー (extended vowel)
フ fu


So if we say it aloud, "go bu ri n shi-i fu" sounds like "Goblin Thief" as pronounced by a native Japanese speaker. You can try Google translate to experiment a bit more and to see and hear what this sounds like.






share|improve this answer





















  • Can you switch around your hex words to the correct endianness 30b4 30d6 ..? That way they form correct Unicode values for Japanese.
    – usr2564301
    14 hours ago










  • As noted in the answer, these are Unicode values that just happen to be encoded as UCS-2, also known as UTF-16LE
    – Edward
    13 hours ago






  • 1




    Wait, this is the bloody Japanese file? That's odd. I need to dig through the package and see if I can track down the actual English one then... I know for a fact that some of the files I extracted are the English ones because they have actual plaintext in them, which is why substituting them worked. Plus I'm 99% certain the package I have is from the SCEA version of the game. Hmmm. Thanks ever so much, that explains a lot that was driving me nuts!
    – Celandine Crane
    13 hours ago








  • 2




    Follow up note for anyone who was curious: the English text is actually stored in a completely separate file that's unique to the PS4 version of the game and is presumably pulled from to override the Japanese strings which exist in the base files. God only knows if there's any way to convince the Vita build to read from there.
    – Celandine Crane
    8 hours ago













up vote
18
down vote



accepted







up vote
18
down vote



accepted






The way you can learn this encoding is to study Japanese. :) From the first line of your text file diff, we can see that this on the left:



1131 3210 3130 3330 1021 0000 0000 0000  .12.1030.!......


Translates to this on the right:



1100 3100 3200 1000 3100 3000 3300 3000  ..1.2....1.0.3.0.
1000 01ff 0000 0000 0000 0000 0000 0000 .................


This is a very strong hint that the file on the left is using an 8-bit encoding and that the one on the right is using a 16-bit encoding. The digits are translated in a very straightforward way, but the exclamation point ("!") is 0x21 in ASCII or UTF-8 but 0x01ff is the Unicode "fullwidth exclamation mark" encoded as UCS-2 (that is, UTF-16 encoded as little-endian).



So with the hint about encoding we can see that the hex you've identified as "Goblin Thief"



b430 d630 ea30 f330 b730 fc30 d530


is rendered as ゴブリンシーフ which is the Katakana representation of the English words "Goblin Thief". It's common for Japanese speakers to render foreign words into phonetic equivalents with Katakana. So taking it syllable-by-syllable we have:



ゴ  go
ブ bu
リ ri
ン n
シ shi
ー (extended vowel)
フ fu


So if we say it aloud, "go bu ri n shi-i fu" sounds like "Goblin Thief" as pronounced by a native Japanese speaker. You can try Google translate to experiment a bit more and to see and hear what this sounds like.






share|improve this answer












The way you can learn this encoding is to study Japanese. :) From the first line of your text file diff, we can see that this on the left:



1131 3210 3130 3330 1021 0000 0000 0000  .12.1030.!......


Translates to this on the right:



1100 3100 3200 1000 3100 3000 3300 3000  ..1.2....1.0.3.0.
1000 01ff 0000 0000 0000 0000 0000 0000 .................


This is a very strong hint that the file on the left is using an 8-bit encoding and that the one on the right is using a 16-bit encoding. The digits are translated in a very straightforward way, but the exclamation point ("!") is 0x21 in ASCII or UTF-8 but 0x01ff is the Unicode "fullwidth exclamation mark" encoded as UCS-2 (that is, UTF-16 encoded as little-endian).



So with the hint about encoding we can see that the hex you've identified as "Goblin Thief"



b430 d630 ea30 f330 b730 fc30 d530


is rendered as ゴブリンシーフ which is the Katakana representation of the English words "Goblin Thief". It's common for Japanese speakers to render foreign words into phonetic equivalents with Katakana. So taking it syllable-by-syllable we have:



ゴ  go
ブ bu
リ ri
ン n
シ shi
ー (extended vowel)
フ fu


So if we say it aloud, "go bu ri n shi-i fu" sounds like "Goblin Thief" as pronounced by a native Japanese speaker. You can try Google translate to experiment a bit more and to see and hear what this sounds like.







share|improve this answer












share|improve this answer



share|improve this answer










answered 14 hours ago









Edward

1,928921




1,928921












  • Can you switch around your hex words to the correct endianness 30b4 30d6 ..? That way they form correct Unicode values for Japanese.
    – usr2564301
    14 hours ago










  • As noted in the answer, these are Unicode values that just happen to be encoded as UCS-2, also known as UTF-16LE
    – Edward
    13 hours ago






  • 1




    Wait, this is the bloody Japanese file? That's odd. I need to dig through the package and see if I can track down the actual English one then... I know for a fact that some of the files I extracted are the English ones because they have actual plaintext in them, which is why substituting them worked. Plus I'm 99% certain the package I have is from the SCEA version of the game. Hmmm. Thanks ever so much, that explains a lot that was driving me nuts!
    – Celandine Crane
    13 hours ago








  • 2




    Follow up note for anyone who was curious: the English text is actually stored in a completely separate file that's unique to the PS4 version of the game and is presumably pulled from to override the Japanese strings which exist in the base files. God only knows if there's any way to convince the Vita build to read from there.
    – Celandine Crane
    8 hours ago


















  • Can you switch around your hex words to the correct endianness 30b4 30d6 ..? That way they form correct Unicode values for Japanese.
    – usr2564301
    14 hours ago










  • As noted in the answer, these are Unicode values that just happen to be encoded as UCS-2, also known as UTF-16LE
    – Edward
    13 hours ago






  • 1




    Wait, this is the bloody Japanese file? That's odd. I need to dig through the package and see if I can track down the actual English one then... I know for a fact that some of the files I extracted are the English ones because they have actual plaintext in them, which is why substituting them worked. Plus I'm 99% certain the package I have is from the SCEA version of the game. Hmmm. Thanks ever so much, that explains a lot that was driving me nuts!
    – Celandine Crane
    13 hours ago








  • 2




    Follow up note for anyone who was curious: the English text is actually stored in a completely separate file that's unique to the PS4 version of the game and is presumably pulled from to override the Japanese strings which exist in the base files. God only knows if there's any way to convince the Vita build to read from there.
    – Celandine Crane
    8 hours ago
















Can you switch around your hex words to the correct endianness 30b4 30d6 ..? That way they form correct Unicode values for Japanese.
– usr2564301
14 hours ago




Can you switch around your hex words to the correct endianness 30b4 30d6 ..? That way they form correct Unicode values for Japanese.
– usr2564301
14 hours ago












As noted in the answer, these are Unicode values that just happen to be encoded as UCS-2, also known as UTF-16LE
– Edward
13 hours ago




As noted in the answer, these are Unicode values that just happen to be encoded as UCS-2, also known as UTF-16LE
– Edward
13 hours ago




1




1




Wait, this is the bloody Japanese file? That's odd. I need to dig through the package and see if I can track down the actual English one then... I know for a fact that some of the files I extracted are the English ones because they have actual plaintext in them, which is why substituting them worked. Plus I'm 99% certain the package I have is from the SCEA version of the game. Hmmm. Thanks ever so much, that explains a lot that was driving me nuts!
– Celandine Crane
13 hours ago






Wait, this is the bloody Japanese file? That's odd. I need to dig through the package and see if I can track down the actual English one then... I know for a fact that some of the files I extracted are the English ones because they have actual plaintext in them, which is why substituting them worked. Plus I'm 99% certain the package I have is from the SCEA version of the game. Hmmm. Thanks ever so much, that explains a lot that was driving me nuts!
– Celandine Crane
13 hours ago






2




2




Follow up note for anyone who was curious: the English text is actually stored in a completely separate file that's unique to the PS4 version of the game and is presumably pulled from to override the Japanese strings which exist in the base files. God only knows if there's any way to convince the Vita build to read from there.
– Celandine Crane
8 hours ago




Follow up note for anyone who was curious: the English text is actually stored in a completely separate file that's unique to the PS4 version of the game and is presumably pulled from to override the Japanese strings which exist in the base files. God only knows if there's any way to convince the Vita build to read from there.
– Celandine Crane
8 hours ago










Celandine Crane is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















Celandine Crane is a new contributor. Be nice, and check out our Code of Conduct.













Celandine Crane is a new contributor. Be nice, and check out our Code of Conduct.












Celandine Crane is a new contributor. Be nice, and check out our Code of Conduct.
















Thanks for contributing an answer to Reverse Engineering Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2freverseengineering.stackexchange.com%2fquestions%2f20109%2fhow-should-i-approach-reverse-engineering-this-text-encoding%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Ellipse (mathématiques)

Quarter-circle Tiles

Mont Emei