Tuesday, March 17, 2020

An Explanation of Unicode Character Encoding

An Explanation of Unicode Character Encoding For a computer to be able to store text and numbers that humans can understand, there needs to be a code that transforms characters into numbers. The Unicode standard defines such a code by using character encoding. The reason character encoding is so important is so that every device can display the same information. A custom character encoding scheme might work brilliantly on one computer, but problems will occur when if you send that same text to someone else. It wont know what youre talking about unless it understands the encoding scheme too. Character Encoding All character encoding does is assign a number to every character that can be used. You  could make a character encoding right now. For example, I could say that the letter A becomes the number 13, a14, 133, #123, and so on. This is where industry-wide standards come in. If the whole computer industry uses the same character encoding scheme, every computer can display the same characters. What Is Unicode? ASCII (American Standard Code for Information Interchange) became the first widespread encoding scheme. However, its limited to only 128 character definitions. This is fine for the most common English characters, numbers, and punctuation, but is a bit limiting for the rest of the world. Naturally, the rest of the world wants the same encoding scheme for their characters too. However, for a little, while depending on where you were, there might have been a different character displayed for the same ASCII code. In the end, the other parts of the world began creating their own encoding schemes, and things started to get a little bit confusing. Not only were the coding schemes of different lengths, programs needed to figure out which encoding scheme they were supposed to use. It became apparent that a new character encoding scheme was needed, which is when the Unicode standard was created. The objective of Unicode is to unify all the different encoding schemes so that the confusion between computers can be limited as much as possible. These days, the Unicode standard defines values for over 128,000 characters and can be seen at the Unicode Consortium. It has several character encoding forms: UTF-8: Only uses one byte (8 bits) to encode English characters. It can use a sequence of bytes to encode other characters. UTF-8 is widely used in email systems and on the internet.UTF-16: Uses two bytes (16 bits) to encode the most commonly used characters. If needed, the additional characters can be represented by a pair of 16-bit numbers.UTF-32: Uses four bytes (32 bits) to encode the characters. It became apparent that as the Unicode standard grew, a 16-bit number is too small to represent all the characters. UTF-32 is capable of representing every Unicode character as one number. Note: UTF means Unicode Transformation Unit. Code Points A code point is the value that a character is given in the Unicode standard. The values according to Unicode are written as hexadecimal numbers and have a prefix of U. For example, to encode the characters we looked at earlier: A is U0041a is U00611 is U0031# is U0023 These code points are split into 17 different sections called planes, identified by numbers 0 through 16. Each plane holds 65,536 code points. The first plane, 0, holds the most commonly used characters and is known as the Basic Multilingual Plane (BMP). Code Units The encoding schemes are made up of code units, which are used to provide an index for where a character is positioned on a plane. Consider UTF-16 as an example. Each 16-bit number is a code unit. The code units can be transformed into code points. For instance, the flat note symbol â™ ­ has a code point of U1D160 and lives on the second plane of the Unicode standard (Supplementary Ideographic Plane). It would be encoded using the combination of the 16-bit code units UD834 and UDD60. For the BMP, the values of the code points and code units are identical. This allows a shortcut for UTF-16 that saves a lot of storage space. It only needs to use one 16-bit number to represent those characters. How Does Java Use Unicode? Java was created around the time when the Unicode standard had values defined for a much smaller set of characters. Back then, it was felt that 16-bits would be more than enough to encode all the characters that would ever be needed. With that in mind, Java was designed to use UTF-16. The char data type was originally used to represent a 16-bit Unicode code point. Since Java SE v5.0, the char represents a code unit. It makes little difference for representing characters that are in the Basic Multilingual Plane because the value of the code unit is the same as the code point. However, it does mean that for the characters on the other planes, two chars are needed. The important thing to remember is that a single char data type can no longer represent all the Unicode characters.

Sunday, March 1, 2020

The Meaning of -N Desu in Japanese

The Meaning of '-N Desu' in Japanese The phrase –n desu (ã‚“ 㠁 §Ã£ â„¢), meaning it is, is sometimes used at the end of a sentence. It is also commonly used in conversation, though it might be difficult for beginners to learn. The phrase has an explanatory or confirmatory function. The difference between –masu (ã€Å"㠁 ¾Ã£ â„¢), another nominal ending for a verb, and  Ã¢â‚¬â€œn desu is very subtle. This makes it very hard to translate. The nominal ending  Ã¢â‚¬â€œn desu can be translated as it is the case that or it is for the reason that. However, there is no true English equivalent. –N Desu Versus –Masu One of the best ways to understand the subtle, nuanced meaning of –n desu is to compare it to  Ã¢â‚¬â€œmasu  by viewing how two sentences use these endings differently: Ryokou ni iku n desu ka? (りょ㠁“㠁† 㠁 « 㠁„㠁  ã‚“ 㠁 §Ã£ â„¢ 㠁‹ã€‚) Are you going to travel? Ryokou ni ikimasu ka? (  Ã£â€šÅ Ã£â€šâ€¡Ã£ â€œÃ£ â€  㠁 « 㠁„㠁 Ã£  ¾Ã£ â„¢ 㠁‹ã€‚) Are you going on a trip? In the first sentence, which uses –n desu, the  speaker assumes that the listener is going on a trip and just wants her to confirm it. In the second sentence, which uses –masu,  the speaker simply wants to know if the listener is going on a trip or not. Formal Versus Informal You also need to use a different form of  Ã¢â‚¬â€œn desu when it  is attached directly to a plain form of the verb in an informal situation. When the circumstances are informal, use –n da  instead of –n desu, as demonstrated in the table. The sentences are written first in hiragana, which is a phonetic syllabary  (or transliteration) made from simplified  kanji  characters. These sentences are then spelled using Japanese characters. An English translation follows on the right side of the table. Ashita doubutsuen ni ikimasu.明æâ€" ¥Ã¥â€¹â€¢Ã§â€° ©Ã¥Å"’㠁 «Ã¨ ¡Å'㠁 Ã£  ¾Ã£ â„¢Ã£â‚¬â€š(formal) I am going to the zoo tomorrow.(simple statement) Ashita doubutsuen ni iku.明æâ€" ¥Ã¥â€¹â€¢Ã§â€° ©Ã¥Å"’㠁 «Ã¨ ¡Å'㠁 Ã£â‚¬â€š(informal) Ashita doubutsuen ni iku n desu.明æâ€" ¥Ã¥â€¹â€¢Ã§â€° ©Ã¥Å"’㠁 «Ã¨ ¡Å'㠁 Ã£â€šâ€œÃ£  §Ã£ â„¢Ã£â‚¬â€š(formal) I am going to the zoo tomorrow.(explaining his or her plans for tomorrow.) Ashita doubutsuen ni iku n da.明æâ€" ¥Ã¥â€¹â€¢Ã§â€° ©Ã¥Å"’㠁 «Ã¨ ¡Å'㠁 Ã£â€šâ€œÃ£   Ã£â‚¬â€š(informal) Note how in Japanese, social context is very important. In English, the social situation, or position of the person you are addressing, would make little or no difference. You would tell a good friend at school or a visiting dignitary at a formal state dinner that you are going to the zoo using the same words. Yet, in a formal situation in Japan, you would use –n desu, but you would use  Ã¢â‚¬â€œn da if the circumstance were less formal. In the case of the first two sentences above, you would use –masu  in a formal situation but omit the ending altogether if the setting or circumstances were informal. Why Questions In Japanese, why questions are often completed with –n desu because they are asking for a reason or an explanation, as the table demonstrates: Doushite byouin ni iku n desu ka.Haha ga byouki nan desu.㠁 ©Ã£ â€ Ã£ â€"㠁 ¦Ã§â€"…é™ ¢Ã£  «Ã£  Ã£â€šâ€œÃ£  §Ã£ â„¢Ã£ â€¹Ã£â‚¬â€šÃ¦ ¯ Ã£ Å'çâ€"…æ °â€"㠁 ªÃ£â€šâ€œÃ£  §Ã£ â„¢Ã£â‚¬â€š Why are you going to the hospital?Because my mother is sick. Doushite tabenai n desu ka.Onaka ga suiteinai n desu.㠁 ©Ã£ â€ Ã£ â€"㠁 ¦Ã© £Å¸Ã£  ¹Ã£  ªÃ£ â€žÃ£â€šâ€œÃ£  §Ã£ â„¢Ã£ â€¹Ã£â‚¬â€šÃ£ Å Ã£  ªÃ£ â€¹Ã£ Å'㠁™ã â€žÃ£  ¦Ã£  ªÃ£ â€žÃ£â€šâ€œÃ£  §Ã£ â„¢Ã£â‚¬â€š Why don't you eat?Because I am not hungry.