$$normalize
Replace special characters forms with their simple form equivalent (removing marks by default)
- Allows post-processing over Java's normalizer algorithm result
Post Operations
ROBUST
- Try to return the most of similar letters to latin, replaced to their latin equivalent, including:- Removing combining diacritical marks (works with NFD/NFKD which leaves the characters decomposed)
- Stroked (and others which are not composed) (i.e. "ĐŁłŒ" -> "DLlOE")
- Replacing (with space) and trimming white-spaces
Usage
{
"$$normalize": /* String to normalize */,
"form": "NFKD" /* or NFKC / NFD / NFC */,
"post_operation": "ROBUST" /* or NONE */
}
"$$normalize([form],[post_operation]):{input}"
note
Concrete values in the usage example are default values.
Returns
string
Arguments
Argument | Type | Values | Required / Default Value | Description |
---|---|---|---|---|
Primary | string | Yes | String to normalize | |
form | enum | NFKD / NFKC /NFD /NFC | NFKD | Normalizer Form (as described in Java's documentation. Default is NFKD; Decompose for compatibility) |
post_operation | enum | ROBUST / NONE | ROBUST | Post operation to run on result to remove/replace more letters |
Examples
Input
Definition
Output
"Tĥïŝ ĩš â fůňķŷ Šťŕĭńġ abcABC…"
"$$normalize:$"
"This is a funky String abcABC..."
"Tĥïŝ ĩš â fůňķŷ Šťŕĭńġ abcABC…"
"$$normalize(NFKD):$"
"This is a funky String abcABC..."
"Tĥïŝ ĩš â fůňķŷ Šťŕĭńġ abcABC…"
"$$normalize(NFKD,NONE):$"
"Tĥïŝ ĩš â fůňķŷ Šťŕĭńġ abcABC..."
"Tĥïŝ ĩš â fůňķŷ Šťŕĭńġ abcABC…"
"$$normalize(NFD):$"
"This is a funky String abcABC…"
"Tĥïŝ ĩš â fůňķŷ Šťŕĭńġ abcABC…"
"$$normalize(NFD,NONE):$"
"Tĥïŝ ĩš â fůňķŷ Šťŕĭńġ abcABC…"
"Tĥïŝ ĩš â fůňķŷ Šťŕĭńġ abcABC…"
"$$normalize(NFKC):$"
"Tĥïŝ ĩš â fůňķŷ Šťŕĭńġ abcABC..."
"Tĥïŝ ĩš â fůňķŷ Šťŕĭńġ abcABC…"
"$$normalize(NFKC,NONE):$"
"Tĥïŝ ĩš â fůňķŷ Šťŕĭńġ abcABC..."
"Tĥïŝ ĩš â fůňķŷ Šťŕĭńġ abcABC…"
"$$normalize(NFC):$"
"Tĥïŝ ĩš â fůňķŷ Šťŕĭńġ abcABC…"
"Tĥïŝ ĩš â fůňķŷ Šťŕĭńġ abcABC…"
"$$normalize(NFC,NONE):$"
"Tĥïŝ ĩš â fůňķŷ Šťŕĭńġ abcABC…"
"!\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~"
"$$normalize:$"
"!\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~"
"ĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıIJijĴĵĶķĸĹĺĻļĽľĿŀŁłŃńŅņŇňʼnŊŋŌōŎŏŐőŒœŔŕŖŗŘřŚśŜŝŞşŠšŢţŤťŦŧŨũŪūŬŭŮůŰűŲųŴŵŶŷŸŹźŻżŽžſ"
"$$normalize:$"
"AaAaAaCcCcCcCcDdDdEeEeEeEeEeGgGgGgGgHhHhIiIiIiIiIıIJijJjKkĸLlLlLlLlLlNnNnNnnNnOoOoOoOEoeRrRrRrSsSsSsSsTtTtTtUuUuUuUuUuUuWwYyYZzZzZzs"
"ƀƁƂƃƄƅƆƇƈƉƊƋƌƍƎƏƐƑƒƓƔƕƖƗƘƙƚƛƜƝƞƟƠơƢƣƤƥƦƧƨƩƪƫƬƭƮƯưƱƲƳƴƵƶƷƸƹƺƻƼƽƾƿǀǁǂǃDŽDždžLJLjljNJNjnjǍǎǏǐǑǒǓǔǕǖǗǘǙǚǛǜǝǞǟǠǡǢǣǤǥǦǧǨǩǪǫǬǭǮǯǰDZDzdzǴǵǶǷǸǹǺǻǼǽǾǿȀȁȂȃȄȅȆȇȈȉȊȋȌȍȎȏȐȑȒȓȔȕȖȗȘșȚțȜȝȞȟȠȡȢȣȤȥȦȧȨȩȪȫȬȭȮȯȰȱȲȳȴȵȶȷȸȹȺȻȼȽȾȿɀɁɂɃɄɅɆɇɈɉɊɋɌɍɎɏ"
"$$normalize:$"
"bBƂƃbbCCcDDƋƌƍƎƏƐFfGƔhƖIKklƛƜƝƞƟOoƢƣPpRSsƩƪtTtƮUuƱƲYyZzƷƸƹƺƻƼƽƾƿǀǁǂǃDZDzdzLJLjljNJNjnjAaIiOoUuUuUuUuUuǝAaAaAEaeGgGgKkOoOoƷʒjDZDzdzGgHpNnAaAEaeOoAaAaEeEeIiIiOoOoRrRrUuUuSsTtȜȝHhȠdȢȣZzAaEeOoOoOoOoYylntȷȸȹACcLTszɁɂBUAEeJjQqRrYy"
"ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ"
"$$normalize:$"
"AAAAAAAECEEEEIIIIDNOOOOO×OUUUUYÞßaaaaaaaeceeeeiiiiðnooooo÷ouuuuyþy"
"ŒœÆæǢǣǼǽ"
"$$normalize:$"
"OEoeAEaeAEaeAEae"