It seems that is messing with charset! #1

antoniocapuozzo · 2015-02-17T17:00:40Z

UTF-8 character problem seems to be solved adding flag U to regular expression.
http://stackoverflow.com/questions/19629893/does-preg-replace-change-my-character-set

$re = '% # Collapse ws everywhere but in blacklisted elements.
(?> # Match all whitespans other than single space.
[^\S ]\s* # Either one [\t\r\n\f\v] and zero or more ws,
| \s{2,} # or two or more consecutive-any-whitespace.
) # Note: The remaining regex consumes no text at all...
(?= # Ensure we are not in a blacklist tag.
(?: # Begin (unnecessary) group.
(?: # Zero or more of...
[^<]++ # Either one or more non-"<"
| < # or a < starting a non-blacklist tag.
(?!/?(?:textarea|pre)\b)
)*+ # (This could be "unroll-the-loop"ified.)
) # End (unnecessary) group.
(?: # Begin alternation group.
< # Either a blacklist start tag.
(?>textarea|pre)\b
| \z # or end of file.
) # End alternation group.
) # If we made it here, we are not in a blacklist tag.
%ixu';

antoniocapuozzo closed this as completed Feb 17, 2015

antoniocapuozzo reopened this Feb 17, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

It seems that is messing with charset! #1

It seems that is messing with charset! #1

antoniocapuozzo commented Feb 17, 2015

It seems that is messing with charset! #1

It seems that is messing with charset! #1

Comments

antoniocapuozzo commented Feb 17, 2015