Commit da41c19a authored by Luis Javier Merino's avatar Luis Javier Merino Committed by Tomaz Canabrava
Browse files

URI: be more strict with www. URIs

We recognize URIs that start with an scheme and a possibly empty
authority, and URI suffixes that start with "www."

In the case of URIs starting with an scheme, they are of the form:

scheme://[ userinfo "@" ] host ...

while "www." URI suffixes are of the form:

www. <rest of host> ...

where host is actually in reg-name form (not in IPv4address or
IP-literal form).

This commit allows more strict parsing of e.g.

www.example.com:foo@bar.com

as <URI>:<email> instead of as a long <URI>.
parent 64fb6409
......@@ -63,6 +63,9 @@ void HotSpotFilterTest::testUrlFilterRegex_data()
<< "http://example.com" << true;
QTest::newRow("empty_fragment") << "http://example.com/#"
<< "http://example.com" << true;
QTest::newRow("www_followed_by_colon") << "www.example.com:foo@bar.com"
<< "www.example.com" << true;
}
void HotSpotFilterTest::testUrlFilterRegex()
......
......@@ -37,7 +37,8 @@ using namespace Konsole;
// scheme://
// - Must start with an ASCII letter, preceeded by any non-word character,
// so "http" but not "mhttp"
static const char scheme_or_www[] = "(?<=^|[\\s\\[\\]()'\"])(?:www\\.|[a-z][a-z0-9+\\-.]*+://)";
static const char scheme_or_www[] = "(?<=^|[\\s\\[\\]()'\"])(?:www\\.|[a-z][a-z0-9+\\-.]*+://";
static const char scheme_or_www_end[] = ")";
// unreserved / pct-encoded / sub-delims
#define COMMON_1 "a-z0-9\\-._~%!$&'()*+,;="
......@@ -62,6 +63,7 @@ using LS1 = QLatin1String;
const QRegularExpression UrlFilter::FullUrlRegExp(
LS1(scheme_or_www)
+ LS1(userInfo)
+ LS1(scheme_or_www_end)
+ LS1(host)
+ LS1(port)
+ LS1(path)
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment