Skip to content
  • Igor Kushnir's avatar
    Fix and improve LanguageController's MimeTypeCache · 025e4fa9
    Igor Kushnir authored
    Bugs in the old implementation that are fixed in this commit:
      * The regular expression matching code was incorrect and so it never
    matched the only pattern it handled: "CMakeLists.txt". As a result,
    LanguageController::languagesForUrl(".../CMakeLists.txt") returned an
    empty language list in background threads; resorted to extra work
    culminating in a call to LanguageController::languagesForMimetype() in
    the main thread.
      * The suffix matching optimization matched patterns case-sensitively
    and so didn't match names like "X.CPP". As with the regular expression
    matching bug, this resulted in a wrong return value in background
    threads and extra work in the main thread.
      * The suffix matching optimization assumed that '*' is the only
    wildcard character. While this is actually the case for glob patterns of
    the mime types that can currently end up in mimeTypeCache, it might have
    led to a surprising bug if more complex glob patterns became supported
    in the future.
    
    TestLanguageController::testLanguagesForUrlWithCache() fails on the
    following data rows without this commit because of these bugs:
      - CMakeLists
      - cmakelists wrong case
      - upper-case
      - mixed-case
    
    Improvements of the MimeTypeCache reimplementation in this commit:
      * Literal pattern optimization: "CMakeLists.txt" was the only pattern
    that required regular expression matching in the old implementation. It
    could not be handled by the suffix matching optimization. Now the new
    separate pattern category m_literalPatterns handles this case so that
    the slower regular expression matching never happens in practice.
      * The suffixes, literal patterns and regular expressions are now
    created once and cached rather than constructed in each
    languagesForUrl() call.
      * QRegularExpression is now used instead of the deprecated QRegExp.
    
    This is the list of wildcard patterns supported by maintained KDevelop
    plugins (collated from X-KDevelop-SupportedMimeTypes plugin entries and
    /usr/share/mime/globs2):
    kdevclangsupport
        text/x-chdr
            *.h
        text/x-c++hdr
            *.hh
            *.hpp
            *.hp
            *.h++
            *.hxx
        text/x-csrc
            *.c:cs
        text/x-c++src
            *.c++
            *.cc
            *.cxx
            *.C:cs
            *.cpp
        text/x-opencl-src
            *.cl
        text/vnd.nvidia.cuda.csrc
            *.cu
        text/vnd.nvidia.cuda.chdr
            *.cuh
        text/x-objcsrc
            *.m
    kdevpatchreview
        text/x-patch
            *.patch
            *.diff
    kdevqmljs
        text/x-qml
            *.qml
            *.qmlproject
            *.qmltypes
        application/javascript
            *.jsm
            *.mjs
            *.js
    KDevCMakeManager
        text/x-cmake
            cmakelists.txt
            *.cmake
    KDevCssSupport
        text/css
            *.css
        text/html
            *.html
            *.htm
    KDevPhpSupport
        application/x-php
            *.phps
            *.php
            *.php3
            *.php4
            *.php5
    kdevpythonsupport
        text/x-python
            *.wsgi
            *.py
            *.pyx
        text/x-python3
            *.py3x
            *.py3
            *.py
    KDevRubySupport
        application/x-ruby
            *.rb
    
    Only *.c and *.C out of all supported patterns should be matched
    case-sensitively. But both of these patterns belong to the same plugin -
    kdevclangsupport. So LanguageController can safely match all patterns
    case-insensitively. See also
    https://specifications.freedesktop.org/shared-mime-info-spec/shared-mime-info-spec-latest.html
    
    Average BenchLanguageController results before and at this commit in
    milliseconds per iteration:
            Data row                    Before      At
    1. benchLanguagesForUrlNoCache()
        CMakeLists                      0.029       0.00046
        cmakelists wrong case           0.029       0.00046
        lower-case                      0.0023      0.00058
        upper-case                      0.029       0.00058
        mixed-case                      0.029       0.00058
        .C                              0.0023      0.00050
        .cl                             0.0023      0.00070
        existent C with extension       0.0022      0.00053
        .cc                             0.0023      0.00058
        .cmake                          0.0016      0.00039
        .diff                           0.00094     0.00037
        .qml                            0.0012      0.00036
        existent C w/o extension        0.16        0.16
        existent patch w/o extension    0.20        0.20
    2. benchLanguagesForUrlFilledCache()
        CMakeLists                      0.032       0.0011
        cmakelists wrong case           0.031       0.0011
        lower-case                      0.0039      0.00083
        upper-case                      0.030       0.00083
        mixed-case                      0.030       0.00085
        .C                              0.0039      0.00072
        .cl                             0.0039      0.00091
        existent C with extension       0.0038      0.00064
        .cc                             0.0039      0.00080
        .cmake                          0.0039      0.00090
        .diff                           0.0039      0.00083
        .qml                            0.0039      0.00093
        existent C w/o extension        0.16        0.16
        existent patch w/o extension    0.20        0.20
    3. benchLanguagesForUrlNoMatchNoCache()
        empty                           0.0021      0.0016
        archive                         0.024       0.023
        OpenDocument Text               0.024       0.023
        existent archive with extension 0.030       0.029
        existent archive w/o extension  0.15        0.15
    4. benchLanguagesForUrlNoMatchFilledCache()
        empty                           0.0054      0.0018
        archive                         0.029       0.024
        OpenDocument Text               0.029       0.024
        existent archive with extension 0.035       0.030
        existent archive w/o extension  0.16        0.15
    
    Almost every benchmark runs faster now. Many run more than ten times
    faster thanks to this commit.
    025e4fa9