Sign Up
Log In
Log In
or
Sign Up
Places
All Projects
Status Monitor
Collapse sidebar
home:coldboot
vectorscan
0001-documentation-Add-cmake-option-to-build-ma...
Overview
Repositories
Revisions
Requests
Users
Attributes
Meta
File 0001-documentation-Add-cmake-option-to-build-man-pages.patch of Package vectorscan
From 55bda4505f4355ee893198d2dc542ffecf4daa6e Mon Sep 17 00:00:00 2001 From: Jeremy Linton <jeremy.linton@arm.com> Date: Thu, 15 Feb 2024 14:39:42 -0600 Subject: [PATCH] documentation: Add cmake option to build man pages Man pages tend to be preferred in some circles, lets add an option to build the vectorscan documentation that way. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> --- doc/dev-reference/CMakeLists.txt | 14 +++ doc/dev-reference/chimera.rst | 22 ++-- doc/dev-reference/compilation.rst | 92 +++++++-------- doc/dev-reference/conf.py.in | 18 +-- doc/dev-reference/getting_started.rst | 159 +++++++++++++++++--------- doc/dev-reference/index.rst | 2 +- doc/dev-reference/intro.rst | 22 ++-- doc/dev-reference/performance.rst | 22 ++-- doc/dev-reference/preface.rst | 18 +-- doc/dev-reference/runtime.rst | 24 ++-- doc/dev-reference/serialization.rst | 20 ++-- doc/dev-reference/tools.rst | 44 +++---- libhs.pc.in | 2 +- tools/hsbench/engine_hyperscan.cpp | 2 +- 14 files changed, 264 insertions(+), 197 deletions(-) diff --git a/doc/dev-reference/CMakeLists.txt b/doc/dev-reference/CMakeLists.txt index 449589f..48349c2 100644 --- a/doc/dev-reference/CMakeLists.txt +++ b/doc/dev-reference/CMakeLists.txt @@ -19,6 +19,7 @@ else() set(SPHINX_BUILD_DIR "${CMAKE_CURRENT_BINARY_DIR}/_build") set(SPHINX_CACHE_DIR "${CMAKE_CURRENT_BINARY_DIR}/_doctrees") set(SPHINX_HTML_DIR "${CMAKE_CURRENT_BINARY_DIR}/html") +set(SPHINX_MAN_DIR "${CMAKE_CURRENT_BINARY_DIR}/man") configure_file("${CMAKE_CURRENT_SOURCE_DIR}/conf.py.in" "${CMAKE_CURRENT_BINARY_DIR}/conf.py" @ONLY) @@ -32,4 +33,17 @@ add_custom_target(dev-reference "${SPHINX_HTML_DIR}" DEPENDS dev-reference-doxygen COMMENT "Building HTML dev reference with Sphinx") + +add_custom_target(dev-reference-man ALL + ${SPHINX_BUILD} + -b man + -c "${CMAKE_CURRENT_BINARY_DIR}" + -d "${SPHINX_CACHE_DIR}" + "${CMAKE_CURRENT_SOURCE_DIR}" + "${SPHINX_MAN_DIR}" + DEPENDS dev-reference-doxygen + COMMENT "Building man page reference with Sphinx") + +install(FILES ${CMAKE_BINARY_DIR}/doc/dev-reference/man/vectorscan.7 + DESTINATION "${CMAKE_INSTALL_MANDIR}/man7/") endif() diff --git a/doc/dev-reference/chimera.rst b/doc/dev-reference/chimera.rst index d35b116..cb8c84c 100644 --- a/doc/dev-reference/chimera.rst +++ b/doc/dev-reference/chimera.rst @@ -11,10 +11,10 @@ Introduction ************ Chimera is a software regular expression matching engine that is a hybrid of -Hyperscan and PCRE. The design goals of Chimera are to fully support PCRE -syntax as well as to take advantage of the high performance nature of Hyperscan. +Vectorscan and PCRE. The design goals of Chimera are to fully support PCRE +syntax as well as to take advantage of the high performance nature of Vectorscan. -Chimera inherits the design guideline of Hyperscan with C APIs for compilation +Chimera inherits the design guideline of Vectorscan with C APIs for compilation and scanning. The Chimera API itself is composed of two major components: @@ -65,13 +65,13 @@ For a given database, Chimera provides several guarantees: .. note:: Chimera is designed to have the same matching behavior as PCRE, including greedy/ungreedy, capturing, etc. Chimera reports both **start offset** and **end offset** for each match like PCRE. Different - from the fashion of reporting all matches in Hyperscan, Chimera only reports + from the fashion of reporting all matches in Vectorscan, Chimera only reports non-overlapping matches. For example, the pattern :regexp:`/foofoo/` will match ``foofoofoofoo`` at offsets (0, 6) and (6, 12). -.. note:: Since Chimera is a hybrid of Hyperscan and PCRE in order to support +.. note:: Since Chimera is a hybrid of Vectorscan and PCRE in order to support full PCRE syntax, there will be extra performance overhead compared to - Hyperscan-only solution. Please always use Hyperscan for better performance + Vectorscan-only solution. Please always use Vectorscan for better performance unless you must need full PCRE syntax support. See :ref:`chruntime` for more details @@ -83,12 +83,12 @@ Requirements The PCRE library (http://pcre.org/) version 8.41 is required for Chimera. .. note:: Since Chimera needs to reference PCRE internal function, please place PCRE source - directory under Hyperscan root directory in order to build Chimera. + directory under Vectorscan root directory in order to build Chimera. -Beside this, both hardware and software requirements of Chimera are the same to Hyperscan. +Beside this, both hardware and software requirements of Chimera are the same to Vectorscan. See :ref:`hardware` and :ref:`software` for more details. -.. note:: Building Hyperscan will automatically generate Chimera library. +.. note:: Building Vectorscan will automatically generate Chimera library. Currently only static library is supported for Chimera, so please use static build type when configure CMake build options. @@ -119,7 +119,7 @@ databases: Compilation allows the Chimera library to analyze the given pattern(s) and pre-determine how to scan for these patterns in an optimized fashion using -Hyperscan and PCRE. +Vectorscan and PCRE. =============== Pattern Support @@ -134,7 +134,7 @@ Semantics ========= Chimera supports the exact same semantics of PCRE library. Moreover, it supports -multiple simultaneous pattern matching like Hyperscan and the multiple matches +multiple simultaneous pattern matching like Vectorscan and the multiple matches will be reported in order by end offset. .. _chruntime: diff --git a/doc/dev-reference/compilation.rst b/doc/dev-reference/compilation.rst index 6f5541e..a0ae8c8 100644 --- a/doc/dev-reference/compilation.rst +++ b/doc/dev-reference/compilation.rst @@ -9,7 +9,7 @@ Compiling Patterns Building a Database ******************* -The Hyperscan compiler API accepts regular expressions and converts them into a +The Vectorscan compiler API accepts regular expressions and converts them into a compiled pattern database that can then be used to scan data. The API provides three functions that compile regular expressions into @@ -24,7 +24,7 @@ databases: #. :c:func:`hs_compile_ext_multi`: compiles an array of expressions as above, but allows :ref:`extparam` to be specified for each expression. -Compilation allows the Hyperscan library to analyze the given pattern(s) and +Compilation allows the Vectorscan library to analyze the given pattern(s) and pre-determine how to scan for these patterns in an optimized fashion that would be far too expensive to compute at run-time. @@ -48,10 +48,10 @@ To compile patterns to be used in streaming mode, the ``mode`` parameter of block mode requires the use of :c:member:`HS_MODE_BLOCK` and vectored mode requires the use of :c:member:`HS_MODE_VECTORED`. A pattern database compiled for one mode (streaming, block or vectored) can only be used in that mode. The -version of Hyperscan used to produce a compiled pattern database must match the -version of Hyperscan used to scan with it. +version of Vectorscan used to produce a compiled pattern database must match the +version of Vectorscan used to scan with it. -Hyperscan provides support for targeting a database at a particular CPU +Vectorscan provides support for targeting a database at a particular CPU platform; see :ref:`instr_specialization` for details. ===================== @@ -75,14 +75,14 @@ characters exist in regular grammar like ``[``, ``]``, ``(``, ``)``, ``{``, While in pure literal case, all these meta characters lost extra meanings expect for that they are just common ASCII codes. -Hyperscan is initially designed to process common regular expressions. It is +Vectorscan is initially designed to process common regular expressions. It is hence embedded with a complex parser to do comprehensive regular grammar interpretation. Particularly, the identification of above meta characters is the basic step for the interpretation of far more complex regular grammars. However in real cases, patterns may not always be regular expressions. They could just be pure literals. Problem will come if the pure literals contain -regular meta characters. Supposing fed directly into traditional Hyperscan +regular meta characters. Supposing fed directly into traditional Vectorscan compile API, all these meta characters will be interpreted in predefined ways, which is unnecessary and the result is totally out of expectation. To avoid such misunderstanding by traditional API, users have to preprocess these @@ -90,7 +90,7 @@ literal patterns by converting the meta characters into some other formats: either by adding a backslash ``\`` before certain meta characters, or by converting all the characters into a hexadecimal representation. -In ``v5.2.0``, Hyperscan introduces 2 new compile APIs for pure literal patterns: +In ``v5.2.0``, Vectorscan introduces 2 new compile APIs for pure literal patterns: #. :c:func:`hs_compile_lit`: compiles a single pure literal into a pattern database. @@ -106,7 +106,7 @@ content directly into these APIs without worrying about writing regular meta characters in their patterns. No preprocessing work is needed any more. For new APIs, the ``length`` of each literal pattern is a newly added parameter. -Hyperscan needs to locate the end position of the input expression via clearly +Vectorscan needs to locate the end position of the input expression via clearly knowing each literal's length, not by simply identifying character ``\0`` of a string. @@ -127,19 +127,19 @@ Supported flags: :c:member:`HS_FLAG_CASELESS`, :c:member:`HS_FLAG_SINGLEMATCH`, Pattern Support *************** -Hyperscan supports the pattern syntax used by the PCRE library ("libpcre"), +Vectorscan supports the pattern syntax used by the PCRE library ("libpcre"), described at <http://www.pcre.org/>. However, not all constructs available in libpcre are supported. The use of unsupported constructs will result in compilation errors. -The version of PCRE used to validate Hyperscan's interpretation of this syntax +The version of PCRE used to validate Vectorscan's interpretation of this syntax is 8.41 or above. ==================== Supported Constructs ==================== -The following regex constructs are supported by Hyperscan: +The following regex constructs are supported by Vectorscan: * Literal characters and strings, with all libpcre quoting and character escapes. @@ -177,7 +177,7 @@ The following regex constructs are supported by Hyperscan: :c:member:`HS_FLAG_SINGLEMATCH` flag is on for that pattern. * Lazy modifiers (:regexp:`?` appended to another quantifier, e.g. - :regexp:`\\w+?`) are supported but ignored (as Hyperscan reports all + :regexp:`\\w+?`) are supported but ignored (as Vectorscan reports all matches). * Parenthesization, including the named and unnamed capturing and @@ -219,15 +219,15 @@ The following regex constructs are supported by Hyperscan: .. note:: At this time, not all patterns can be successfully compiled with the :c:member:`HS_FLAG_SOM_LEFTMOST` flag, which enables per-pattern support for :ref:`som`. The patterns that support this flag are a subset of patterns that - can be successfully compiled with Hyperscan; notably, many bounded repeat - forms that can be compiled with Hyperscan without the Start of Match flag + can be successfully compiled with Vectorscan; notably, many bounded repeat + forms that can be compiled with Vectorscan without the Start of Match flag enabled cannot be compiled with the flag enabled. ====================== Unsupported Constructs ====================== -The following regex constructs are not supported by Hyperscan: +The following regex constructs are not supported by Vectorscan: * Backreferences and capturing sub-expressions. * Arbitrary zero-width assertions. @@ -246,32 +246,32 @@ The following regex constructs are not supported by Hyperscan: Semantics ********* -While Hyperscan follows libpcre syntax, it provides different semantics. The +While Vectorscan follows libpcre syntax, it provides different semantics. The major departures from libpcre semantics are motivated by the requirements of streaming and multiple simultaneous pattern matching. The major departures from libpcre semantics are: -#. **Multiple pattern matching**: Hyperscan allows matches to be reported for +#. **Multiple pattern matching**: Vectorscan allows matches to be reported for several patterns simultaneously. This is not equivalent to separating the patterns by :regexp:`|` in libpcre, which evaluates alternations left-to-right. -#. **Lack of ordering**: the multiple matches that Hyperscan produces are not +#. **Lack of ordering**: the multiple matches that Vectorscan produces are not guaranteed to be ordered, although they will always fall within the bounds of the current scan. -#. **End offsets only**: Hyperscan's default behaviour is only to report the end +#. **End offsets only**: Vectorscan's default behaviour is only to report the end offset of a match. Reporting of the start offset can be enabled with per-expression flags at pattern compile time. See :ref:`som` for details. #. **"All matches" reported**: scanning :regexp:`/foo.*bar/` against - ``fooxyzbarbar`` will return two matches from Hyperscan -- at the points + ``fooxyzbarbar`` will return two matches from Vectorscan -- at the points corresponding to the ends of ``fooxyzbar`` and ``fooxyzbarbar``. In contrast, libpcre semantics by default would report only one match at ``fooxyzbarbar`` (greedy semantics) or, if non-greedy semantics were switched on, one match at ``fooxyzbar``. This means that switching between greedy and non-greedy - semantics is a no-op in Hyperscan. + semantics is a no-op in Vectorscan. To support libpcre quantifier semantics while accurately reporting streaming matches at the time they occur is impossible. For example, consider the pattern @@ -299,7 +299,7 @@ as in block 3 -- which would constitute a better match for the pattern. Start of Match ============== -In standard operation, Hyperscan will only provide the end offset of a match +In standard operation, Vectorscan will only provide the end offset of a match when the match callback is called. If the :c:member:`HS_FLAG_SOM_LEFTMOST` flag is specified for a particular pattern, then the same set of matches is returned, but each match will also provide the leftmost possible start offset @@ -308,7 +308,7 @@ corresponding to its end offset. Using the SOM flag entails a number of trade-offs and limitations: * Reduced pattern support: For many patterns, tracking SOM is complex and can - result in Hyperscan failing to compile a pattern with a "Pattern too + result in Vectorscan failing to compile a pattern with a "Pattern too large" error, even if the pattern is supported in normal operation. * Increased stream state: At scan time, state space is required to track potential SOM offsets, and this must be stored in persistent stream state in @@ -316,20 +316,20 @@ Using the SOM flag entails a number of trade-offs and limitations: required to match a pattern. * Performance overhead: Similarly, there is generally a performance cost associated with tracking SOM. -* Incompatible features: Some other Hyperscan pattern flags (such as +* Incompatible features: Some other Vectorscan pattern flags (such as :c:member:`HS_FLAG_SINGLEMATCH` and :c:member:`HS_FLAG_PREFILTER`) can not be used in combination with SOM. Specifying them together with :c:member:`HS_FLAG_SOM_LEFTMOST` will result in a compilation error. In streaming mode, the amount of precision delivered by SOM can be controlled -with the SOM horizon flags. These instruct Hyperscan to deliver accurate SOM +with the SOM horizon flags. These instruct Vectorscan to deliver accurate SOM information within a certain distance of the end offset, and return a special start offset of :c:member:`HS_OFFSET_PAST_HORIZON` otherwise. Specifying a small or medium SOM horizon will usually reduce the stream state required for a given database. .. note:: In streaming mode, the start offset returned for a match may refer to - a point in the stream *before* the current block being scanned. Hyperscan + a point in the stream *before* the current block being scanned. Vectorscan provides no facility for accessing earlier blocks; if the calling application needs to inspect historical data, then it must store it itself. @@ -341,7 +341,7 @@ Extended Parameters In some circumstances, more control over the matching behaviour of a pattern is required than can be specified easily using regular expression syntax. For -these scenarios, Hyperscan provides the :c:func:`hs_compile_ext_multi` function +these scenarios, Vectorscan provides the :c:func:`hs_compile_ext_multi` function that allows a set of "extended parameters" to be set on a per-pattern basis. Extended parameters are specified using an :c:type:`hs_expr_ext_t` structure, @@ -383,18 +383,18 @@ section. Prefiltering Mode ================= -Hyperscan provides a per-pattern flag, :c:member:`HS_FLAG_PREFILTER`, which can -be used to implement a prefilter for a pattern than Hyperscan would not +Vectorscan provides a per-pattern flag, :c:member:`HS_FLAG_PREFILTER`, which can +be used to implement a prefilter for a pattern than Vectorscan would not ordinarily support. -This flag instructs Hyperscan to compile an "approximate" version of this -pattern for use in a prefiltering application, even if Hyperscan does not +This flag instructs Vectorscan to compile an "approximate" version of this +pattern for use in a prefiltering application, even if Vectorscan does not support the pattern in normal operation. The set of matches returned when this flag is used is guaranteed to be a superset of the matches specified by the non-prefiltering expression. -If the pattern contains pattern constructs not supported by Hyperscan (such as +If the pattern contains pattern constructs not supported by Vectorscan (such as zero-width assertions, back-references or conditional references) these constructs will be replaced internally with broader constructs that may match more often. @@ -404,7 +404,7 @@ back-reference :regexp:`\\1`. In prefiltering mode, this pattern might be approximated by having its back-reference replaced with its referent, forming :regexp:`/\\w+ again \\w+/`. -Furthermore, in prefiltering mode Hyperscan may simplify a pattern that would +Furthermore, in prefiltering mode Vectorscan may simplify a pattern that would otherwise return a "Pattern too large" error at compile time, or for performance reasons (subject to the matching guarantee above). @@ -422,22 +422,22 @@ matches for the pattern. Instruction Set Specialization ****************************** -Hyperscan is able to make use of several modern instruction set features found +Vectorscan is able to make use of several modern instruction set features found on x86 processors to provide improvements in scanning performance. Some of these features are selected when the library is built; for example, -Hyperscan will use the native ``POPCNT`` instruction on processors where it is +Vectorscan will use the native ``POPCNT`` instruction on processors where it is available and the library has been optimized for the host architecture. -.. note:: By default, the Hyperscan runtime is built with the ``-march=native`` +.. note:: By default, the Vectorscan runtime is built with the ``-march=native`` compiler flag and (where possible) will make use of all instructions known by the host's C compiler. -To use some instruction set features, however, Hyperscan must build a +To use some instruction set features, however, Vectorscan must build a specialized database to support them. This means that the target platform must be specified at pattern compile time. -The Hyperscan compiler API functions all accept an optional +The Vectorscan compiler API functions all accept an optional :c:type:`hs_platform_info_t` argument, which describes the target platform for the database to be built. If this argument is NULL, the database will be targeted at the current host platform. @@ -467,7 +467,7 @@ See :ref:`api_constants` for the full list of CPU tuning and feature flags. Approximate matching ******************** -Hyperscan provides an experimental approximate matching mode, which will match +Vectorscan provides an experimental approximate matching mode, which will match patterns within a given edit distance. The exact matching behavior is defined as follows: @@ -492,7 +492,7 @@ follows: Here are a few examples of approximate matching: -* Pattern :regexp:`/foo/` can match ``foo`` when using regular Hyperscan +* Pattern :regexp:`/foo/` can match ``foo`` when using regular Vectorscan matching behavior. With approximate matching within edit distance 2, the pattern will produce matches when scanned against ``foo``, ``foooo``, ``f00``, ``f``, and anything else that lies within edit distance 2 of matching corpora @@ -513,7 +513,7 @@ matching support. Here they are, in a nutshell: * Reduced pattern support: * For many patterns, approximate matching is complex and can result in - Hyperscan failing to compile a pattern with a "Pattern too large" error, + Vectorscan failing to compile a pattern with a "Pattern too large" error, even if the pattern is supported in normal operation. * Additionally, some patterns cannot be approximately matched because they reduce to so-called "vacuous" patterns (patterns that match everything). For @@ -548,7 +548,7 @@ Logical Combinations ******************** For situations when a user requires behaviour that depends on the presence or -absence of matches from groups of patterns, Hyperscan provides support for the +absence of matches from groups of patterns, Vectorscan provides support for the logical combination of patterns in a given pattern set, with three operators: ``NOT``, ``AND`` and ``OR``. @@ -561,7 +561,7 @@ offset is *true* if the expression it refers to is *false* at this offset. For example, ``NOT 101`` means that expression 101 has not yet matched at this offset. -A logical combination is passed to Hyperscan at compile time as an expression. +A logical combination is passed to Vectorscan at compile time as an expression. This combination expression will raise matches at every offset where one of its sub-expressions matches and the logical value of the whole expression is *true*. @@ -603,7 +603,7 @@ In a logical combination expression: * Whitespace is ignored. To use a logical combination expression, it must be passed to one of the -Hyperscan compile functions (:c:func:`hs_compile_multi`, +Vectorscan compile functions (:c:func:`hs_compile_multi`, :c:func:`hs_compile_ext_multi`) along with the :c:member:`HS_FLAG_COMBINATION` flag, which identifies the pattern as a logical combination expression. The patterns referred to in the logical combination expression must be compiled together in @@ -613,7 +613,7 @@ When an expression has the :c:member:`HS_FLAG_COMBINATION` flag set, it ignores all other flags except the :c:member:`HS_FLAG_SINGLEMATCH` flag and the :c:member:`HS_FLAG_QUIET` flag. -Hyperscan will accept logical combination expressions at compile time that +Vectorscan will accept logical combination expressions at compile time that evaluate to *true* when no patterns have matched, and report the match for combination at end of data if no patterns have matched; for example: :: diff --git a/doc/dev-reference/conf.py.in b/doc/dev-reference/conf.py.in index d0ef371..36c718e 100644 --- a/doc/dev-reference/conf.py.in +++ b/doc/dev-reference/conf.py.in @@ -1,6 +1,6 @@ # -*- coding: utf-8 -*- # -# Hyperscan documentation build configuration file, created by +# Vectorscan documentation build configuration file, created by # sphinx-quickstart on Tue Sep 29 15:59:19 2015. # # This file is execfile()d with the current directory set to its @@ -43,8 +43,8 @@ source_suffix = '.rst' master_doc = 'index' # General information about the project. -project = u'Hyperscan' -copyright = u'2015-2018, Intel Corporation' +project = u'Vectorscan' +copyright = u'2015-2018, Intel Corporation; 2020-2024, VectorCamp; and other contributors' # The version info for the project you're documenting, acts as replacement for # |version| and |release|, also used in various other places throughout the @@ -202,7 +202,7 @@ latex_elements = { # (source start file, target name, title, # author, documentclass [howto, manual, or own class]). latex_documents = [ - ('index', 'Hyperscan.tex', u'Hyperscan Documentation', + ('index', 'Hyperscan.tex', u'Vectorscan Documentation', u'Intel Corporation', 'manual'), ] @@ -232,8 +232,8 @@ latex_documents = [ # One entry per manual page. List of tuples # (source start file, name, description, authors, manual section). man_pages = [ - ('index', 'hyperscan', u'Hyperscan Documentation', - [u'Intel Corporation'], 1) + ('index', 'vectorscan', u'Vectorscan Documentation', + [u'Intel Corporation'], 7) ] # If true, show URL addresses after external links. @@ -246,8 +246,8 @@ man_pages = [ # (source start file, target name, title, author, # dir menu entry, description, category) texinfo_documents = [ - ('index', 'Hyperscan', u'Hyperscan Documentation', - u'Intel Corporation', 'Hyperscan', 'High-performance regular expression matcher.', + ('index', 'Vectorscan', u'Vectorscan Documentation', + u'Intel Corporation; VectorCamp', 'Vectorscan', 'High-performance regular expression matcher.', 'Miscellaneous'), ] @@ -272,4 +272,4 @@ breathe_domain_by_extension = {"h" : "c"} # -- Add some customisation ----------------------------------------------- def setup(app): - app.add_stylesheet("hyperscan.css") # Custom stylesheet for e.g. :regex: + app.add_css_file("hyperscan.css") # Custom stylesheet for e.g. :regex: diff --git a/doc/dev-reference/getting_started.rst b/doc/dev-reference/getting_started.rst index aaff15b..57d7821 100644 --- a/doc/dev-reference/getting_started.rst +++ b/doc/dev-reference/getting_started.rst @@ -7,43 +7,41 @@ Getting Started Very Quick Start **************** -#. Clone Hyperscan :: +#. Clone Vectorscan :: - cd <where-you-want-hyperscan-source> - git clone git://github.com/intel/hyperscan + cd <where-you-want-vectorscan-source> + git clone https://github.com/VectorCamp/vectorscan -#. Configure Hyperscan +#. Configure Vectorscan Ensure that you have the correct :ref:`dependencies <software>` present, and then: :: - cd <where-you-want-to-build-hyperscan> + cd <where-you-want-to-build-vectorscan> mkdir <build-dir> cd <build-dir> - cmake [-G <generator>] [options] <hyperscan-source-path> + cmake [-G <generator>] [options] <vectorscan-source-path> Known working generators: * ``Unix Makefiles`` --- make-compatible makefiles (default on Linux/FreeBSD/Mac OS X) * ``Ninja`` --- `Ninja <http://martine.github.io/ninja/>`_ build files. - * ``Visual Studio 15 2017`` --- Visual Studio projects - Generators that might work include: + Unsupported generators that might work include: * ``Xcode`` --- OS X Xcode projects. -#. Build Hyperscan +#. Build Vectorscan Depending on the generator used: * ``cmake --build .`` --- will build everything * ``make -j<jobs>`` --- use makefiles in parallel * ``ninja`` --- use Ninja build - * ``MsBuild.exe`` --- use Visual Studio MsBuild * etc. -#. Check Hyperscan +#. Check Vectorscan - Run the Hyperscan unit tests: :: + Run the Vectorscan unit tests: :: bin/unit-hyperscan @@ -55,20 +53,23 @@ Requirements Hardware ======== -Hyperscan will run on x86 processors in 64-bit (Intel\ |reg| 64 Architecture) and -32-bit (IA-32 Architecture) modes. +Vectorscan will run on x86 processors in 64-bit (Intel\ |reg| 64 Architecture) and +32-bit (IA-32 Architecture) modes as well as Arm v8.0+ aarch64, and POWER 8+ ppc64le +machines. Hyperscan is a high performance software library that takes advantage of recent -Intel architecture advances. At a minimum, support for Supplemental Streaming -SIMD Extensions 3 (SSSE3) is required, which should be available on any modern -x86 processor. +architecture advances. -Additionally, Hyperscan can make use of: +Additionally, Vectorscan can make use of: * Intel Streaming SIMD Extensions 4.2 (SSE4.2) * the POPCNT instruction * Bit Manipulation Instructions (BMI, BMI2) * Intel Advanced Vector Extensions 2 (Intel AVX2) + * Arm NEON + * Arm SVE and SVE2 + * Arm SVE2 BITPERM + * IBM Power8/Power9 VSX if present. @@ -79,40 +80,34 @@ These can be determined at library compile time, see :ref:`target_arch`. Software ======== -As a software library, Hyperscan doesn't impose any particular runtime -software requirements, however to build the Hyperscan library we require a -modern C and C++ compiler -- in particular, Hyperscan requires C99 and C++11 +As a software library, Vectorscan doesn't impose any particular runtime +software requirements, however to build the Vectorscan library we require a +modern C and C++ compiler -- in particular, Vectorscan requires C99 and C++17 compiler support. The supported compilers are: - * GCC, v4.8.1 or higher - * Clang, v3.4 or higher (with libstdc++ or libc++) - * Intel C++ Compiler v15 or higher - * Visual C++ 2017 Build Tools + * GCC, v9 or higher + * Clang, v5 or higher (with libstdc++ or libc++) -Examples of operating systems that Hyperscan is known to work on include: +Examples of operating systems that Vectorscan is known to work on include: Linux: -* Ubuntu 14.04 LTS or newer +* Ubuntu 20.04 LTS or newer * RedHat/CentOS 7 or newer +* Fedora 38 or newer +* Debian 10 FreeBSD: * 10.0 or newer -Windows: - -* 8 or newer - Mac OS X: * 10.8 or newer, using XCode/Clang -Hyperscan *may* compile and run on other platforms, but there is no guarantee. -We currently have experimental support for Windows using Intel C++ Compiler -or Visual Studio 2017. +Vectorscan *may* compile and run on other platforms, but there is no guarantee. -In addition, the following software is required for compiling the Hyperscan library: +In addition, the following software is required for compiling the Vectorscan library: ======================================================= =========== ====================================== Dependency Version Notes @@ -132,20 +127,20 @@ Ragel, you may use Cygwin to build it from source. Boost Headers ------------- -Compiling Hyperscan depends on a recent version of the Boost C++ header +Compiling Vectorscan depends on a recent version of the Boost C++ header library. If the Boost libraries are installed on the build machine in the usual paths, CMake will find them. If the Boost libraries are not installed, the location of the Boost source tree can be specified during the CMake configuration step using the ``BOOST_ROOT`` variable (described below). Another alternative is to put a copy of (or a symlink to) the boost -subdirectory in ``<hyperscan-source-path>/include/boost``. +subdirectory in ``<vectorscanscan-source-path>/include/boost``. For example: for the Boost-1.59.0 release: :: - ln -s boost_1_59_0/boost <hyperscan-source-path>/include/boost + ln -s boost_1_59_0/boost <vectorscan-source-path>/include/boost -As Hyperscan uses the header-only parts of Boost, it is not necessary to +As Vectorscan uses the header-only parts of Boost, it is not necessary to compile the Boost libraries. CMake Configuration @@ -168,11 +163,12 @@ Common options for CMake include: | | Valid options are Debug, Release, RelWithDebInfo, | | | and MinSizeRel. Default is RelWithDebInfo. | +------------------------+----------------------------------------------------+ -| BUILD_SHARED_LIBS | Build Hyperscan as a shared library instead of | +| BUILD_SHARED_LIBS | Build Vectorscan as a shared library instead of | | | the default static library. | +| | Default: Off | +------------------------+----------------------------------------------------+ -| BUILD_STATIC_AND_SHARED| Build both static and shared Hyperscan libs. | -| | Default off. | +| BUILD_STATIC_LIBS | Build Vectorscan as a static library. | +| | Default: On | +------------------------+----------------------------------------------------+ | BOOST_ROOT | Location of Boost source tree. | +------------------------+----------------------------------------------------+ @@ -180,12 +176,64 @@ Common options for CMake include: +------------------------+----------------------------------------------------+ | FAT_RUNTIME | Build the :ref:`fat runtime<fat_runtime>`. Default | | | true on Linux, not available elsewhere. | +| | Default: Off | ++------------------------+----------------------------------------------------+ +| USE_CPU_NATIVE | Native CPU detection is off by default, however it | +| | is possible to build a performance-oriented non-fat| +| | library tuned to your CPU. | +| | Default: Off | ++------------------------+----------------------------------------------------+ +| SANITIZE | Use libasan sanitizer to detect possible bugs. | +| | Valid options are address, memory and undefined. | ++------------------------+----------------------------------------------------+ +| SIMDE_BACKEND | Enable SIMDe backend. If this is chosen all native | +| | (SSE/AVX/AVX512/Neon/SVE/VSX) backends will be | +| | disabled and a SIMDe SSE4.2 emulation backend will | +| | be enabled. This will enable Vectorscan to build | +| | and run on architectures without SIMD. | +| | Default: Off | ++------------------------+----------------------------------------------------+ +| SIMDE_NATIVE | Enable SIMDe native emulation of x86 SSE4.2 | +| | intrinsics on the building platform. That is, | +| | SSE4.2 intrinsics will be emulated using Neon on | +| | an Arm platform, or VSX on a Power platform, etc. | +| | Default: Off | ++------------------------+----------------------------------------------------+ + +X86 platform specific options include: + ++------------------------+----------------------------------------------------+ +| Variable | Description | ++========================+====================================================+ +| BUILD_AVX2 | Enable code for AVX2. | ++------------------------+----------------------------------------------------+ +| BUILD_AVX512 | Enable code for AVX512. Implies BUILD_AVX2. | ++------------------------+----------------------------------------------------+ +| BUILD_AVX512VBMI | Enable code for AVX512 with VBMI extension. Implies| +| | BUILD_AVX512. | ++------------------------+----------------------------------------------------+ + +Arm platform specific options include: + ++------------------------+----------------------------------------------------+ +| Variable | Description | ++========================+====================================================+ +| BUILD_SVE | Enable code for SVE, like on AWS Graviton3 CPUs. | +| | Not much code is ported just for SVE , but enabling| +| | SVE code production, does improve code generation, | +| | see Benchmarks. | ++------------------------+----------------------------------------------------+ +| BUILD_SVE2 | Enable code for SVE2, implies BUILD_SVE. Most | +| | non-Neon code is written for SVE2. | ++------------------------+----------------------------------------------------+ +| BUILD_SVE2_BITPERM | Enable code for SVE2_BITPERM harwdare feature, | +| | implies BUILD_SVE2. | +------------------------+----------------------------------------------------+ For example, to generate a ``Debug`` build: :: cd <build-dir> - cmake -DCMAKE_BUILD_TYPE=Debug <hyperscan-source-path> + cmake -DCMAKE_BUILD_TYPE=Debug <vectorscan-source-path> @@ -193,7 +241,7 @@ Build Type ---------- CMake determines a number of features for a build based on the Build Type. -Hyperscan defaults to ``RelWithDebInfo``, i.e. "release with debugging +Vectorscan defaults to ``RelWithDebInfo``, i.e. "release with debugging information". This is a performance optimized build without runtime assertions but with debug symbols enabled. @@ -201,7 +249,7 @@ The other types of builds are: * ``Release``: as above, but without debug symbols * ``MinSizeRel``: a stripped release build - * ``Debug``: used when developing Hyperscan. Includes runtime assertions + * ``Debug``: used when developing Vectorscan. Includes runtime assertions (which has a large impact on runtime performance), and will also enable some other build features like building internal unit tests. @@ -211,7 +259,7 @@ The other types of builds are: Target Architecture ------------------- -Unless using the :ref:`fat runtime<fat_runtime>`, by default Hyperscan will be +Unless using the :ref:`fat runtime<fat_runtime>`, by default Vectorscan will be compiled to target the instruction set of the processor of the machine that being used for compilation. This is done via the use of ``-march=native``. The result of this means that a library built on one machine may not work on a @@ -223,7 +271,7 @@ CMake, or ``CMAKE_C_FLAGS`` and ``CMAKE_CXX_FLAGS`` on the CMake command line. F example, to set the instruction subsets up to ``SSE4.2`` using GCC 4.8: :: cmake -DCMAKE_C_FLAGS="-march=corei7" \ - -DCMAKE_CXX_FLAGS="-march=corei7" <hyperscan-source-path> + -DCMAKE_CXX_FLAGS="-march=corei7" <vectorscan-source-path> For more information, refer to :ref:`instr_specialization`. @@ -232,17 +280,17 @@ For more information, refer to :ref:`instr_specialization`. Fat Runtime ----------- -A feature introduced in Hyperscan v4.4 is the ability for the Hyperscan +A feature introduced in Hyperscan v4.4 is the ability for the Vectorscan library to dispatch the most appropriate runtime code for the host processor. -This feature is called the "fat runtime", as a single Hyperscan library +This feature is called the "fat runtime", as a single Vectorscan library contains multiple copies of the runtime code for different instruction sets. .. note:: The fat runtime feature is only available on Linux. Release builds of - Hyperscan will default to having the fat runtime enabled where supported. + Vectorscan will default to having the fat runtime enabled where supported. -When building the library with the fat runtime, the Hyperscan runtime code +When building the library with the fat runtime, the Vectorscan runtime code will be compiled multiple times for these different instruction sets, and these compiled objects are combined into one library. There are no changes to how user applications are built against this library. @@ -254,11 +302,11 @@ resolved so that the right version of each API function is used. There is no impact on function call performance, as this check and resolution is performed by the ELF loader once when the binary is loaded. -If the Hyperscan library is used on x86 systems without ``SSSE3``, the runtime +If the Vectorscan library is used on x86 systems without ``SSSE4.2``, the runtime API functions will resolve to functions that return :c:member:`HS_ARCH_ERROR` instead of potentially executing illegal instructions. The API function :c:func:`hs_valid_platform` can be used by application writers to determine if -the current platform is supported by Hyperscan. +the current platform is supported by Vectorscan. As of this release, the variants of the runtime that are built, and the CPU capability that is required, are the following: @@ -299,6 +347,11 @@ capability that is required, are the following: cmake -DBUILD_AVX512VBMI=on <...> + Vectorscan add support for Arm processors and SVE, SV2 and SVE2_BITPERM. + example: :: + + cmake -DBUILD_SVE=ON -DBUILD_SVE2=ON -DBUILD_SVE2_BITPERM=ON <...> + As the fat runtime requires compiler, libc, and binutils support, at this time it will only be enabled for Linux builds where the compiler supports the `indirect function "ifunc" function attribute diff --git a/doc/dev-reference/index.rst b/doc/dev-reference/index.rst index b5d6a54..4046a29 100644 --- a/doc/dev-reference/index.rst +++ b/doc/dev-reference/index.rst @@ -1,5 +1,5 @@ ############################################### -Hyperscan |version| Developer's Reference Guide +Vectorscan |version| Developer's Reference Guide ############################################### ------- diff --git a/doc/dev-reference/intro.rst b/doc/dev-reference/intro.rst index 58879ae..71538eb 100644 --- a/doc/dev-reference/intro.rst +++ b/doc/dev-reference/intro.rst @@ -5,11 +5,11 @@ Introduction ############ -Hyperscan is a software regular expression matching engine designed with +Vectorscan is a software regular expression matching engine designed with high performance and flexibility in mind. It is implemented as a library that exposes a straightforward C API. -The Hyperscan API itself is composed of two major components: +The Vectorscan API itself is composed of two major components: *********** Compilation @@ -17,7 +17,7 @@ Compilation These functions take a group of regular expressions, along with identifiers and option flags, and compile them into an immutable database that can be used by -the Hyperscan scanning API. This compilation process performs considerable +the Vectorscan scanning API. This compilation process performs considerable analysis and optimization work in order to build a database that will match the given expressions efficiently. @@ -36,8 +36,8 @@ See :ref:`compilation` for more detail. Scanning ******** -Once a Hyperscan database has been created, it can be used to scan data in -memory. Hyperscan provides several scanning modes, depending on whether the +Once a Vectorscan database has been created, it can be used to scan data in +memory. Vectorscan provides several scanning modes, depending on whether the data to be scanned is available as a single contiguous block, whether it is distributed amongst several blocks in memory at the same time, or whether it is to be scanned as a sequence of blocks in a stream. @@ -45,7 +45,7 @@ to be scanned as a sequence of blocks in a stream. Matches are delivered to the application via a user-supplied callback function that is called synchronously for each match. -For a given database, Hyperscan provides several guarantees: +For a given database, Vectorscan provides several guarantees: * No memory allocations occur at runtime with the exception of two fixed-size allocations, both of which should be done ahead of time for @@ -56,7 +56,7 @@ For a given database, Hyperscan provides several guarantees: call. - **Stream state**: in streaming mode only, some state space is required to store data that persists between scan calls for each stream. This allows - Hyperscan to track matches that span multiple blocks of data. + Vectorscan to track matches that span multiple blocks of data. * The sizes of the scratch space and stream state (in streaming mode) required for a given database are fixed and determined at database compile time. This @@ -64,7 +64,7 @@ For a given database, Hyperscan provides several guarantees: time, and these structures can be pre-allocated if required for performance reasons. -* Any pattern that has successfully been compiled by the Hyperscan compiler can +* Any pattern that has successfully been compiled by the Vectorscan compiler can be scanned against any input. There are no internal resource limits or other limitations at runtime that could cause a scan call to return an error. @@ -74,12 +74,12 @@ See :ref:`runtime` for more detail. Tools ***** -Some utilities for testing and benchmarking Hyperscan are included with the +Some utilities for testing and benchmarking Vectorscan are included with the library. See :ref:`tools` for more information. ************ Example Code ************ -Some simple example code demonstrating the use of the Hyperscan API is -available in the ``examples/`` subdirectory of the Hyperscan distribution. +Some simple example code demonstrating the use of the Vectorscan API is +available in the ``examples/`` subdirectory of the Vectorscan distribution. diff --git a/doc/dev-reference/performance.rst b/doc/dev-reference/performance.rst index 23781bd..12074ea 100644 --- a/doc/dev-reference/performance.rst +++ b/doc/dev-reference/performance.rst @@ -4,7 +4,7 @@ Performance Considerations ########################## -Hyperscan supports a wide range of patterns in all three scanning modes. It is +Vectorscan supports a wide range of patterns in all three scanning modes. It is capable of extremely high levels of performance, but certain patterns can reduce performance markedly. @@ -25,7 +25,7 @@ For example, caseless matching of :regexp:`/abc/` can be written as: * :regexp:`/(?i)abc(?-i)/` * :regexp:`/abc/i` -Hyperscan is capable of handling all these constructs. Unless there is a +Vectorscan is capable of handling all these constructs. Unless there is a specific reason otherwise, do not rewrite patterns from one form to another. As another example, matching of :regexp:`/foo(bar|baz)(frotz)?/` can be @@ -41,24 +41,24 @@ Library usage .. tip:: Do not hand-optimize library usage. -The Hyperscan library is capable of dealing with small writes, unusually large +The Vectorscan library is capable of dealing with small writes, unusually large and small pattern sets, etc. Unless there is a specific performance problem -with some usage of the library, it is best to use Hyperscan in a simple and +with some usage of the library, it is best to use Vectorscan in a simple and direct fashion. For example, it is unlikely for there to be much benefit in buffering input to the library into larger blocks unless streaming writes are tiny (say, 1-2 bytes at a time). -Unlike many other pattern matching products, Hyperscan will run faster with +Unlike many other pattern matching products, Vectorscan will run faster with small numbers of patterns and slower with large numbers of patterns in a smooth fashion (as opposed to, typically, running at a moderate speed up to some fixed limit then either breaking or running half as fast). -Hyperscan also provides high-throughput matching with a single thread of -control per core; if a database runs at 3.0 Gbps in Hyperscan it means that a +Vectorscan also provides high-throughput matching with a single thread of +control per core; if a database runs at 3.0 Gbps in Vectorscan it means that a 3000-bit block of data will be scanned in 1 microsecond in a single thread of control, not that it is required to scan 22 3000-bit blocks of data in 22 microseconds. Thus, it is not usually necessary to buffer data to supply -Hyperscan with available parallelism. +Vectorscan with available parallelism. ******************** Block-based matching @@ -72,7 +72,7 @@ accumulated before processing, it should be scanned in block rather than in streaming mode. Unnecessary use of streaming mode reduces the number of optimizations that can -be applied in Hyperscan and may make some patterns run slower. +be applied in Vectorscan and may make some patterns run slower. If there is a mixture of 'block' and 'streaming' mode patterns, these should be scanned in separate databases except in the case that the streaming patterns @@ -107,7 +107,7 @@ Allocate scratch ahead of time Scratch allocation is not necessarily a cheap operation. Since it is the first time (after compilation or deserialization) that a pattern database is used, -Hyperscan performs some validation checks inside :c:func:`hs_alloc_scratch` and +Vectorscan performs some validation checks inside :c:func:`hs_alloc_scratch` and must also allocate memory. Therefore, it is important to ensure that :c:func:`hs_alloc_scratch` is not @@ -329,7 +329,7 @@ Consequently, :regexp:`/foo.*bar/L` with a check on start of match values after the callback is considerably more expensive and general than :regexp:`/foo.{300}bar/`. -Similarly, the :c:member:`hs_expr_ext::min_length` extended parameter can be +Similarly, the :cpp:member:`hs_expr_ext::min_length` extended parameter can be used to specify a lower bound on the length of the matches for a pattern. Using this facility may be more lightweight in some circumstances than using the SOM flag and post-confirming match length in the calling application. diff --git a/doc/dev-reference/preface.rst b/doc/dev-reference/preface.rst index 68373b7..5739690 100644 --- a/doc/dev-reference/preface.rst +++ b/doc/dev-reference/preface.rst @@ -6,35 +6,35 @@ Preface Overview ******** -Hyperscan is a regular expression engine designed to offer high performance, the +Vectorscan is a regular expression engine designed to offer high performance, the ability to match multiple expressions simultaneously and flexibility in scanning operation. Patterns are provided to a compilation interface which generates an immutable pattern database. The scan interface then can be used to scan a target data buffer for the given patterns, returning any matching results from that data -buffer. Hyperscan also provides a streaming mode, in which matches that span +buffer. Vectorscan also provides a streaming mode, in which matches that span several blocks in a stream are detected. -This document is designed to facilitate code-level integration of the Hyperscan +This document is designed to facilitate code-level integration of the Vectorscan library with existing or new applications. -:ref:`intro` is a short overview of the Hyperscan library, with more detail on -the Hyperscan API provided in the subsequent sections: :ref:`compilation` and +:ref:`intro` is a short overview of the Vectorscan library, with more detail on +the Vectorscan API provided in the subsequent sections: :ref:`compilation` and :ref:`runtime`. :ref:`perf` provides details on various factors which may impact the -performance of a Hyperscan integration. +performance of a Vectorscan integration. :ref:`api_constants` and :ref:`api_files` provides a detailed summary of the -Hyperscan Application Programming Interface (API). +Vectorscan Application Programming Interface (API). ******** Audience ******** -This guide is aimed at developers interested in integrating Hyperscan into an -application. For information on building the Hyperscan library, see the Quick +This guide is aimed at developers interested in integrating Vectorscan into an +application. For information on building the Vectorscan library, see the Quick Start Guide. *********** diff --git a/doc/dev-reference/runtime.rst b/doc/dev-reference/runtime.rst index 396521c..249fd23 100644 --- a/doc/dev-reference/runtime.rst +++ b/doc/dev-reference/runtime.rst @@ -4,7 +4,7 @@ Scanning for Patterns ##################### -Hyperscan provides three different scanning modes, each with its own scan +Vectorscan provides three different scanning modes, each with its own scan function beginning with ``hs_scan``. In addition, streaming mode has a number of other API functions for managing stream state. @@ -33,8 +33,8 @@ See :c:type:`match_event_handler` for more information. Streaming Mode ************** -The core of the Hyperscan streaming runtime API consists of functions to open, -scan, and close Hyperscan data streams: +The core of the Vectorscan streaming runtime API consists of functions to open, +scan, and close Vectorscan data streams: * :c:func:`hs_open_stream`: allocates and initializes a new stream for scanning. @@ -57,14 +57,14 @@ will return immediately with :c:member:`HS_SCAN_TERMINATED`. The caller must still call :c:func:`hs_close_stream` to complete the clean-up process for that stream. -Streams exist in the Hyperscan library so that pattern matching state can be +Streams exist in the Vectorscan library so that pattern matching state can be maintained across multiple blocks of target data -- without maintaining this state, it would not be possible to detect patterns that span these blocks of data. This, however, does come at the cost of requiring an amount of storage per-stream (the size of this storage is fixed at compile time), and a slight performance penalty in some cases to manage the state. -While Hyperscan does always support a strict ordering of multiple matches, +While Vectorscan does always support a strict ordering of multiple matches, streaming matches will not be delivered at offsets before the current stream write, with the exception of zero-width asserts, where constructs such as :regexp:`\\b` and :regexp:`$` can cause a match on the final character of a @@ -76,7 +76,7 @@ Stream Management ================= In addition to :c:func:`hs_open_stream`, :c:func:`hs_scan_stream`, and -:c:func:`hs_close_stream`, the Hyperscan API provides a number of other +:c:func:`hs_close_stream`, the Vectorscan API provides a number of other functions for the management of streams: * :c:func:`hs_reset_stream`: resets a stream to its initial state; this is @@ -98,10 +98,10 @@ A stream object is allocated as a fixed size region of memory which has been sized to ensure that no memory allocations are required during scan operations. When the system is under memory pressure, it may be useful to reduce the memory consumed by streams that are not expected to be used soon. The -Hyperscan API provides calls for translating a stream to and from a compressed +Vectorscan API provides calls for translating a stream to and from a compressed representation for this purpose. The compressed representation differs from the full stream object as it does not reserve space for components which are not -required given the current stream state. The Hyperscan API functions for this +required given the current stream state. The Vectorscan API functions for this functionality are: * :c:func:`hs_compress_stream`: fills the provided buffer with a compressed @@ -157,7 +157,7 @@ scanned in block mode. Scratch Space ************* -While scanning data, Hyperscan needs a small amount of temporary memory to store +While scanning data, Vectorscan needs a small amount of temporary memory to store on-the-fly internal data. This amount is unfortunately too large to fit on the stack, particularly for embedded applications, and allocating memory dynamically is too expensive, so a pre-allocated "scratch" space must be provided to the @@ -170,7 +170,7 @@ databases, only a single scratch region is necessary: in this case, calling will ensure that the scratch space is large enough to support scanning against any of the given databases. -While the Hyperscan library is re-entrant, the use of scratch spaces is not. +While the Vectorscan library is re-entrant, the use of scratch spaces is not. For example, if by design it is deemed necessary to run recursive or nested scanning (say, from the match callback function), then an additional scratch space is required for that context. @@ -219,11 +219,11 @@ For example: Custom Allocators ***************** -By default, structures used by Hyperscan at runtime (scratch space, stream +By default, structures used by Vectorscan at runtime (scratch space, stream state, etc) are allocated with the default system allocators, usually ``malloc()`` and ``free()``. -The Hyperscan API provides a facility for changing this behaviour to support +The Vectorscan API provides a facility for changing this behaviour to support applications that use custom memory allocators. These functions are: diff --git a/doc/dev-reference/serialization.rst b/doc/dev-reference/serialization.rst index 4f884c7..5950e60 100644 --- a/doc/dev-reference/serialization.rst +++ b/doc/dev-reference/serialization.rst @@ -4,7 +4,7 @@ Serialization ############# -For some applications, compiling Hyperscan pattern databases immediately prior +For some applications, compiling Vectorscan pattern databases immediately prior to use is not an appropriate design. Some users may wish to: * Compile pattern databases on a different host; @@ -14,9 +14,9 @@ to use is not an appropriate design. Some users may wish to: * Control the region of memory in which the compiled database is located. -Hyperscan pattern databases are not completely flat in memory: they contain +Vectorscan pattern databases are not completely flat in memory: they contain pointers and have specific alignment requirements. Therefore, they cannot be -copied (or otherwise relocated) directly. To enable these use cases, Hyperscan +copied (or otherwise relocated) directly. To enable these use cases, Vectorscan provides functionality for serializing and deserializing compiled pattern databases. @@ -40,10 +40,10 @@ The API provides the following functions: returns a string containing information about the database. This call is analogous to :c:func:`hs_database_info`. -.. note:: Hyperscan performs both version and platform compatibility checks +.. note:: Vectorscan performs both version and platform compatibility checks upon deserialization. The :c:func:`hs_deserialize_database` and :c:func:`hs_deserialize_database_at` functions will only permit the - deserialization of databases compiled with (a) the same version of Hyperscan + deserialization of databases compiled with (a) the same version of Vectorscan and (b) platform features supported by the current host platform. See :ref:`instr_specialization` for more information on platform specialization. @@ -51,17 +51,17 @@ The API provides the following functions: The Runtime Library =================== -The main Hyperscan library (``libhs``) contains both the compiler and runtime -portions of the library. This means that in order to support the Hyperscan +The main Vectorscan library (``libhs``) contains both the compiler and runtime +portions of the library. This means that in order to support the Vectorscan compiler, which is written in C++, it requires C++ linkage and has a dependency on the C++ standard library. Many embedded applications require only the scanning ("runtime") portion of the -Hyperscan library. In these cases, pattern compilation generally takes place on +Vectorscan library. In these cases, pattern compilation generally takes place on another host, and serialized pattern databases are delivered to the application for use. To support these applications without requiring the C++ dependency, a -runtime-only version of the Hyperscan library, called ``libhs_runtime``, is also +runtime-only version of the Vectorscan library, called ``libhs_runtime``, is also distributed. This library does not depend on the C++ standard library and -provides all Hyperscan functions other that those used to compile databases. +provides all Vectorscan functions other that those used to compile databases. diff --git a/doc/dev-reference/tools.rst b/doc/dev-reference/tools.rst index e0465fc..f6d5151 100644 --- a/doc/dev-reference/tools.rst +++ b/doc/dev-reference/tools.rst @@ -4,14 +4,14 @@ Tools ##### -This section describes the set of utilities included with the Hyperscan library. +This section describes the set of utilities included with the Vectorscan library. ******************** Quick Check: hscheck ******************** -The ``hscheck`` tool allows the user to quickly check whether Hyperscan supports -a group of patterns. If a pattern is rejected by Hyperscan's compiler, the +The ``hscheck`` tool allows the user to quickly check whether Vectorscan supports +a group of patterns. If a pattern is rejected by Vectorscan's compiler, the compile error is provided on standard output. For example, given the following three patterns (the last of which contains a @@ -34,7 +34,7 @@ syntax error) in a file called ``/tmp/test``:: Benchmarker: hsbench ******************** -The ``hsbench`` tool provides an easy way to measure Hyperscan's performance +The ``hsbench`` tool provides an easy way to measure Vectorscan's performance for a particular set of patterns and corpus of data to be scanned. Patterns are supplied in the format described below in @@ -44,7 +44,7 @@ easy control of how a corpus is broken into blocks and streams. .. note:: A group of Python scripts for constructing corpora databases from various input types, such as PCAP network traffic captures or text files, can - be found in the Hyperscan source tree in ``tools/hsbench/scripts``. + be found in the Vectorscan source tree in ``tools/hsbench/scripts``. Running hsbench =============== @@ -56,7 +56,7 @@ produce output like this:: $ hsbench -e /tmp/patterns -c /tmp/corpus.db Signatures: /tmp/patterns - Hyperscan info: Version: 4.3.1 Features: AVX2 Mode: STREAM + Vectorscan info: Version: 5.4.11 Features: AVX2 Mode: STREAM Expression count: 200 Bytecode size: 342,540 bytes Database CRC: 0x6cd6b67c @@ -77,7 +77,7 @@ takes to perform all twenty scans. The number of repeats can be changed with the ``-n`` argument, and the results of each scan will be displayed if the ``--per-scan`` argument is specified. -To benchmark Hyperscan on more than one core, you can supply a list of cores +To benchmark Vectorscan on more than one core, you can supply a list of cores with the ``-T`` argument, which will instruct ``hsbench`` to start one benchmark thread per core given and compute the throughput from the time taken to complete all of them. @@ -91,17 +91,17 @@ Correctness Testing: hscollider ******************************* The ``hscollider`` tool, or Pattern Collider, provides a way to verify -Hyperscan's matching behaviour. It does this by compiling and scanning patterns +Vectorscan's matching behaviour. It does this by compiling and scanning patterns (either singly or in groups) against known corpora and comparing the results against another engine (the "ground truth"). Two sources of ground truth for comparison are available: * The PCRE library (http://pcre.org/). - * An NFA simulation run on Hyperscan's compile-time graph representation. This + * An NFA simulation run on Vectorscan's compile-time graph representation. This is used if PCRE cannot support the pattern or if PCRE execution fails due to a resource limit. -Much of Hyperscan's testing infrastructure is built on ``hscollider``, and the +Much of Vectorscan's testing infrastructure is built on ``hscollider``, and the tool is designed to take advantage of multiple cores and provide considerable flexibility in controlling the test. These options are described in the help (``hscollider -h``) and include: @@ -116,11 +116,11 @@ flexibility in controlling the test. These options are described in the help Using hscollider to debug a pattern =================================== -One common use-case for ``hscollider`` is to determine whether Hyperscan will +One common use-case for ``hscollider`` is to determine whether Vectorscan will match a pattern in the expected location, and whether this accords with PCRE's behaviour for the same case. -Here is an example. We put our pattern in a file in Hyperscan's pattern +Here is an example. We put our pattern in a file in Vectorscan's pattern format:: $ cat /tmp/pat @@ -172,7 +172,7 @@ individual matches are displayed in the output:: Total elapsed time: 0.00522815 secs. -We can see from this output that both PCRE and Hyperscan find matches ending at +We can see from this output that both PCRE and Vectorscan find matches ending at offset 33 and 45, and so ``hscollider`` considers this test case to have passed. @@ -180,13 +180,13 @@ passed. corpus alignment 0, and ``-T 1`` instructs us to only use one thread.) .. note:: In default operation, PCRE produces only one match for a scan, unlike - Hyperscan's automata semantics. The ``hscollider`` tool uses libpcre's - "callout" functionality to match Hyperscan's semantics. + Vectorscan's automata semantics. The ``hscollider`` tool uses libpcre's + "callout" functionality to match Vectorscan's semantics. Running a larger scan test ========================== -A set of patterns for testing purposes are distributed with Hyperscan, and these +A set of patterns for testing purposes are distributed with Vectorscan, and these can be tested via ``hscollider`` on an in-tree build. Two CMake targets are provided to do this easily: @@ -202,10 +202,10 @@ Debugging: hsdump ***************** When built in debug mode (using the CMake directive ``CMAKE_BUILD_TYPE`` set to -``Debug``), Hyperscan includes support for dumping information about its +``Debug``), Vectorscan includes support for dumping information about its internals during pattern compilation with the ``hsdump`` tool. -This information is mostly of use to Hyperscan developers familiar with the +This information is mostly of use to Vectorscan developers familiar with the library's internal structure, but can be used to diagnose issues with patterns and provide more information in bug reports. @@ -215,7 +215,7 @@ and provide more information in bug reports. Pattern Format ************** -All of the Hyperscan tools accept patterns in the same format, read from plain +All of the Vectorscan tools accept patterns in the same format, read from plain text files with one pattern per line. Each line looks like this: * ``<integer id>:/<regex>/<flags>`` @@ -227,12 +227,12 @@ For example:: 3:/^.{10,20}hatstand/m The integer ID is the value that will be reported when a match is found by -Hyperscan and must be unique. +Vectorscan and must be unique. The pattern itself is a regular expression in PCRE syntax; see :ref:`compilation` for more information on supported features. -The flags are single characters that map to Hyperscan flags as follows: +The flags are single characters that map to Vectorscan flags as follows: ========= ================================= =========== Character API Flag Description @@ -256,7 +256,7 @@ between braces, separated by commas. For example:: 1:/hatstand.*teakettle/s{min_offset=50,max_offset=100} -All Hyperscan tools will accept a pattern file (or a directory containing +All Vectorscan tools will accept a pattern file (or a directory containing pattern files) with the ``-e`` argument. If no further arguments constraining the pattern set are given, all patterns in those files are used. diff --git a/libhs.pc.in b/libhs.pc.in index 3ad2b90..d1e3ffb 100644 --- a/libhs.pc.in +++ b/libhs.pc.in @@ -4,7 +4,7 @@ libdir=@CMAKE_INSTALL_PREFIX@/@CMAKE_INSTALL_LIBDIR@ includedir=@CMAKE_INSTALL_PREFIX@/@CMAKE_INSTALL_INCLUDEDIR@ Name: libhs -Description: Intel(R) Hyperscan Library +Description: A portable fork of the high-performance regular expression matching library Version: @HS_VERSION@ Libs: -L${libdir} -lhs Cflags: -I${includedir}/hs diff --git a/tools/hsbench/engine_hyperscan.cpp b/tools/hsbench/engine_hyperscan.cpp index 95461de..f3de35e 100644 --- a/tools/hsbench/engine_hyperscan.cpp +++ b/tools/hsbench/engine_hyperscan.cpp @@ -248,7 +248,7 @@ void EngineHyperscan::printStats() const { printf("Signature set: %s\n", compile_stats.sigs_name.c_str()); } printf("Signatures: %s\n", compile_stats.signatures.c_str()); - printf("Hyperscan info: %s\n", compile_stats.db_info.c_str()); + printf("Vectorscan info: %s\n", compile_stats.db_info.c_str()); printf("Expression count: %'zu\n", compile_stats.expressionCount); printf("Bytecode size: %'zu bytes\n", compile_stats.compiledSize); printf("Database CRC: 0x%x\n", compile_stats.crc32); -- 2.43.2
Locations
Projects
Search
Status Monitor
Help
OpenBuildService.org
Documentation
API Documentation
Code of Conduct
Contact
Support
@OBShq
Terms
openSUSE Build Service is sponsored by
The Open Build Service is an
openSUSE project
.
Sign Up
Log In
Places
Places
All Projects
Status Monitor