Request 1205027 (revoked)

Overview

Request 1205027 revoked

- Update to 2024.4.0
- Summary of major features and improvements  
* More Gen AI coverage and framework integrations to minimize
code changes
+ Support for GLM-4-9B Chat, MiniCPM-1B, Llama 3 and 3.1,
Phi-3-Mini, Phi-3-Medium and YOLOX-s models.
+ Noteworthy notebooks added: Florence-2, NuExtract-tiny
Structure Extraction, Flux.1 Image Generation, PixArt-α:
Photorealistic Text-to-Image Synthesis, and Phi-3-Vision
Visual Language Assistant.
* Broader Large Language Model (LLM) support and more model
compression techniques.
+ OpenVINO™ runtime optimized for Intel® Xe Matrix Extensions
(Intel® XMX) systolic arrays on built-in GPUs for efficient
matrix multiplication resulting in significant LLM
performance boost with improved 1st and 2nd token
latency, as well as a smaller memory footprint on
Intel® Core™ Ultra Processors (Series 2).
+ Memory sharing enabled for NPUs on Intel® Core™ Ultra
Processors (Series 2) for efficient pipeline integration
without memory copy overhead.
+ Addition of the PagedAttention feature for discrete GPUs*
enables a significant boost in throughput for parallel
inferencing when serving LLMs on Intel® Arc™ Graphics
or Intel® Data Center GPU Flex Series.
* More portability and performance to run AI at the edge,
in the cloud, or locally.
+ OpenVINO™ Model Server now comes with production-quality
support for OpenAI-compatible API which enables i
significantly higher throughput for parallel inferencing
on Intel® Xeon® processors when serving LLMs to many
concurrent users.
+ Improved performance and memory consumption with prefix
caching, KV cache compression, and other optimizations
for serving LLMs using OpenVINO™ Model Server.
+ Support for Python 3.12.
- Support Change and Deprecation Notices
* Using deprecated features and components is not advised.
They are available to enable a smooth transition to new
solutions and will be discontinued in the future.
To keep using discontinued features, you will have to
revert to the last LTS OpenVINO version supporting them.
For more details, refer to the OpenVINO Legacy Features
and Components page.
* Discontinued in 2024.0:
+ Runtime components:
- Intel® Gaussian & Neural Accelerator (Intel® GNA).
Consider using the Neural Processing Unit (NPU) for
low-powered systems like Intel® Core™ Ultra or
14th generation and beyond.
- OpenVINO C++/C/Python 1.0 APIs (see 2023.3 API
transition guide for reference).
- All ONNX Frontend legacy API (known as
ONNX_IMPORTER_API)
-'PerfomanceMode.UNDEFINED' property as part of the
OpenVINO Python API
+ Tools:
- Deployment Manager. See installation and deployment
guides for current distribution options.
- Accuracy Checker.
- Post-Training Optimization Tool (POT). Neural Network
Compression Framework (NNCF) should be used instead.
- A Git patch for NNCF integration with huggingface/
transformers. The recommended approach is to use
huggingface/optimum-intel for applying NNCF
optimization on top of models from Hugging Face.
- Support for Apache MXNet, Caffe, and Kaldi model
formats. Conversion to ONNX may be used as a
solution.
* Deprecated and to be removed in the future:
+ The macOS x86_64 debug bins will no longer be
provided with the OpenVINO toolkit, starting with
OpenVINO 2024.5.
+ Python 3.8 is now considered deprecated, and it will not
be available beyond the 2024.4 OpenVINO version.
+ dKMB support is now considered deprecated and will be
fully removed with OpenVINO 2024.5
+ Intel® Streaming SIMD Extensions (Intel® SSE) will be
supported in source code form, but not enabled in the
binary package by default, starting with OpenVINO 2025.0
+ The openvino-nightly PyPI module will soon be discontinued.
End-users should proceed with the Simple PyPI nightly repo
instead. More information in Release Policy.
+ The OpenVINO™ Development Tools package (pip install
openvino-dev) will be removed from installation options and
distribution channels beginning with OpenVINO 2025.0.
+ Model Optimizer will be discontinued with OpenVINO 2025.0.
Consider using the new conversion methods instead. For more
details, see the model conversion transition guide.
+ OpenVINO property Affinity API will be discontinued with
OpenVINO 2025.0. It will be replaced with CPU binding
configurations (ov::hint::enable_cpu_pinning).
+ OpenVINO Model Server components:
- “auto shape” and “auto batch size” (reshaping a model in
runtime) will be removed in the future. OpenVINO’s dynamic
shape models are recommended instead.
+ A number of notebooks have been deprecated. For an
up-to-date listing of available notebooks, refer to the
OpenVINO™ Notebook index (openvinotoolkit.github.io).