Revisions of apache-arrow
Ana Guerrero (anag+factory)
accepted
request 1218457
from
Benjamin Greiner (bnavigator)
(revision 15)
Ana Guerrero (anag+factory)
accepted
request 1201792
from
Benjamin Greiner (bnavigator)
(revision 14)
Dominique Leuenberger (dimstar_suse)
accepted
request 1194086
from
Benjamin Greiner (bnavigator)
(revision 13)
Ana Guerrero (anag+factory)
accepted
request 1170145
from
Benjamin Greiner (bnavigator)
(revision 12)
- Update to 16.0.0 ## Bug Fixes * [C++][ORC] Catch all ORC exceptions to avoid crash (#40697) * [C++][S3] Handle conventional content-type for directories (#40147) * [C++] Strengthen handling of duplicate slashes in S3, GCS (#40371) * [C++] Avoid hash_mean overflow (#39349) * [C++] Fix spelling (array) (#38963) * [C++][Parquet] Fix crash in Modular Encryption (#39623) * [C++][Dataset] Fix failures in dataset-scanner-benchmark (#39794) * [C++][Device] Fix Importing nested and string types for DeviceArray (#39770) * [C++] Use correct (non-CPU) address of buffer in ExportDeviceArray (#39783) * [C++] Improve error message for "chunker out of sync" condition (#39892) * [C++] Use make -j1 to install bundled bzip2 (#39956) * [C++] DatasetWriter avoid creating zero-sized batch when max_rows_per_file enabled (#39995) * [C++][CI] Disable debug memory pool for ASAN and Valgrind (#39975) * [C++][Gandiva] Make Gandiva's default cache size to be 5000 for object code cache (#40041) * [C++][FS][Azure] Fix CreateDir and DeleteDir trailing slash issues on hierarchical namespace accounts (#40054) * [C++][FS][Azure] Validate containers in AzureFileSystem::Impl::MovePaths() (#40086) * [C++] Decimal types with different precisions and scales bind failed in resolve type when call arithmetic function (#40223) * [C++][Docs] Correct the console emitter link (#40146) * [C++][Python] Fix test_gdb failures on 32-bit (#40293) * [Python][C++] Fix large file handling on 32-bit Python build (#40176) * [C++] Support glog 0.7 build (#40230) * [C++] Fix cast function bind failed after add an alias name through AddAlias (#40200) * [C++] TakeCC: Concatenate only once and delegate to TakeAA instead of TakeCA (#40206) * [C++] Fix an abort on asof_join_benchmark run for lost an arg (#40234) * [C++] Fix an simple buffer-overflow case in decimal_benchmark (#40277) * [C++] Reduce S3Client initialization time (#40299) * [C++] Fix a wrong total_bytes to generate StringType's test data in vector_hash_benchmark (#40307) * [C++][Gandiva] Add support for compute module's decimal promotion rules (#40434) * [C++][Parquet] Add missing config.h include in key_management_test.cc (#40330) * [C++][CMake] Add missing glog::glog dependency to arrow_util (#40332) * [C++][Gandiva] Add missing OpenSSL dependency to encrypt_utils_test.cc (#40338) * [C++] Remove const qualifier from Buffer::mutable_span_as (#40367) * [C++] Avoid simplifying expressions which call impure functions (#40396) * [C++] Expose protobuf dependency if opentelemetry or ORC are enabled (#40399) * [C++][FlightRPC] Add missing expiration_time arguments (#40425) * [C++] Move key_hash/key_map/light_array related files to internal for prevent using by users (#40484) * [C++] Add missing Threads::Threads dependency to arrow_static (#40433) * [C++] Fix static build on Windows (#40446) * [C++] Ensure using bundled FlatBuffers (#40519) * [C++][CI] Fix TSAN and ASAN/UBSAN crashes (#40559) * [C++] Repair FileSystem merge error (#40564) * [C++] Fix 3.12 Python support (#40322) * [C++] Move mold linker flags to variables (#40603) * [C++] Enlarge dest buffer according to dest offset for CopyBitmap benchmark (#40769) * [C++][Gandiva] 'ilike' function does not work (#40728) * [C++] Fix protobuf package name setting for builds with substrait (#40753) * [C++][ORC] Fix std::filesystem related link error with ORC 2.0.0 or later (#41023) * [C++] Fix TSAN link error for module library (#40864) * [C++][FS][Azure] Don't run TestGetFileInfoGenerator() with Valgrind (#41163) * [C++] Fix null count check in BooleanArray.true_count() (#41070) * [C++] IO: fixing compiling in gcc 7.5.0 (#41025) * [C++][Parquet] Bugfixes and more tests in boolean arrow decoding (#41037) * [C++] formatting.h: Make sure space is allocated for the 'Z' when formatting timestamps (#41045) * [C++] Ignore ARROW_USE_MOLD/ARROW_USE_LLD with clang < 12 (#41062) * [C++] Fix: left anti join filter empty rows. (#41122) * [CI][C++] Don't use CMake 3.29.1 with vcpkg (#41151) * [CI][C++] Use newer LLVM on Ubuntu 24.04 (#41150) * [CI][R][C++] test-r-linux-valgrind has started failing * [C++][Python] Sporadic asof_join failures in PyArrow * [C++] Fix Valgrind error in string-to-float16 conversion (#41155) * [C++] Stop defining ARROW_TEST_MEMCHECK in config.h.cmake (#41177) * [C++] Fix mistake in integration test. Explicitly cast std::string to avoid compiler interpreting char* -> bool (#41202) ## New Features and Improvements * [C++] Filesystem implementation for Azure Blob Storage * [C++] Implement cast to/from halffloat (#40067) * [C++] Add residual filter support to swiss join (#39487) * [C++] Add support for building with Emscripten (#37821) * [C++][Python] Add missing methods to RecordBatch (#39506) * [C++][Java][Flight RPC] Add Session management messages (#34817) * [C++] build filesystems as separate modules (#39067) * [C++][Parquet] Rewrite BYTE_STREAM_SPLIT SSE optimizations using xsimd (#40335) * [C++] Add support for service-specific endpoint for S3 using AWS_ENDPOINT_URL_S3 (#39160) * [C++][FS][Azure] Implement DeleteFile() (#39840) * [C++] Implement Azure FileSystem Move() via Azure DataLake Storage Gen 2 API (#39904) * [C++] Add ImportChunkedArray and ExportChunkedArray to/from ArrowArrayStream (#39455) * [CI][C++][Go] Don't run jobs that use a self-hosted GitHub Actions Runner on fork (#39903) * [C++][FS][Azure] Use the generic filesystem tests (#40567) * [C++][Compute] Add binary_slice kernel for fixed size binary (#39245) * [C++] Avoid creating memory manager instance for every buffer view/copy (#39271) * [C++][Parquet] Minor: Style enhancement for parquet::FileMetaData (#39337) * [C++] IO: Reuse same buffer in CompressedInputStream (#39807) * [C++] Use more permissable return code for rename (#39481) * [C++][Parquet] Use std::count in ColumnReader ReadLevels (#39397) * [C++] Support cast kernel from large string, (large) binary to dictionary (#40017) * [C++] Pass -jN to make in external projects (#39550) * [C++][Parquet] Add integration test for BYTE_STREAM_SPLIT (#39570) * [C++] Ensure top-level benchmarks present informative metrics (#40091) * [C++] Ensure CSV and JSON benchmarks present a bytes/s or items/s metric (#39764) * [C++] Ensure dataset benchmarks present a bytes/s or items/s metric (#39766) * [C++][Gandiva] Ensure Gandiva benchmarks present a bytes/s or items/s metric (#40435) * [C++][Parquet] Benchmark levels decoding (#39705) * [C++][FS][Azure] Remove StatusFromErrorResponse as it's not necessary (#39719) * [C++][Parquet] Make BYTE_STREAM_SPLIT routines type-agnostic (#39748) * [C++][Device] Generic CopyBatchTo/CopyArrayTo memory types (#39772) * [C++] Document and micro-optimize ChunkResolver::Resolve() (#39817) * [C++] Allow building cpp/src/arrow/**/*.cc without waiting bundled libraries (#39824) * [C++][Parquet] Parquet binary length overflow exception should contain the length of binary (#39844) * [C++][Parquet] Minor: avoid creating a new Reader object in Decoder::SetData (#39847) * [C++] Thirdparty: Bump google benchmark to 1.8.3 (#39878) * [C++] DataType::ToString support optionally show metadata (#39888) * [C++][Gandiva] Accept LLVM 18 (#39934) * [C++] Use Requires instead of Libs for system RE2 in arrow.pc (#39932) * [C++] Small CSV reader refactoring (#39963) * [C++][Parquet] Expand BYTE_STREAM_SPLIT to support FIXED_LEN_BYTE_ARRAY, INT32 and INT64 (#40094) * [C++][FS][Azure] Add support for reading user defined metadata (#40671) * [C++][FS][Azure] Add AzureFileSystem support to FileSystemFromUri() (#40325) * [C++][FS][Azure] Make attempted reads and writes against directories fail fast (#40119) * [C++][Python] Basic conversion of RecordBatch to Arrow Tensor (#40064) * [C++][Python] Basic conversion of RecordBatch to Arrow Tensor - add support for different data types (#40359) * [C++][Python] Basic conversion of RecordBatch to Arrow Tensor - add option to cast NULL to NaN (#40803) * [C++][FS][Azure] Implement DeleteFile() for flat-namespace storage accounts (#40075) * [CI][C++] Add a job on ARM64 macOS (#40456) * [C++][Parquet] Remove AVX512 variants of BYTE_STREAM_SPLIT encoding (#40127) * [C++][Parquet][Tools] Print FIXED_LEN_BYTE_ARRAY length (#40132) * [C++] Make S3 narrative test more flexible (#40144) * [C++] Remove redundant invocation of BatchesFromTable (#40173) * [C++][CMake] Use "RapidJSON" CMake target for RapidJSON (#40210) * [C++][CMake] Use arrow/util/config.h.cmake instead of add_definitions() (#40222) * [C++] Fix: improve the backpressure handling in the dataset writer (#40722) * [C++][CMake] Improve description why we need to initialize AWS C++ SDK in arrow-s3fs-test (#40229) * [C++] Add support for system glog 0.7 (#40275) * [C++] Specialize ResolvedChunk::Value on value-specific types instead of entire class (#40281) * [C++][Docs] Add documentation of array factories (#40373) * [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed (#40329) * [FlightRPC][C++][Java][Go] Add URI scheme to reuse connection (#40084) * [C++] Add benchmark for ToTensor conversions (#40358) * [C++] Define ARROW_FORCE_INLINE for non-MSVC builds (#40372) * [C++] Add support for mold (#40397) * [C++] Add support for LLD (#40927) * [C++] Produce better error message when Move is attempted on flat-namespace accounts (#40406) * [C++][ORC] Upgrade ORC to 2.0.0 (#40508) * [CI][C++] Don't install FlatBuffers (#40541) * [C++] Ensure pkg-config flags include -ldl for static builds (#40578) * [Dev][C++][Python][R] Use pre-commit for clang-format (#40587) * [C++] Rename Function::is_impure() to is_pure() (#40608) * [C++] Add missing util/config.h in arrow/io/compressed_test.cc (#40625) * [Python][C++] Support conversion of pyarrow.RunEndEncodedArray to numpy/pandas (#40661) * [C++] Expand Substrait type support (#40696) * [C++] Create registry for Devices to map DeviceType to MemoryManager in C Device Data import (#40699) * [C++][Parquet] Minor enhancement code of encryption (#40732) * [C++][Parquet] Simplify PageWriter and ColumnWriter creation (#40768) * [C++] Re-order loads and stores in MemoryPoolStats update (#40647) * [C++] Revert changes from PR #40857 (#40980) * [C++] Correctly report asimd/neon in GetRuntimeInfo (#40857) * [C++] Thirdparty: bump zstd to 1.5.6 (#40837) * [Docs][C++][Python] Add initial documentation for RecordBatch::Tensor conversion (#40842) * [C++][Python] Basic conversion of RecordBatch to Arrow Tensor - add support for row-major (#40867) * [C++][Parquet] Encoding: Optimize DecodeArrow/Decode(bitmap) for PlainBooleanDecoder (#40876) * [C++] Suppress shorten-64-to-32 warnings in CUDA/Skyhook codes (#40883) * [C++] Fix unused function build error (#40984) * [C++][Parquet] RleBooleanDecoder supports DecodeArrow with nulls (#40995) * [C++][FS][Azure] Adjust DeleteDir/DeleteDirContents/GetFileInfoSelector behaviors against Azure for generic filesystem tests (#41068) * [C++][Parquet] Avoid allocating buffer object in RecordReader's SkipRecords (#39818) - Drop apache-arrow-pr40230-glog-0.7.patch - Drop apache-arrow-pr40275-glog-0.7-2.patch - Belated inclusion of submission without changelog by Shani Hadiyanto <shanipribadi@gmail.com>) * disable static devel packages by default: The CMake targets require them for all builds, if not disabled * Add subpackages for Apache Arrow Flight and Flight SQL - Update to 16.0.0 * [Python] construct pandas.DataFrame with public API in to_pandas (#40897) * [Python] Fix ORC test segfault in the python wheel windows test (#40609) * [Python] Attach Python stacktrace to errors in ConvertPyError (#39380) * [Python] Plug reference leaks when creating Arrow array from Python list of dicts (#40412) * [Python] Empty slicing an array backwards beyond the start is now empty (#40682) * [Python] Slicing an array backwards beyond the start now includes first item. (#39240) * [Python] Calling pyarrow.dataset.ParquetFileFormat.make_write_options as a class method results in a segfault (#40976) * [Python] Fix parquet import in encryption test (#40505) * [Python] fix raising ValueError on _ensure_partitioning (#39593) * [Python] Validate max_chunksize in Table.to_batches (#39796) * [C++][Python] Fix test_gdb failures on 32-bit (#40293) * [Python] Make Tensor.__getbuffer__ work on 32-bit platforms (#40294) * [Python] Avoid using np.take in Array.to_numpy() (#40295) * [Python][C++] Fix large file handling on 32-bit Python build (#40176) * [Python] Update size assumptions for 32-bit platforms (#40165) * [Python] Fix OverflowError in foreign_buffer on 32-bit platforms (#40158) * [Python] Add Type_FIXED_SIZE_LIST to _NESTED_TYPES set (#40172) * [Python] Mark ListView as a nested type (#40265) * [Python] only allocate the ScalarMemoTable when used (#40565) * [Python] Error compiling Cython files on Windows during release verification * [Python] Fix flake8 failures in python/benchmarks/parquet.py (#40440) * [Python] Suppress python/examples/minimal_build/Dockerfile.* warnings (#40444) * [Python][Docs] Add workaround for autosummary (#40739) * [Python] BUG: Empty slicing an array backwards beyond the start should be empty * [CI][Python] Activate ARROW_PYTHON_VENV if defined in sdist-test job (#40707) * [CI][Python] CI failures on Python builds due to pytest_cython (#40975) * [Python] ListView pandas tests should use np.nan instead of None (#41040) * [C++][Python] Sporadic asof_join failures in PyArrow ## New Features and Improvements * [Python][CI] Remove legacy hdfs tests from hdfs and hypothesis setup (#40363) * [Python] Remove deprecated pyarrow.filesystem legacy implementations (#39825) * [C++][Python] Add missing methods to RecordBatch (#39506) * [Python][CI] Support ORC in Windows wheels * [Python] Correct test marker for join_asof tests (#40666) * [Python] Add join_asof binding (#34234) * [Python] Add a function to download and extract timezone database on Windows (#38179) * [Python][CI][Packaging] Enable ORC on Windows Appveyor CI and Windows wheels for pyarrow * [Python] Add a FixedSizeTensorScalar class (#37533) * [Python][CI][Dev][Python] Release and merge script errors (#37819)" (#40150) * [Python] Construct pyarrow.Field and ChunkedArray through Arrow PyCapsule Protocol (#40818) * [Python] Fix missing byte_width attribute on DataType class (#39592) * [Python] Compatibility with NumPy 2.0 * [Packaging][Python] Enable building pyarrow against numpy 2.0 (#39557) * [Python] Basic pyarrow bindings for Binary/StringView classes (#39652) * [Python] Expose force_virtual_addressing in PyArrow (#39819) * [Python][Parquet] Support hashing for FileMetaData and ParquetSchema (#39781) * [Python] Add bindings for ListView and LargeListView (#39813) * [Python][Packaging] Build pyarrow wheels with numpy RC instead of nightly (#41097) * [Python] Support creating Binary/StringView arrays from python objects (#39853) * [Python] ListView support for pa.array() (#40160) * [Python][CI] Remove upper pin on pytest (#40487) * [Python][FS][Azure] Minimal Python bindings for AzureFileSystem (#40021) * [Python] Low-level bindings for exporting/importing the C Device Interface (#39980) * [Python] Add ChunkedArray import/export to/from C (#39985) * [Python] Use Cast() instead of CastTo (#40116) * [C++][Python] Basic conversion of RecordBatch to Arrow Tensor (#40064) * [C++][Python] Basic conversion of RecordBatch to Arrow Tensor - add support for different data types (#40359) * [C++][Python] Basic conversion of RecordBatch to Arrow Tensor - add option to cast NULL to NaN (#40803) * [Python] Support requested_schema in __arrow_c_stream__() (#40070) * [Python] Support Binary/StringView conversion to numpy/pandas (#40093) * [Python] Allow FileInfo instances to be passed to dataset init (#40143) * [Python][CI] Add 32-bit Debian build on Crossbow (#40164) * [Python] ListView arrow-to-pandas conversion (#40482) * [Python][CI] Disable generating C lines in Cython tracebacks (#40225) * [Python] Support construction of Run-End Encoded arrays in pa.array(..) (#40341) * [Python] Accept dict in pyarrow.record_batch() function (#40292) * [Python] Update for NumPy 2.0 ABI change in PyArray_Descr->elsize (#40418) * [Python][CI] Fix install of nightly dask in integration tests (#40378) * [Python] Fix byte_width for binary(0) + fix hypothesis tests (#40381) * [Python][CI] Fix dataset partition filter tests with pandas nightly (#40429) * [Docs][Python] Added JsonFileFormat to docs (#40585) * [Dev][C++][Python][R] Use pre-commit for clang-format (#40587) * [Python][C++] Support conversion of pyarrow.RunEndEncodedArray to numpy/pandas (#40661) * [Python] Simplify and improve perf of creation of the column names in Table.to_pandas (#40721) * [Docs][C++][Python] Add initial documentation for RecordBatch::Tensor conversion (#40842) * [C++][Python] Basic conversion of RecordBatch to Arrow Tensor - add support for row-major (#40867) * [CI][Python] check message in test_make_write_options_error for Cython 2 (#41059) * [Python] Add copy keyword in Array.array for numpy 2.0+ compatibility (#41071) * [Python][Packaging] PyArrow wheel building is failing because of disabled vcpkg install of liblzma - Drop apache-arrow-pr40230-glog-0.7.patch - Drop apache-arrow-pr40275-glog-0.7-2.patch - Add pyarrow-pr41319-numpy2-tests.patch gh#apache/arrow#41319
Ana Guerrero (anag+factory)
accepted
request 1160967
from
Benjamin Greiner (bnavigator)
(revision 11)
- Update to 15.0.2 ## Bug Fixes * [C++][Acero] Increase size of Acero TempStack (#40007) * [C++][Dataset] Add missing Protobuf static link dependency (#40015) * [C++] Possible data race when reading metadata of a parquet file (#40111) * [C++] Make span SFINAE standards-conforming to enable compilation with nvcc (#40253) - Update to 15.0.2 ## Bug Fixes * [Python] Fix except clauses (#40387) * [Python][CI] Skip failing test_dateutil_tzinfo_to_string (#40486) (forwarded request 1160966 from bnavigator)
Dominique Leuenberger (dimstar_suse)
accepted
request 1152982
from
Benjamin Greiner (bnavigator)
(revision 10)
Ana Guerrero (anag+factory)
accepted
request 1150089
from
Benjamin Greiner (bnavigator)
(revision 9)
Ana Guerrero (anag+factory)
accepted
request 1139093
from
Benjamin Greiner (bnavigator)
(revision 8)
Ana Guerrero (anag+factory)
accepted
request 1138300
from
Benjamin Greiner (bnavigator)
(revision 7)
Ana Guerrero (anag+factory)
accepted
request 1109686
from
Benjamin Greiner (bnavigator)
(revision 5)
Dominique Leuenberger (dimstar_suse)
accepted
request 1092627
from
Benjamin Greiner (bnavigator)
(revision 4)
Dominique Leuenberger (dimstar_suse)
accepted
request 1087840
from
Benjamin Greiner (bnavigator)
(revision 3)
- Update to 12.0.0 * Run-End Encoded Arrays have been implemented and are accessible (GH-32104) * The FixedShapeTensor Logical value type has been implemented using ExtensionType (GH-15483, GH-34796) ## Compute * New kernel to convert timestamp with timezone to wall time (GH-33143) * Cast kernels are now built into libarrow by default (GH-34388) ## Acero * Acero has been moved out of libarrow into it’s own shared library, allowing for smaller builds of the core libarrow (GH-15280) * Exec nodes now can have a concept of “ordering” and will reject non-sensible plans (GH-34136) * New exec nodes: “pivot_longer” (GH-34266), “order_by” (GH-34248) and “fetch” (GH-34059) * Breaking Change: Reorder output fields of “group_by” node so that keys/segment keys come before aggregates (GH-33616) ## Substrait * Add support for the round function GH-33588 * Add support for the cast expression element GH-31910 * Added API reference documentation GH-34011 * Added an extension relation to support segmented aggregation GH-34626 * The output of the aggregate relation now conforms to the spec GH-34786 ## Parquet * Added support for DeltaLengthByteArray encoding to the Parquet writer (GH-33024) (forwarded request 1087839 from bnavigator)
Dominique Leuenberger (dimstar_suse)
accepted
request 1076956
from
Benjamin Greiner (bnavigator)
(revision 2)
Dominique Leuenberger (dimstar_suse)
accepted
request 1075538
from
Benjamin Greiner (bnavigator)
(revision 1)
second try: now without jemalloc and without gflags-static apache-arrow is being used more and more by python numeric packages like pandas 2.0 (through pyarrow)
Displaying all 15 revisions