Revisions of apache-arrow
buildservice-autocommit
accepted
request 1170145
from
Benjamin Greiner (bnavigator)
(revision 33)
baserev update by copy to link target
Benjamin Greiner (bnavigator)
committed
(revision 32)
Benjamin Greiner (bnavigator)
committed
(revision 31)
Benjamin Greiner (bnavigator)
accepted
request 1170120
from
Benjamin Greiner (bnavigator)
(revision 30)
- Update to 16.0.0 ## Bug Fixes * [C++][ORC] Catch all ORC exceptions to avoid crash (#40697) * [C++][S3] Handle conventional content-type for directories (#40147) * [C++] Strengthen handling of duplicate slashes in S3, GCS (#40371) * [C++] Avoid hash_mean overflow (#39349) * [C++] Fix spelling (array) (#38963) * [C++][Parquet] Fix crash in Modular Encryption (#39623) * [C++][Dataset] Fix failures in dataset-scanner-benchmark (#39794) * [C++][Device] Fix Importing nested and string types for DeviceArray (#39770) * [C++] Use correct (non-CPU) address of buffer in ExportDeviceArray (#39783) * [C++] Improve error message for "chunker out of sync" condition (#39892) * [C++] Use make -j1 to install bundled bzip2 (#39956) * [C++] DatasetWriter avoid creating zero-sized batch when max_rows_per_file enabled (#39995) * [C++][CI] Disable debug memory pool for ASAN and Valgrind (#39975) * [C++][Gandiva] Make Gandiva's default cache size to be 5000 for object code cache (#40041) * [C++][FS][Azure] Fix CreateDir and DeleteDir trailing slash issues on hierarchical namespace accounts (#40054) * [C++][FS][Azure] Validate containers in AzureFileSystem::Impl::MovePaths() (#40086) * [C++] Decimal types with different precisions and scales bind
Benjamin Greiner (bnavigator)
accepted
request 1163690
from
Shani Hadiyanto (shanipribadi)
(revision 29)
I would like to have apache flight and apache flight sql library built. also disabling the static build because the generated CMake Targets includes them, making builds against libarrow requiring not just apache-arrow-devel but also all of the devel-static packages. note: flight and flight-sql are packaged separately. in upstream rpm and fedora repo, flight-sql is included in libarrow-flight-libs.
buildservice-autocommit
accepted
request 1160967
from
Benjamin Greiner (bnavigator)
(revision 28)
baserev update by copy to link target
Benjamin Greiner (bnavigator)
accepted
request 1160966
from
Benjamin Greiner (bnavigator)
(revision 27)
- Update to 15.0.2 ## Bug Fixes * [C++][Acero] Increase size of Acero TempStack (#40007) * [C++][Dataset] Add missing Protobuf static link dependency (#40015) * [C++] Possible data race when reading metadata of a parquet file (#40111) * [C++] Make span SFINAE standards-conforming to enable compilation with nvcc (#40253) - Update to 15.0.2 ## Bug Fixes * [Python] Fix except clauses (#40387) * [Python][CI] Skip failing test_dateutil_tzinfo_to_string (#40486)
buildservice-autocommit
accepted
request 1152982
from
Benjamin Greiner (bnavigator)
(revision 26)
baserev update by copy to link target
Benjamin Greiner (bnavigator)
accepted
request 1152980
from
Benjamin Greiner (bnavigator)
(revision 25)
- Reenable logging * Add apache-arrow-pr40230-glog-0.7.patch * Add apache-arrow-pr40275-glog-0.7-2.patch * now requires glog devel files to be present for apache-arrow-devel; ArrowConfig.cmake fails otherwise * gh#apache/arrow#40181 * gh#apache/arrow#40230 * gh#apache/arrow#40275 - Move d:l:p:n/python-pyarrow to the science/apache-arrow as multibuild package: Uses the same source and is tightly connected.
buildservice-autocommit
accepted
request 1150089
from
Benjamin Greiner (bnavigator)
(revision 24)
baserev update by copy to link target
Benjamin Greiner (bnavigator)
accepted
request 1150081
from
Benjamin Greiner (bnavigator)
(revision 23)
- Update to 15.0.1 ## Bug Fixes * [C++] "iso_calendar" kernel returns incorrect results for array length > 32 (#39360) * [C++] Explicit error in ExecBatchBuilder when appending var length data exceeds offset limit (int32 max) (#39383) * [C++][Parquet] Pass memory pool to decoders (#39526) * [C++][Parquet] Validate page sizes before truncating to int32 (#39528) * [C++] Fix tail-word access cross buffer boundary in `CompareBinaryColumnToRow` (#39606) * [C++] Fix the issue of ExecBatchBuilder when appending consecutive tail rows with the same id may exceed buffer boundary (for fixed size types) (#39585) * [Release] Update platform tags for macOS wheels to macosx_10_15 (#39657) * [C++][FlightRPC] Fix nullptr dereference in PollInfo (#39711) * [C++] Fix tail-byte access cross buffer boundary in key hash avx2 (#39800) * [C++][Acero] Fix AsOfJoin with differently ordered schemas than the output (#39804) * [C++] Expression ExecuteScalarExpression execute empty args function with a wrong result (#39908) * [C++] Strip extension metadata when importing a registered extension (#39866) * [C#] Restore support for .NET 4.6.2 (#40008) * [C++] Fix out-of-line data size calculation in BinaryViewBuilder::AppendArraySlice (#39994) * [C++][CI][Parquet] Fixing parquet column_writer_test building (#40175) ## New Features and Improvements * [C++] PollFlightInfo does not follow rule of 5 * [C++] Fix filter and take kernel for month_day_nano intervals (#39795) * [C++] Thirdparty: Bump zlib to 1.3.1 (#39877) * [C++] Add missing "#include <algorithm>" (#40010) - Release 15.0.0 ## Bug Fixes * [C++] Bring back case_when tests for union types (#39308) * [C++] Fix the issue of ExecBatchBuilder when appending consecutive tail rows with the same id may exceed buffer boundary (#39234) * [C++][Python] Add a no-op kernel for dictionary_encode(dictionary) (#38349) * [C++] Use the latest tagged version of flatbuffers (#38192) * [C++] Don't use MSVC_VERSION to determin -fms-compatibility-version (#36595) * [C++] Optimize hash kernels for Dictionary ChunkedArrays (#38394) * [C++][Gandiva] Avoid registering exported functions multiple times in gandiva (#37752) * [C++][Acero] Fix race condition caused by straggling input in the as-of-join node (#37839) * [C++][Parquet] add more closed file checks for ParquetFileWriter (#38390) * [C++][FlightRPC] Add missing app_metadata arguments (#38231) * [C++][Parquet] Fix Valgrind memory leak in arrow-dataset-file-parquet-encryption-test (#38306) * [C++][Parquet] Don't initialize OpenSSL explicitly with OpenSSL 1.1 (#38379) * [C++] Re-generate flatbuffers C++ for Skyhook (#38405) * [C++] Avoid passing null pointer to LZ4 frame decompressor (#39125) * [C++] Add missing explicit size_t cast for i386 (#38557) * [C++] Fix: add TestingEqualOptions for gtest functions. (#38642) * [C++][Gandiva] Use arrow io util to replace std::filesystem::path in gandiva (#38698) * [C++] Protect against PREALLOCATE preprocessor defined on macOS (#38760) * [C++] Check variadic buffer counts in bounds (#38740) * [C++][FS][Azure] Do nothing for CreateDir("/container", true) (#38783) * Fix TestArrowReaderAdHoc.ReadFloat16Files to use new uncompressed files (#38825) * [C++] S3FileSystem export s3 sdk config "use_virtual_addressing" to arrow::fs::S3Options (#38858) * [C++][Gandiva] Fix Gandiva to_date function's validation for supress errors parameter (#38987) * [C++][Parquet] Fix spelling (#38959) * [C++] Fix spelling (acero) (#38961) * [C++] Fix spelling (compute) (#38965) * [C++] Fix spelling (util) (#38967) * [C++] Fix spelling (dataset) (#38969) * [C++] Fix spelling (filesystem) (#38972) * [C++] Fix spelling (#38978) * [C++] Fix spelling (#38980) * [C++][Acero] union node output batches should be unordered (#39046) * [C++][CI] Fix Valgrind failures (#39127) * [C++] Remove needless system Protobuf dependency with -DARROW_HDFS=ON (#39137) * [C++][Compute] Fix negative duration division (#39158) * [C++] Add missing data copy in StreamDecoder::Consume(data) (#39164) * [C++] Remove compiler warnings with -Wconversion -Wno-sign-conversion in public headers (#39186) * [C++][Benchmarking] Remove hardcoded min times (#39307) * [C++] Don't use "if constexpr" in lambda (#39334) * [C++] Disable -Werror=attributes for Azure SDK's identity.hpp (#39448) * [C++] Fix compile warning (#39389) * [CI][JS] Force node 20 on JS build on arm64 to fix build issues (#39499) * [C++] Disable parallelism for jemalloc external project (#39522) * [C++][Parquet] Fix crash in test_parquet_dataset_lazy_filtering (#39632) * [C++] Disable parallelism for all `make`-based externalProjects when CMake >= 3.28 is used ## New Features and Improvements * [C++][JSON] Change the max rows to Unlimited(int_32) (#38582) * [C++][Python] Add "Z" to the end of timestamp print string when tz defined (#39272) * [C++][Python] DLPack implementation for Arrow Arrays (producer) (#38472) * [C++] Diffing of Run-End Encoded arrays (#35003) * [C++][Python][R] Allow users to adjust S3 log level by environment variable (#38267) * [C++][Format] Implementation of the LIST_VIEW and LARGE_LIST_VIEW array formats (#35345) * [C++] Use Cast() instead of CastTo() for Scalar in test (#39044) * [C++][Python][Parquet] Implement Float16 logical type (#36073) * [C++] Add Utf8View and BinaryView to the c ABI (#38443) * [C++][Parquet] Add api to get RecordReader from RowGroupReader (#37003) * [C++] Expose a span converter for Buffer and ArraySpan (#38027) * [C++] Add A Dictionary Compaction Function For DictionaryArray (#37418) * [C++] Add arrow::ipc::StreamDecoder::Reset() (#37970) * [C++] Implement file reads for Azure filesystem (#38269) * [C++][Integration] Add C++ Utf8View implementation (#37792) * [C++][Gandiva] Add external function registry support (#38116) * [C++][Gandiva] Migrate LLVM JIT engine from MCJIT to ORC v2/LLJIT (#39098) * [C++] Feature: support concatenate recordbatches. (#37896) * [C++] Add support for specifying custom Array opening and closing delimiters to arrow::PrettyPrintDelimiters (#38187) * [R] Allow code() to return package name prefix. (#38144) * [C++][Benchmark] Add non-stream Codec Compression/Decompression (#38067) * [C++][Parquet] Change DictEncoder dtor checking to warning log (#38118) * [C++][Parquet] Support reading parquet files with multiple gzip members (#38272) * [C++][Parquet] check the decompressed page size same as size in page header (#38327) * [C++][Azure] Use properties for input stream metadata (#38524) * [C++][FS][Azure] Implement file writes (#38780) * [C++] Implement GetFileInfo for a single file in Azure filesystem (#38505) * [C++][CMake] Use transitive dependency for system GoogleTest (#38340) * [C++][Parquet] Use new encrypted files for page index encryption test (#38347) * Add validation logic for offsets and values to arrow.array.ListArray.fromArrays (#38531) * [C++][Acero] Create a sorted merge node (#38380) * [C++][Benchmark] Adding benchmark for LZ4/Snappy Compression (#38453) * [C++] Support LogicalNullCount for DictionaryArray (#38681) * [C++][Parquet] Faster scalar BYTE_STREAM_SPLIT (#38529) * [C++][Gandiva] Support registering external C functions (#38632) * [C++] Implement GetFileInfo(selector) for Azure filesystem (#39009) * [C++][FS][Azure] Implement CreateDir() (#38708) * [C++][FS][Azure] Implement DeleteDir() (#38793) * [C++][FS][Azure] Implement DeleteDirContents() (#38888) * [C++] : Implement AzureFileSystem::DeleteRootDirContents (#39151) * [C++][FS][Azure] Implement CopyFile() (#39058) * [C++][Go][Parquet] Add tests for reading Float16 files in parquet-testing (#38753) * [C++][FS][Azure] Rename AzurePath to AzureLocation (#38773) * [C++] Implement directory semantics even when the storage account doesn't support HNS (#39361) * [C++][Parquet] Update parquet.thrift to sync with 2.10.0 (#38815) * [C++] Replace "#ifdef ARROW_WITH_GZIP" in dataset test to ARROW_WITH_ZLIB (#38853) * [C++][Parquet] Using length to optimize bloom filter read (#38863) * [C++][Parquet] Minor: making parquet TypedComparator operation as const method (#38875) * [C++] DatasetWriter release rows_in_flight_throttle when allocate writing failed (#38885) * [C++][Parquet] Move EstimatedBufferedValueBytes from TypedColumnWriter to ColumnWriter (#39055) * [C++] Stop installing internal bpacking_simd* headers (#38908) * [C++][Gandiva] Refactor function holder to return arrow Result (#38873) * [C++] Use Cast() instead of CastTo() for Dictionary Scalar in test (#39362) * [C++] Use Cast() instead of CastTo() for Timestamp Scalar in test (#39060) * [C++] Use Cast() instead of CastTo() for List Scalar in test (#39353) * [C++][Parquet] Support row group filtering for nested paths for struct fields (#39065) * [C++] Refactor the Azure FS tests and filesystem class instantiation (#39207) * [C++][Parquet] Optimize FLBA record reader (#39124) * Create module info compiler plugin (#39135) * [C++] : Try to make Buffer::device_type_ non-optional (#39150) * [C++][Parquet] Remove deprecated AppendRowGroup(int64_t num_rows) (#39209) * [C++][Parquet] Avoid WriteRecordBatch from produce zero-sized RowGroup (#39211) * [C++] Support binary to fixed_size_binary cast (#39236) * [C++][Azure][FS] Add default credential auth configuration (#39263) * [C++] Don't install bundled Azure SDK for C++ with CMake 3.28+ (#39269) * [C++][FS] : Remove the AzureBackend enum and add more flexible connection options (#39293) * [C++][FS] : Inform caller of container not-existing when checking for HNS support (#39298) * [C++][FS][Azure] Add workload identity auth configuration (#39319) * [C++][FS][Azure] Add managed identity auth configuration (#39321) * [C++] Forward arguments to ExceptionToStatus all the way to Status::FromArgs (#39323) * [C++] Flaky DatasetWriterTestFixture.MaxRowsOneWriteBackpresure test (#39379) * [C++] Add ForceCachedHierarchicalNamespaceSupport to help with testing (#39340) * [C++][FS][Azure] Add client secret auth configuration (#39346) * [C++] Reduce function.h includes (#39312) * [C++] Use Cast() instead of CastTo() for Parquet (#39364) * [C++][Parquet] Vectorize decode plain on FLBA (#39414) * [C++][Parquet] Style: Using arrow::Buffer data_as api rather than reinterpret_cast (#39420) * [C++][ORC] Upgrade ORC to 1.9.2 (#39431) * [C++] Use default Azure credentials implicitly and support anonymous credentials explicitly (#39450) * [C++][Parquet] Allow reading dictionary without reading data via ByteArrayDictionaryRecordReader (#39153) - Disable logging until compatibility with glog is restored gh#apache/arrow#40181
buildservice-autocommit
accepted
request 1139093
from
Benjamin Greiner (bnavigator)
(revision 22)
baserev update by copy to link target
Benjamin Greiner (bnavigator)
accepted
request 1139092
from
Benjamin Greiner (bnavigator)
(revision 21)
- Update to 14.0.2 ## New Features and Improvements * GH-38449 - [Release][Go][macOS] Use local test data if possible (#38450) * GH-38591 - [Parquet][C++] Remove redundant open calls in ParquetFileFormat::GetReaderAsync (#38621) ## Bug Fixes * GH-38345 - [Release] Use local test data for verification if possible (#38362) * GH-38438 - [C++] Dataset: Trying to fix the async bug in Parquet dataset (#38466) * GH-38577 - Reading parquet file behavior change from 13.0.0 to 14.0.0 * GH-38618 - [C++] S3FileSystem: fix regression in deleting explicitly created sub-directories (#38845) * GH-38861 - [C++] Add missing “-framework Security” to Libs.private in arrow.pc (#38869) * GH-39072 - [Release][CI] Python3.11-devel is required for the verification job on AlmaLinux 8 (#39073) * GH-39074 - [Release][Packaging] Use UTF-8 explicitly for KEYS (#39082)
buildservice-autocommit
accepted
request 1138300
from
Benjamin Greiner (bnavigator)
(revision 20)
baserev update by copy to link target
Benjamin Greiner (bnavigator)
accepted
request 1138181
from
Petr Gajdos (pgajdos)
(revision 19)
- disable some tests for s390x [bsc#1218592]
buildservice-autocommit
accepted
request 1125775
from
John Vandenberg (jayvdb)
(revision 18)
baserev update by copy to link target
John Vandenberg (jayvdb)
accepted
request 1125774
from
Ondřej Súkup (mimi_vx)
(revision 17)
- update 14.0.1 * GH-38431 - [Python][CI] Update fs.type_name checks for s3fs tests * GH-38607 - [Python] Disable PyExtensionType autoload - update to 14.0.1 * very long list of changes can be found here: https://arrow.apache.org/release/14.0.0.html
buildservice-autocommit
accepted
request 1109686
from
Benjamin Greiner (bnavigator)
(revision 16)
baserev update by copy to link target
Benjamin Greiner (bnavigator)
accepted
request 1109685
from
Benjamin Greiner (bnavigator)
(revision 15)
- Update to 13.0.0 ## Acero * Handling of unaligned buffers is input nodes can be configured programmatically or by setting the environment variable ACERO_ALIGNMENT_HANDLING. The default behavior is to warn when an unaligned buffer is detected GH-35498. ## Compute * Several new functions have been added: - aggregate functions “first”, “last”, “first_last” GH-34911; - vector functions “cumulative_prod”, “cumulative_min”, “cumulative_max” GH-32190; - vector function “pairwise_diff” GH-35786. * Sorting now works on dictionary arrays, with a much better performance than the naive approach of sorting the decoded dictionary GH-29887. Sorting also works on struct arrays, and nested sort keys are supported using FieldRed GH-33206. * The check_overflow option has been removed from CumulativeSumOptions as it was redundant with the availability of two different functions: “cumulative_sum” and “cumulative_sum_checked” GH-35789. * Run-end encoded filters are efficiently supported GH-35749. * Duration types are supported with the “is_in” and “index_in” functions GH-36047. They can be multiplied with all integer types GH-36128. * “is_in” and “index_in” now cast their inputs more flexibly: they first attempt to cast the value set to the input type, then in the other direction if the former fails GH-36203. * Multiple bugs have been fixed in “utf8_slice_codeunits” when the stop option is omitted GH-36311. ## Dataset * A custom schema can now be passed when writing a dataset GH-35730. The custom schema can alter nullability or metadata information, but is not allowed to change the datatypes written. ## Filesystems * The S3 filesystem now writes files in equal-sized chunks, for compatibility with Cloudflare’s “R2” Storage GH-34363. * A long-standing issue where S3 support could crash at shutdown because of resources still being alive after S3 finalization has been fixed GH-36346. Now, attempts to use S3 resources (such as making filesystem calls) after S3 finalization should result in a clean error. * The GCS filesystem accepts a new option to set the project id GH-36227. ## IPC * Nullability and metadata information for sub-fields of map types is now preserved when deserializing Arrow IPC GH-35297. ## Orc * The Orc adapter now maps Arrow field metadata to Orc type attributes when writing, and vice-versa when reading GH-35304. ## Parquet * It is now possible to write additional metadata while a ParquetFileWriter is open GH-34888. * Writing a page index can be enabled selectively per-column GH-34949. In addition, page header statistics are not written anymore if the page index is enabled for the given column GH-34375, as the information would be redundant and less efficiently accessed. * Parquet writer properties allow specifying the sorting columns GH-35331. The user is responsible for ensuring that the data written to the file actually complies with the given sorting. * CRC computation has been implemented for v2 data pages GH-35171. It was already implemented for v1 data pages. * Writing compliant nested types is now enabled by default GH-29781. This should not have any negative implication. * Attempting to load a subset of an Arrow extension type is now forbidden GH-20385. Previously, if an extension type’s storage is nested (for example a “Point” extension type backed by a struct<x: float64, y: float64>), it was possible to load selectively some of the columns of the storage type. ## Substrait * Support for various functions has been added: “stddev”, “variance”, “first”, “last” (GH-35247, GH-35506). * Deserializing sorts is now supported GH-32763. However, some features, such as clustered sort direction or custom sort functions, are not implemented. ## Miscellaneous * FieldRef sports additional methods to get a flattened version of nested fields GH-14946. Compared to their non-flattened counterparts, the methods GetFlattened, GetAllFlattened, GetOneFlattened and GetOneOrNoneFlattened combine a child’s null bitmap with its ancestors’ null bitmaps such as to compute the field’s overall logical validity bitmap. * In other words, given the struct array [null, {'x': null}, {'x': 5}], FieldRef("x")::Get might return [0, null, 5] while FieldRef("y")::GetFlattened will always return [null, null, 5]. * Scalar::hash() has been fixed for sliced nested arrays GH-35360. * A new floating-point to decimal conversion algorithm exhibits much better precision GH-35576. * It is now possible to cast between scalars of different list-like types GH-36309.
buildservice-autocommit
accepted
request 1092627
from
Benjamin Greiner (bnavigator)
(revision 14)
baserev update by copy to link target
Displaying revisions 1 - 20 of 33