Revisions of python-dask
Ana Guerrero (anag+factory)
accepted
request 1223590
from
Dirk Mueller (dirkmueller)
(revision 82)
- update to 2024.11.1: * Zarr-Python 3 compatibility (:pr:`11388`) * Avoid exponentially increasing taskgraph in overlap * Ensure numba tokenization does not use slow pickle path
Ana Guerrero (anag+factory)
accepted
request 1199615
from
Dirk Mueller (dirkmueller)
(revision 81)
- update to 2024.8.2: * Avoid capturing code of xdist @fjetter * Reduce memory footprint of culling P2P rechunking * Add tests for choosing default rechunking method * Increase visibility of GPU CI updates @charlesbluca * Bump test\_pause\_while\_idle timeout @fjetter * Concatenate small input chunks before P2P rechunking * Remove dump cluster from gen\_cluster @fjetter * Bump `numpy>=1.24` and `pyarrow>=14.0.1` minimum versions * Fix PipInstall plugin on Worker @hendrikmakait * Remove more Python 3.10 compatibility code @jrbourbeau * Use task-based rechunking to prechunk along partial boundaries @hendrikmakait * Ensure client\_desires\_keys does not corrupt Scheduler state @fjetter * Bump minimum ``cloudpickle`` to 3 @jrbourbeau
Dominique Leuenberger (dimstar_suse)
accepted
request 1197828
from
Markéta Machová (mcalabkova)
(revision 80)
Ana Guerrero (anag+factory)
accepted
request 1187904
from
Markéta Machová (mcalabkova)
(revision 79)
Ana Guerrero (anag+factory)
accepted
request 1186062
from
Steve Kowalik (StevenK)
(revision 78)
- Update to 2024.6.2: * profile._f_lineno: handle next_line being None in Python 3.13 * Cache global query-planning config * Python 3.13 fixes * Fix test_map_freq_to_period_start for pandas=3 * Tokenizing memmap arrays will now avoid materializing the array into memory. * Fix test_dt_accessor with query planning disabled * Remove deprecated dask.compatibility module * Ensure compatibility for xarray.NamedArray * Avoid rounding error in test_prometheus_collect_count_total_by_cost_multipliers * Log key collision count in update_graph log event * Rename safe to expected in Scheduler.remove_worker * Eagerly update aggregate statistics for TaskPrefix instead of calculating them on-demand * Improve graph submission time for P2P rechunking by avoiding unpack recursion into indices * Add safe keyword to remove-worker event * Improved errors and reduced logging for P2P RPC calls * Adjust P2P tests for dask-expr * Iterate over copy of Server.digests_total_since_heartbeat to avoid RuntimeError * Add Prometheus gauge for task groups * Fix too strict assertion in shuffle code for pandas subclasses * Reduce noise from erring tasks that are not supposed to be running
Dominique Leuenberger (dimstar_suse)
accepted
request 1171090
from
Dirk Mueller (dirkmueller)
(revision 77)
Ana Guerrero (anag+factory)
accepted
request 1146835
from
Matej Cepl (mcepl)
(revision 75)
Forwarded request #1146758 from bnavigator - Update to 2024.2.0 * Deprecate Dask DataFrame implementation * Improved tokenization * https://docs.dask.org/en/stable/changelog.html#v2024-2-0 - Really drop python39 from testing instead of testing it with every other test flavor
Ana Guerrero (anag+factory)
accepted
request 1142781
from
Dirk Mueller (dirkmueller)
(revision 74)
- update to 2024.1.1: * This release contains compatibility updates for the latest pandas and scipy releases. See :pr:`10834`, :pr:`10849`, :pr:`10845`, and :pr-distributed:`8474` from `crusaderky`_ for details.
Ana Guerrero (anag+factory)
accepted
request 1140136
from
Dirk Mueller (dirkmueller)
(revision 73)
- update to 2024.1.0: * Released on January 12, 2024 * P2P rechunking now utilizes the relationships between input and output chunks. For situations that do not require all-to- all data transfer, this may significantly reduce the runtime and memory/disk footprint. It also enables task culling. * The fastparquet Parquet engine has been deprecated. Users should migrate to the pyarrow engine by installing PyArrow and removing engine="fastparquet" in read_parquet or to_parquet calls. * This release improves serialization robustness for arbitrary data. Previously there were some cases where serialization could fail for non-msgpack serializable data. In those cases we now fallback to using pickle. * Deprecate shuffle keyword in favour of shuffle_method for DataFrame methods (:pr:`10738`) `Hendrik Makait`_ * Deprecate automatic argument inference in repartition * Deprecate compute parameter in set_index * Deprecate inplace in eval * Deprecate Series.view * Deprecate npartitions="auto" for set_index & sort_values
Ana Guerrero (anag+factory)
accepted
request 1135096
from
Factory Maintainer (factory-maintainer)
(revision 72)
Automatic submission by obs-autosubmit
Dominique Leuenberger (dimstar_suse)
accepted
request 1132242
from
Factory Maintainer (factory-maintainer)
(revision 71)
Automatic submission by obs-autosubmit
Ana Guerrero (anag+factory)
accepted
request 1127184
from
Ondřej Súkup (mimi_vx)
(revision 70)
- Update to 2023.11.0 * Zero-copy P2P Array Rechunking * Deprecating PyArrow <14.0.1 * Improved PyArrow filesystem for Parquet * Improve Type Reconciliation in P2P Shuffling * official support for Python 3.12 * Reduced memory pressure for multi array reductions * improved P2P shuffling robustness * Reduced scheduler CPU load for large graphs (forwarded request 1127183 from mimi_vx)
Dominique Leuenberger (dimstar_suse)
accepted
request 1092262
from
Dirk Mueller (dirkmueller)
(revision 68)
Dominique Leuenberger (dimstar_suse)
accepted
request 1090990
from
Steve Kowalik (StevenK)
(revision 67)
- Tighten bokeh requirement to match distributed. - Update to 2023.5.1 * This release drops support for Python 3.8. As of this release Dask supports Python 3.9, 3.10, and 3.11. ## Enhancements * Drop Python 3.8 support (GH#10295) Thomas Grainger * Change Dask Bag partitioning scheme to improve cluster saturation (GH#10294) Jacob Tomlinson * Generalize dd.to_datetime for GPU-backed collections, introduce get_meta_library utility (GH#9881) Charles Blackmon-Luca * Add na_action to DataFrame.map (GH#10305) Patrick Hoefler * Raise TypeError in DataFrame.nsmallest and DataFrame.nlargest when columns is not given (GH#10301) Patrick Hoefler * Improve sizeof for pd.MultiIndex (GH#10230) Patrick Hoefler * Support duplicated columns in a bunch of DataFrame methods (GH#10261) Patrick Hoefler * Add numeric_only support to DataFrame.idxmin and DataFrame.idxmax (GH#10253) Patrick Hoefler * Implement numeric_only support for DataFrame.quantile (GH#10259) Patrick Hoefler * Add support for numeric_only=False in DataFrame.std (GH#10251) Patrick Hoefler * Implement numeric_only=False for GroupBy.cumprod and GroupBy.cumsum (GH#10262) Patrick Hoefler * Implement numeric_only for skew and kurtosis (GH#10258) Patrick Hoefler * mask and where should accept a callable (GH#10289) Irina Truong * Fix conversion from Categorical to pa.dictionary in read_parquet (GH#10285) Patrick Hoefler
Displaying revisions 1 - 20 of 82