-
Notifications
You must be signed in to change notification settings - Fork 6.4k
Insights: ray-project/ray
Overview
Could not load contribution data
Please try again later
1 Release published by 1 person
-
ray-2.47.0 Ray-2.47.0
published
Jun 12, 2025
99 Pull requests merged by 47 people
-
[serve] Set route_prefix and docs_path when re-deploying app
#53753 merged
Jun 14, 2025 -
[cherry-pick][dashboard] Fix retrieving IP address from the GPUProfilingManager on the dashboard agent
#53817 merged
Jun 14, 2025 -
Add tpu usage metrics to reporter_agent
#53678 merged
Jun 14, 2025 -
[data] Refactor interface for actor_pool_map_operator
#53752 merged
Jun 13, 2025 -
ray-llm container cu124 -> cu128 update
#53730 merged
Jun 13, 2025 -
[dashboard] Fix retrieving IP address from the
GPUProfilingManager
on the dashboard agent#53807 merged
Jun 13, 2025 -
[ci/release] Trigger Ray release by running a Bazel binary
#52962 merged
Jun 13, 2025 -
version change for 2.47.1
#53813 merged
Jun 13, 2025 -
cherrypick #53671
#53812 merged
Jun 13, 2025 -
[core] Move dependencies of NodeManger to main.cc for better testability
#53782 merged
Jun 13, 2025 -
[core] Deflake
test_object_spilling.py
#53803 merged
Jun 13, 2025 -
[core] Deflake
test_state_api.py
#53804 merged
Jun 13, 2025 -
[tune] update BlockMetadata args in tests
#53791 merged
Jun 13, 2025 -
[serve] Fix autoscaling metrics
#53778 merged
Jun 13, 2025 -
pass route prefix to replica
#53777 merged
Jun 13, 2025 -
[Serve] Call shared long poll client router registration in event loop
#53613 merged
Jun 13, 2025 -
[core] Add timeout to
ray.get
call intest_update_object_location_batch_failure
#53805 merged
Jun 13, 2025 -
[RLlib] Fix device check in
Learner
.#53706 merged
Jun 13, 2025 -
[core] Deflake
test_client_builder.py
#53774 merged
Jun 13, 2025 -
[core] Increase instance sizes for wheel / HA tests
#53783 merged
Jun 13, 2025 -
[serve.llm] Organize spread out utils.py
#53722 merged
Jun 13, 2025 -
[Doc] Added ray-serve llm doc
#52832 merged
Jun 12, 2025 -
Remove Schema From BlockMetadata
#53454 merged
Jun 12, 2025 -
[Core] Exit the Core Worker Early Error Received from Plasma Store
#53679 merged
Jun 12, 2025 -
fix: WandbLogger crashing silently on a FileNotFoundError
#50308 merged
Jun 12, 2025 -
[Serve] feat: make ray.serve.batch concurrent
#53096 merged
Jun 12, 2025 -
[RLlib] Add ability to compute percentiles to MetricsLogger/Stats
#52963 merged
Jun 12, 2025 -
[core][telemetry] move the open telemetry tests into a pytest module
#53751 merged
Jun 12, 2025 -
[core] Speed up
test_actor_advanced.py
#53738 merged
Jun 12, 2025 -
Replace
miniconda3
withminiforge
#53436 merged
Jun 12, 2025 -
[serve] Improve test_metrics
#53747 merged
Jun 12, 2025 -
update codeowners for ray serve
#53717 merged
Jun 11, 2025 -
[Tune][Air] Fix MLflowLoggerCallback to enable its use with PBT (#27783)
#42182 merged
Jun 11, 2025 -
Fix vLLM batch test by changing to Pixtral
#53744 merged
Jun 11, 2025 -
Fix uv tests on macos x86_64
#53741 merged
Jun 11, 2025 -
[core][telemetry/07] support counter metric on worker side
#53418 merged
Jun 11, 2025 -
[docker] Update latest Docker dependencies for 2.47.0 release
#53749 merged
Jun 11, 2025 -
[docker] Update latest Docker dependencies for 2.47.0 release
#53748 merged
Jun 11, 2025 -
[train] Raise error when calling ray.train.report with a gpu tensor
#53725 merged
Jun 11, 2025 -
[serve] Increase httpx timeout to 30s for backpressure test
#53711 merged
Jun 11, 2025 -
[ci] Move
simulate_storage
from_private/
to_common/
#53735 merged
Jun 11, 2025 -
[core] Give io context concurrency hint
#53642 merged
Jun 11, 2025 -
[serve] Remove dependency on
ray._private.ray_constants.py
#53700 merged
Jun 11, 2025 -
[core] Warning when creating actor with restarts and arguments in plasma
#53713 merged
Jun 11, 2025 -
[Docs] [istio mtls] Add warning on sidecar OOM for mTLS
#53385 merged
Jun 11, 2025 -
[cherry-pick][Docs] Added user-guide for Joins (#52987)
#53712 merged
Jun 11, 2025 -
Add observability for label-selectors
#53423 merged
Jun 11, 2025 -
[Doc] Bind the version of kuberay to v1.3.0 in related docs
#53661 merged
Jun 11, 2025 -
[docs] fix link in gcp-gke-tpu-cluster.md
#53708 merged
Jun 11, 2025 -
Add perf metrics for 2.47.0
#53668 merged
Jun 11, 2025 -
[train][template] pytorch + train + data template uses absolute links
#53718 merged
Jun 11, 2025 -
[train] add trace to WorkerHealthCheckFailedError
#53626 merged
Jun 11, 2025 -
fix CI, wrong import path
#53715 merged
Jun 10, 2025 -
[core][gpu-objects] Fix the performance regression by clearing
object_ref
for small and non-GPU objects#53692 merged
Jun 10, 2025 -
[Docs] Finalize time-series tutorial, add lockfiles
#53710 merged
Jun 10, 2025 -
[core] remove dead open telemetry code
#53709 merged
Jun 10, 2025 -
E2e rag
#53703 merged
Jun 10, 2025 -
[core] Add single-controller API for ray.util.collective and torch gloo backend
#53319 merged
Jun 10, 2025 -
[core] Migrate ray.private.pydantic_compat from _private to _common
#53686 merged
Jun 10, 2025 -
[core][3/N] Avoid unnecessary deserialization/serialization of ParentTaskId
#53695 merged
Jun 10, 2025 -
[core] Remove deprecated
storage
parameter toray.init
#53669 merged
Jun 10, 2025 -
[serve.llm] delete dead code from prompt format days
#53621 merged
Jun 10, 2025 -
[core] Fix
test_multi_tenancy.py
on Windows#53699 merged
Jun 10, 2025 -
[core] Remove unused
object_ref_seed
parameter#53698 merged
Jun 10, 2025 -
[core] early exit spill if spilling config is empty
#53193 merged
Jun 10, 2025 -
[ci] Fix crane auth issue for nightly multi arch tagging
#53483 merged
Jun 10, 2025 -
handle task cancellation error
#53680 merged
Jun 10, 2025 -
Code refactoring in proxy
#53644 merged
Jun 10, 2025 -
[core] Migrate wait_for_condition and async_wait_for_condition from _private to _common
#53652 merged
Jun 10, 2025 -
Convert cluster compute config in release test to Kuberay compute config
#53681 merged
Jun 10, 2025 -
[Core] Vendor setproctitle
#53471 merged
Jun 10, 2025 -
add back run on anyscale button
#53688 merged
Jun 10, 2025 -
[Compiled Graph] Enhance Compile Graph with Multi-Device Support
#53395 merged
Jun 10, 2025 -
BLD: Remove redundant
manylinux1
related flag in.bazelrc
#53549 merged
Jun 10, 2025 -
[train][template] Add Anyscale template for pytorch + train + data
#53220 merged
Jun 10, 2025 -
[core] Remove deprecated
ray start
CLI options#53675 merged
Jun 10, 2025 -
[core] Speed up & deflake
test_multitenancy.py
#53674 merged
Jun 10, 2025 -
[ci] change macos intel platform to 12_0
#53671 merged
Jun 10, 2025 -
[Docs] Create lockfiles for various e2e tutorials
#53672 merged
Jun 10, 2025 -
[Docs] Adds second notebook to timeseries tutorial
#53561 merged
Jun 10, 2025 -
[ci] fix misconfig on byod scripts
#53682 merged
Jun 10, 2025 -
Fix map_batches release test back_to_back option
#53664 merged
Jun 9, 2025 -
[ci] Resize some runtime_env tests
#53670 merged
Jun 9, 2025 -
[core] Creating an interface ObjectManager's for GrpcClientManager.
#53656 merged
Jun 9, 2025 -
[core][1/N] Avoid unnecessary deserialization/serialization of TaskId
#53577 merged
Jun 9, 2025 -
make constant for x-request-id
#53667 merged
Jun 9, 2025 -
Remove
ray.workflow
package#53612 merged
Jun 9, 2025 -
add vale to pre-commit
#53564 merged
Jun 9, 2025 -
Add DataContext + LogicalOp Args to Dataset Export
#53554 merged
Jun 9, 2025 -
[core] Skip test on mac build
#53662 merged
Jun 9, 2025 -
[core] Make preloading Jemalloc configurable for worker
#47243 merged
Jun 9, 2025 -
[llm] bump vllm to 0.9.0.1
#53443 merged
Jun 9, 2025 -
uint8_t* data ptr not used.
#47565 merged
Jun 9, 2025 -
[core][2/N] Avoid unnecessary deserialization/serialization of ObjectId
#53574 merged
Jun 9, 2025 -
[ci] add more docker groups to work with buildkite amis
#53640 merged
Jun 9, 2025 -
[ci] use new docker account for releasing
#53646 merged
Jun 8, 2025 -
[core] Correctly fail worker lease request if a task becomes infeasible after scheduling
#52295 merged
Jun 8, 2025 -
pin flashinfer-python to 0.2.5
#53637 merged
Jun 7, 2025
62 Pull requests opened by 41 people
-
[Air] Add Video FPS Support for `WandbLoggerCallback`
#53638 opened
Jun 7, 2025 -
[Serve] Check multiple FastAPI ingress deployments in a single application
#53647 opened
Jun 8, 2025 -
[core]: Correct podman output parsing for image uri in runtime env
#53653 opened
Jun 9, 2025 -
[core] Use core worker client pool in GCS
#53654 opened
Jun 9, 2025 -
[core] Adding a nightly benchmark for continuous, bidirectional object transfer on two nodes.
#53657 opened
Jun 9, 2025 -
[core] Release resources only after tasks have stopped executing
#53660 opened
Jun 9, 2025 -
Train Tests: Add wrapper to run tests in a loop
#53683 opened
Jun 10, 2025 -
[train] Cleanups for training ingest benchmark
#53684 opened
Jun 10, 2025 -
[refactor] Install uv from test-requirements.txt
#53685 opened
Jun 10, 2025 -
[data] allow max_calls to be a static but not dynamic option
#53687 opened
Jun 10, 2025 -
[WIP] Remove old uv runtime env plugin
#53690 opened
Jun 10, 2025 -
Bump requests from 2.32.3 to 2.32.4 in /python
#53691 opened
Jun 10, 2025 -
[Data] Add reading from Delta Lake tables and from Unity Catalog
#53701 opened
Jun 10, 2025 -
[RLlib; Offline RL] Implement Offline Policy Evaluation (OPE) via Importance Sampling.
#53702 opened
Jun 10, 2025 -
[Serve][LLM] Simplify _prepare_engine_config()
#53704 opened
Jun 10, 2025 -
[HashShuffle] - Add warnings for when there are insufficient resources for Aggregators
#53705 opened
Jun 10, 2025 -
(serve.llm): Refactor/Consolidate LoRA downloading
#53714 opened
Jun 10, 2025 -
(serve.llm) Make _LLMServerBase.__init__ synchronous
#53719 opened
Jun 10, 2025 -
[core][gpu objects] Integrate single-controller collective APIs with GPU objects
#53720 opened
Jun 10, 2025 -
Bump scikit-learn from 1.3.2 to 1.5.1 in /doc/source/ray-overview/examples/e2e-timeseries
#53721 opened
Jun 10, 2025 -
(serve.llm) Remove test leakage from placement bundle logic
#53723 opened
Jun 10, 2025 -
[serve.llm] Add better logging verbosity controls
#53728 opened
Jun 11, 2025 -
Minor Documentation Fixes in Protobuf Files
#53731 opened
Jun 11, 2025 -
[RLlib; docs] Docs do-over (new API stack): `ConnectorV2` documentation.
#53732 opened
Jun 11, 2025 -
[WIP] Remove test cases for `gcs_actor_based_scheduling`
#53733 opened
Jun 11, 2025 -
[core][telemetry/10] support custom gauge+counter+sum metrics
#53734 opened
Jun 11, 2025 -
[core][telemetry/11] support histogram metric on worker side
#53740 opened
Jun 11, 2025 -
[ci] bazelize `get_contributors` script
#53743 opened
Jun 11, 2025 -
[core][rfc] upgrade opentelemetry-sdk
#53745 opened
Jun 11, 2025 -
Test
#53746 opened
Jun 11, 2025 -
Add example gpt2 tuning script
#53750 opened
Jun 11, 2025 -
[ci] add cibase tags for ci base envs
#53755 opened
Jun 12, 2025 -
Fix ray import error when both ROCR_VISIBLE_DEVICES and HIP_VISIBLE_DEVICES are set
#53757 opened
Jun 12, 2025 -
[core] Fix race condition in raylet graceful shutdown
#53762 opened
Jun 12, 2025 -
[core] Add switch for the cache of runtime env
#53775 opened
Jun 12, 2025 -
[serve] Add telemetry for users with Pydantic version < 2
#53779 opened
Jun 12, 2025 -
[core] Fix GCS subscribers map race condition
#53781 opened
Jun 12, 2025 -
Try parallelize tests within file
#53784 opened
Jun 12, 2025 -
DepSets CLI tool
#53785 opened
Jun 13, 2025 -
[core] shutdown grpc on worker force exit
#53787 opened
Jun 13, 2025 -
[train] add proper filtering to metrics
#53788 opened
Jun 13, 2025 -
[core] update to protbuf-28.2, absl-20240722, grpc-1.67 and patch for windows
#53789 opened
Jun 13, 2025 -
Add `pin_memory` to `iter_torch_batches`
#53792 opened
Jun 13, 2025 -
add new developer api to get application url
#53796 opened
Jun 13, 2025 -
[core][gpu-objects] Support intra-process communication
#53798 opened
Jun 13, 2025 -
Feature/sac discrete
#53801 opened
Jun 13, 2025 -
[core] Deflake `test_multiprocessing.py`
#53802 opened
Jun 13, 2025 -
[serve] Revert request timeout from serve instance fixtures
#53809 opened
Jun 13, 2025 -
[core] deleting unused code from plasma client
#53814 opened
Jun 13, 2025 -
Fix pickle error with remote code models in vLLM multiprocessing
#53815 opened
Jun 13, 2025 -
[train] TrainStateActor periodically checks controller status and sets aborted
#53818 opened
Jun 13, 2025 -
Bump gitpython from 3.1.40 to 3.1.41 in /python
#53819 opened
Jun 13, 2025 -
Bump tqdm from 4.64.1 to 4.66.3 in /python
#53820 opened
Jun 13, 2025 -
[Serve.llm][P/D] Support separate deployment config for PDProxy in Prefill disagg
#53821 opened
Jun 14, 2025 -
[train][template] Remove ineffective post build script and pip install instead
#53822 opened
Jun 14, 2025 -
Sharing progress with broader team
#53823 opened
Jun 14, 2025 -
Disable TP=2 VLM batch test
#53825 opened
Jun 14, 2025 -
[Doc][KubeRay] remove head pod trailing hash and adjust volcano output
#53826 opened
Jun 14, 2025
54 Issues closed by 15 people
-
CI test windows://python/ray/tests:test_basic_client_mode is flaky
#52117 closed
Jun 13, 2025 -
[Serve] check_health with custom exception does not enter failed state, infinite retries
#53742 closed
Jun 13, 2025 -
CI test windows://python/ray/serve/tests:test_standalone is flaky
#48420 closed
Jun 13, 2025 -
[core][gpu-objects] Object contains multiple tensors and/or mix of CPU data and GPU tensors
#51274 closed
Jun 13, 2025 -
CI test windows://python/ray/serve/tests:test_standalone_with_comp_sche is flaky
#48425 closed
Jun 13, 2025 -
CI test windows://python/ray/tests:test_actor_state_metrics is flaky
#46303 closed
Jun 13, 2025 -
CI test linux://python/ray/tune:test_tuner is consistently_failing
#53786 closed
Jun 13, 2025 -
CI test linux://rllib:examples/algorithms/appo_custom_algorithm_w_shared_data_actor is flaky
#53176 closed
Jun 13, 2025 -
Release test serve_autoscaling_load_test.aws failed
#53760 closed
Jun 13, 2025 -
Release test llm_serve_llama_3dot1_8B_quantized_tp1_2p6d failed
#53769 closed
Jun 13, 2025 -
Release test llm_serve_llama_3dot1_8B_quantized_tp1_1p1d failed
#53768 closed
Jun 13, 2025 -
Release test llm_serve_llama_3dot2_1B_no_accelerator failed
#53765 closed
Jun 13, 2025 -
Release test llm_serve_llama_3dot2_1B_s3 failed
#53767 closed
Jun 13, 2025 -
Release test llm_serve_llama_3dot1_8B_lora failed
#53766 closed
Jun 13, 2025 -
Release test llm_serve_llama_3dot1_8B_quantized_tp_1 failed
#53764 closed
Jun 13, 2025 -
Release test llm_serve_llama_3dot1_8B_tp_2 failed
#53763 closed
Jun 13, 2025 -
Release test serve_scale_replicas.aws failed
#53761 closed
Jun 13, 2025 -
CI test windows://python/ray/serve/tests:test_logging is flaky
#46043 closed
Jun 13, 2025 -
[llm] vllm is throwing RuntimeError("Failed to infer device type")
#51967 closed
Jun 13, 2025 -
CI test windows://python/ray/serve/tests:test_target_capacity is consistently_failing
#48426 closed
Jun 12, 2025 -
CI test windows://python/ray/serve/tests:test_telemetry is flaky
#48427 closed
Jun 12, 2025 -
CI test linux://python/ray/data:test_datasink is consistently_failing
#52098 closed
Jun 12, 2025 -
CI test linux://rllib:examples/learners/ppo_with_torch_lr_schedulers is flaky
#49181 closed
Jun 12, 2025 -
[Serve] concurrency in ray.serve.batch
#53071 closed
Jun 12, 2025 -
Release test batch_inference_hetero failed
#53601 closed
Jun 12, 2025 -
CI test linux://rllib:learning_tests_stateless_cartpole_appo_gpu is flaky
#47295 closed
Jun 12, 2025 -
CI test darwin://python/ray/tests:test_runtime_env_uv_run_client_mode is consistently_failing
#53650 closed
Jun 11, 2025 -
CI test windows://python/ray/serve/tests:test_backpressure is consistently_failing
#50386 closed
Jun 11, 2025 -
[Kuberay] The reference of Kuberay code link should bind to a release version
#53655 closed
Jun 11, 2025 -
[Data] add boundaries or sorted flag to GroupedData.map_groups
#52577 closed
Jun 11, 2025 -
CI test windows://python/ray/tests:test_actor_failures is consistently_failing
#52130 closed
Jun 11, 2025 -
[Dashboard/Core] In KubeRay, resource list in Cluster Dashboard tab
#53641 closed
Jun 11, 2025 -
CI test windows://python/ray/tests:test_object_store_metrics is flaky
#49514 closed
Jun 11, 2025 -
[core][gpu-objects] Performance regression caused by transferring object references for small objects
#53623 closed
Jun 10, 2025 -
[Core] Ray worker fails to register with raylet when using grpcio>=1.71.0
#53631 closed
Jun 10, 2025 -
CI test linux://rllib:learning_tests_multi_agent_cartpole_ppo_multi_gpu is flaky
#46226 closed
Jun 10, 2025 -
[core] support .rayignore
#53648 closed
Jun 10, 2025 -
[core] CUDA VISIBLE DEVICES is not being set for PlacementGroups
#53643 closed
Jun 10, 2025 -
CI test linux://python/ray/tests:test_client_builder is flaky
#43570 closed
Jun 10, 2025 -
Release test map_batches_fixed_size_actors_numpy_False failed
#53658 closed
Jun 10, 2025 -
Release test map_batches_autoscaling_actors_numpy_False failed
#53659 closed
Jun 10, 2025 -
CI test windows://python/ray/tests:test_multi_tenancy is consistently_failing
#51506 closed
Jun 10, 2025 -
[Core] Ray doesn't respect object_store_memory when spilling is disabled
#53086 closed
Jun 10, 2025 -
[Serve] quick request cancellation with model composition leads to unhandled `TaskCancelledError`s
#53639 closed
Jun 10, 2025 -
Migrating from `manylinux1` to `manylinux2014`
#53548 closed
Jun 10, 2025 -
CI test linux://rllib:examples/evaluation/evaluation_parallel_to_training_multi_agent_duration_auto is flaky
#53255 closed
Jun 10, 2025 -
CI test linux://python/ray/data:test_map is consistently_failing
#48164 closed
Jun 9, 2025 -
CI test windows://python/ray/tests:test_output is consistently_failing
#51467 closed
Jun 9, 2025 -
CI test darwin://python/ray/tests:test_runtime_env_conda_and_pip_client_mode is consistently_failing
#53649 closed
Jun 9, 2025 -
[Core] We should make preloading Jemalloc configurable for worker
#47242 closed
Jun 9, 2025 -
CI test darwin://python/ray/tests:test_job is consistently_failing
#45537 closed
Jun 9, 2025 -
Issue with TLS Authentication
#53651 closed
Jun 9, 2025
34 Issues opened by 31 people
-
[bug][serve.llm] AssertionError: failed to get the hash of the compiled graph (VLM, batch, TP=2)
#53824 opened
Jun 14, 2025 -
How to transfer tensors stored in GPU in actor with NCCL?
#53816 opened
Jun 13, 2025 -
[flaky] test_scheduling_2.py::test_demand_report_when_scale_up
#53811 opened
Jun 13, 2025 -
Release test random_shuffle_fixed_size failed
#53806 opened
Jun 13, 2025 -
[Data] Custom Partitioner in Ray Data and Related Implementation Considerations
#53800 opened
Jun 13, 2025 -
[Core] Transient network failure on RPC `WaitForActorRefDeleted` causes actor registration fail
#53797 opened
Jun 13, 2025 -
How to enable tool calling in serve llm?
#53795 opened
Jun 13, 2025 -
[RLlib] Checkpointing fails with CUDA GPU learner using the new API stack
#53793 opened
Jun 13, 2025 -
[<Ray component: Core|RLlib|etc...>] Issue of port allocation
#53790 opened
Jun 13, 2025 -
[RLlib][Unity] unity3d_env_local.py 'NoneType' for action spaces
#53780 opened
Jun 12, 2025 -
Support gymnasium > 1.0.0
#53776 opened
Jun 12, 2025 -
[Dashboard] Support ncu
#53759 opened
Jun 12, 2025 -
[Core] Ray hangs with vllm0.8.5 v1 api for tp8+pp4
#53758 opened
Jun 12, 2025 -
Core: Ray 2.45 causes Google's LIBTPU to be very spammy
#53756 opened
Jun 12, 2025 -
[core] Race condition between raylet graceful shutdown and GCS health checks
#53739 opened
Jun 11, 2025 -
Conflict between ROCR_VISIBLE_DEVICES and HIP_VISIBLE_DEVICES environment variables causes Ray import error
#53737 opened
Jun 11, 2025 -
[Announcement] Ray Summit 2025 Call for Proposals Due June 30th
#53729 opened
Jun 11, 2025 -
[core] Actor restarts don't work when an actor creation arg is evicted from plasma
#53727 opened
Jun 10, 2025 -
Release test compiled_graphs failed
#53716 opened
Jun 10, 2025 -
[Core] Custom docker image not scaling out
#53696 opened
Jun 10, 2025 -
[<Ray component: Core|RLlib|etc...>] SAC config error about framework
#53694 opened
Jun 10, 2025 -
[Data] Support for SQL/DataFrame capability
#53693 opened
Jun 10, 2025 -
[Dashboard] Display gpu metrics for AMD/ROCm devices
#53689 opened
Jun 10, 2025 -
[Serve] Serve-native CPU profiling in Replicas is broken
#53677 opened
Jun 9, 2025 -
[Dask-on-Ray,core] Tasks not registering on the jobs and job is subsequently getting stuck
#53666 opened
Jun 9, 2025 -
[Ray Core] Detached actor doesn't finish method after the client disconnects
#53665 opened
Jun 9, 2025 -
[Ray serve] Unable to serve meta-llama/Llama-3.1-8B-Instruct
#53663 opened
Jun 9, 2025 -
[Core] Transient network failure on RPC `MarkJobFinished` causes node crash
#53645 opened
Jun 8, 2025 -
[core][autoscaler] Select different node types when a node type is unavailable
#53636 opened
Jun 7, 2025 -
[core][dashboard]: Package already exists, skipping upload.
#53635 opened
Jun 7, 2025
260 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[ci] First release test on GKE
#53390 commented on
Jun 13, 2025 • 33 new comments -
[train] Driver SIGINT calls controller abort
#53600 commented on
Jun 14, 2025 • 27 new comments -
[Docs] Add ServiceMonitor section and make some step optional in Grafana & Promethus page
#53474 commented on
Jun 14, 2025 • 19 new comments -
[core] fix detached actor being unexpectedly killed
#53562 commented on
Jun 13, 2025 • 18 new comments -
[core] Fix gcs register actor callback check
#53634 commented on
Jun 13, 2025 • 15 new comments -
[Docs][KubeRay] Add guide for writing KubeRay doctests
#51708 commented on
Jun 9, 2025 • 11 new comments -
[Serve] Set the docs path after app is initialized on the replica
#53463 commented on
Jun 11, 2025 • 5 new comments -
[Core] Add default Ray Node labels at Node init
#53360 commented on
Jun 13, 2025 • 5 new comments -
Update V2 Autoscaler to support scheduling using Node labels and LabelSelector API
#53578 commented on
Jun 13, 2025 • 4 new comments -
[Serve] Prioritize stopping most recently scaled-up replicas during downscaling
#52929 commented on
Jun 12, 2025 • 2 new comments -
[core][rocm] Allow CUDA_VISIBLE_DEVICS and HIP_VISIBLE_DEVICES
#53531 commented on
Jun 12, 2025 • 2 new comments -
[Core] Add Logic to Emit Task Events to Event Aggregator
#53402 commented on
Jun 13, 2025 • 2 new comments -
[core][telemetry/08] record counter metric e2e
#53449 commented on
Jun 11, 2025 • 1 new comment -
[Serve] Immediately terminate unscheduled replicas
#52416 commented on
Jun 8, 2025 • 0 new comments -
Use large instance for OOM test
#52400 commented on
Jun 8, 2025 • 0 new comments -
Debug multinode
#52382 commented on
Jun 12, 2025 • 0 new comments -
Bump http-proxy-middleware from 2.0.6 to 2.0.9 in /python/ray/dashboard/client
#52370 commented on
Jun 8, 2025 • 0 new comments -
[core] Avoid infinite recursion when an ActorHandle fails to unpickle
#52363 commented on
Jun 8, 2025 • 0 new comments -
Bump torch from 2.0.1 to 2.6.0 in /doc/source/templates/05_dreambooth_finetuning/dreambooth
#52351 commented on
Jun 8, 2025 • 0 new comments -
[core] Fix stuck of long-time running RayCluster
#52303 commented on
Jun 12, 2025 • 0 new comments -
[build] warning when username or homedir include @ character
#52274 commented on
Jun 8, 2025 • 0 new comments -
[autoscaler] Refactor locks behavior in local node provider
#52269 commented on
Jun 11, 2025 • 0 new comments -
[Do Not Merge][core/raylet] Split giant ray core C++ targets into small ones
#52250 commented on
Jun 8, 2025 • 0 new comments -
Added warning for serve config when no config exists
#52224 commented on
Jun 8, 2025 • 0 new comments -
[data] Add overwrite options for iceberg sink
#52206 commented on
Jun 8, 2025 • 0 new comments -
[core] Remove object store runner
#51766 commented on
Jun 7, 2025 • 0 new comments -
Adapt Dask on Ray to the new Dask Task class
#52589 commented on
Jun 10, 2025 • 0 new comments -
Bump h11 from 0.14.0 to 0.16.0 in /release
#52582 commented on
Jun 11, 2025 • 0 new comments -
[Dashboard] Add Worker ID column to Worker table in Node detail page
#52581 commented on
Jun 14, 2025 • 0 new comments -
[fix][Doc] Code snippet in `Ray Data Internal` document
#52559 commented on
Jun 9, 2025 • 0 new comments -
[data] Improve Ray Data progress bar summary string
#52526 commented on
Jun 9, 2025 • 0 new comments -
[core][autoscaler] make ready/infeasible/backlog counts clear in the autoscaler status verbose report
#52520 commented on
Jun 9, 2025 • 0 new comments -
fix bootstrap alert inheritence
#52519 commented on
Jun 9, 2025 • 0 new comments -
[core] Static Priority Scheduling (3/N)
#52506 commented on
Jun 9, 2025 • 0 new comments -
[core] Static Priority scheduling (4/N)
#52489 commented on
Jun 9, 2025 • 0 new comments -
[core] Support streaming arbitrary types to Status messages
#52473 commented on
Jun 9, 2025 • 0 new comments -
[core] Static Priority scheduling (2/N)
#52465 commented on
Jun 9, 2025 • 0 new comments -
feat(data): Add on_error and error_handler to map_batches
#52457 commented on
Jun 12, 2025 • 0 new comments -
[Data][doc] Update on ray.data.Dataset.map() type hints
#52455 commented on
Jun 9, 2025 • 0 new comments -
[Not for Merge] [Core] Add Task Event Aggregator Skeleton
#52452 commented on
Jun 12, 2025 • 0 new comments -
[TEST] test branch
#52447 commented on
Jun 9, 2025 • 0 new comments -
[core] Fix cgroup controller check
#52443 commented on
Jun 9, 2025 • 0 new comments -
[core] Static Priority Scheduling (1/N)
#52439 commented on
Jun 8, 2025 • 0 new comments -
[air/data][perf]: Hot-path check to reduce the number of calls of pa.concat_arrays
#51792 commented on
Jun 12, 2025 • 0 new comments -
[core] Get cloud provider with ray on kubernetes
#51793 commented on
Jun 7, 2025 • 0 new comments -
[core][wip] Trying bzlmod
#51834 commented on
Jun 7, 2025 • 0 new comments -
[core] Support `.options` chaining in `actor.options`
#51836 commented on
Jun 10, 2025 • 0 new comments -
[core][cgraph] Refactor get_devices() for compiled graph
#51855 commented on
Jun 8, 2025 • 0 new comments -
[DO NOT MERGE] Increase test timeout to repro in CI
#51863 commented on
Jun 8, 2025 • 0 new comments -
[Data] Fix bug where pandas blocks don't use tensor extension
#51868 commented on
Jun 8, 2025 • 0 new comments -
Solved warning system
#51875 commented on
Jun 11, 2025 • 0 new comments -
[DONOTMERGE] (mostly) Python PoC for GPU objects in Ray
#51902 commented on
Jun 11, 2025 • 0 new comments -
Add new autoscaling parameter `aggregation function`
#51905 commented on
Jun 8, 2025 • 0 new comments -
[WIP]
#51939 commented on
Jun 8, 2025 • 0 new comments -
[RLlib]fix a bug in the example
#51944 commented on
Jun 8, 2025 • 0 new comments -
[build] Fix typo & improve `build-docker.sh`
#51950 commented on
Jun 8, 2025 • 0 new comments -
[core][autoscaler] test with KubeRay 1.4.0
#51958 commented on
Jun 8, 2025 • 0 new comments -
[Fix][Core] Fail fast if the dashboard agent fails to launch the HTTP server
#51960 commented on
Jun 9, 2025 • 0 new comments -
Add return_future to ActorPool.get_next()
#52004 commented on
Jun 11, 2025 • 0 new comments -
test for raycirun
#52012 commented on
Jun 8, 2025 • 0 new comments -
[Chore][Dashboard] Move DataHead to python/ray/data/ folder
#52013 commented on
Jun 9, 2025 • 0 new comments -
[Chore][Dashboard] Move `TrainHead` to `python/ray/train` folder
#52014 commented on
Jun 9, 2025 • 0 new comments -
[Data] Implement proper limit pushdown #51966
#52018 commented on
Jun 8, 2025 • 0 new comments -
[Data] Make `from_items` lineage serializable
#52026 commented on
Jun 8, 2025 • 0 new comments -
Re-arrange integrations docs
#52049 commented on
Jun 8, 2025 • 0 new comments -
[WIP] Ray Data doc updates
#52062 commented on
Jun 8, 2025 • 0 new comments -
[Data] Preserving special types in mapping and filtering
#52067 commented on
Jun 8, 2025 • 0 new comments -
[Data,Train] Add helpful errors when running forbidden methods on sharded datasets
#52079 commented on
Jun 13, 2025 • 0 new comments -
[data] Add Dataset.write_datasink_lazy to support intermediate outputs.
#52094 commented on
Jun 8, 2025 • 0 new comments -
[Dashboard] Add GPU component usage
#52102 commented on
Jun 9, 2025 • 0 new comments -
Bump xgrammar from 0.1.16 to 0.1.18 in /python
#52175 commented on
Jun 8, 2025 • 0 new comments -
upgrade path to python protobuf 4
#52194 commented on
Jun 8, 2025 • 0 new comments -
[train] upgrade tensorflow-datasets
#52195 commented on
Jun 8, 2025 • 0 new comments -
[core][tests] Add chaos tests to verify the interaction between actor restarts, task retries, and lineage reconstruction
#53021 commented on
Jun 13, 2025 • 0 new comments -
[Data] Refactoring udf context sharing
#53026 commented on
Jun 13, 2025 • 0 new comments -
[data] fix lance dataset schema
#53134 commented on
Jun 9, 2025 • 0 new comments -
[docs] updating broken links on rllib torch doc
#53161 commented on
Jun 10, 2025 • 0 new comments -
Add progress bars to hash operators
#53175 commented on
Jun 13, 2025 • 0 new comments -
[WIP] Fix daft test
#53338 commented on
Jun 9, 2025 • 0 new comments -
[Serve] Allow HTTPs Options in Ray Serve
#26814 commented on
Jun 8, 2025 • 0 new comments -
[WIP] [core] Attempting a basic solution to streaming generator not adding errors to plasma
#53393 commented on
Jun 13, 2025 • 0 new comments -
try running things with protobuf 4
#53442 commented on
Jun 9, 2025 • 0 new comments -
feat: Add QPS-based autoscaling policy for Ray Serve
#53445 commented on
Jun 9, 2025 • 0 new comments -
Replace `python setup.py bdist_wheel` with `pip wheel`
#53458 commented on
Jun 13, 2025 • 0 new comments -
[Data] Added distinct function
#53460 commented on
Jun 11, 2025 • 0 new comments -
[core] Add as_completed and map_unordered APIs
#53461 commented on
Jun 12, 2025 • 0 new comments -
[Serve] make various default values of `AutoscalingConfig.max_replicas` consistent and >1
#50222 commented on
Jun 8, 2025 • 0 new comments -
[core][telemetry/09] record sum metric e2e
#53512 commented on
Jun 11, 2025 • 0 new comments -
[RLlib] MetricsLogger: Fix `get/set_state` to handle tensors in `self.values`.
#53514 commented on
Jun 13, 2025 • 0 new comments -
[core] Turn executed task inserted into a RAY_CHECK
#53522 commented on
Jun 8, 2025 • 0 new comments -
[core][telemetry/10] record histogram metric e2e
#53523 commented on
Jun 11, 2025 • 0 new comments -
[serve.llm] Update ray-llm docker
#53532 commented on
Jun 7, 2025 • 0 new comments -
[RLlib] Upgrade RLlink protocol for external env/simulator training.
#53550 commented on
Jun 13, 2025 • 0 new comments -
Script to generate test coverage for doc files
#53556 commented on
Jun 13, 2025 • 0 new comments -
[Serve] Multiple FastAPI ingress deployments in a single application are not disallowed
#53024 commented on
Jun 8, 2025 • 0 new comments -
[Not for Merge] Event Aggregator Perf
#53576 commented on
Jun 13, 2025 • 0 new comments -
[Doc] vale ignores anchors of headers
#53580 commented on
Jun 9, 2025 • 0 new comments -
[core][telemetry/11] record legacy-legacy metrics e2e
#53596 commented on
Jun 11, 2025 • 0 new comments -
[Train] Allow customization of FPS for wandb logger; instead of slow 4 FPS
#50186 commented on
Jun 7, 2025 • 0 new comments -
Fix 53605
#53607 commented on
Jun 9, 2025 • 0 new comments -
[core] Remove experimental `max_cpu_frac_per_node`
#53610 commented on
Jun 10, 2025 • 0 new comments -
[core] Support broadcast collective for compiled graphs
#53625 commented on
Jun 10, 2025 • 0 new comments -
[Serve] Unable to load meta-llama/Llama-3.3-70B-Instruct
#53571 commented on
Jun 7, 2025 • 0 new comments -
[core] [easy] readability improvements for IO Workers
#52590 commented on
Jun 10, 2025 • 0 new comments -
[Dashboard] Allow getting dashboard URL via RuntimeContext
#52676 commented on
Jun 10, 2025 • 0 new comments -
check if ray is installed when using conda env
#52677 commented on
Jun 10, 2025 • 0 new comments -
Bump transformers from 4.30.1 to 4.50.0 in /doc/source/templates/testing/docker/03_serving_stable_diffusion
#52681 commented on
Jun 14, 2025 • 0 new comments -
[core] Minor pull manager cleanup
#52724 commented on
Jun 9, 2025 • 0 new comments -
[Core][Refactor] Create separate RPCs for cancelling prepared PG bundle and removing PG
#52751 commented on
Jun 9, 2025 • 0 new comments -
[core] Remove copy when receiving small object returns
#52777 commented on
Jun 9, 2025 • 0 new comments -
[core] Remove small task output copy on task execution path
#52778 commented on
Jun 9, 2025 • 0 new comments -
[core][refactor] Move `to_resubmit_` from CoreWorker to TaskManager to avoid an abstraction leak
#52779 commented on
Jun 10, 2025 • 0 new comments -
[WIP][Data] replace tensor array with naive pyarrow tensor
#52784 commented on
Jun 10, 2025 • 0 new comments -
[Core] Increase timeout of start_api_server and make it configurable
#52789 commented on
Jun 14, 2025 • 0 new comments -
[ci] try running cicd unit tests in forge env
#52792 commented on
Jun 12, 2025 • 0 new comments -
Bump minimum pyarrow version to 17
#52820 commented on
Jun 14, 2025 • 0 new comments -
[DONT-MERGE] Adding a sleep to start debugging windows tests.
#52822 commented on
Jun 10, 2025 • 0 new comments -
[Data] remove empty lance read tasks
#52831 commented on
Jun 14, 2025 • 0 new comments -
Train Tests: Use map_batches for image_classification
#52837 commented on
Jun 9, 2025 • 0 new comments -
[DNR] remove ensure_liveness
#52913 commented on
Jun 10, 2025 • 0 new comments -
[data] ResourceManager: always reserve min resource requirement
#52914 commented on
Jun 10, 2025 • 0 new comments -
[core] Synchronize locations with pinned_at_raylet_id
#52920 commented on
Jun 10, 2025 • 0 new comments -
[core] Add sync get node info to NodeInfoAccessor
#52928 commented on
Jun 10, 2025 • 0 new comments -
Ray Data Enhancement: Percentiles and Statistical Aggregations (PR #52588)
#52937 commented on
Jun 10, 2025 • 0 new comments -
Add pinned_memory and non_blocking transfer for default collate_fn
#52948 commented on
Jun 10, 2025 • 0 new comments -
[Data] Replace `_MapWorker` name with operator names
#52949 commented on
Jun 10, 2025 • 0 new comments -
[Data] fix write_iceberg error
#52956 commented on
Jun 10, 2025 • 0 new comments -
[deps] upgrade pandas to always use 2+
#52961 commented on
Jun 11, 2025 • 0 new comments -
[Core] Propogate InvalidArgument Status from LabelSelector Data Type
#52964 commented on
Jun 13, 2025 • 0 new comments -
[RLlib; Offline RL] - Use `iter_torch_batches` in learner
#52968 commented on
Jun 13, 2025 • 0 new comments -
[core] Use GetResourceLoadRequest as a substitute liveness check
#52971 commented on
Jun 10, 2025 • 0 new comments -
[Data] Fixing null-safety when converting to `TensorArray`
#52977 commented on
Jun 10, 2025 • 0 new comments -
[Not for Merge] Increate Timeout for asan tests to Repro in CI
#52993 commented on
Jun 10, 2025 • 0 new comments -
[Core] Identify Mac M1/M2 GPUs as valid GPUs
#39136 commented on
Jun 9, 2025 • 0 new comments -
[Data] Adding streaming capability for `ray.data.Dataset.unique`
#51207 commented on
Jun 9, 2025 • 0 new comments -
[distributed debugger] vscode extension does not accept windows path when configuring cluster
#53088 commented on
Jun 9, 2025 • 0 new comments -
[container] Publish multi-architecture container images
#41727 commented on
Jun 9, 2025 • 0 new comments -
[Data] Significant Memory Leak / OOM When Reading Large Parquet Files with RayData
#49158 commented on
Jun 9, 2025 • 0 new comments -
Uv sync with project using Ray fails installing on Python 3.13
#52819 commented on
Jun 9, 2025 • 0 new comments -
[data] ObjectRefs passed to map UDF are not automatically deref'ed
#49207 commented on
Jun 9, 2025 • 0 new comments -
[Serve] Make replica scheduler backoff configurable
#52871 commented on
Jun 10, 2025 • 0 new comments -
[Conda] Ray should raise exception when ray is not installed in conda environment
#52672 commented on
Jun 10, 2025 • 0 new comments -
[Core] Submitted containerized job is stuck in pending mode
#37293 commented on
Jun 10, 2025 • 0 new comments -
[ray.serve.llm] serve.llm with streaming has overhead compared to vllm-v0 for a single replica when concurrency > 32
#52746 commented on
Jun 10, 2025 • 0 new comments -
Ray serve + core steaming is slow at high concurrency
#52745 commented on
Jun 10, 2025 • 0 new comments -
[core] ray.init does not work if run in a node with external ip while the cluster is started internally
#8244 commented on
Jun 10, 2025 • 0 new comments -
ray azure does not work out of the box
#52511 commented on
Jun 10, 2025 • 0 new comments -
[Azure] Ray up for Azure fails
#48976 commented on
Jun 10, 2025 • 0 new comments -
Support Availability Zone Deployment in Azure
#39966 commented on
Jun 10, 2025 • 0 new comments -
[Serve] Autoscaling not working correctly when `max_replica_per_node` is set in Ray Serve
#53582 commented on
Jun 10, 2025 • 0 new comments -
[Serve] `fastapi_app` is still mutable in the deployment constructor after being passed to `@serve.ingress`
#52775 commented on
Jun 10, 2025 • 0 new comments -
[core] ray.init() not possible even while on same network as Ray Cluster.
#53520 commented on
Jun 10, 2025 • 0 new comments -
[core] TPU Visible Chips not set correctly
#53569 commented on
Jun 10, 2025 • 0 new comments -
[Bug] Dashboard can't start with TLS on
#22466 commented on
Jun 10, 2025 • 0 new comments -
[Train] Support for `lightning.pytorch` on the `mps` backend
#49858 commented on
Jun 10, 2025 • 0 new comments -
[Ray Core/Dashboard] - Installing Ray via UV breaks dashboard.
#53608 commented on
Jun 10, 2025 • 0 new comments -
[Serve.llm] Clean up output logs and give option to opt out of different verbosity levels
#53492 commented on
Jun 10, 2025 • 0 new comments -
[llm] Roadmap for Data and Serve LLM APIs
#51313 commented on
Jun 11, 2025 • 0 new comments -
[Data] [optimizer] map/map_batches should output the same number of rows as the input
#36295 commented on
Jun 11, 2025 • 0 new comments -
[Core] Read-only buffer error in some scikit-learn models
#52571 commented on
Jun 11, 2025 • 0 new comments -
[Core] ASSERTION FAILED: queue.num_items() == 0
#53510 commented on
Jun 11, 2025 • 0 new comments -
[Core] Ray dashboard agent high memory usage
#52639 commented on
Jun 11, 2025 • 0 new comments -
[Core][ROCm] Setting CUDA_VISIBLE_DEVICES leads to an assertion
#52701 commented on
Jun 11, 2025 • 0 new comments -
[Core] `ray job submit` doesn't always catch the last lines of the job logs
#48701 commented on
Jun 11, 2025 • 0 new comments -
[doc] Add documentation guide for MPI on Ray.
#41626 commented on
Jun 11, 2025 • 0 new comments -
[Dashboard] Decoupling dashboard and dashboard lifetime from Ray Cluster
#46444 commented on
Jun 9, 2025 • 0 new comments -
[Core] Ray_tasks and ray_memory_manager_worker_eviction_total metrics should emit 0 instead of null for each state at start
#47616 commented on
Jun 9, 2025 • 0 new comments -
[Ray debugger] Unable to use debugger on Ray Cluster on k8s
#45541 commented on
Jun 9, 2025 • 0 new comments -
[Dashboard] Should specify the time range in job detail page for load the cluster status and scale metrics
#41781 commented on
Jun 9, 2025 • 0 new comments -
[Dashboard] Add job retention mechanism
#35700 commented on
Jun 9, 2025 • 0 new comments -
[Ray Dashboard] Set Route Prefix/Base Address
#35269 commented on
Jun 9, 2025 • 0 new comments -
[Core] Expose logs for runtime environment installation process on worker nodes for remote Ray clusters
#34310 commented on
Jun 9, 2025 • 0 new comments -
[State API] ray log truncation message improvements
#32392 commented on
Jun 9, 2025 • 0 new comments -
[Core] Core metrics observed from worker nodes do not propagate to Prometheus
#31675 commented on
Jun 9, 2025 • 0 new comments -
[core][state] list objects show objects if spilled.
#31374 commented on
Jun 9, 2025 • 0 new comments -
Troubleshooting the root cause that cluster_status is undefined
#40076 commented on
Jun 9, 2025 • 0 new comments -
[Event] Support rotation for the event log files
#39591 commented on
Jun 9, 2025 • 0 new comments -
[Dashboard] Support row in the dashboard.
#38024 commented on
Jun 9, 2025 • 0 new comments -
[dashboard][UI] [TaskTable] Rendering task table has a 8s delay
#36656 commented on
Jun 9, 2025 • 0 new comments -
[UI] [Log viewer] Prevent the infinite loading for logs fetch
#36486 commented on
Jun 9, 2025 • 0 new comments -
[Dashboard] Kill a job
#30182 commented on
Jun 9, 2025 • 0 new comments -
[Dashboard] [CI] Add tests for uncaught exceptions
#29809 commented on
Jun 9, 2025 • 0 new comments -
[Core] Setting python log level for ray processes
#29758 commented on
Jun 9, 2025 • 0 new comments -
[Core] timeline doesn't show all infos.
#28320 commented on
Jun 9, 2025 • 0 new comments -
[State Observability] Improve ray list job implementation.
#26832 commented on
Jun 9, 2025 • 0 new comments -
[autoscaler][logs] Improve status logging
#26670 commented on
Jun 9, 2025 • 0 new comments -
[Autoscaler/Core][Code quality] Handle autoscaler event logging through RPC, not logs.
#26186 commented on
Jun 9, 2025 • 0 new comments -
[core] Node IDs not consistent across APIs
#25090 commented on
Jun 9, 2025 • 0 new comments -
[Jobs] Setting `RAY_LOG_TO_STDERR` results in empty job logs
#24886 commented on
Jun 9, 2025 • 0 new comments -
[Core Observability] Include name to actor log prefix + process name
#24876 commented on
Jun 9, 2025 • 0 new comments -
[jobs] [Feature] Support streaming job logs to stdout/stderr
#23564 commented on
Jun 9, 2025 • 0 new comments -
[dashboard] Wonky GPU display
#14664 commented on
Jun 9, 2025 • 0 new comments -
[metrics] ray.util.metrics API should closely mirror the prometheus python API
#14496 commented on
Jun 9, 2025 • 0 new comments -
[metrics] Report general metrics for gRPC
#14368 commented on
Jun 9, 2025 • 0 new comments -
[core] Getting node IP address by object ref
#13630 commented on
Jun 9, 2025 • 0 new comments -
[Logs] Spdlog doesn't rotate raylet.out and gcs_server.out
#13466 commented on
Jun 9, 2025 • 0 new comments -
Ray dashboard_url and prom_discovery.json files not scoped to session dir
#12662 commented on
Jun 9, 2025 • 0 new comments -
verify windows wheels.
#43442 commented on
Jun 8, 2025 • 0 new comments -
[core] Fix max_calls option when used on a worker that is part of a workflow
#43700 commented on
Jun 11, 2025 • 0 new comments -
remove flaky marker from test
#44033 commented on
Jun 12, 2025 • 0 new comments -
RuntimeContext support get actor namespace
#45025 commented on
Jun 11, 2025 • 0 new comments -
[Core] Improve logging during accelerator auto-detection
#45240 commented on
Jun 14, 2025 • 0 new comments -
Enable setting OS disk size in Azure
#45867 commented on
Jun 10, 2025 • 0 new comments -
Fix malformed `temp_dir` path when connecting Windows workers to cluster with Linux head
#45930 commented on
Jun 9, 2025 • 0 new comments -
[URL] Change the absolute path to a relative path to solve the ingres…
#45933 commented on
Jun 10, 2025 • 0 new comments -
python/ray/autoscaler/gcp/*.yaml: change scheduling from dict to list
#46500 commented on
Jun 14, 2025 • 0 new comments -
Fix mlflow artifact logging
#46570 commented on
Jun 10, 2025 • 0 new comments -
[Core] If possible, force flush the trace when the worker ends.
#46654 commented on
Jun 14, 2025 • 0 new comments -
[Core]Support Merge code search path from env variable
#46771 commented on
Jun 11, 2025 • 0 new comments -
[bazel] move python rules up
#47260 commented on
Jun 12, 2025 • 0 new comments -
[RayCluster] Introduce how to run ray remote job with ray client (#47…
#47771 commented on
Jun 11, 2025 • 0 new comments -
Add kuberay operator addon to cmd in gke-gcs-bucket.md
#48268 commented on
Jun 11, 2025 • 0 new comments -
(WIP) [core][compiled graphs] Unify code paths for NCCL P2P and collectives scheduling
#48649 commented on
Jun 12, 2025 • 0 new comments -
[Core]: Fix ConnectionError on Autoscaler CR lookups in K8s clusters …
#48675 commented on
Jun 7, 2025 • 0 new comments -
Update azure.md - Missing azure dependency
#49104 commented on
Jun 11, 2025 • 0 new comments -
[RLlib] Add NPU and HPU support to RLlib
#49535 commented on
Jun 13, 2025 • 0 new comments -
[Core] Add virtual cluster
#49717 commented on
Jun 12, 2025 • 0 new comments -
[DATA]Add custom resources in data autoscaling
#49756 commented on
Jun 9, 2025 • 0 new comments -
[core] Thread-safe gcs node manager
#50024 commented on
Jun 7, 2025 • 0 new comments -
[core] Always create a default executor
#51058 commented on
Jun 8, 2025 • 0 new comments -
[CI] Replace `black` with `ruff format`
#51332 commented on
Jun 8, 2025 • 0 new comments -
[Dashboard] Support reporting AMD GPU usage
#51345 commented on
Jun 10, 2025 • 0 new comments -
[Core] Cover cpplint for ray/src/ray/common
#51551 commented on
Jun 8, 2025 • 0 new comments -
[py_modules] Don't install the wheel package if it's already installed
#51629 commented on
Jun 11, 2025 • 0 new comments -
windows dev setup
#51678 commented on
Jun 8, 2025 • 0 new comments -
pyarrow.lib.ArrowInvalid: Struct child array #5 does not match type field: null vs double
#53529 commented on
Jun 8, 2025 • 0 new comments -
[core] Lazily subscribe to node changes from workers
#51718 commented on
Jun 7, 2025 • 0 new comments -
[Core] Native CPU affinity support for accelerators
#51719 commented on
Jun 10, 2025 • 0 new comments -
[Core] Turn off RayTaskError cause wrapping functionality
#48320 commented on
Jun 11, 2025 • 0 new comments -
[Core] `ray.init()` and `ray start` fails on Windows 11 in ray 2.45+
#52739 commented on
Jun 12, 2025 • 0 new comments -
[Core] ray._raylet.ObjectRef and ray.types.ObjectRef type compabtibility
#53591 commented on
Jun 12, 2025 • 0 new comments -
[Serve] Proxy actor not started on worker node when using kuberay
#50349 commented on
Jun 12, 2025 • 0 new comments -
[Serve] Specify different images for each deployment
#52994 commented on
Jun 12, 2025 • 0 new comments -
[RLlib] gym.spaces.Sequence unbatching error
#53293 commented on
Jun 12, 2025 • 0 new comments -
[RLlib] ActionMaskingTorchRLModule can't set up `conv_filters`
#53325 commented on
Jun 12, 2025 • 0 new comments -
[RLlib] Type of `AlgorithmConfig.training(learner_connector` is wrong
#53368 commented on
Jun 12, 2025 • 0 new comments -
[RLlib] Env runners error out when interacting with Repeated observation spaces
#53327 commented on
Jun 12, 2025 • 0 new comments -
AttributeError: 'NoneType' object has no attribute 'enable_rl_module_and_learner' with highway-env
#53398 commented on
Jun 12, 2025 • 0 new comments -
[RLLIB] EnvContext.vector_index is always 0
#53419 commented on
Jun 12, 2025 • 0 new comments -
[RLlib][PPO new-API] Large discrepancy between Algorithm.evaluate() and manual inference via restored EnvToModule/ModuleToEnv pipelines on CarRacing-v3
#53588 commented on
Jun 12, 2025 • 0 new comments -
[Core | Serve] Compatibility issue with pydantic>=2.10
#52211 commented on
Jun 12, 2025 • 0 new comments -
[Core] Prevent schedulling non-GPU tasks to GPU nodes
#47866 commented on
Jun 12, 2025 • 0 new comments -
[core][compiled graphs] Slow NCCL init on H200 server
#53619 commented on
Jun 12, 2025 • 0 new comments -
[Data] `dataset.write_iceberg` error
#52967 commented on
Jun 12, 2025 • 0 new comments -
[Data] PyArrow 20.0.0 Backward Incompatability (`unexpected keyword argument 'maps_as_pydicts'`)
#52685 commented on
Jun 12, 2025 • 0 new comments -
[LLM] We need to create a more robust way of handling actor shutdown
#53179 commented on
Jun 13, 2025 • 0 new comments -
CI test windows://python/ray/serve/tests:test_batching is consistently_failing
#46016 commented on
Jun 13, 2025 • 0 new comments -
Global Per-Epoch Shuffling with TorchTrainer
#47460 commented on
Jun 13, 2025 • 0 new comments -
Release test sort_autoscaling failed
#53546 commented on
Jun 13, 2025 • 0 new comments -
CI test windows://python/ray/serve/tests:test_request_timeout is flaky
#48417 commented on
Jun 13, 2025 • 0 new comments -
[Core|Dataset] Ray job stuck with idle actors with no tasks
#45822 commented on
Jun 13, 2025 • 0 new comments -
[core|serve] Migrate shared utilities from `ray._private` to `ray._common`
#53478 commented on
Jun 13, 2025 • 0 new comments -
CI test windows://python/ray/tests:test_object_spilling is consistently_failing
#45961 commented on
Jun 13, 2025 • 0 new comments -
CI test windows://python/ray/tests:test_object_spilling_asan is consistently_failing
#45962 commented on
Jun 13, 2025 • 0 new comments -
CI test windows://python/ray/tests:test_object_spilling_debug_mode is consistently_failing
#43796 commented on
Jun 13, 2025 • 0 new comments -
CI test linux://python/ray/tests:test_runtime_env_container is consistently_failing
#45223 commented on
Jun 14, 2025 • 0 new comments -
[Dashboard] Provide a job dashboard URL link instead of the dashboard link when ray.init is called.
#35427 commented on
Jun 11, 2025 • 0 new comments -
Add Apple silicon GPU(mps) support to ray
#38464 commented on
Jun 12, 2025 • 0 new comments -
updates to setup-dev.py to work around the types.py import issues
#38948 commented on
Jun 14, 2025 • 0 new comments -
[core] Fix a corner case where the RPC never return
#39801 commented on
Jun 11, 2025 • 0 new comments