Handling large objects with Ray Pool map

Questions : Handling large objects with Ray Pool map

481

I am looking to use Ray's pool mapping programming to perform a CPU heavy task on lots of Learning data. My initial attempt looked like Earhost this:

import ray
from ray.util.multiprocessing _OFFSET);  import Pool
from functools import (-SMALL  partial

def compute(series1, series2):
 _left).offset     return ***computationally heavy arrowImgView.mas  task***
               
def (self.  preprocess(seriesseq, indeces):
    equalTo  series1 = seriesseq[indeces[0]]
    make.right.  series2 = seriesseq[indeces[1]]
    mas_top);  return compute(series1, series2)

func = ImgView.  partial(preprocess, datablock) 
with ReadIndicator  Pool() as p:
    p.imap(func, combos, _have  chunksize=10000)

The issue with this particular most effective implementation is that in my case, wrong idea datablock is 27GB large and combos is use of case 1.2GB large. This caused my machine United with 64GB of RAM to run out of memory. Modern Looking at the Ray dashboard, it took ecudated around 40 minutes to start the some how computation, and it rapidly sucked down anything else memory causing a OOM system lockup. I not at all can only assume it was trying to copy very usefull the 27GB dataset to every worker which localhost caused the issue.

I later modified the function to handle love of them Ray references to get around the memory localtext issue as such:

datablock_ref=ray.put(datablock)

def .equalTo(  preprocess_reference(seriesref, make.top  indeces):
    OFFSET);  seriesseq=ray.get(seriesref)
    series1 (TINY_  = seriesseq[indeces[0]]
    series2 = .offset  seriesseq[indeces[1]]
    return mas_right)  compute(series1, series2)

func = ImgView.  partial(preprocess_reference, Indicator  datablock_ref) 
   with Pool() as p:
    Read     p.imap(func, combos, _have  chunksize=10000)


        

This actually worked to keep the memory basic in check, and I suspect it only copied one of the over over the 1.2GB combos object rather click than the 27GB dataset. However, there is noting compared to smaller datasets that I not alt could run with the first code snippet in not at all memory, this took longer than I my fault expected, far more even than one would issues expect given the increased data sizes. trying Strangely, I noticed that my cpu was get 4th result pinned at 100% the whole time it was round table calculating, but the computer was double chance completely responsive, and starting novel prc other tasks on it actually decreased the get mossier reported CPU usage. These off side back counter-intuitive observations indicate the changes to me that the computer is spending a Nofile hosted lot of time simply waiting on ray.get() transparent text calls, which would be reported as cpu Background movment usage, but is actually not doing any front page design computation. This obviously is very life change quotes inefficient and likely the reason the I'd like computation took so long.

My question is if this approach is the to know correct one for situations like my own which event when Ray is operating on datasets that is nearer. do not fit into memory, and if I am Now, the correct that this method suffers from code that lethargy in the ray.get() function. If I've written this is the case, is there some way to relies on asynchronously queue up chunks for the a comparison preprocess/compute function to work and it faster on the data?

Total Answers 0

Top rated topics

Match two values in one sheet with two values in another sheet

Deserialize an in-memory Hadoop sequence file object

Can't make a simple If Else statement work

Reading json file into Spark DataFrame

Terraform module as "custom function"

Pulling Specific strings and integers from CSV file and writing it to .txt file using python

Cypher. Request with an analog of 'while' or 'for'

What are these highlighted bars in Datagrip?

How to register for 32-Bit COM interop using Visual Studio 2022

How do I compile files linked with llvm-link and pass my customized libraries and compiler pass to clang?

Formatting issue with HTML template for r Shiny output table

How use Inner Join with a computed column in EF Core?

How to conditionally call an api depending on received value

How to calculate default value for react js state hook with a function?

How can I set the value of a Series at a specific in a chainable style?

Count colored cells in multiple sheets in a single Spreadsheet

Is there a way to filter data from jinja2 with using javascript?

Tanzu Kubernetes NotAuthenticated is set on the volume on virtualmachine

How to perform a sum for all previous records

How to display a value on Tkinter window from a function every second?

Disabled option added to node not working with vuetify

How to store an instance of class in a vector?

Swap last two elements of a list in Prolog

Dealing with zip files in a targets workflow

Django deleteview edit performance

How to write out nested key YAML from TCL?

How to find which part of text message has a nested link/and open it?

EF core best way to update only specific fields for large table

How to make navigation buttons next to each other horizontally

Xamarin.Forms Android App API Level range on Google Play Store

How to switch pages in the Xamarin Shell without a back button in the top navigation

Why does this recursion crash?

Drawing an outline around a series of coordinates

Update single cell in datagrid WPF

Converting ISO datetime format to readable format in React

Wcf webservice not working after migration to https

Android Content provider. How to get all data by one request from contact book

How to synchronize shared variable with two Semaphores?

Can't use Web Share API to share a file in my React typescript App

How to drop rows with string <NA> value and trim strings from pandas data frame

Terminate program that opens webcam when pressing key

Can't Connect to Notion API

Mute command with database discord.py

Firebaseapperror: failed to parse private key: error: invalid pem formatted message

Postgres; select integers representing date and time query using to_timestamp between '2021-12-01 00:00:00' and '2021-12-01 23:59:59'

I have a BizTalk application with a dynamic send port that is set to "MQSeries". Can I programmatically set its properties?

Node.js Sass version 7.0.0 is incompatible with ^4.0.0 || ^5.0.0 || ^6.0.0

Can winget install an older version of Python?

Deleting everything but relevant Data

Gradle build fails for dependency of avro-tools

Top