logo
首页技术栈工具库讨论
adaptive-scheduler

adaptive-scheduler

The Adaptive scheduler solves the following problem, you need to run a few 100 learners and can use >1k cores. `ipyparallel` and `dask.distributed` provide very powerful engines for interactive sessions. However, when you want to connect to >1k cores it starts to struggle. Besides that, on a shared cluster there is often the problem of starting an interactive session with ample space available. Our approach is to schedule a different job for each ` adaptive.Learner`. The creation and running of these jobs are managed by ` adaptive-scheduler`. This means that your calculation will definitely run, even though the cluster might be fully occupied at the moment. Because of this approach, there is almost no limit to how many cores you want to use. You can either use 10 nodes for 1 job (`learner`) or 1 core for 1 job (`learner`) while scheduling hundreds of jobs. Everything is written such that the computation is maximally local. This means that is one of the jobs crashes, there is no problem and it will automatically schedule a new one and continue the calculation where it left off (because of Adaptive's periodic saving functionality). Even if the central "job manager" dies, the jobs will continue to run (although no new jobs will be scheduled.)
由 
bruceshi2021-01-13 收录
--
推荐
不推荐
更多信息
CONDA
conda install -c anaconda adaptive-scheduler
查看
标签
根据用户添加的标签生成
暂无标签