DSW: Integrated development platform for AI R & D

Posted May 29, 20204 min read

Press conference portal [Product Details]( https://www.aliyun.com/activity/bigdata/painewproduct?spm=a2c6h . 12873622.0.0.6a5346b9E1pZ6W) Cloud-native technology, focusing on user experience and improving R & D efficiency Environment construction is an important part of the algorithm development process. In addition to hardware selection, installation and configuration of the software environment here, subsequent upgrades often take a lot of time. With the help of cloud-native technologies such as Alibaba Cloud ECS, Docker, and Kubernetes, DSW can help users complete the environment in 2-3 minutes. Users can choose all resource specifications including CPU and heterogeneous computing GPU provided by Alibaba Cloud ECS according to algorithm needs and cost considerations. Not only supports prepaid, but also postpaid. DSW is also configured with more than 10 typical software environment configurations suitable for different AI scenarios, including different versions of mainstream training frameworks such as Tensorflow and PyTorch, for users to choose. As a highly open development environment, DSW opens sudo permissions to users and supports the installation of any third-party libraries. In order to meet the algorithm users of different levels and development habits, combining visualization, interactive programming and command line input, DSW provides three kinds of programming entrances:WebIde is suitable for projects with relatively high engineering requirements; JupyterLab is suitable for rapid POC experiments; The portal can be used to quickly execute Shell commands, run programs and simple edits.

DSW has also developed and pre-installed various JupyterLab and WebIDE plug-ins, such as the visual tool Tensorboard, which is widely loved by deep learning developers. Users can open it through Launcher and Commands in DSW, and they can even use%tensorboard magic commands to open directly in the Notebook There are many ways to use Tensorboard. Not only supports local files, but also can open the training logs stored in OSS and ODPS. Aiming at the feature that algorithm students use Python more, Python plug-in is installed in DSW's WebIDE, which can be debugged online in the browser directly, and the program is tracked step by step. Users can also install any plug-ins that they need according to their needs. DSW supports reading and writing from a variety of data sources, including NAS, OSS, cloud disk and MaxCompute, especially the built-in dswmagic magic command allows users to read and write data in MaxCompute tables using SQL statements in ipynb files, preset SQL editor support Syntax highlighting, smart prompts, automatic completion and other functions, and also supports running Sql scripts with variable substitution functions. The query results are automatically displayed in the most friendly graphical form. In order to save resource costs, users can use the shutdown and no-charge function to shut down and save the environment when the instance is not in use, and then quickly restore it with one click. In addition, DSW supports user-defined image installation and can create instances based on previously saved or customized environments. DSW has supported multiple Tianchi competitions with more than 100 teams on the public cloud. It has also been followed by competitions within the Ali Group. After all the hard work, it proved to be not only suitable for individual and team research and development, but also supports large-scale algorithm competitions and education and training. Security and stability are issues that users are more concerned about. The computing, storage, and network resources used by DSW are completely purchased using the user's own account and deployed inside the user's own vpc, which can facilitate access to other user data. Users are completely isolated and have very good security. Based on Alibaba Cloud ECS and container services, stability is guaranteed. Integrate PAI's capability components to accelerate business landing As a member of the PAI family, in addition to completing the stand-alone development and training functions, DSW also has some PAI basic capabilities built in. For example, users can directly use the PAI vision algorithm package EasyVision to perform image classification training evaluation and prediction in the instance; through automatic tuning of AutoML for algorithm hyperparameter automatic tuning; users can even enjoy the PAI compilation optimization algorithm component TAO provided without perception Operator optimization during training. Finally, DSW also provides a CommonIO component for the algorithm to directly read MaxCompute table data. It supports standard interfaces such as TableRecordDataSet, TableReader, and TableWriter to facilitate the submission of training programs directly to PAI's distributed training cluster. It is foreseeable that in the near future, more PAI algorithm packages will be built into the DSW base image. DSW will also rely on the PAI SDK to provide users with one-stop services such as pipeline construction, scheduling and management around the key links of AI R & D and production processes such as data reading, processing, model training, model management and online services.