Primal-dual subgradient methods for convex problems. MIT Computer Science & Artificial Intelligence Lab, http://lis.csail.mit.edu/new/research.php. 0000004881 00000 n Machine learning has provided high-impact data-driven technology that has been used in spam filters, recommender systems, object recognition, speech recognition, internet advertisement, demand prediction, market analysis, fault detection, and more. I Equivalently, we can minimize log P(yjx; ). ;��?��7��C��(�+r�*PG%�������O� v@�=@���u�����! 258 0 obj <>stream 0000113731 00000 n 0000002761 00000 n 0000067165 00000 n I We can also minimize other sorts of loss functions log can help for numerical reasons.

+�5�7 0000027169 00000 n 0000148959 00000 n 0000065708 00000 n Understand optimization techniques and their fundamental role in machine learning. Despite the recent great practical success of deep learning, the lack of the theoretical understanding of nonconvex optimization in deep learning has been a major challenge to the achievement of reliability and verifiability of systems with deep learning modules, and to the rigorous study of many proposed methods in deep learning. 0000113993 00000 n Representation learning, such as deep learning, often requires us to deal with nonconvex optimization problems. Goal: minimize some loss function I For example, if we have some data (x;y), we may want to maximize P(yjx; ). 0000100856 00000 n 0000004445 00000 n 0000003689 00000 n 0000014567 00000 n 0000090076 00000 n The goal is to advance the theory and methods in machine learning where learning requires non-convex optimization.

0000136926 00000 n xڍ�P��6�tw��,� )��,��.�KII�� H� �� �( ���tww�����7�3�>��s��ܳ�άgȧ��� >A~�@I��P ���@!\vv#������ ���R��P��H�2�4ԆA>� Aa������%�2�yK�A�G�6? %%EOF 192 0 obj <> endobj �������lˈ��V�I�ޖ�>F��_~��C�. If you would like to contact us about our work, please scroll down to the people section and click on Practice real-world … Overview of Optimization for Machine Learning Often in machine learning we are interested in learning the parameters of a model.

p��\���ҁ!�p��s+�(���f�o��?�������V�����[�����G��i�$���0�*@������j�!>��UG��ˠ uv�{��*��A8����?�ƿ6�����_w�O�r�ܐ�I��*0r{��R� s��fB�b ��7( �䒐�( P���`��4�Ca� �^0� ���LŅ�D 1����HB `�BZ��$���@��$ p��"u0wd�I�T �� ����n���h�@$�� 0000137571 00000 n h�bb�f`b``��� LO ?$�|X�L��ߗ�ۓ!|�B��F�*�����s���O4)�MpE���T�ñ.�Ϙ�{����]g�y�hqhX���hwM�����i%�`�\~���wD�������Iu�$��"]S��[ K��I��$�]�
Machine learning, however, is not simply a consumer of optimization technology but a rapidly evolving field that is itself generating new optimization ideas. %PDF-1.6 %���� Mathematical pro-gramming puts a premium on accuracy, speed, and robustness. 0000066171 00000 n 0000051661 00000 n <> '���W�>��P��X]�O���3�@a � �wp���(��[)�K��!8�� pB��8��_��p�/����[�_�+(p�8 �`g���H1������ �����5�a�0�{�?濏X@��Cc5�? 0000014942 00000 n endstream endobj 193 0 obj <> endobj 194 0 obj <>/Font<>>>/Fields[]>> endobj 195 0 obj <> endobj 196 0 obj <> endobj 197 0 obj <>/ExtGState<>/Font<>/ProcSet[/PDF/Text/ImageC]/XObject<>>>/Rotate 0/StructParents 7/Type/Page>> endobj 198 0 obj [199 0 R] endobj 199 0 obj <>/BS<>/Border[0 0 1]/C[0.0 0.0 1.0]/H/I/Rect[175.808 87.326 475.616 100.96]/Subtype/Link/Type/Annot>> endobj 200 0 obj <> endobj 201 0 obj <> endobj 202 0 obj [583.3 555.6 555.6 833.3 833.3 277.8 305.6 500 500 500 500 500 750 444.4 500 722.2 777.8 500 902.8 1013.9 777.8 277.8 277.8 500 833.3 500 833.3 777.8 277.8 388.9 388.9 500 777.8 277.8 333.3 277.8 500 500 500 500 500 500 500 500 500 500 500 277.8 277.8 277.8 777.8 472.2 472.2 777.8 750 708.3 722.2 763.9 680.6 652.8 784.7 750 361.1 513.9 777.8 625 916.7 750 777.8 680.6 777.8 736.1 555.6 722.2 750 750 1027.8 750 750 611.1 277.8 500 277.8 500 277.8 277.8 500 555.6 444.4 555.6 444.4 305.6 500 555.6 277.8 305.6 527.8 277.8 833.3 555.6 500 555.6 527.8 391.7 394.4 388.9 555.6 527.8 722.2 527.8 527.8 444.4] endobj 203 0 obj [305.6 550 550 550 550 550 550 550 550 550 550 550 305.6 305.6 366.7 855.6 519.4 519.4 733.3 733.3 733.3 702.8 794.4 641.7 611.1 733.3 794.4 330.6 519.4 763.9 580.6 977.8 794.4 794.4 702.8 794.4 702.8 611.1 733.3 763.9 733.3 1038.9 733.3 733.3 672.2 343.1 558.3 343.1 550 305.6 305.6 525 561.1 488.9 561.1 511.1 336.1 550 561.1 255.6 286.1 530.6 255.6 866.7 561.1 550 561.1 561.1 372.2 421.7 404.2 561.1 500 744.4 500] endobj 204 0 obj <>stream

���7⣰��o��6��S|��89hk��sx� ���8]Þ�~���M�v$��+gy�Z79�!I)�l��tc�of^^O��!i�3B5����ɑ�@S��aé�7L��3��4�_s���F��K��F�!Z����ܖ�1�. 18.657: Mathematics of Machine Learning Lecturer: Philippe Rigollet Lecture 11 Scribe: Kevin Li Oct. 14, 2015. 2. 0 from the machine learning and optimization perspectives can be quite different. 0000124149 00000 n

0000100943 00000 n 0000051528 00000 n endstream endobj 257 0 obj <>/Filter/FlateDecode/Index[106 86]/Length 22/Size 192/Type/XRef/W[1 1 1]>>stream

0000114715 00000 n x��]�rd�q���>/gR;G��qnb�*��J��tR�(�]MiH��Y��װ+O�Lw��0.

0000002635 00000 n 0000102720 00000 n

one of the group leads' people pages, where you can reach out to them directly. �M�Adɞ��" �Ad{�Ad"�Ad����&���퍼h~or��߷�v����9HG��E~��U���[�\\n�K6�EpL�hae?�R�s���Л�����>����� [HW+@����*��?�p3:�h���gĭ���3�Y�[�O5����ެ�(�� 0000076526 00000 n 0000137220 00000 n stream 0000049306 00000 n There is much more to this topic than will be covered in this class so you may be 0000026884 00000 n -`�Z����6S�ѹ�EY� )o��a���\�,-d-rw%[kk����M��T��4��^��V�q^1��X��tS�cK���J\��8`�19���=8^�*��t�, �t¢h.�d�{�OO�L��&q�SQ�/9�Rt�@��� ��b|���x���9m�M�Z���8{�چ�����2�8GQ.�O[/�b 9x�G���}0��c�,}}����d�ik��.��c�Xp�J����̊��*~UF8�I;��0S��u@w'ɶ���S���rV u�}vl�������Ţ�����X�`��4.v;��>����v���S���9-�~I�}]B��[�rl�}p��K����֧>W���\C.X��. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, Bernhard Sch¨olkopf and Alexander J. Smola Introduction to Machine Learning, Ethem Alpaydin Gaussian Processes for Machine Learning, Carl Edward Rasmussen and Christopher K. I. Williams . 192 67 0000065436 00000 n C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006, ISBN … 0000159722 00000 n 0000003303 00000 n LC���:��$�

0000001669 00000 n

0000089135 00000 n 0000015056 00000 n 0000013924 00000 n 0000076658 00000 n 0000027206 00000 n T� ��D��iO6`rR����[�dB m��!1#8��:*� �v{��.�$(���!��έ>C_�#%���� 0000003556 00000 n 0000048859 00000 n 0000137819 00000 n CONVEX OPTIMIZATION FOR MACHINE LEARNING. Whereas traditional machine learning makes use of features designed by human users or experts as a type of prior, representation learning tries to learn features from the data as well. �!`(��u{����Z ]O0�c�?x �/�w�?��@;�`� h �p�����Z�/ u�er�Ð� _�d�4�]:��� !

startxref 0000002795 00000 n Optimization for Machine Learning Editors: Suvrit Sra suvrit@gmail.com Max Planck Insitute for Biological Cybernetics 72076 Tu¨bingen, Germany Sebastian Nowozin nowozin@gmail.com Microsoft Research Cambridge, CB3 0FB, United Kingdom Stephen J. Wright swright@cs.uwisc.edu University of Wisconsin Madison, WI 53706 This is a draft containing only srachapter.tex and an abbreviated front …

As such it has been a fertile ground for new statistical and algorithmic developments. )+�R��h4��u�����ϯ^���ÿo_��B�_N�ίN>;}��7���)ON߼H��I'ޘA�prz��V\�����^ݬ7bPF[eW���->K���k���*�M��z�L�)����V>��W�V�C������Y�-����b��D΄�+h �6�{x��m��S�_k�m0���)c�[���T���Ư|D�շ+�Z%Fd�z%ҳ0�`�{�� ����${�:�-{�?�d�{�{F��O��Bk9(k@�N/@y��4*X����w���$.��a�o>�8��V�.��|@YM$>%i�L!>�ΪW�fi7���I�[��u�_�-.��{{(�O��'Ҁ�8���"N���3�7N&��-�e��G�����;��j���T��>��Lf�6F�J ��W��8}����>���M��Ы?�y�����b�m�!B������{磈��������7׉����z�"���*�������~j���$.�|�'w!5L>�9�]Ê��M�mP jB�B�M�Ґ��3� xG�㝉�@a 0000149646 00000 n 0000137704 00000 n Since generalization is the bottom line in machine learning and training is normally done off-line, accuracy and small speed im-provements are of little concern in machine learning. *)��AZZFɠA�5聆0����@�F����������Y,+�R%�m�U�f�n�aX�#�C�aU�C&�gCVv��T X���� �T���&���A�A����M�.��R��M��I�Gx�R?H#��q�f��1�1f0?������@�Ɔ�9L'�5�x$������@�����.����?�%v ����SH320�Y�Ue2�_���2� ���� 0000076393 00000 n

0000101601 00000 n 0000002299 00000 n 0000102304 00000 n 0000149227 00000 n Stochastic Optimization for Machine Learning Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Nesterov’s 60 Birthday, February 2016 Shalev-Shwartz (HU) SO for ML Nesterov’2016 1 / 38. 0000160000 00000 n @�qٕ`��g2�_� N.����8�ow���� ��A�2��`s��� �)�@xJ �����0 ��޾`G���: ����\ �?�0'��@ 0000139283 00000 n 0000003822 00000 n ߴ��x J/�����

Representation learning, such as deep learning, often requires us to deal with nonconvex optimization problems. 0000114846 00000 n %�쏢

0000159496 00000 n �%�T�R �|t����Q�IBf����j��V]=cK��氳O=g�,d�4�՚v6P٣(7Y�ie���~�b*!���s�x��!�)���_��f�,���i��ζ*�3�Jx���B���ܗF��j��@P� trailer xref h�b```b``����� �� Ā B@1v�/��s'r�0�P-��l���/80V4vL\b�pr�ːmY%�Z��ζۛt�P;?�L+ɏO�4K��4��+�b�I�W��d���)Z
0000076938 00000 n 0000002830 00000 n 0000014190 00000 n In this lecture, we will cover the basics of convex optimization as it applies to machine learning. 5 0 obj 0000003330 00000 n Demystify machine learning through computational engineering principles and applications in this two-course program from MIT ... Assess and respond to cost-accuracy tradeoffs in simulation and optimization, and make decisions about how to deploy computational resources. 0000089441 00000 n Whereas traditional machine learning makes use of features designed by human users or experts as a type of prior, representation learning tries to learn features from the data as well. 0000066848 00000 n As the amount of available data is increasing, we are facing a rise in the importance of machine learning itself, and also in the demand of machine learning methods that are more adaptable to the data at hand, such as representation learning. %PDF-1.2 <]/Prev 345972/XRefStm 2299>> 0000000016 00000 n 0000076789 00000 n 0000014325 00000 n 0000138609 00000 n Broadly speaking, Machine Learning refers to the automated identification of patterns in data. 0000114259 00000 n

0000014058 00000 n [�[���� ��D� III���$ ��a�@�?����:� ���Z�����O Happy Birthday Yuri The single paper that made the largest impact on my PhD thesis. Computer Science & Artificial Intelligence Laboratory. 0000002472 00000 n By analyzing the properties of nonconvex optimization in machine learning, it is aimed to propose new machine learning models and optimization methods. 0000137951 00000 n