Discover the latest results, algorithms & trends in the AI world
November 20th, 2020
24h NON-STOP
Data Science UA will gather participants from all over the world at the 9th Data Science UA Conference which will be held online on November 20th, 2020.
The conference will last for 24 hours non-stop consisting of three significant tracks: Technical track, Workshops track, and Business track.
Speakers from TOP companies Amazon, Facebook AI, Airbus, Nvidia, Google, IBM and others are going to share experiences and discuss as much as possible about how AI transforms the world today and what is going to be tomorrow.
There will be 3000+ participants and 70 speakers from the world’s best companies.
At the conference, you will learn step-by-step algorithms through practical workshops as well as insights which you will be able to bring to life in your own work.
Let’s build Data Science Community together!
Report language: English
November 20th, 2020
Start: 8am GMT (10:00 Kyiv time)
Julien Simon
Principal Developer Advocate, AI & Machine Learning at Amazon
End-to-end natural language processing with Amazon SageMaker
Julien Simon
Principal Developer Advocate, AI & Machine Learning at Amazon
End-to-end natural language processing with Amazon SageMaker
Technical track
Bio
As a Global AI & Machine Learning Evangelist, Julien focuses on helping developers and enterprises bring their ideas to life. He frequently speaks at conferences, and also blogs on the AWS Blog and on Medium.
Abstract
In this code-level talk, we will start from a large natural language dataset. Using Python and Jupyter, we will first run a batch job on SageMaker Processing in order to clean, stem, and tokenize the dataset. Then, we'll use fully managed infrastructure to train topic modeling models with the Latent Dirichlet Allocation and Neural Topic Modeling algorithms, two built-in algorithms in SageMaker. Finally, we'll deploy both models, we'll run predictions and we'll compare results.
Shagun Sodhani
Research Engineer at Facebook AI
A tutorial on Policy Gradients
Shagun Sodhani
Research Engineer at Facebook AI
A tutorial on Policy Gradients
Technical track
Bio
Hi! I am Shagun, a Research Engineer with Facebook AI Research. Before that, I was an MSc student at Mila (Quebec Artificial Intelligence Institute) with Prof Yoshua Bengio and Prof Jian Tang. My research focuses on lifelong reinforcement learning - training AI systems that can interact with and learn from the physical world (reinforcement learning) and consistently improve as they do so without forgetting the previous knowledge (lifelong learning). My stack primarily comprises of Python (and related ML/DS/visualization toolkits). I love to play with new technology and look forward to meeting new people at Data Science UA. Website: https://shagunsodhani.com Previous Talks: https://shagunsodhani.com/talks/"
Abstract
Policy Gradient Algorithms is a popular and widely-applicable family of Reinforcement Learning Algorithms. Several state-of-the-art RL algorithms (PPO, SAC, IMPALA, etc.) are variants of policy gradient algorithms. The main idea behind policy gradient algorithms is to learn a parametric policy by directly optimize the policy (instead of optimizing some value functions, as done in value function based methods). This characteristic makes them a natural fit for tasks where the learning agent can choose an action from a continuous range (e.g., controlling the angle when steering a car). However, they are also useful for tasks with discrete action space (like choosing between accelerator, brake, and clutch). In this talk, we will start with the vanilla policy gradient algorithm. While extremely easy to implement, the basic algorithm suffers from high variance in practice (as we will see during the talk). Then we will talk about some "cheap" yet effective methods for reducing the variance in practice. From there on, we will discuss one of the more commonly used (in practice) algorithms called Soft Actor-Critic (SAC) and will walk through a simple SAC implementation. We will conclude the talk with a discussion on IMPALA, a distributed policy-gradient algorithm that can be used to scale RL agents' training for real-life tasks while using lesser data.
Rich Dutton
Head of Machine Learning for Corporate Engineering at Google
How Google Uses AI and ML in the Enterprise
Rich Dutton
Head of Machine Learning for Corporate Engineering at Google
How Google Uses AI and ML in the Enterprise
Business track
Bio
Rich Dutton is the Head of Machine Learning for Corporate Engineering at Google, where he leads a team of 15 engineers and data scientists across NYC and Austin. Prior to this role, Rich was a tech lead in Bigtable at Google following a 15 year career working in data and analytics across both tech and finance in the US (New York and Seattle), Europe and Asia. When not working, Rich practices Muay Thai and spends time with his family, in Williamsburg, Brooklyn, including his Mini Australian Shepherd, Radia, and his self-driving car, Trinity.
Abstract
This session will outline how Google’s Corporate Engineering team is using AI and machine learning to spur innovation within Google. Additionally, Rich will identify the work that his team does (the structure, example use cases etc.), and the research that’s driving the work his team does and the democratization of AI (work in ML Fairness, Privacy, Interpretability and AutoML technologies).
Dr. Sergei Bobrovskyi
Data Scientist at Airbus
Deep Learning Anomaly Detection
Dr. Sergei Bobrovskyi
Data Scientist at Airbus
Deep Learning Anomaly Detection
Technical track
Bio
Dr. Sergei Bobrovskyi is a Data Scientist within the AI Platforms team at Airbus. His work focuses on applications of AI for anomaly detection in time series, spanning various use-cases across Airbus.
Abstract
Many modern products, as well as manufacturing systems, produce large amounts of sensor signals, which cannot be analyzed and even captured in its totality by humans. In this talk, we focus on automatic anomaly detection tasks for sensor data. We assess the industrial viability of various semi-supervised anomaly detection systems based on Deep Learning for automatic discovery of point, contextual, and collective anomalies on large datasets with little prior knowledge.
Oleksandr Maksymets
Research Engineer at Facebook AI
Embodied AI: Agents that can See, Talk, Act and Reason
Oleksandr Maksymets
Research Engineer at Facebook AI
Embodied AI: Agents that can See, Talk, Act and Reason
Technical track
Bio
Oleksandr Maksymets is a research engineer at Facebook AI Research (FAIR) working on embodied agent navigation using deep learning. Co-author and maintainer of Open Source Habitat AI Framework from FAIR that brings community benchmarks to the field and supports simulation to reality transferability. Oleksandr was one of the organizers of Embodied AI challenges and workshops for CVPR 2019/2020.
Abstract
Imagine walking up to a home robot and asking “Hey robot – can you go check if my laptop is on my desk? And if so, bring it to me.” AI Habitat enables training of such embodied AI agents (virtual robots and egocentric assistants) in a highly photorealistic & efficient 3D simulator, before transferring the learned skills to reality. We will talk about the state of the art in training intelligent agents' domain using machine learning and how to scale a model to 30 years of house walking experience.
Parsa Hosseini, Ph.D
Senior Data Scientist at Tesla
Big Data Analysis with Deep Learning for Epileptic Seizure Prediction
Parsa Hosseini, Ph.D
Senior Data Scientist at Tesla
Big Data Analysis with Deep Learning for Epileptic Seizure Prediction
Business track
Bio
Parsa Hosseini is a senior data scientist at Tesla focusing on the machine learning initiative. He works towards developing machine learning and deep learning algorithms for innovative applications. He has been an adjunct lecturer and faculty member with several universities since 2009 and currently is with Santa Clara University. His research focuses on machine learning, deep learning, signal, and image processing. He received the Ph.D. degree in electrical and computer engineering with research in computer science from Rutgers University, New Brunswick, NJ, the USA in 2018. He has served on the scientific committees and review boards of several national and international conferences and journals. He is a senior member of IEEE.
Abstract
Developing a Brain-Computer Interface (BCI) for seizure prediction can help epileptic patients have a better quality of life. However, there are many difficulties and challenges in developing such a system as real-life support for patients. Because of the nonstationary nature of EEG signals, normal and seizure patterns vary across different patients. Thus, finding a group of manually extracted features for the prediction task is not practical. Moreover, when using implanted electrodes for brain recording massive amounts of data are produced. This big data calls for the need for safe storage and high computational resources for real-time processing. To address these challenges, a cloud-based BCI system for the analysis of this big EEG data is presented. First, a dimensionality-reduction technique is developed to increase classification accuracy as well as to decrease the communication bandwidth and computation time. Second, following a deep-learning approach, a stacked autoencoder is trained in two steps for unsupervised feature extraction and classification. Third, a cloud-computing solution is proposed for real-time analysis of big EEG data. The results on a benchmark clinical dataset illustrate the superiority of the proposed patient-specific BCI as an alternative method and its expected usefulness in real-life support of epilepsy patients.
Sandeep Jain
Leader - Data Science at IBM
Impediment to Predictive Maintenance
Sandeep Jain
Leader - Data Science at IBM
Impediment to Predictive Maintenance
Business track
Bio
Dr. Sandeep Jain is a General Manager, Advanced Analytics and Optimization at IBM. His experience spans across supply chain planning and optimization, predictive analytics for aerospace, oil & gas, heavy equipment, energy & utility and CPG. He is a Ph.D. from the Indian Institute of Science in OR/Management Science. Sandeep has published papers in various journals and conferences.
Abstract
The general perception about predictive maintenance is reduced equipment downtime, better productivity, and higher utilization which results in reduced cost and increase in revenue. The data scientist, maintenance & operations face challenges due to the quality & volume of data from IoT devices & storage. Sometimes failure data is so sparse which makes it difficult even to train the models. There are some methods which can be used to minimize the risk of failure of implementation of predictive maintenance projects
Mohamed Rachidi
Lead Machine Learning Engineer at Salesforce
Workshop track, Apache Spark - Optimization
Mohamed Rachidi
Lead Machine Learning Engineer at Salesforce
Workshop track, Apache Spark - Optimization
Workshop track
Bio
Mohamed works as a lead data engineer for Salesforce in the Einstein security team, working on state of the art anomaly detection pipelines. Prior to this role, he was the lead machine learning engineer at Pixlee, where he was Focusing his work on fashion visual tagging engine pipelines using deep learning. An entrepreneur turned machine learning engineer, Born and raised in Morocco, I spent about 7 years in France to get my master's in artificial intelligence before coming to the US to cofound my first startup.
Abstract
Maximize cluster utilization by correctly shaping partitions and reducing data skew. We will go through a hands-on exercise to shaping partitions and some optimization strategies including join optimization and multi-threading as well as skew reduction.
Meher Anand Kasam
Software developer at NASA Frontier Development Labs
Workshop track, Deep Learning on Mobile
Meher Anand Kasam
Software developer at NASA Frontier Development Labs
Workshop track, Deep Learning on Mobile
Workshop track
Bio
Meher Anand Kasam, the author of O'Reilly's Practical Deep Learning book, is a seasoned software developer with apps used by tens of millions of users every day. Currently an iOS developer at Square, and having previously worked at Microsoft and Amazon, he has shipped features for a range of apps from Square's Point of Sale to the Bing iPhone app. Previously, he worked at Microsoft, where he was the mobile development lead for the Seeing AI app, which has received widespread recognition and awards from Mobile World Congress, CES, FCC, and the American Council of the Blind, to name a few. A hacker at heart with a flair for fast prototyping, he's won several hackathons and converted them to features shipped in widely used products. He also serves as a judge of international competitions including Global Mobile Awards and Edison Awards.
Abstract
Over the last few years, convolutional neural networks (CNN) have risen in popularity, especially in the area of computer vision. Many mobile applications running on smartphones and wearable devices would potentially benefit from the new opportunities enabled by deep learning techniques. However, CNN's are by nature computationally and memory intensive, making them challenging to deploy on a mobile device. This workshop explains how to practically bring the power of convolutional neural networks and deep learning to memory and power-constrained devices like smartphones. You will learn various strategies to circumvent obstacles and build mobile-friendly shallow CNN architectures that significantly reduce the memory footprint and therefore make them easier to store on a smartphone; The workshop also dives into how to use a family of model compression techniques to prune the network size for live image processing, enabling you to build a CNN version optimized for inference on mobile devices. Along the way, you will learn practical strategies to preprocess your data in a manner that makes the models more efficient in the real world. Following a step by step example of building an iOS deep learning app, we will discuss tips and tricks, speed and accuracy trade-offs, and benchmarks on different hardware to demonstrate how to get started developing your own deep learning application suitable for deployment on storage- and power-constrained mobile devices. You can also apply similar techniques to make deep neural nets more efficient when deploying in a regular cloud-based production environment, thus reducing the number of GPUs required and optimizing on cost.
Anirudh Koul
ML Lead at NASA Frontier Development Labs
Workshop track, Deep Learning on Mobile
Anirudh Koul
ML Lead at NASA Frontier Development Labs
Workshop track, Deep Learning on Mobile
Workshop track
Bio
Anirudh Koul is a noted AI expert, NASA FDL ML Lead, UN/TEDx speaker, author of O'Reilly's Practical Deep Learning book, and a former scientist at Microsoft AI & Research, where he founded Seeing AI, considered the most used technology among the blind community after the iPhone. With features shipped to a billion users, he brings over a decade of production-oriented applied research experience on petabyte-scale datasets. He also coaches a team for Roborace, autonomous driving championship @200mph. His work in the AI for Good field, which IEEE has called 'life-changing', has received awards from CES, FCC, MIT, Cannes Lions, American Council of the Blind, showcased at events by UN, World Economic Forum, White House, House of Lords, Netflix, National Geographic, and lauded by world leaders including Justin Trudeau and Theresa May.
Abstract
Over the last few years, convolutional neural networks (CNN) have risen in popularity, especially in the area of computer vision. Many mobile applications running on smartphones and wearable devices would potentially benefit from the new opportunities enabled by deep learning techniques. However, CNN's are by nature computationally and memory intensive, making them challenging to deploy on a mobile device. This workshop explains how to practically bring the power of convolutional neural networks and deep learning to memory and power-constrained devices like smartphones. You will learn various strategies to circumvent obstacles and build mobile-friendly shallow CNN architectures that significantly reduce the memory footprint and therefore make them easier to store on a smartphone; The workshop also dives into how to use a family of model compression techniques to prune the network size for live image processing, enabling you to build a CNN version optimized for inference on mobile devices. Along the way, you will learn practical strategies to preprocess your data in a manner that makes the models more efficient in the real world. Following a step by step example of building an iOS deep learning app, we will discuss tips and tricks, speed and accuracy trade-offs, and benchmarks on different hardware to demonstrate how to get started developing your own deep learning application suitable for deployment on storage- and power-constrained mobile devices. You can also apply similar techniques to make deep neural nets more efficient when deploying in a regular cloud-based production environment, thus reducing the number of GPUs required and optimizing on cost.
Meltem Ballan
Professional Data Science Fellow at General Motors
Front Seat of Autonomous Vehicles: Ethics, Acceptable Use and Computer Vision
Meltem Ballan
Professional Data Science Fellow at General Motors
Front Seat of Autonomous Vehicles: Ethics, Acceptable Use and Computer Vision
Business track
Bio
Accomplished technology executive with a unique combination of analytical and leadership expertise developed over 20 years both in industry and academia. A pioneering woman data scientist who has nurtured and mentored hundreds of budding analysts and scientists as a recognized leader and as an advisory board member. Co-founded a technology startup providing a Big-data analytics and ML platform.
Abstract
Day by day we are getting closer to the autonomous vehicle era and we still have a lot of works to do. How and where to use the data, how much control we should give to AI, what use cases will serve humanity, and how we are solving the biologically motivated computer vision. What information can be translated from human brain and vision to the digital world?
Galina Voloshyna
Data & Analytics IT Director at Coca-Cola
Working with POS data in store & the insights to be gained in promotional optimization
Galina Voloshyna
Data & Analytics IT Director at Coca-Cola
Working with POS data in store & the insights to be gained in promotional optimization
Business track
Bio
Galina is a passionate IT leader with 16 years of experience in creating Business Intelligence and Data Analytics solutions to grow FMCG brands in P&G and The Coca-Cola Company. She is a hands-on IT professional who has been managing heavy lifting of digital transformation across a swathe of markets from Europe to China, delivering tangible business results.
Abstract
A significant portion of the budget for many consumer goods companies and retailers goes to promote the products with consumers. Some estimates put this number to as high as 3% of the overall revenue. Therefore, understanding which promotions do have positive ROI and what is their exact effect is of primary concern to many companies. This situation gets even more complicated when we start taking eCom retailers into account the decision about starting or stopping a price promotion need to be made in a split second. In this talk we will cover specific questions and sets of algorithms to unravel the promotional effectiveness analysis, specifically: - Deriving calendar or price promotions for organizations that do not track such information - Understanding volume flows from one product to the other using only point-of-sale information - Predicting the performance of each promotion and therefore optimizing the overall company investment. We will discuss how a combination of trusted algorithms such as clustering, GBDT and time series can help bring tangible benefits to the old problem of consumer promotions.
Dipanjan (DJ) Sarkar
Data Science Lead, Google Developer Expert at Google
Deep Transfer Learning for Natural Language Processing
Dipanjan (DJ) Sarkar
Data Science Lead, Google Developer Expert at Google
Deep Transfer Learning for Natural Language Processing
Technical track
Bio
Dipanjan (DJ) Sarkar is a Data Science Lead at Applied Materials, leading advanced analytics efforts around computer vision, natural language processing, and deep learning. He is also a Google Developer Expert in Machine Learning. He has consulted and worked with several startups as well as Fortune 500 companies like Intel and Open Source organizations like Red Hat \ IBM.
Abstract
Handling challenging real-world problems in Natural Language Processing (NLP) includes tackling class imbalance, problem complexity, and the lack of availability of enough labeled data for training. Thanks to the recent advancements in deep transfer learning in NLP, we have been able to make rapid strides in not only tackling these problems but also leverage these models for diverse downstream NLP tasks. The intent of this session is to journey through the recent advancements in deep transfer learning for NLP by taking a look at various state-of-the-art models and methodologies including: - Pre-trained embeddings for Deep Learning Models (FastText with CNNs\Bi-directional LSTMs + Attention) - Universal Embeddings (Sentence Encoders, NNLMs) - Transformers We will also look at the power of some of these models, especially transformers, to solve diverse problems like summarization, entity recognition, question-answering, sentiment analysis, classification with some hands-on examples leveraging, Python, TensorFlow and the famous transformers library from HuggingFace.
Yehor Morylov
Computer Vision Tech Lead at EverguardAI
Computer Vision for Real-time Anomaly Detection in Steel Manufacturing
Yehor Morylov
Computer Vision Tech Lead at EverguardAI
Computer Vision for Real-time Anomaly Detection in Steel Manufacturing
Technical track
Bio
Morylov Yehor is a Computer Vision Tech Lead in Everguard, a company that improves worker’s safety and prevents accidents before they happened. His previous experience includes more than 5 years in Computer Vision and Deep Learning. Master of Computer Science and Artificial Intelligence at NTUU "KPI" IASA.
Abstract
Steel manufacturers suffer significant losses due to accidents on a production line. In the presentation, I will tell how we developed an anomaly detection system that monitors and prevents hot metal bars collision. I will cover approaches that were used from classical computer vision to segmentation networks, data synthesis and anomaly detection.
Daphne Cheung
Data Scientist at The Walt Disney Company
Data-Driven Storytelling: A Narrative in Numbers
Daphne Cheung
Data Scientist at The Walt Disney Company
Data-Driven Storytelling: A Narrative in Numbers
Business track
Bio
Daphne Cheung is currently a Data Scientist at The Walt Disney Company, transforming Enterprise Technology with the use of AIOps solutions, machine learning, and storytelling. Her past experience includes developing business intelligence tools in the aviation industry and advancing entrepreneurial tech initiatives with the U.S. Embassy in Singapore and Hong Kong. Apart from her work at Disney, Daphne promotes data as a catalyst for change in diversity and inclusion programs in the U.S, most notably with the Women@Disney program and LA-based technology non-profit LA Tech4Good. She has spoken about her experience as a woman in technology, the importance of data storytelling, and intersectionality in data science at DataCon LA, the Analytics Institute of Ireland, and a number of international podcasts to be released this year.
Abstract
Storytelling has quickly become an essential skill in a data science career. We often forget that the objective of data science is to make data-driven decisions and encourage action, and that action is driven by emotion. Numbers, however, are not emotional. Stories are. The human race has been telling stories to articulate concepts and ideas since the beginning of time and continues to do so as a communication mechanism. How do we begin to apply these same concepts to the way we communicate data insights? Data storytelling is more than a few aesthetically pleasing, and easily digestible data visualizations. It is an art that requires the marrying of narrative and data. Storytelling means establishing a seamless story arc from audience identification and business problem definition to action, all backed by data insights. Join me in breaking down the structure of a data story with the help of use-cases from my time at Disney, the gold standard of storytelling for the last 100+ years.
Chip Huyen
Machine Learning Engineer & Open Source Lead Snorkel AI
Design, Data, Development, Deployment: Breaking down the Machine Learning production pipeline
Chip Huyen
Machine Learning Engineer & Open Source Lead Snorkel AI
Design, Data, Development, Deployment: Breaking down the Machine Learning production pipeline
Business track
Bio
Chip Huyen works to bring the best practices to machine learning production. Her experiences include Snorkel AI, Netflix, NVIDIA, Primer, and Stanford, where she taught TensorFlow for Deep Learning Research. She’s also the author of four bestselling Vietnamese books.
Abstract
Machine learning has found increasing use in the real world, and yet a framework for productionizing machine learning systems is not well understood. This talk outlines the challenges and approaches to designing, developing, and deploying ML systems. It starts with the gap between ML in research and ML in production, how ML applications differ from traditional software engineering applications, and the rise of MLOps. The next part covers the four main stages in the iterative process of ML systems design. For each stage, it breaks down the steps needed, the tradeoffs of different solutions at each step. The talk ends with a survey of the MLOps landscape by analyzing over 200 available tools, where they fit into the ecosystem, and what’s missing in the ecosystem.
Saeed Reza Kheradpisheh
Data Science and Deep Learning Lecturer at Shahid Beheshti University
Spiking Neural Networks
Saeed Reza Kheradpisheh
Data Science and Deep Learning Lecturer at Shahid Beheshti University
Spiking Neural Networks
Technical track
Bio
Saeed is a computational neuroscientist with a Ph.D. in computer science from the University of Tehran. His researches mainly focus on spiking neural networks and the computational models of object recognition in the visual cortex.
Abstract
Saeed will tell us about spiking neural nets (SNN) and their differences and advantages to the traditional mainstream artificial neural nets. He takes us for a tour around the neural coding, neuronal dynamics, neural connectivity, and learning algorithms in SNNs. And, he will show us some examples of SNNs in visual categorization tasks.
Oleksandr Proskurin
Founder at Machine Factor Technologies
Improving time-series ensemble predictions with Sequential Bootrstrapping. E-Mini S&P 500 futures example
Oleksandr Proskurin
Founder at Machine Factor Technologies
Improving time-series ensemble predictions with Sequential Bootrstrapping. E-Mini S&P 500 futures example
Technical track
Bio
Oleksandr Proskurin is a Founder and CIO of Machine Factor Technologies, a company consulting asset managers in financial machine learning applications, and algorithmic trading. His previous experience includes more than 4 years working in the hedge fund industry, researching and implementing volatility and commodity trading strategies using futures, options, and leveraged ETFs.
Abstract
Financial time series predictions suffer from the non-dependency of samples being predicted. As a result, the performance of standard ensemble models decreases comparing to datasets with independent samples (spam classification, image recognition, etc.). The algorithm of Sequential Bootstrapping modifies standard bagging to tackle the problem of dependent and overlapping samples. The lecture covers a detailed explanation of the algorithm, how it improves out-of-sample predictions and it's the application on E-Mini S&P 500 futures example using the open-source mlfinlab package. At the end of the lecture, Oleksandr will present his latest research on predicting the out-performance of the Sequential Bootstrapping model using the histogram of average sample uniqueness.
Alexandr Honchar
Entrepreneur, Advisor, and Author in AI at Neurons Lab
The economy of AI
Alexandr Honchar
Entrepreneur, Advisor, and Author in AI at Neurons Lab
The economy of AI
Business track
Bio
Alex Honchar has worked on industrial and research AI projects for around 7 years. At the moment he is active as an entrepreneur, applied researcher, and educator. He is co-founder of consulting boutique Neurons Lab, publishes blogs on Medium with more than 1M views and academic articles with more than 100 quotations, regularly speaks at conferences and workshops across Europe.
Abstract
As entrepreneurs, we are interested in inventions and innovations that are profitable firsthand. Of course, on the peak of the hype, you can capitalize even on the mention of the “AI” in the product, but you’re looking for the innovations that have deep long-term values. Hence, we need a framework to translate all these “accuracies of these neural networks” to actual business models with clear costs and opportunities breakdown. In this talk, we will start with the main innovation patterns of industrial revolutions and how they affect global and micro-economies in terms of productivity, quality, speed, scaling, and spread. Then, we will extend with specific cases related only to the AI technologies, since it automates not manual, but cognitive human abilities. Within this framework, we will review celebrated AI use cases in retail (price and demand forecasting, real-time engagement), investment management (portfolio management and risk management), and manufacturing industries (predictive maintenance, quality control). Based on this we will learn how exactly AI can improve processes or do the opposite if calculated wrong / misunderstood the technology.
Oleksandr Romanko
Ph.D., Head of Quantitative Research, SS&C Algorithmics
Artificial Intelligence-Driven Financial Risk Analytics and Portfolio Optimization
Oleksandr Romanko
Ph.D., Head of Quantitative Research, SS&C Algorithmics
Artificial Intelligence-Driven Financial Risk Analytics and Portfolio Optimization
Business track + Panel discussion
Bio
Oleksandr received a Ph.D. and a Master’s Degree in Computer Science at McMaster University (Canada). He is an adjunct professor, University of Toronto, and UCU(Ukrainian Catholic University). Also, he is the honorary director of the Master of Business and Management in Artificial Intelligence Program at Kyiv School of Economics.
Abstract
Simulation and optimization algorithms are used in quantitative finance and risk management to model, evaluate, hedge, and optimally re-balance portfolios of financial assets. The primary goal of the simulation is to model uncertainty in asset values over time. Optimization techniques help to minimize risk and maximize the performance of financial portfolios. As a performance, numerical stability, and practical applicability of simulation and optimization algorithms still remain a challenge in financial modeling, we look at machine learning practices to improve the accuracy of financial modeling. Moreover, we investigate how we can enhance formulating financial modeling and optimization problems with Artificial Intelligence algorithms such as Natural Language Processing and Neural Nets. Natural language understanding algorithms for portfolio stress-testing and for financial optimization problems such as sentiment analysis and chat-bots will be discussed and demonstrated.
Saurav Kaushik
Data Scientist at Uber
Workshop track, Building agents for OpenAI virtual environments using Deep Reinforcement Learning
Saurav Kaushik
Data Scientist at Uber
Workshop track, Building agents for OpenAI virtual environments using Deep Reinforcement Learning
Workshop track
Bio
Saurav is a Data Scientist at Uber with extensive experience in competitive data science, open-source contributions, and research. He has been ranked among the top 500 data scientists in the world on Kaggle, is an acclaimed data science speaker, and is also the author and maintainer of open source libraries on PyPI and CRAN.
Abstract
In this session, we'll begin with understanding the fundamentals of reinforcement learning and will work our way to build an agent that can efficiently operate in Open AI virtual environments using Deep Q-Learning, a class of Deep Reinforcement Learning algorithms.
Lucas Weynars
Data Scientist - ANZ at Uber
Workshop track, How to solve a regression problem, from business objective to model deployment
Lucas Weynars
Data Scientist - ANZ at Uber
Workshop track, How to solve a regression problem, from business objective to model deployment
Workshop track
Bio
Electrical Engineer turned Data Scientist with a passion to use machine learning to solve real business problems and improve customer experience.
Abstract
In this workshop you will learn how to tackle a regression problem, covering things like: - Regression exploratory data analysis - The most used regression algorithms - Feature engineering and parameter tuning - Which evaluation metrics to use Level: Beginner/Intermediate
Dmitry Anoshin
Data Engineer at Amazon
Analytics Engineering and Teams collaboration
Dmitry Anoshin
Data Engineer at Amazon
Analytics Engineering and Teams collaboration
Business track
Bio
Analytics and Data Engineer Leader with 10+ years of experience working in Business Intelligence, Data Warehouse & Data Integration, BigData, Cloud, and ML space across North America and Europe. Leading Data Engineering initiatives, working on a petabyte size data platform built using cloud and big data technologies for supporting machine learning experiments, data science models, business intelligence reporting, and data exchange with internal and external partners. Handling privacy compliance and security-critical data sets. Apart from work, I am teaching a Cloud Computing course at the University of Victoria, mentoring high school students at CS faculty, and volunteering my time for coaching people with analytics engineering skills in the CIS region. It is a free program that helps you to master modern analytics from scratch - www.datalearn.ru. Moreover, I am the author of analytics books (https://www.amazon.com/Dmitry-Anoshin/e/B01A5PVT2M) and speaker at data related conferences and user groups.
Abstract
Last 5 years I have been building data analytics solutions on the Amazon scale. Working with Alexa AI and Amazon marketplace. I saw how the industry changed and how new requirements appeared. Starting from traditional Data Warehouse and Business Intelligence solutions, we slowly moved towards Cloud Data Platforms. Today, I am working within the Science team and we are creating an ML platform using the cloud capabilities of AWS. In my talk, I will talk about the architecture of modern analytics solutions and cross-team collaboration between SDE, Data Scientists, and Product Managers.
Sergey Pugachev
Solutions Architect at Amazon Web Services
Workshop track, Training you ML models faster and cheaper with Amazon SageMaker
Sergey Pugachev
Solutions Architect at Amazon Web Services
Workshop track, Training you ML models faster and cheaper with Amazon SageMaker
Workshop track
Bio
Sergey is a Senior Solutions Architect at Amazon Web Services (AWS) in Munich. Sergey worked at Booking.com, Google, Microsoft, and Intel. He was a Google Developer Expert (GDE) on web-technologies, as well as Microsoft MVP (Microsoft Most Valuable Professional).
Abstract
In this workshop you will learn how to use Amazon SageMaker to train your models including: training a model using algorithms provided by Amazon SageMaker, using your custom code (script) to train a model on SageMaker, bringing your own custom algorithms as containers, tracking, evaluating, and organizing training experiments, distributing your training jobs, tuning your models and getting full visibility into the training process through Amazon SageMaker Debugger.
Igor Ivaniuk
Solutions Architect at Amazon Web Services
Workshop track, Training you ML models faster and cheaper with Amazon SageMaker
Igor Ivaniuk
Solutions Architect at Amazon Web Services
Workshop track, Training you ML models faster and cheaper with Amazon SageMaker
Workshop track
Bio
Solutions Architect in AWS, working with customers in the CIS region. Specialized in banking, Fintech domains, blockchain, and e-commerce. Before AWS was building cloud solutions and leading companies in their cloud journeys, transforming their architecture and processes along the way. Speaks at conferences and meetups on a range of topics including development processes and tools, serverless, analytics, and ML.
Abstract
In this workshop you will learn how to use Amazon SageMaker to train your models including: training a model using algorithms provided by Amazon SageMaker, using your custom code (script) to train a model on SageMaker, bringing your own custom algorithms as containers, tracking, evaluating, and organizing training experiments, distributing your training jobs, tuning your models and getting full visibility into the training process through Amazon SageMaker Debugger.
Yurii Volkov
Machine Learning Engineer at Snap Inc.
Optimizing neural nets for Apple Neural Engine
Yurii Volkov
Machine Learning Engineer at Snap Inc.
Optimizing neural nets for Apple Neural Engine
Business track
Bio
For the last 4 years, Yurii worked in two successful startups: Insilico Medicine (signed a 200M$ deal with CTFH) and AI Factory (acquired by Snap). He conducted research in the CV and DL-based drug discovery and published multiple patents and papers in highly rated scientific journals (including Nature Biotechnology). Now Yurii works in Snap’s HQ in Los Angeles.
Abstract
- What is ANE, what is known about architecture.
- How to run networks on ANE.
- Benchmarking ANE vs GPU vs CPU.
- Architecture restrictions for running on ANE.
Jay Kachhadia
Data Scientist at ViacomCBS
Full Stack Data Science: The Next Gen of Data Scientists Cohort
Jay Kachhadia
Data Scientist at ViacomCBS
Full Stack Data Science: The Next Gen of Data Scientists Cohort
Business track
Bio
I play with petabytes of data and engineer systems that could see the future with machine learning and help make business decisions for brands like CBS, Comedy Central, Nickelodeon, MTV, Paramount Pictures, and many more. In short, I'm a Data Scientist at ViacomCBS Digital. Also, I am a Data Science Blogger for Towards Data Science with more than 75000 views on my own blogs. I did my masters in Data Science from Syracuse University and hold a bachelors in Computer Engineering from National Institute of Technology, Surat. Back during my undergrad, I was Lead for Google Developers Group NIT Surat where I delivered talks on chatbot architecture and Data Science in Action.
Abstract
Data Science is a fast-changing industry, and there's no longer one specialization that you can do to get into Data Science due to the changing demands of the Industry. Different companies require a different skillset, and I would like to share the know-how of getting into Data Science straight out of school in 2020. I wrote a blog on the same topic, which received more than 50000 views and more than 100 shares on twitter, including mentions from remarkable AI communities and Data Science companies worldwide. Many of the companies indeed need full stack Data Scientist now to build the infrastructure for practicing Data Science or to make Data Products powered by machine learning. In my current role, I work as a full-stack Data Scientist and would like to share what it takes for all the aspirants to break into this field and how full-stack Data Science is done.
Veronica Tamayo Flores
Head of Consulting at Data Science UA
When you need a data science consultancy and when you don't
Veronica Tamayo Flores
Head of Consulting at Data Science UA
When you need a data science consultancy and when you don't
Business track
Bio
In 2018, she graduated from IE Business School (Spain) specialization in Business Analytics and Big Data. In the past, she worked in marketing and digital analytics for retail. Veronica manages data science and business intelligence technology projects at companies. The main expertise is business analysis, business translation (a combination of business and technical skills), conducting analytical projects and business development.
Abstract
When a company starts its data transformation, it is often hard to understand where to start exactly. How to select the first business case to focus on? How to convince the decision-maker to put effort into the pilot project? Why is infrastructure so critical? Data science consultancy may help in answering most of them. However, you don't always need consultants to proceed on the way to becoming data-first. We will go through these first steps in data transformation and figure out when hiring consultants is a great idea during the talk.
Olexiy Oryeshko
Staff Software Engineer at Google Search
Panel discussion
Olexiy Oryeshko
Staff Software Engineer at Google Search
Panel discussion
Bio
Olexiy applies Machine Learning to large-scale user-facing products. Olexiy has improved Machine Learning models and systems used in Web search, YouTube, Play Store, and other Google products. Now, Olexiy applies his experience as a tech lead for an interactive platform for data science and machine learning, used by hundreds of Google engineers. Olexiy earned his MS degree in computer science from Kyiv University in 2004.
Stevan Rudinac
Associate Professor Artificial Intelligence/Machine Learning for Business at University of Amsterdam
How multimedia analytics can help solve complex business problems
Stevan Rudinac
Associate Professor Artificial Intelligence/Machine Learning for Business at University of Amsterdam
How multimedia analytics can help solve complex business problems
Business track
Bio
Stevan Rudinac is an Associate Professor of Artificial Intelligence for Business at the University of Amsterdam Business School and a guest researcher at the Informatics Institute of the UvA. In his research he aims at enabling large-scale multimedia analytics based on the relevance criteria defined at a higher semantic level, by jointly analysing visual content and the heterogeneous information associated with it, ranging from text, automatically generated metadata and open data statistics, to information about users and their social network. What fascinates him is the potential of artificial intelligence in addressing important societal challenges, such as liveability and security.
Abstract
In recent decades the production of multimedia content exceeded all expectations. Both size and heterogeneity of multimedia collections increased significantly and the datasets featuring hundreds of millions of images, videos, text, information about users, and various metadata are becoming a commonplace. Therefore, it is unsurprising that the experts from practically all spheres of academia and industry are increasingly using this wealth of information for improving their processes and making better informed decisions. In this talk we will showcase our recent work on applying multimedia analytics to solve problems in the domains of urban computing, marketing and creative industries.
Himanshu Upreti
Co-Founder & Chief Technology Officer (CTO) at AI Palette
How to use AI in Consumer Food Product Innovation?
Himanshu Upreti
Co-Founder & Chief Technology Officer (CTO) at AI Palette
How to use AI in Consumer Food Product Innovation?
Business track
Bio
Himanshu is a highly driven and passionate entrepreneur currently leading the technology vision at Ai Palette as Co-Founder & CTO. Ai Palette is a deeptech AI startup backed by the Singapore Govt. that is revolutionizing the way new consumer products are created today and aiming to put a dent in the huge $4 Trillion Food Industry. Prior to this, Himanshu worked at Visa Inc. right after his graduation from IIT Guwahati and built data products on Visa’s Big Data Platform that enabled a seamless and faster payment experience. Himanshu has spoken previously about data science at Company and College Events, Podcasts, and General Assembly Data Science Course.
Abstract
90% of the new product launches in the CPG (Consumer Packaged Goods) industry fail in the first year. According to an AC Nielsen study, 50% of the products fail because they don’t address broader consumer needs. This is surprising given the amount of money and time that the CPG companies spend in consumer research. But on digging deeper, one realizes the challenges with the current consumer research process. And that’s the exact problem that Ai Palette is solving for the CPG brands to help them identify what next product to launch into the market. At Ai Palette, we have built a cloud-based Artificial Intelligence platform, using which CPG companies can create consumer winning products. The platform gathers insights from consumer digital footprint about food on social media, menus, recipes, retail, blogs, discussion forums, etc. and couples them with the internal company data to arrive at the product attributes and features that address the unmet needs of the end consumers. The patent-pending AI tech of AI Palette is composed of NLP and Computer Vision Stack. In Asia, every region has its own nuances and language complexity and that’s where we have built native language food-trained models for the various Asian geographies (count of over 10 including China, Korea, Thailand, India, Malaysia) to understand the local food preferences. Moreover, people love to share more through images than text these days and that’s where we leverage the Computer Vision models to analyze and identify what you are having along with your McDonald’s Burger.
Ali Leylani
AI Architect, Senior Data Scientist at Atea Sverige
The value and importance of explainability, and why striving towards it is critical for every organisation aiming to capitalize on data with machine learning.
Ali Leylani
AI Architect, Senior Data Scientist at Atea Sverige
The value and importance of explainability, and why striving towards it is critical for every organisation aiming to capitalize on data with machine learning.
Technical track
Bio
Ali Leylani – Lead Data Scientist at Atea and Board Member of Stockholm AI. With a strong background in mathematics and theoretical physics, Ali daily helps businesses adopt an objective, data-driven philosophy.
Abstract
Ali will give first give a brief introduction to the field, explaining the difference between predicting and explaining, and then continue to highlight the latest best practices and share lessons learned from real business cases.
Thomas Timmermann
Data Scientist at Codecentric
Workshop track, Named Entity Recognition in Legal Texts with Bert
Thomas Timmermann
Data Scientist at Codecentric
Workshop track, Named Entity Recognition in Legal Texts with Bert
Workshop track
Bio
Thomas is a Data Scientist focusing on deep learning and natural language processing. Trained as a pure mathematician, he spent his youth with research and teaching before joining codecentric AG, Germany, in 2018.
Abstract
This workshop provides an introduction to natural language processing and shows how to fine-tune a pre-trained Bert transformer model on a custom NER task: extracting references to verdicts, laws, etc. from legal texts. Along the way, we explain tokenization, language models, and all that. The hands-on parts show how to use spaCy, keras and huggingface transformers.
Oles` Petriv
Chief Technology Officer at Reface
Future of interactive content in context of light and portable generative networks
Oles` Petriv
Chief Technology Officer at Reface
Future of interactive content in context of light and portable generative networks
Technical track
Bio
For the last seven years, Oles has been actively researching and developing computer vision and natural language processing systems. He is the author of a machine learning course on the Prometheus platform and an in-depth training course at the ARVI Lab. He has extensive experience in video processing using deep learning methods for detecting objects and actions, predicting image depth maps, semantic segmentation and generating subtitles for images and video studios in Hollywood. Oles has developed one of the first automation systems to control the placement of groceries at the store shelves using neural networks. He led the development of many projects for automated analysis of news in various languages, recognition of entities, analysis of conceptual drift and representation of language structures using machine learning systems.
Abstract
Rise of generative neural networks totally reshaped our expectations of what is possible is media content creation and what is not. We will talk about: idea of dividing media processing on 2 parts (content-dependent and user-dependent) and how it will revolutionize the way we interact with media, how model compression techniques allows us to deliver high-quality post processing on user's device in real-time, why neural networks are core part in process of media content atomization and interactive recombination of concepts as new "visual language"
Marta Paes Moreira
Developer Advocate at Ververica
Building an End-to-End Analytics Pipeline with PyFlink
Marta Paes Moreira
Developer Advocate at Ververica
Building an End-to-End Analytics Pipeline with PyFlink
Technical track
Bio
Marta is a Developer Advocate at Ververica (formerly data Artisans) and a contributor to Apache Flink. After finding her mojo in open source, she is committed to making sense of Data Engineering through the eyes of those using its by-products.
Abstract
Stream processing has fundamentally changed the way we build and think about data pipelines — but the technologies that unlock its value haven’t always been friendly to non-Java/Scala developers. Flink has recently introduced PyFlink, allowing developers to tap into streaming data in real-time with the flexibility of Python and its wide ecosystem for data analytics and Machine Learning. In this talk, we'll explore the basics of PyFlink and showcase how developers can make use of familiar tools like interactive notebooks to unleash the full power of an advanced stream processor like Flink.
Volodymyr Koshel
Senior Data Scientist, Deep Learning Enginee, Vodafone
Panel discussion
Volodymyr Koshel
Senior Data Scientist, Deep Learning Enginee, Vodafone
Panel discussion
Bio
Volodymyr Koshel has been in the scientific field for more than 15 years and is currently receiving a Ph.D. in Computer Science. Five years of experience in data analysis as a Data Scientist, Machine Learning Engineer, Deep Learning Engineer.
Prithvi Shetty
Data Scientist at SAP
Workshops track, Building deep learning NLP models and deploying it
Prithvi Shetty
Data Scientist at SAP
Workshops track, Building deep learning NLP models and deploying it
Workshops track
Bio
Currently, Prithvi is working as a Data Scientist at SAP Concur mainly in applying deep learning (LSTMs and RNNs) for NLP projects since 2 years. He graduated from the University of Washington pursuing a Data Science specialization in 2019. He ventured out in the domain of applying deep learning and machine learning through his research work at the University of Washington. This was his main motivation which led him to passion for Data Science.
Abstract
Through this conference, he mainly wants to show how to utilize state-of-the-art deep learning architectures to build end-to-end NLP models which can be used for different purposes. He will mainly focus on the importance of text classification as well as how to implement them. From highlighting the merit of data cleaning and user research before building the model to deploying the model for real-life implementation in numerous use-cases. Being a Data Scientist, Prithvi understands the pain of deploying ML models, and thus, he aims to illustrate the following steps: 1) Research and text cleaning before building an ML model 2) Building the deep learning model (using LSTMs/ Embeddings) 3) How to tune the hyperparameters of the model 4) Deployment of the model for real-life implementation (In AWS using TensorFlow serving) Even though the talk will be technical and requires moderate Python/ML expertise, the workshop will be as simple and brief as possible. We will work on the toxic comments dataset for implementing the workshop. - What Metadata should be stored in a production setup? - How can we monitor the entire pipeline and track the performance of the deployed models? The participants will build an end-to-end data analytics pipeline including: - Pipeline Orchestration with TFX, Kubeflow, and Airflow - Data preparation - Jupyter Notebooks - Distributed training with TensorFlow - Automation & CI/CD using Jenkins and Argo - Model and metadata storage - Model serving and monitoring
Kristoffer Gordon Clausen
Data Scientist at 2021.AI
Governance and Explainable AI. How to achieve regulatory excellence
Kristoffer Gordon Clausen
Data Scientist at 2021.AI
Governance and Explainable AI. How to achieve regulatory excellence
Business track
Bio
- Background in robotics and automation from DTU & CMU.
Specialized in AI.
- Data scientist at 2021.AI.
Focus on the supply chain, manufacturing, and pharma.
- Board member at Neural.
Working on strengthening the Danish AI ecosystem.
Previous speaking engagements:
- June 18, 2020, Danish IT: Applied AI – AI in practice.
- September 24, 2020, Danish IT: Governance and Explainable AI.
- September 30, 2020, Danish Metal IT Conference: Cases from production and supply chain.
Abstract
- Setting the scene
- Who are 2021.AI?
- Introduction to AI
- Explainable and trustworthy AI
- Building trust in AI
- Explaining the black box
- Data and AI Governance
- The AI governance landscape
- Lifecycle management for AI
- What it takes to achieve regulatory excellence
Brian Lucena
Principal at Numeristical
Workshop track, StructureBoost - a new Gradient Boosting Package
Brian Lucena
Principal at Numeristical
Workshop track, StructureBoost - a new Gradient Boosting Package
Workshop track
Bio
Brian Lucena is Principal at Numeristical and the creator of StructureBoost, ML-Insights, and SplineCalib. His mission is to enhance the understanding and application of modern machine learning and statistical techniques.
Abstract
The values of a categorical variable frequently have a structure that is not ordinal or linear in nature. For example, the months of the year have a circular structure, and the US States have a geographical structure. Standard approaches such as one-hot or numerical encoding are unable to effectively exploit the structural information of such variables. In this tutorial, we will introduce the StructureBoost gradient boosting package, wherein the structure of categorical variables can be represented by a graph and exploited to improve predictive performance. Moreover, StructureBoost can make informed predictions on categorical values for which there is little or no data, by leveraging the knowledge of the structure. We will walk through examples of how to configure and train models using StructureBoost and demonstrate other features of the package.
Borys Pratsiuk
Chief Technology Officer at Scalarr
Merge your data science team with your production processes
Borys Pratsiuk
Chief Technology Officer at Scalarr
Merge your data science team with your production processes
Business track
Bio
Borys graduated from the Chair of Physical and Biomedical Electronics of the KPI with honors in 2007 on the specialty “Physical and Biomedical Electronics”. Borys works CTO at Scalarr.
Abstract
Your company operates according to well-established rules, but you decided to go into machine learning and optimize it. How will you plan release date, sprint duration? Who will be responsible for model stability on production? When the DevOps team became more important than the DS team? And many other questions I will answer in my presentation related to process optimization and cross-team collaboration improvement. It will be my story.
Marta Markiewicz
Head of Data Science at Objectivity
Hack your life with data science
Marta Markiewicz
Head of Data Science at Objectivity
Hack your life with data science
Business track
Bio
Head of Data Science at Objectivity with a background in Mathematical Statistics. For about 9 years, she has been discovering the potential of data in various business domains, from medical data, through retail, HR, finance, aviation, real estate, ... She deeply believes in the power of data in every area of life.
Abstract
It’s not a mystery that AI has the power to transform the business. But what about everyday reality? In this talk I would like to show, using examples, how to hack your own life with data science — finance, personal development, health, you name it!
Badr Ouali
Head of Data Science at Vertica
VerticaPy: Scalable in-DB Data Science with Python Front-End
Badr Ouali
Head of Data Science at Vertica
VerticaPy: Scalable in-DB Data Science with Python Front-End
Technical track
Bio
Badr Ouali works as a Data Scientist for Vertica worldwide. He can embrace data projects end to end through a clear understanding of the “big picture” as well as attention to details, resulting in achieving great business outcomes — a distinctive differentiator in his role.
Abstract
Nowadays, 'Big Data' is one of the main topics in the data science world, and data scientists are often at the center of any organization. The benefits of becoming more data-driven are undeniable and are often needed to survive in the industry. Vertica was the first real analytic columnar database and is still the fastest in the market. However, SQL alone isn't flexible enough to meet the needs of data scientists. Python has quickly become the most popular tool in this domain, owing much of its flexibility to its high-level of abstraction and impressively large and ever-growing set of libraries. Its accessibility has led to the development of popular and performant APIs, like pandas and scikit-learn, and a dedicated community of data scientists. However, Python only works in-memory for a single node process. While distributed programming languages have tried to face this challenge, they are still generally in-memory and can never hope to process all of your data, and moving data is expensive. On top of all of this, data scientists must also find convenient ways to deploy their data and models. The whole process is time-consuming. VerticaPy aims to solve all of these problems. The idea is simple: instead of moving data to your tools, VerticaPy brings your tools to the data.
Romain Paulus
Lead Research Scientist at SuSea Inc.
Semi-supervised and unsupervised abstractive summarization
Romain Paulus
Lead Research Scientist at SuSea Inc.
Semi-supervised and unsupervised abstractive summarization
Technical track
Bio
Romain Paulus is a former Lead Research Scientist at Salesforce Research, focusing his work on deep learning for abstractive text summarization and natural language generation. Before that, he was the founding engineer of the California-based startup MetaMind, where he led the full-stack development of a deep-learning-as-a-service platform.
Abstract
Abstractive summarization has been getting a lot of people's attention in the NLP community as a unique unsolved problem. It's a multi-faceted task where a model not only has to understand the main topic of a document, but also write a clear and factually correct summary of it. Moreover, there is limited supervised data available for training summarization models outside of a few specific domains like news. In this talk, we will explore the different ways to train abstractive summarization models with little or no supervised data, and we will discuss how it changes the way we tend to approach complex NLP problems.
Svetlana Vinogradova
Lead Data Scientist at InsideTracker (Segterra)
Blood Biomarkers Data and Data from Wearables: Insights for Personalized Recommendations
Svetlana Vinogradova
Lead Data Scientist at InsideTracker (Segterra)
Blood Biomarkers Data and Data from Wearables: Insights for Personalized Recommendations
Technical track
Bio
Svetlana Vinogradova is a Lead Data Scientist at InsideTracker, leading the Data Science team to integrate blood biomarkers and DNA data with physiological data from activity trackers to improve lifestyle recommendations and discover new patterns and optimal zones in sleep, heart rate, and blood biomarkers.
Admond Lee
Consulting Data Scientist, Admond Lee; Contributing Writer, Towards Data Science; KGnuggets
Panel discussion
Admond Lee
Consulting Data Scientist, Admond Lee; Contributing Writer, Towards Data Science; KGnuggets
Panel discussion
Bio
With his degree in Physics, Admond discovered and pursued his passion in data science and never looked back ever since. Being a data science communicator at heart, Admond's journey towards data science has been inspiring others. His story and data science work has been featured by various publications, including KDnuggets, Medium, Tech in Asia, AI Time Journal, and business magazines. Besides, Admond has been invited to speak at various workshops and meetups. He is now on a mission to make data science accessible to everyone by helping companies to truly leverage the power of data science to drive business values and guiding students as well as professionals to go into data science field.
Dr. Karol Przystalski
CTO at Codete
Computer vision methods for skin cancer recognition
Dr. Karol Przystalski
CTO at Codete
Computer vision methods for skin cancer recognition
Business track
Bio
Obtained a Ph.D. degree in Computer Science in 2015 at the Jagiellonian University in Cracow. CTO and founder of Codete. Leading and mentoring teams at Codete. Working with Fortune 500 companies on data science projects. Built a research lab for machine learning methods and big data solutions at Codete. Gives speeches and training in data science with a focus on applied machine learning in German, Polish, and English. Used to be an O’Reilly trainer.
Abstract
Pattern recognition of images is one of the most popular approaches that is used in machine learning solutions supporting medical doctors. We show to use image processing methods to do simple image analysis to find skin cancer cases. In the next step, we use neural networks and simple white-box methods to recognize skin cancer patterns on multilevel images.
Jörg Schad
Head Of Engineering and Machine Learning at ArangoDB
Workshops track, Building OpenSource Machine Learning Pipelines
Jörg Schad
Head Of Engineering and Machine Learning at ArangoDB
Workshops track, Building OpenSource Machine Learning Pipelines
Workshops track
Bio
Jörg Schad is Head of Machine Learning at ArangoDB. In a previous life, he has worked on machine learning pipelines in healthcare and finance, distributed systems at Mesosphere, and in-memory databases. He received his Ph.D. for research around distributed databases and data analytics. He’s a frequent speaker at meetups, international conferences, and lecture halls.
Abstract
There are many great tutorials for training your deep learning models using TensorFlow, Keras, Spark, or one of the many other frameworks. But training is only a small part of the overall deep learning pipeline. This workshop gives an overview into building a complete automated deep learning pipeline starting with exploratory analysis, overtraining, model storage, model serving, and monitoring and answer questions such as: - How can we enable data scientists to exploratively develop models? - How to automatize distributed Training, Model Optimization, and serving using CI/CD? - How can we easily deploy these distributed deep learning frameworks on any public or private infrastructure? - How can we manage multiple different deep learning frameworks on a single cluster, especially considering heterogeneous resources such as GPU? - How can we store and serve models at scale? - What Metadata should be stored in a production setup? - How can we monitor the entire pipeline and track the performance of the deployed models? The participants will build an end-to-end data analytics pipeline including: - Pipeline Orchestration with TFX, Kubeflow, and Airflow - Data preparation - Jupyter Notebooks - Distributed training with TensorFlow - Automation & CI/CD using Jenkins and Argo - Model and metadata storage - Model serving and monitoring
Tetiana Kodliuk
Chief Science Officer at Dathena Science
Getting Ready for Fast-Changing World: Drifts detection in Data Security
Tetiana Kodliuk
Chief Science Officer at Dathena Science
Getting Ready for Fast-Changing World: Drifts detection in Data Security
Technical track
Bio
Tania leads Data Science team at Dathena and is responsible for the innovation and patenting. With the mathematical background, she is passioned about Natural Language Processing, Deep Learning and truly believes in the need for Responsible AI. That boosts the projects on AI Explainability and Auditability for Data Security and Privacy. She is also building a self-driven platform, which she calls “Dathena's brain”, to manage continuous data analysis through autonomous decision-making system.
Abstract
If you know that deployment of the high-quality model in production is not the end of a fairytale like "And they lived happily ever after", you might find this talk interesting. Data Security expects continuous data analysis and risk assessment, which can be successfully achieved by Machine Learning solutions. There is one "But" though: when the models are deployed in production with every-minute data re-scanning, they can lose prediction accuracy over time as real-life data is rarely stationary. Do not believe? Check your emails with Balinese paradise's recommendations when you stuck in your country due to COVID 2019. Here we face the "Data Drift" challenge, which is defined as a change in the distribution of data used in a predictive task. Detecting changes in new incoming data is key to make sure the predictions obtained are valid. We will continue this talk with the solutions, that can help to maintain high-quality data analysis such as Active Learning.
Jon McLoone
Director of Technical Services, Communication and Strategy at Wolfram Research Europe Ltd.
Unified data + unified computation = Multi-paradigm data science
Jon McLoone
Director of Technical Services, Communication and Strategy at Wolfram Research Europe Ltd.
Unified data + unified computation = Multi-paradigm data science
Technical track
Bio
As Director of Technical Services, Communication and Strategy at Wolfram Research Europe, Jon McLoone is central to driving the company's technical business strategy and leading the consulting solutions team. Described as “The Computation Company”, the Wolfram group are world leaders in integrated technology for computation, data science, and AI including machine learning. With over 25 years of experience working with Wolfram Technologies, Jon has helped in directing software development, system design, technical marketing, corporate policy, business strategies, and much more. Jon gives regular keynote appearances and media interviews on topics such as the Future of AI, Enterprise Computation Strategies, and Education Reform, across multiple fields including healthcare, fintech and data science. He holds a degree in mathematics from the University of Durham. Jon is also Co-founder and Director of Development for computerbasedmath.org, an organization dedicated to fundamental reform of maths education and the introduction of computational thinking. The movement is now a worldwide force in re-engineering the STEM curriculum with early projects in Estonia, Sweden and Africa.
Abstract
While greater automation has made machine learning and data science tools accessible to non-experts, that same automation is equally important to the expert user. By breaking down the barriers between different kinds of computation a truly multi-paradigm approach to data science becomes possible. This talk will demonstrate Wolfram Research's progress towards a fully unified computation platform including live coded machine-learning, computer vision and production deployment. Making this all possible is an underlying symbolic representation that unifies data, models, code, and interfaces. The talk will explain how this simplifies high-level concepts and enables their automation. Examples will include surgery and transfer learning on a neural network and automated anomaly detection. Jon McLoone is a senior developer with nearly 30 years of experience at Wolfram Research where he leads the technical services team, developing data science solutions for customers from industries ranging from energy to finance, and education to medicine.
Paweł Zawistowski
Lead Data Scientist at Adform
How good is your model?
Paweł Zawistowski
Lead Data Scientist at Adform
How good is your model?
Technical track
Bio
Senior Data Scientist working in Adform’s Research, AI & Analytics area and an assistant professor at the faculty of Computer Science, at the Warsaw University of Technology. IT wizard specializing in artificial intelligence methods, especially in the nonlinear issues of regression and classification, which his doctorate concerned. He gained professional experience both in the field of science and research as well as in commercial projects. He has been seriously analyzing and modeling data since 2008 – since then he has participated in various projects, ranging from individual analyzes of small data sets, through the development of regression and classification methods in research projects, to the creation of large-scale production systems using predictive models of hundreds of thousands of times per second.
Abstract
In applied data science, you build your model for some specific purpose. Before you are ready to ship it to production, or hand it to your customer, the question arises: is it ""good enough""? This question is tricky because what it exactly means will vary from project to project. Even for a given case, if you ask different stakeholders, you might get different answers ranging from expected ROI, through good AUC values, to technical aspects like latencies and memory requirements. Yet, it is crucial to get the answer right if you want your model to thrive. This talk will try to address model evaluation widely, touching on subjects like defining acceptance criteria for your model, the importance of baselines, performing evaluation using A/B tests, and other techniques, along with discussing some pitfalls you might encounter. We will talk about the subject from a practical perspective: scenarios that might be not obvious how to evaluate your model, and simple comparisons of standard measures like accuracy or MSE seem not enough.
Meeta Dash
VP of Product at Verta
Every company can be an AI company
Meeta Dash
VP of Product at Verta
Every company can be an AI company
Business track
Bio
Meeta is a passionate, customer-obsessed product leader with a track record of launching innovative products that solve real business problems. As VP Product at Verta, she is building MLOps tools to help data science teams track, deploy, operate and monitor models and bring order to AI/ML chaos. Prior to Appen, Prior to Verta, Meeta held several product leadership roles at Appen, Figure Eight, Cisco Systems, Tokbox/Telefonica and Computer Associates with a focus on ML data platforms, Voice/Conversational AI, and Analytics.
Abstract
"Artificial Intelligence is set for explosive growth and is impacting the future of every industry and human interaction. With so much hype all-around us, here’s the million-dollar question “Is AI living up to its promise in the enterprise?” The reality is most companies are still struggling to move from experimentation to production and justify the business value for AI products. Unsurprisingly, business leaders and technologists have very different views about the current challenges. How do we bring business and technology together and successfully scale AI projects? I will share with you real-world processes, management techniques, and tools needed for running AI at scale including but not limited to: - Taking a business-first approach to AI - Organizational structure and culture - Successfully moving from prototype to production - Techniques and tooling to effectively train, deploy, monitor & tune machine learning models - Building the AIOps flywheel to make AI core part of your business"
Olivier Blais
Cofounder, VP Data Science at Moov AI
Validate and Monitor Your AI and Machine Learning Models
Olivier Blais
Cofounder, VP Data Science at Moov AI
Validate and Monitor Your AI and Machine Learning Models
Technical track
Bio
Olivier is a data science expert whose leading field of expertise and cutting-edge knowledge of AI and machine learning led him to support many companies’ digital transformations, as well as implementing projects in different industries. Olivier is the laureate of the prestigious “30 under 30” prize. He is co-author of a patent for an advanced algorithm that evaluates the creditworthiness of a borrower.
Abstract
You’ve created a wicked AI or machine learning model that changes the way you do business. Good job. But how do you validate your model and monitor it in the long run? Advanced machine learning and AI models get more and more powerful. They also tend to become more complicated to validate and monitor. This has a major impact on the business’ adoption of models. Initial validation and monitoring are not only critical to ensure the model’s sound performance, but they are also mandatory in some industries like banking and insurance. You will learn the best techniques that can be applied manually or automatically to validate and monitor statistical models. Techniques below will be discussed and demonstrated to perform a full model validation: — Techniques used for initial validation. 2-3 topics for post-discussion? Model validation, model monitoring, machine learning use cases in general. What are some infrastructure and languages discussed? This talk is infrastructure agnostic. Python (mostly TensorFlow or PyTorch) What you'll learn? You'll learn a cutting edge framework which you can't find on Google, yet. We'll show DevOps techniques using open source packages. You will learn the best techniques that can be applied manually or automatically to validate and monitor statistical models.
Eugene Khvedchenya
AI/ML Advisor at VITech
Workshop track, Deep learning for satellite image processing
Eugene Khvedchenya
AI/ML Advisor at VITech
Workshop track, Deep learning for satellite image processing
Workshop track
Bio
Eugene is an AI/ML consultant with a strong focus on computer vision. He has over 10 years experience in the software development industry. Has strong technical skills and experience in creating high-load applications. During his career he worked in a wide spectrum of domains - from cloud computing to edge devices, from FPGA and C to Python. He's the author of pytorch-toolbelt (https://github.com/BloodAxe/pytorch-toolbelt) library, member of core team of Albumentations library (http://albumentations.ai/) and contributor to Catalyst (https://catalyst-team.github.io/catalyst/) DL framework. Kaggle Master, ranked Top-100 Kaggle rating worldwide. Author of "Mastering OpenCV for practical computer-vision projects"
Abstract
Satellites generate tremendous amounts of data every day and it helps to spot wildfires, coastline erosions, buildings damage in natural disasters and concentration of nutrients in farm crops from the sky. In this talk I will explain how deep learning can solve these problems. We go through recent data-science competitions on satellite imagery and analyze know-hows of the top-performing solutions. For better experience, attendees should have some prior experience with deep learning and image segmentation.
Yann Landrin-Schweitzer
Founder and CEO at Stealth Startup
Mathematically defensible data privacy as a Data Science accelerator.
Yann Landrin-Schweitzer
Founder and CEO at Stealth Startup
Mathematically defensible data privacy as a Data Science accelerator.
Business track
Bio
After an early start in mechanical engineering, Yann has been working on Data Science, AI and Data Engineering at scale since 2003. He has applied these skills in various industrial contexts, in several social media startups, in advertising giants like Yahoo, and digital content creators and distributors like Netflix.
Abstract
2020 is the time for data privacy. Today, it is complicated for customers to ensure their privacy is maintained, and complicated for companies to use data safely and deliver on privacy expectations. As a result, opportunities to use data for good are missed, at the same time as data misuse is rampant, and data teams struggle to get value out of their data. Data Science teams end up regarding privacy as difficult, obscure and frustrating, an obstacle between them and achieving the goals they are given in their organization. And business leaders end up seeing privacy as purely an exercise in risk mitigation, rather than something that can be a competitive advantage.
Olga Petrova
Machine Learning DevOps Engineer at Scaleway
Active learning: how to reduce the amount of data that needs to be labeled
Olga Petrova
Machine Learning DevOps Engineer at Scaleway
Active learning: how to reduce the amount of data that needs to be labeled
Technical track
Bio
Olga is a deep learning R&D engineer at Scaleway, the second-largest french cloud provider. Previously, she received her Ph.D. in theoretical physics from Johns Hopkins University and spent several years working as a quantum physicist. Olga’s current interests focus on semi-supervised and active machine learning.
Abstract
Most of the recent advances in the deep learning field come at a high price. The costs involved in developing and training these models are two-fold: namely, they can be attributed to computing power and training data. Computational resources are getting increasingly more affordable through widespread cloud computing services. On the other hand, gathering and especially manually labeling data cannot scale in the same way. A common scenario is that in which unlabeled data comes cheap, but the labeling budget is severely limited. Practice shows that all data is not created equal: the choice of which data is prioritized to be labeled has a profound effect on the final performance of the resulting model. The task of determining which data samples would be most "informative" when labeled, goes under what is known as active learning. In this talk, I will present an overview of the active learning approach that is applied to an image classification problem.
Colin Gillespie
Data Scientist at Jumping Rivers
Enforcing Standards in a Data Science Workflow
Colin Gillespie
Data Scientist at Jumping Rivers
Enforcing Standards in a Data Science Workflow
Technical track
Bio
Dr. Colin Gillespie is a Senior Lecturer (Associate Professor) at Newcastle University, UK, and the co-founder of Jumping Rivers. His research interests are high-performance statistical computing and Bayesian statistics. He has given talks at a variety of conferences, including useR, RStudio, the Turing Institute, and ODSC.
Abstract
Many R workflows revolve around packages and git. Typically, they use some form of continuous integration, such as Travis, or Gitlab CI. The general idea is that R developers are notified if a commit causes the package to fail some checks. This talk will describe the additional rigorous steps that we apply to our checks via the integrated package. Using this package, allows us to standardize code style, catch errors quicker, and produce more readable commits. We will highlight that while imposing these tests can initially slow down progress on a project, overall they lead to a more robust product.
Vladyslava Tyshchenko
Data Analyst at Softserve
Detecting Biomarkers of Aging using Machine Learning Algorithms
Vladyslava Tyshchenko
Data Analyst at Softserve
Detecting Biomarkers of Aging using Machine Learning Algorithms
Technical track
Bio
Vladyslava obtained her BS and MS degrees in Software Engineering with honors from Dnipro National University. At SoftServe she works on the NLP projects for various industries. She is passionate about computational biology and is involved into the research projects on biology of aging where she applies machine learning algorithms. She has experience in working with biomedical texts, metagenomic and transcriptomic data.
Abstract
Recent advances in accuracy and diversity of machine learning and deep learning algorithms push researchers around the world to apply them to variety of fields. One of such fields is biogerontology, where scientists are trying to uncover aging-related questions like "why do we age" or "can we age slower" from the biology point of view using recent computational methods. In this talk, we will find and analyze potential biomarkers of aging having the gene expression dataset. We will go through important tips that one should know while analyzing transcriptomic data, like data transformations, choosing the right model, stability of feature selection, model explanation and interpretation of the results.
Ravi Ilango
Senior Data Scientist at a startup in stealth mode
Using PreTrained NLP Models and Machine Intelligence/AI for automation
Ravi Ilango
Senior Data Scientist at a startup in stealth mode
Using PreTrained NLP Models and Machine Intelligence/AI for automation
Technical track
Bio
Currently working as Founding Team Member and Sr Data Scientist at a Startup in the Stealth model. Passionate about developing deployable deep learning solutions.
Abstract
Natural Language Processing (NLP) is one of the fast-growing segments of deep learning/AI, revolutionizing operational efficiencies in service businesses. NLP focuses on understanding languages and uses a variety of techniques/tools ranging from Data Engineering (NLP Pipelines), Data Science, GPUs, and Pre-trained Deep Learning models. Top AI companies are using NLP to implement solutions to provide a quantum leap in operational efficiencies in service industries. This session will focus on using Pre-trained NLP models (BERT, GPT2), and will include a demo of NLP pipeline, PyTorch framework and spaCy.
Michael Grogan
Data Scientist - TensorFlow and Time Series Specialist at Self-Employed
Predicting Hotel Cancellations with Machine Learning
Michael Grogan
Data Scientist - TensorFlow and Time Series Specialist at Self-Employed
Predicting Hotel Cancellations with Machine Learning
Business track
Bio
Michael Grogan is a data scientist with expertise in TensorFlow and time series analysis. His educational background is a Master’s degree in Economics from University College Cork, Ireland. Much of his work has been in the domain of business intelligence; i.e. using machine learning technologies to develop solutions to a wide range of business problems.
Abstract
Hotel cancellations can cause issues for many businesses in the industry. Not only do cancellations result in lost revenue, but this can also cause difficulty in coordinating bookings and adjusting revenue management practices. This session will provide a high-level analysis of different feature selection and classification tools, methods for dealing with imbalanced datasets, along with interpretable machine learning models. Time series modeling techniques will also be discussed. This will include models such as ARIMA and LSTM, along with structural time series modeling using the TensorFlow Probability library.
Diego Hueltes
Machine Learning Manager at RavenPack
From the Earth to the Moon: Lessons from the space race to apply in Machine Learning projects
Diego Hueltes
Machine Learning Manager at RavenPack
From the Earth to the Moon: Lessons from the space race to apply in Machine Learning projects
Business track
Bio
I am the Machine Learning Manager at RavenPack, in Marbella, (Málaga, Spain). I’m a teacher in the Big Data & Analytics master for ESESA IMF, an Antonio de Nebrija University title. I also collaborated teaching in the Big Data Executive Program at Escuela de Organización Industrial (EOI), a Spanish business school where I have been also a Big Data mentor.
Abstract
The space race was an EEUU — Soviet Union competition to conquer the space. This competence helped to develop space technology in an incredible manner, developing other derivative technologies as a side effect. This race was full of success on both sides, achieving goals that seemed impossible in record time. From this space race, we can learn some lessons that we can apply to our Machine Learning projects to have a bigger success rate in a limited amount of time.
Ylan Kazi
VP, Data Science + Machine Learning at UnitedHealth Group
How AI Will Decide Your Fate
Ylan Kazi
VP, Data Science + Machine Learning at UnitedHealth Group
How AI Will Decide Your Fate
Business track
Bio
Ylan Kazi is the Vice President, Data Science and Machine Learning for Unitedhealthcare based in Minnetonka, Minnesota. He leads a team of high performing data scientists focusing on improving health outcomes for Medicare patients. Ylan is skilled at leading data science teams that apply machine learning to solve business challenges and deliver business value. He is active in the data science community and serves as an AI/ML advisor to Smart Steward, a company that provides solutions to combat antibiotic resistance and COVID-19. Ylan also writes about how AI will affect humanity at discoveringai.com.
Abstract
My presentation will cover how AI is starting to control how people will behave and how it integrates into our lives. Everything from getting a new job, to the legal system, to social media is affected by AI and we are starting to give more control to AI to make these decisions. I will also show how this can affect us (both positively and negatively) and what people can do about it.
Mark Kurtz
Machine Learning Lead at Neural Magic
Pruning Neural Networks for Success
Mark Kurtz
Machine Learning Lead at Neural Magic
Pruning Neural Networks for Success
Technical track
Bio
Experienced Software and Machine Learning Leader with a demonstrated history in the internet industry. Proficient across the full stack for engineering and machine learning. Strong engineering professional with a Master’s Degree focused in Robotics Engineering from Washington University in St. Louis.
Abstract
"According to a recent survey, 59% of data scientists are not optimizing their deep learning models for production, despite the performance gains techniques like quantization and pruning can offer. This is no surprise. Model optimizations are hard. But we are here to tell you that we found a way to make model optimizations easy. Join our webinar on May 28 to: Get and overview of pruning, including benefits and downsides Discover new tools that make pruning easy and successful Learn how to prune models for performance in production Understand pruning techniques that result in lower deployment costs.
Andrii Burlutskyi
Head of Marketing at Master Of Code Global
Oleg Boguslavskyi
Co-Owner Data Science UA
Philipp Kofman
Deep Learning researcher
Oksana Kurylo
Director Women who code Kyiv
Access 3 livestreamed tracks: Technical track, Workshops track, and Business track
Network with participants from all over the world
Access to conference recording (72 hours of content)
Upskill through our workshops
Get answers on your questions from top speakers during Q&A sessions
Listen to experts during the panel discussion
Upon request, assistance in finding a job in the best Ukrainian projects and companies
A few days before the event, we will send you access to the streaming system.
All reports will be available for participants after the end of the Сonference. We will send them within a few days after the event.
25% — for students and teachers. In order to get a discount promo code, please send a photo of a student ticket/document to
Buy tickets with your friends — get a group discount!
5% — from 2 tickets
7% — from 3 tickets
10% — from 5 tickets
Data Science UA is a Ukrainian company that was established in 2016 in Kyiv, Ukraine.
Over the years we’ve built an ecosystem around the community of 5000+ professionals in data science and AI, which allows us to provide:
– high-quality recruitment (over 150+ closed senior-level positions);
– consulting for companies in Ukraine and around the world
– mentorship programs;
– opening AI R&D Centers.
We’ve organized 8 international conferences Data Science UA with 6000+ attendees and now launching the new format- International online 9th Data Science UA Conference.
We provide a 25% ticket discount for students and teachers. In order to get a discount promo code, send a photo of a student card or a document confirming that you are working as a professor to the info@data-science.com.ua. We will send you the promo code, that has to be applied when buying the ticket. Group discounts: 5% — from 2 tickets, 7% — from 3 tickets, 10%— from 5 tickets. Choose the desired number of tickets on the ticket sales page. The discount will apply automatically. Please note that discount are not applied to ‘First 100’ price.
To purchase tickets with cashless payment, send an e-mail to info@data-science.com.ua with the necessary information:
– The legal name of the company
– Personal info to create a ticket (name, last name, phone, position, mail)
– Requisites and number of tickets
Discounts
5% — from 2 tickets, 7% — from 3 tickets, 10%— from 5 tickets.
25% ticket discount for students and teachers.
Send your CV to cv@data-science.com.ua, we might know some projects you will like.
We are always ready to answer your questions, advise or direct you. We love working with people so that the interaction is as effective and comfortable as possible.
We setup AI R&D centers in Ukraine and provide full support for its startup and operations. Powered by our largest DS&AI community we can hire the engineering team and get it going in a matter of weeks.
We know almost all top talent engineers in Ukraine in person, so we are always the first to know about new job opportunities or job seekers. This allows us to help engineers finding interesting projects, and companies have the ability to use our recruiting services and find talented professionals within 2 weeks from the start of the search.
Many companies such as Ring, Grammarly, Samsung, DataRobot, Snap already have their R&D centers here, as Ukraine is the 1-st software development destination in Central and Eastern Europe, the 4-th largest exporter of IT products and services in the world.
Please send your inquiry to info@data-science.com.ua and we will give you the answer.
We help companies and individuals all over the world to design and implement solutions for data-driven decision making