A Winter of Discontent has shifted over the last few months to a Spring of hope. Many countries (and in the US, most states) are now actively vaccinating their populace against Covid-19, unemployment is dropping dramatically, and people are beginning to plan for the post-epidemic world.
As I write this, Norwescon, a science fiction conference held annually in Seattle, Washington since 1976, has wrapped up its first virtual incarnation. With a focus on science fiction writing and futurists, this convention has long been a favorite of mine, and a chance to talk with professional authors, media producers, and subject matter experts. This year, that whole process was held across Airmeet, which (a few minor glitches aside) performed admirably in bringing that experience to the computer screen.
It is likely that the year 2020 will be seen in retrospect as the Great Reset. While many of the panelists and audience members expressed a burning desire to see the end of the pandemic and the opportunity to meet in person, what surprised me was the fact that many also expressed the desire to continue with virtual conferences as an adjunct to the experience, rather than simply going back to the way things had been.
This theme is one I’ve been hearing echoed repeatedly outside this venue as well. It is almost certain that post-Covid, the business place will change dramatically to a point where work becomes a place to meet periodically, where the workweek of 9-5 days will likely transform into a 24-7 work environment where some times are considered quieter than others, and where the obsession with butts-in-seats gets replaced by a more goal-oriented view where reaching demonstrable objectives becomes more important than attendance.
This conference also laid bare another point – your audience, whether as a writer, a creative, a business, or any other organization, is no longer geographically bound. For the first time, people attended this convention from everywhere in the world, even as people who traditionally hadn’t been able to come locally because of health issues were able to participate this year. The requirements of geo-physicality have long been a subtle but significant barrier for many such people, and the opportunity for this segment of the population to attend this year meant that many more points of view were presented than would have been otherwise. This too illustrates that for all the talk of inclusiveness previous, the rise of the digital society may be the first time that such unacknowledged barriers are now actively being knocked over.
Finally, a theme that seemed to permeate the convention this year is that increasingly we are living in the future we visualized twenty years ago. Science fiction is not about “predicting” the future, despite the perception to the contrary. It is, instead, a chance to explore what-if scenarios, to look at the very real human stories as they are impacted by changes in technology. During one (virtual) hallway conversation I had, noted Philip K. Dick Award-winning author PJ Manney made the point that ultimately the technology, while important in science fiction, is not the central focus of writers in this genre. Rather, the stories being written explore the hard questions about what it means to be human in a world of dramatic change, something that anyone who works in this space should always keep in the back of their mind.
This is why we run Data Science Central, and why we are expanding its focus to consider the width and breadth of digital transformation in our society. Data Science Central is your community. It is a chance to learn from other practitioners, and a chance to communicate what you know to the data science community overall. I encourage you to submit original articles and to make your name known to the people that are going to be hiring in the coming year. As always let us know what you think.
Part 2 of this short series focused on fundamental techniques, see here. In this Part 3, you will find several machine learning tricks and recipes, many with a statistical flavor. These are articles that I wrote in the last few years. The whole series will feature articles related to the following aspects of machine learning:
Mathematics, simulations, benchmarking algorithms based on synthetic data (in short, experimental data science)
Opinions, for instance about the value of a PhD in our field, or the use of some techniques
Methods, principles, rules of thumb, recipes, tricks
My articles are always written in simple English and accessible to professionals with typically one year of calculus or statistical training, at the undergraduate level. They are geared towards people who use data but are interesting in gaining more practical analytical experience. Managers and decision makers are part of my intended audience. The style is compact, geared towards people who do not have a lot of free time.
Despite these restrictions, state-of-the-art, of-the-beaten-path results as well as machine learning trade secrets and research material are frequently shared. References to more advanced literature (from myself and other authors) is provided for those who want to dig deeper in the interested topics discussed.
1. Machine Learning Tricks, Recipes and Statistical Models
These articles focus on techniques that have wide applications or that are otherwise fundamental or seminal in nature.
Statistics: New Foundations, Toolbox, and Machine Learning Recipes
Available here. In about 300 pages and 28 chapters it covers many new topics, offering a fresh perspective on the subject, including rules of thumb and recipes that are easy to automate or integrate in black-box systems, as well as new model-free, data-driven foundations to statistical science and predictive analytics. The approach focuses on robust techniques; it is bottom-up (from applications to theory), in contrast to the traditional top-down approach.
The material is accessible to practitioners with a one-year college-level exposure to statistics and probability. The compact and tutorial style, featuring many applications with numerous illustrations, is aimed at practitioners, researchers, and executives in various quantitative fields.
Applied Stochastic Processes
Available here. Full title: Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of Numeration Systems (104 pages, 16 chapters.) This book is intended for professionals in data science, computer science, operations research, statistics, machine learning, big data, and mathematics. In 100 pages, it covers many new topics, offering a fresh perspective on the subject.
It is accessible to practitioners with a two-year college-level exposure to statistics and probability. The compact and tutorial style, featuring many applications (Blockchain, quantum algorithms, HPC, random number generation, cryptography, Fintech, web crawling, statistical testing) with numerous illustrations, is aimed at practitioners, researchers and executives in various quantitative fields.
To receive a weekly digest of our new articles, subscribe to our newsletter, here.
About the author: Vincent Granville is a data science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at DataShaping.com, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target). You can access Vincent’s articles and books, here.
Limitations on physical interactions throughout the world have reshaped our lives and habits. And while the pandemic has been disrupting the majority of industries,e-commercehas been thriving. This article covers how reinforcement learning fordynamic pricinghelps retailers refine their pricing strategies to increase profitability and boost customer engagement and loyalty.
In dynamic pricing, we want an agent to set optimal prices based on market conditions. In terms of RL concepts, actions are all of the possible prices and states, market conditions, except for the current price of the product or service.
Usually, it is incredibly problematic to train an agent from an interaction with a real-world market. The reason is that an agent should gain lots of samples from an environment, which is a very time-consuming process. Also, there exists an exploration-exploitation trade-off. It means that an agent should visit a representable subset of the whole state space, trying out different actions. Consequently, an agent will act sub-optimally while training and could lose lots of money for a company.
An alternative approach is to use a simulation of the environment. Using a prognostication model, we can compute the reward (for example, income) based on the state (market conditions, except current price), and the action is the current price. So, we only need to model transitions between states. This task strongly depends on the state representation, but it tends to create a few modelling assumptions to be solved. The main drawback of the RL approach is that it is extremely hard to simulate a market accurately.
Sales data
For simplicity, we use simulated sales rather than real ones. Sales data are simulated as a sum of a price-dependent component, a highly seasonal component dependent on time and a noise term. To get a seasonal component, we use the Google Trends data of a highly seasonal product – a swimming pool. Google Trends provides weekly data for over five years. There is a clear one-year seasonality in the data, so it is easy to extract it and use it as a first additive term for sales. Since this term repeats with the one year, it is a function of a week number, ranging from 0 to 52.
The second term depends on prices from the current timestamp as well as the previous timestamp to model sales. The overall formula looks like this:
Here, f may be any monotonically decreasing function. The intuition is that if all other features are fixed, increasing the company’s price (either current or previous) decreases sales. On the other hand, increasing competitors’ prices leads to increased sales. The random term is sampled from a zero-mean normal distribution.
We set the function f to be linear with a negative coefficient. This allows us to analytically find a greedy policy and compare it with the RL agent’s performance.
Experiments
We treat the dynamic pricing task as an episodic task with a one-year duration, consisting of 52 consecutive steps. We assume that competitors change their prices randomly.
We compare different agents by running 500 simulations and collecting cumulative rewards over 52 weeks. The graph below shows the performance of the random and greedy agents
Tabular Q-learning
Q-learning is an off-policy temporal difference control algorithm. Its main purpose is to iteratively find the action values of an optimal policy (optimal action values).
Using these action values, we can easily find an optimal policy. It would be any greedy policy with respect to optimal action values. The following updated formula is used:
The estimates converge to the optimal action values, independent of the policy used (usually epsilon-greedy with respect to the current estimates is used). This updated formula can also be treated as an iterative way of solving Bellman optimality equations.
This algorithm assumes a discrete state space and action space. Accordingly, before running this algorithm, we should discretise continuous variables into bins. The name “tabular” means that action values are stored in one huge table. Memory usage and training time grow exponentially with the increase in the number of features in the state representation, making it computationally intractable for complex environments (for example Atari games).
As we can see, this approach outperforms a random agent, but cannot outperform a greedy agent.
Deep Q-network
The deep Q-network (DQN) algorithm is based on the same idea as Tabular Q-learning. The main difference is that DQN uses a parametrised function to approximate optimal action values. More specifically, DQN uses artificial neural networks (ANNs) as approximators. Based on state representation, both convolutional neural networks and recurrent neural networks can be used.
The optimisation objective at iteration i looks as follows:
where:
The behaviour distribution p is obtained by acting epsilon-greedy with respect to the previous model’s parameters.
The gradient of the objective is as follows:
The loss function is optimised by stochastic gradient descent. Instead of computing a full expectation, a single-sample approximation can be used, leading to an updated Q-learning formula.
Two problems that make the optimisation process harder are correlated input data and the dependence of the target on the model’s parameters. Both problems are tackled by using an experience replay mechanism. At each step it saves a trajectory into a buffer, then instead of using a single trajectory to update parameters, a full batch sampled from a buffer is used.
With DQN you can use higher-dimensional state spaces (even images can be used). Also, it tends to be more sample-efficient than the Tabular approach. The reason is that ANNs can generalise to unseen states, even if the agent does not act from those states. On the other hand, the Tabular approach requires the whole state space to be visited. DQN is sample-efficient because it is an experience replay, which allows multiple uses of a single sample.
As we can see, DQN outperforms all other agents. Also, it was trained on a smaller number of episodes.
Policy Gradients
The policy gradients algorithm uses an entirely different idea to learn an optimal policy. Instead of learning optimal action values and moving greedily with respect to them, policy gradients directly parametrise and optimise a policy. ANNs are often used to parametrise a policy.
The difficulty here is that an optimisation objective, the state-value of the first state, depends on a dynamics function p, which is unknown.
That is why policy gradients use the fact that the gradient of the objective is an expected value of a random variable, which is approximated while acting in the environment.
We can subtract a baseline, which depends only on the state, from action value estimates to reduce variance. This does not affect the mean but can significantly reduce the variance, thus speeding up the learning process.
Usually, state-value estimates are used as a baseline.
Then stochastic gradient ascent is done using the following formula:
This method requires a lot of interaction with an environment in order to converge.
Policy gradients outperform a greedy agent but do not perform as well as DQN. Likewise, policy gradients require far more episodes then DQN.
Please note that the real-world market environment and dependencies in it are way more complicated. This article covers a basic simulation proving that approach is working and can be applied to real data.
Autonomous vehicle’ is a buzzword that’s been circulating in recent decades. However, the development of such a vehicle has posed a significant challenge for automotive manufacturers. This article describes howdeep learningautonomous driving and navigation can help to turn the concept into a long-awaited reality.
Thelow-touch economyin a post-pandemic world is driving the introduction of autonomous technologies that can satisfy our need for contactless interactions. Whether it’s self-driving vehicles delivering groceries or medicines or robo-taxis driving us to our desired destinations, there’s never been a bigger demand for autonomy.
Self-driving vehicles have six different levels of autonomy, from drivers being in full control to full automation. According toStatista, the market for autonomous vehicles in levels 4 and 5 will reach $60 billion by 2030. The same research indicates that 73% of the total number of cars on our roads will have at least some level of autonomy before fully autonomous vehicles are introduced.
Countries and automobile companies around the world are working on bringing a higher level of unmanned driving to a wider audience. South Korea has recently announced it is to invest around$1 billionin autonomous vehicle technologies and introduce a level 4 car by 2027.
Machine learning and deep learning are among other technologies that enable more sophisticated autonomous vehicles.Applications of deep learning techniquesin self-driving cars include:
Scene classification
Path planning
Scene understanding
Lane and traffic sign recognition
Obstacle and pedestrian detection
Motion control
Deep learning for autonomous navigation
Deep learning methods can help to address the challenges of perception and navigation in autonomous vehicle manufacturing. When a driver navigates between two locations, they drive using their knowledge of the road, how streets look like and traffic lights, etc. It is a simple task for a human driver, but quite a challenge for an autonomous vehicle.
Here at ELEKS, we’ve created a demo model that can help vehicles to navigate the environment as humans already do – using eyesight and previous knowledge. We came up with a solution that offers autonomous navigation without GPS and vehicle telemetry by using modern deep learning methods and other data science possibilities.
We used only an on-dash camera and street view dataset of the city of Lviv, Ukraine; we used no GPS or sensors. Below is an overview of the techniques applied and our key findings.
1. Image segmentation task
We used a Cityscapes dataset with 19 classes, which focuses on buildings, road, signs, etc., and an already trained model from DeepLab. The model we used is based on Xception inference. Other models with different maps/IoUs are also available.
Final layers were the semantic probabilities — mostly dim ~ classes*output_image_dims (without channels), so they could be filtered and become the inference to similarity model. It is recommended to transform them into the embedding layer or find a more suitable layer before the outputs. However, even after transformation objects position (higher or lower) on the frame and distance to it, may have influenced the embedding robustness.
2. Gathering additional data and labelling
We then downloaded raw photos from the web of the streets, road names and locations (coordinates, etc.), and we also got the Street View API key for download. We added labels in semi-automated ways based on the names and locations and verified them manually. We created pairs of images for similarity model training.
Finally, we used the image augmentation (also adding photos of different times of day and seasons) and image labelling using model (for example, additional negative samples, which the model recognizes as similar, but they are not located on the one street (GPS, street names, etc.)). As a result, we created a dataset containing approximately 8-12K augmented images.
3. Similarity models ideation and validation
We tested a few streets view comparison approaches from classical descriptor and template matching to modern SOTA DL algorithms like QATM. The most accurate was the inference model with representation for each segmented image in a pair, like VGG, ResNet or efficientNet and binary classifier (xgb or rf). Validated accuracy equals to approximately 82.5% (whether the right street was found or not), taking into account Lviv’s most known streets between 2011 and 2019 and with augmentation (changing image shapes, lightning, etc).
4. Outcome and performance features
We segmented every tenth frame, which was helpful for near real-time calculation, and because there would not be any huge changes in the environment in the space of 10 frames (1/3 s). Then, DeepLab models have shown >70 mIoU (Cityscapes, third semantic mat – buildings), time for prediction – CPU 15s-more than 10m based on Xception, GPU ~ <1s.
The similarity prediction was equal to 1min per 100 pairs (inference on GPU (4Gb VRAM) + classifier on 6 CPU cores). It can be optimised, after the first estimated positions, by limiting search only to closer street views, because the vehicle can’t appear more than 1 km in 10-50 frames.
Not all of the city’s streets were covered, so we found videos with a drive around the city centre. For the map positioning, we used wiki maps; however, other maps can be used if needed. We got the vehicle coordinates from street image metadata (lat/long, street name).
Some streets segments are available in a few different versions — the same location in 2011, 2015 or 2019 and photos from different sources, etc., so the classifier can find any of them. We used mostly weak affine transformations for the street’s augmentation with no flipping or strong colour and shape changes.
Some of the estimations may be inaccurate for the following reasons:
Street and road estimation – the static object area is low, street noise is quite high (vehicles, pedestrians) or seasonality changes (trees, snow, rain, etc.)
Vehicle position and speed errors – the same street position and different street step or Euclidean distance for curved streets can be viewed with a different focus (distance to an object), etc.
You can check out a video sample of prerecorded navigation with post-processinghere.
Data is more present and more powerful than ever. It can be tapped to tailor products, services and experiences. It contains insights on all manner of things; from shopping and travel habits, to music preferences, to clinical drug trial efficiency. And, critically for businesses, it can improve operational efficiency, customer conversion and brand loyalty. DataOps can help developers facilitate data management to add real value to businesses and customers alike.
it can come in many different forms and there’s so much of it, data is a messy mass to handle.
Modern data analyticsrequires a high level of automation in order to test validity, monitor the performance and behaviour of data pipelines, track data lineage, detect anomalies that might indicate a quality issue and much more besides.
DataOps is a methodology, created to tackle the problem of repeated, mundane data processing tasks, thus making analytics easier and faster, while enabling transparency and quality-detection within data pipelines. Mediumdescribes DataOps’ aimas; “to reduce the end-to-end cycle time of data analytics, from the origin of ideas to the literal creation of charts, graphs and models that create value”.
So, what are the DataOps principles that can boost your business value?
What’s the DataOps Manifesto?
DataOps relies on much more than just automating parts of the data lifecycle and establishing quality procedures. It’s as much about the innovative people using it, as it is a tool in and of itself. That’s where theDataOps Manifestocomes in. It was devised to help businesses facilitate and improve their data analytics processes. The Manifesto lists 18 core components, which can be summarised as the following:
Enabling end-to-end orchestration and monitoring
Focus on quality
Introducing an Agile way of working
Building a long-lasting data ecosystem that will continuously deliver and improve at a sustainable pace (based on customer feedback and team input)
Developing efficient communication between (and throughout) different teams and their customers
Viewing analytics delivery as ‘lean manufacturing’ which strives for improvements and facilitates the reuse of components and approaches
Choosing simplicity over complexity
DataOps enables businesses to transform their data management and data analytics processes. By implementing intelligent DataOps strategies, it’s possible to deploy massive disposable data environments where it would have been impossible otherwise. Additionally, following this methodology can have huge benefits for companies in terms of regulatory compliance. For example, DataOps combined withmigration to a hybrid cloudallows companies to safeguard the compliance of protected and sensitive data, while taking advantage of cloud cost savings for non-sensitive data.
DataOps and the data pipeline
It is common to imagine data pipeline as a conveyor belt-style manufacturing process, where the raw data enters one end of the pipeline and is processed into usable forms by the time it reaches the other end. Much like a traditional manufacturing line, there are stringent quality and efficiency management processes in place along the way. In fact, because this analogy is so apt, the data pipeline is often referred to as a “data factory”.
This refining process delivers quality data in the form of models and reports, which data analysts can use for the myriad reasons mentioned earlier, and far more beyond those. Without the data pipeline, the raw information remains illegible.
The key benefits of DataOps
The benefits of DataOps are many. It creates a much faster end-to-end analytics process for a start. With the help of Agile development methodologies, the release cycle can occur in a matter of seconds instead of days or weeks. When used within the environment of DataOps, Agile methods allow businesses to flex to changing customer requirements—particularly vital nowadays—and deliver more value, quicker.
A few other important benefits are:
Allows businesses to focus on important issues. With improved data accuracy and less time spent on mundane tasks, analytics teams can focus on more strategic issues.
Enables instant error detection. Tests can be executed to catch data that’s been processed incorrectly, before it’s passed downstream.
Ensures high-quality data. Creating automated, repeatable processes with automatic checks and controlled rollouts reduces the chances that human error will end up being distributed.
Creates a transparent data model. Tracking data lineage, establishing data ownership and sharing the same set of rules for processing different data sources creates a semantic data model that’s easy for all users to understand—thus,data can be used to its full potential.
So how is DataOps implemented within an organisation? There are four key stages within the DataOps roadmap, illustrated in simple form below.
In these uncertain and unprecedented times due to the COVID-19 outbreak, more and more businesses are witnessing a slow-down in their operations. However, the construction market is continuing to be resilient in spite of the tremendous challenges brought about by COVID-19 pandemic.
When it comes to construction sites, drive-thru strategies and work from home are not feasible as they need to run job sites. Artificial intelligence (AI) is one such technology in construction industry, which is helping to sustain in these trying times. According to a Research Dive published report, the COVID-19 pandemic has negatively impacted the global AI in construction market .
Transforming the Construction Industry
Artificial intelligence in the realm of the engineering & construction industry, is renovating construction to ‘artificial construction.’ In these pandemic situation, AI is leading to real-time VR (Virtual Reality) construction models and reducing errors. The major market player are implementing several business strategies in AI in construction market to sustain in these trying times. For instance, Vinnie, a construction-trained AI engine by smartvid.io can see if the workers are close to each other in this COVID-19 pandemic through the introduction of its novel ‘People in Group’ analytics.
Today, AI in construction industry has become a common tool for carrying out many construction activities. In addition, many big companies in the construction industry all across the globe are immensely adopting AI as it boasts a multitude of applications. AI has the ability to accurately evaluate the cost overrun of a project, on the basis of factors such as type of contract, size, and also the level of competence of the managers to risk moderation via self-driving machinery and equipment. Thus, there’s no reason for AI not being a part of toolset for any construction firm.
The activities detected by artificial intelligence in construction in every image include:
Excavation
Foundation
Demolition
Trench Work
Concrete Pour
Structural Steel
Mechanical, Electrical, and Plumbing
Finish work
New COVID-19 Tags of AI in Construction
The novel tags of artificial intelligence in construction activities due to coronavirus pandemic provide opportunities for improved and efficient workflows. AI can automatically acclaim risk ratings on the basis of the hazards detected in combination with circumstantial tags. In addition, during the ongoing demolition, you can dive more deeply into the images, find any demolition photo with a quick search, and create an observation. The rules to follow during this pandemic situation can be also established within these contexts.
Artificial intelligence has already created a benchmark in construction industry. Unlike humans who would lose focus and tire after identifying hazards in large number of images, AI equipment never fades in finding out the hazards quickly and accurately. Beside, with the new tags for COVID-19, AI can now adopt some of the context that people clench without barely thinking. The better AI is getting in construction activities, the more information will be obtained such as identifying risks and prevent any hazardous incidents from happening.
The new tags of AI in construction for COVID-19 include:
Worker in Cab – These photos show someone operating a machine or driving a vehicle.
Worker at Height – These images show someone on a ladder, in a lift, or near an unprotected edge.
Workers in Groups: This is a way to identify whether the worker are standing too close to each other in groups and not performing “social distancing” in the new era of coronavirus.
Today, the vast majority of construction firms big or small are relying more on technology such as artificial intelligence to sustain. The key players of AI in construction industry are advancing AI-trained equipment and engines to effectively run their businesses in the coronavirus chaos. Thus, in reality, this has become a necessity for the firms to innovate technology in order to keep the business afloat amid these uncertain times.
About Us: Research Dive is a market research firm based in Pune, India. Maintaining the integrity and authenticity of the services, the firm provides the services that are solely based on its exclusive data model, compelled by the 360-degree research methodology, which guarantees comprehensive and accurate analysis. With unprecedented access to several paid data resources, team of expert researchers, and strict work ethic, the firm offers insights that are extremely precise and reliable. Scrutinizing relevant news releases, government publications, decades of trade data, and technical & white papers, Research dive deliver the required services to its clients well within the required timeframe. Its expertise is focused on examining niche markets, targeting its major driving factors, and spotting threatening hindrances. Complementarily, it also has a seamless collaboration with the major industry aficionado that further offers its research an edge.
The adoption of technology in the Manufacturing Industry has been slow, but steady. Technology adoption has been relatively faster when it has helped improve productivity, boost quality, and reduce costs. Industry CXOs are convinced that IT has a major role to play in manufacturing, but less than 30 percent of manufacturers have adopted Industry 4.0 technologies.[i] Now, with COVID-19 disrupting supply chains, manufacturers are being forced to examine virtualization and automation opportunities for their plants and MES in a bid to make them more resilient.
The problem is that many manufacturing organizations have created home-grown tools around Manufacturing Operations Management (MOM). These solutions cannot withstand the shock of COVID-19 type disruptions. They need to evolve into Smart Manufacturing systems. This is why, now is a good time to invest in full-fledged MES that leads to a connected, transparent, optimized and agile organization.
There are pockets in the manufacturing sector that appreciate the potential of MES. However, most are still improving their understanding of MES and how it can extract benefits across the supply chain.
At KONE, the Finnish engineering services organization known for its moving walkways and elevators, MES has been used as a starting point for its transformation to a digital factory. “MES is not only about tool implementation,” says Martin Navratil, Director, Manufacturing Network Development, who has been implementing MES at KONE for a few years now. “It is about the commitment of leadership and change management.”[ii] KONE embedded MES into its manufacturing strategy to access real time data while executing processes in their warehouse or during production on the shop floor. The availability of continuous data (during a shift, day-wise, week-wise, etc.) has improved efficiency and responsiveness to customer needs. Navratil says that MES has made an impact in four areas:
Driving collaborative innovation: MES is the foundation to bring digital competencies into the organizations by synchronizing the processes, tools, materials, equipment and people on a global scale.
Enabling a service mind set: MES connects geographically dispersed factories, putting an end to unsustainable and fragmented systems. The flexibility it provides supports a service mind set.
Building customer centric solutions: MES minimizes the cycle time and improves responsiveness. It also provides data to continuously drive a Lean and Six Sigma culture to improve quality.
Fast and smart execution: MES provides maximum transparency to customers regarding deliveries. Real time data is visually available to everyone, allowing the organization to put the customer at the center of the operations, reduce time to market and create customer trust.
MES places real time data into the hands of the organization, allowing it to become intelligent and exploit opportunities for faster improvement. Not only can supervisors monitor production during a shift (to achieve targets or improve asset utilization) but they can also transfer the granular data to other parts of the organization such as the maintenance and engineering teams for faster service response and to enable continuous improvement. The data also introduces great traceability, leading to excellence in delivery. “MES has brought many more opportunities to achieve better results,” observes Navratil.
For manufacturing organizations, MES is strategic to changing the way of working and to increase technological maturity – an essential pre-condition to the adoption of Industry 4.0 technologies. KONE provides an example of how MES can impact the organization and make it future ready.
Last week, I saw a nice presentation on Probabilistic Programming from a student in Iran (link below).
I am interested in this subject for my teaching at the #universityofoxford. In this post, I provide a brief introduction to Probabilistic programming. Probabilistic programming is a programming paradigm designed to implement and solve probabilistic models. They unite probabilistic modeling and traditional general-purpose programming.
Probabilistic programming techniques aim to create systems that help make decisions in the face of uncertainty. There are already a number of existing statistical techniques that handle uncertainty ex Latent Variable Models and Probabilistic Graphical Models.
There are several tools and libraries for Probabilistic Programming: PyMC3 (Python, Backend: Theano) , Pyro (Python, Backend: PyTorch), Edward (Python, Backend TensorFlow) Turing (Julia) and TensorFlow Probability
While Probabilistic Programming techniques are powerful, they are relatively complex for traditional developers. Because variables are assigned a probability distribution, Bayesian techniques are a key element of probabilistic programming. But because the mathematics of Bayesian inference is intractable, we use other techniques that build on Bayesian strategies such as Markov Chain Monte Carlo, Variational Inference and Expectation Propagation
Deep Smoke Segmentation Inspired by the recent success of fully convolutional networks (FCN) in semantic segmentation, we propose a deep smoke segmentation network to infer high quality segmentation masks from blurry smoke images. To overcome large variations in texture, color and shape of smoke appearance, we divide the proposed network into a coarse path and a fine path. The first path is an encoder-decoder FCN with skip structures, which extracts global context information of smoke and accordingly generates a coarse segmentation mask. To retain fine spatial details of smoke, the second path is also designed as an encoder-decoder FCN with skip structures, but it is shallower than the first path network. Finally, we propose a very small network containing only add, convolution and activation layers to fuse the results of the two paths. Thus, we can easily train the proposed network end to end for simultaneous optimization of network parameters. To avoid the difficulty in manually labelling fuzzy smoke objects, we propose a method to generate synthetic smoke images. According to results of our deep segmentation method, we can easily and accurately perform smoke detection from videos. Experiments on three synthetic smoke datasets and a realistic smoke dataset show that our method achieves much better performance than state-of-the-art segmentation algorithms based on FCNs. Test results of our method on videos are also appealing. …
Proximal Meta-Policy Search (ProMP) Credit assignment in Meta-reinforcement learning (Meta-RL) is still poorly understood. Existing methods either neglect credit assignment to pre-adaptation behavior or implement it naively. This leads to poor sample-efficiency during meta-training as well as ineffective task identification strategies. This paper provides a theoretical analysis of credit assignment in gradient-based Meta-RL. Building on the gained insights we develop a novel meta-learning algorithm that overcomes both the issue of poor credit assignment and previous difficulties in estimating meta-policy gradients. By controlling the statistical distance of both pre-adaptation and adapted policies during meta-policy search, the proposed algorithm endows efficient and stable meta-learning. Our approach leads to superior pre-adaptation policy behavior and consistently outperforms previous Meta-RL algorithms in sample-efficiency, wall-clock time, and asymptotic performance. Our code is available at https://…/promp. …
Equivariant Transformer (ET) How can prior knowledge on the transformation invariances of a domain be incorporated into the architecture of a neural network? We propose Equivariant Transformers (ETs), a family of differentiable image-to-image mappings that improve the robustness of models towards pre-defined continuous transformation groups. Through the use of specially-derived canonical coordinate systems, ETs incorporate functions that are equivariant by construction with respect to these transformations. We show empirically that ETs can be flexibly composed to improve model robustness towards more complicated transformation groups in several parameters. On a real-world image classification task, ETs improve the sample efficiency of ResNet classifiers, achieving relative improvements in error rate of up to 15% in the limited data regime while increasing model parameter count by less than 1%. …
Joint and Individual Variation Explained (JIVE) Research in several fields now requires the analysis of datasets in which multiple high-dimensional types of data are available for a common set of objects. In particular, The Cancer Genome Atlas (TCGA) includes data from several diverse genomic technologies on the same cancerous tumor samples. In this paper we introduce Joint and Individual Variation Explained (JIVE), a general decomposition of variation for the integrated analysis of such datasets. The decomposition consists of three terms: a low-rank approximation capturing joint variation across data types, low-rank approximations for structured variation individual to each data type, and residual noise. JIVE quantifies the amount of joint variation between data types, reduces the dimensionality of the data, and provides new directions for the visual exploration of joint and individual structure. The proposed method represents an extension of Principal Component Analysis and has clear advantages over popular two-block methods such as Canonical Correlation Analysis and Partial Least Squares. A JIVE analysis of gene expression and miRNA data on Glioblastoma Multiforme tumor samples reveals gene-miRNA associations and provides better characterization of tumor types. …
What is Distributed Artificial Intelligence (DAI)?
Attempts to find a “bullet-proof” definition have not produced result: it seems like the term is slightly “ahead of time”. Still, we can analyze semantically the term itself – deriving that distributed artificial intelligence is the same AI (see our effort to suggest an “applied” definition) though partitioned across several computers that are not clustered together (neither data-wise, nor via applications, not by providing access to particular computers in principle). I.e., ideally, distributed artificial intelligence should be arranged in such a way that none of the computers participating in that “distribution” have direct access to data nor applications of another computer: the only alternative becomes transmission of data samples and executable scripts via “transparent” messaging. Any deviations from that ideal should lead to an advent of “partially distributed artificial intelligence” – an example being distributed data with a central application server. Or its inverse. One way or the other, we obtain as a result a set of “federated” models (i.e., either models trained each on their own data sources, or each trained by their own algorithms, or “both at once”).
Distributed AI scenarios “for the masses”
We will not be discussing edge computations, confidential data operators, scattered mobile searches, or similar fascinating yet not the most consciously and wide-applied (not at this moment) scenarios. We will be much “closer to life” if, for instance, we consider the following scenario (its detailed demo can and should be watched here): a company runs a production-level AI/ML solution, the quality of its functioning is being systematically checked by an external data scientist (i.e., an expert that is not an employee of the company). For a number of reasons, the company cannot grant the data scientist access to the solution but it can send him a sample of records from a required table following a schedule or a particular event (for example, termination of a training session for one or several models by the solution). With that we assume, that the data scientist owns some version of the AI/ML mechanisms already integrated in the production-level solution that the company is running – and it is likely that they are being developed, improved, and adapted to concrete use cases of that concrete company, by the data scientist himself. Deployment of those mechanisms into the running solution, monitoring of their functioning, and other lifecycle aspects are being handled by a data engineer (the company employee).
An example of deployment of a production-level AI/ML solution on InterSystems IRIS platform that works autonomously with a flow of data coming from equipment, was provided by us in this article. The same solution runs in the demo under the link provided in the above paragraph. You can build your own solution prototype on InterSystems IRIS using the content (free with no time limit) in our repo Convergent Analytics (visit sections Links to Required Downloads and Root Resources).
Which “degree of distribution” of AI do we get via such scenario? In our opinion, in this scenario we are rather close to the ideal because the data scientist is “cut from” both the data (just a limited sample is transmitted – although crucial as of a point in time) and the algorithms of the company (data scientist’s own “specimens” are never in 100% sync with the “live” mechanisms deployed and running as part of the real-time production-level solution), he has no access at all to the company IT infrastructure. Therefore, the data scientist’s role resolves to a partial replay on his local computational resources of an episode of the company production-level AI/ML solution functioning, getting an estimate of the quality of that functioning at an acceptable confidence level – and returning a feedback to the company (formulated, in our concrete scenario, as “audit” results plus, maybe, an improved version of this or that AI/ML mechanism involved in the company solution).
Figure 1 Distributed AI scenario formulation
We know that feedback may not necessarily need to be formulated and transmitted during an AI artifact exchange by humans, this follows from publications about modern instruments and already existing experience around implementations of distributed AI. However, the strength of InterSystems IRIS platform is that it allows equally efficiently to develop and launch both “hybrid” (a tandem of a human and a machine) and fully automated AI use cases – so we will continue our analysis based on the above “hybrid” example, while leaving a possibility for the reader to elaborate on its full automation on their own.
How a concrete distributed AI scenario runs on InterSystems IRIS platform
The intro to our video with the scenario demo that is mentioned in the above section of this article gives a general overview of InterSystems IRIS as real-time AI/ML platform and explains its support of DevOps macromechanisms. In the demo, the “company-side” business process that handles regular transmission of training datasets to the external data scientist, is not covered explicitly – so we will start from a short coverage of that business process and its steps.
A major “engine” of the sender business processes is the while-loop (implemented using InterSystems IRIS visual business process composer that is based on the BPL notation interpreted by the platform), responsible for a systematic sending of training datasets to the external data scientist. The following actions are executed inside that “engine” (see the diagram, skip data consistency actions):
Figure 2 Main part of the “sender” business process
(a) Load Analyzer – loads the current set of records from the training dataset table into the business process and forms a dataframe in the Python session based on it. The call-action triggers an SQL query to InterSystems IRIS DBMS and a call to Python interface to transfer the SQL result to it so that the dataframe is formed;
(b) Analyzer 2 Azure – another call-action, triggers a call to Python interface to transfer it a set of Azure ML SDK for Python instructions to build required infrastructure in Azure and to deploy over that infrastructure the dataframe data formed in the previous action;
As a result of the above business process actions executed, we obtain a stored object (a .csv file) in Azure containing an export of the recent dataset used for model training by the production-level solution at the company:
Figure 3 “Arrival” of the training dataset to Azure ML
With that, the main part of the sender business process is over, but we need to execute one more action keeping in mind that any computation resources that we create in Azure ML are billable (see the diagram, skip data consistency actions):
Figure 4 Final part of the “sender” business process
(c) Resource Cleanup – triggers a call to Python interface to transfer it a set of Azure ML SDK for Python instructions to remove from Azure the computational infrastructure built in the previous action.
The data required for the data scientist has been transmitted (the dataset is now in Azure), so we can proceed with launching the “external” business process that would access the dataset, run at least one alternative model training (algorithmically, an alternative model is distinct from the model running as part of the production-level solution), and return to the data scientist the resulting model quality metrics plus visualizations permitting to formulate “audit findings” about the company production-level solution functioning efficiency.
Let us now take a look at the receiver business process: unlike its sender counterpart (runs among the other business processes comprising the autonomous AI/ML solution at the company), it does not require a while-loop, but it contains instead a sequence of actions related to training of alternative models in Azure ML and in IntegratedML (the accelerator for use of auto-ML frameworks from within InterSystems IRIS), and extracting the training results into InterSystems IRIS (the platform is also considered installed locally at the data scientist’s):
Figure 5 “Receiver” business process
(a) Import Python Modules – triggers a call to Python interface to transfer it a set of instructions to import Python modules that are required for further actions;
(b) Set AUDITOR Parameters – triggers a call to Python interface to transfer it a set of instructions to assign default values to the variables required for further actions;
(c) Audit with Azure ML – (we will be skipping any further reference to Python interface triggering) hands “audit assignment” to Azure ML;
(d) Interpret Azure ML – gets the data transmitted to Azure ML by the sender business process, into the local Python session together with the “audit” results by Azure ML (also, creates a visualization of the “audit” results in the Python session);
(e) Stream to IRIS – extracts the data transmitted to Azure ML by the sender business process, together with the “audit” results by Azure ML, from the local Python session into a business process variable in IRIS;
(f) Populate IRIS – writes the data transmitted to Azure ML by the sender business process, together with the “audit” results by Azure ML, from the business process variable in IRIS to a table in IRIS;
(g) Audit with IntegratedML – “audits” the data received from Azure ML, together with the “audit” results by Azure ML, written into IRIS in the previous action, using IntegratedML accelerator (in this particular case it handles H2O auto-ML framework);
(h) Query to Python – transfers the data and the “audit” results by IntegratedML into the Python session;
(i) Interpret IntegratedML – in the Python session, creates a visualization of the “audit” results by IntegratedML;
(j) Resource Cleanup – deletes from Azure the computational infrastructure created in the previous actions.
Figure 6 Visualization of Azure ML “audit” results
Figure 7 Visualization of IntegratedML “audit” results
How distributed AI is implemented in general on InterSystems IRIS platform
InterSystems IRIS platform distinguishes among three fundamental approaches to distributed AI implementation:
Direct exchange of AI artifacts with their local and central handling based on the rules and algorithms defined by the user
AI artifact handling delegated to specialized frameworks (for example: TensorFlow, PyTorch) with exchange orchestration and various preparatory steps configured on local and the central instances of InterSystems IRIS by the user
Both AI artifact exchange and their handling done via cloud providers (Azure, AWS, GCP) with local and the central instances just sending input data to a cloud provider and receiving back the end result from it
Figure 8 Fundamental approaches to distributed AI implementation on InterSystems IRIS platform
These fundamental approaches can be used modified/combined: in particular, in the concrete scenario described in the previous section of this article (“audit”), the third, “cloud-centric”, approach is used with a split of the “auditor” part into a cloud portion and a local portion executed on the data scientist side (acting as a “central instance”).
Theoretical and applied elements that are adding up to the “distributed artificial intelligence” discipline right now in this reality that we are living, have not yet taken a “canonical form”, which creates a huge potential for implementation innovations. Our team of experts follows closely the evolution of distributed AI as a discipline, and constructs accelerators for its implementation on InterSystems IRIS platform. We would be glad to share our content and help everyone who finds useful the domain discussed here to start prototyping distributed AI mechanisms.