One of the more fascinating things about data, especially as you gather more and more of it together, is the extent to which information is connected. Some of this is due to the fact that we share geospatial envelopes (we’re in the same general range and time as the things that are described) but much of it has to do with the fact that we describe and define things as compositions of other things.
The Customer 360, etc., data environments make use of this fact – customers are connected to purchases (real or potential) via contracts and interactions to providers, to locations, to interest groups, and so forth, each of which in turn is connected to other things. This network of things and relationships exists in almost all databases except the most simplistic, forming graphs of information.
Such graphs typically depend upon the consistent identification of keys. A typical SQL database maintains local keys, such as integers that are tied to a given table. These keys by themselves are not unique, but can be made to be unique by qualifying these index values with the identity of the database and the associated table that the primary keys are associated with.
Specialized databases called triple stores work with globally unique keys, but even there, much of what restricts the semantic web from taking off is in identifying when two separate keys in different systems actually refer to the same entity. This becomes especially problematic as the entities being referred to become more abstract.
One area where this is becoming addressed is in the rise of non-fungible tokens or NFTs. An NFT is a piece of intellectual property with an interwoven encryption key that identifies that property uniquely – there is in essence only one such object, even digitally. This means that if you create a movie (as an example), then assign one copy of that movie to an NFT, that token serves to identify that resource absolutely. With that concrete example, you can talk about different representations of an object, but ultimately, for identification purposes, these abstractions and revisions still ultimately can be traced back to the NFT. In effect, NFTs become vehicles for establishing provenance.
This connectedness and the ability to uniquely identify virtual products likely will be at the center of the next stage of data – the move from enterprise data to global data. At this stage, the ability is coming for autoclassification of assets, for determination of key cognates (e.g., master data management), and the emergence of protocols for the safe sharing and protecting that data.
This is why we run Data Science Central, and why we are expanding its focus to consider the width and breadth of digital transformation in our society. Data Science Central is your community. It is a chance to learn from other practitioners, and a chance to communicate what you know to the data science community overall. I encourage you to submit original articles and to make your name known to the people that are going to be hiring in the coming year. As always let us know what you think.
DSC is looking for editorial content specifically in these areas for May, with these topics likely having higher priority than other incoming articles.
GANs and Adversarial Networks
Data-Ops
Non-Fungible Tokens
Post-Covid Work
No Code Computing
Integration of Machine Learning and Knowledge Graphs
This email, and all related content, is published by Data Science Central, a division ofTechTarget, Inc.
275 Grove Street, Newton, Massachusetts, 02466 US
You are receiving this email because you are a member of TechTarget. When you access content from this email, your information may be shared with the sponsors or future sponsors of that content and with our Partners, see up-to-date Partners List below, as described in our Privacy Policy. For additional assistance, please contact: webmaster@techtarget.com
copyright 2021 TechTarget, Inc. all rights reserved. Designated trademarks, brands, logos and service marks are the property of their respective owners.
In this article, we will explore the Liskov’s substitution principle, one of the SOLID principles and how to implement it in a Pythonic way. The SOLID principles entail a series of good practices to achieve better-quality software. In case some of you aren’t aware of what SOLID stands for, here it is:
S: Single responsibility principle
O: Open/closed principle
L: Liskov’s substitution principle
I: Interface segregation principle
D: Dependency inversion principle
The goal of this article is to implement proper class hierarchies in object-oriented design, by complying with Liskov’s substitution principle.
Liskov’s substitution principle
Liskov’s substitution principle(LSP) states that there is a series of properties that an object type musthold to preserve the reliability of its design.
The main idea behind LSP is that, for any class, a client should be able to use any of its subtypes indistinguishably, without even noticing, and therefore without compromising the expected behavior at runtime. That means that clients are completely isolated and unaware of changes in the class hierarchy.
More formally, this is the original definition (LISKOV 01) of LSP: if S is a subtype of T, then objects of type T may be replaced by objects of type S, without breaking the program.
This can be understood with the help of a generic diagram such as the following one. Imagine that there is some client class that requires (includes) objects of another type. Generally speaking, we will want this client to interact with objects of some type, namely, it will work through an interface.
Now, this type might as well be just a generic interface definition, an abstract class or an interface, not a class with the behavior itself. There may be several subclasses extending this type (described in Figure 1 with the nameSubtype, up toN). The idea behind this principle is that if the hierarchy is correctly implemented, the client class has to be able to work with instances of any of the subclasses without even noticing. These objects should be interchangeable, as Figure 1 shows:
Figure 1: A generic subtypes hierarchy
This is related to other design principles we have already visited, like designing for interfaces. A good class must define a clear and concise interface, and as long as subclasses honor that interface, the program will remain correct.
As a consequence of this, the principle also relates to the ideas behind designing by contract. There is a contract between a given type and a client. By following the rules of LSP, the design will make sure that subclasses respect the contracts as they are defined by parent classes.
Detecting LSP issues with tools
There are somescenarios so notoriously wrong with respect to the LSP that they can be easily identified by the tools such asmypyandpylint.
Using mypy to detect incorrect method signatures
By using typeannotations, throughout our code, and configuringmypy, we can quickly detect some basic errors early, and check basic compliance with LSP for free.
If one of the subclasses of theEventclass were to override a method in an incompatible fashion,mypywould notice this by inspecting the annotations:
When we runmypyon this file, we will get an error message saying the following:
error: Argument 1 of “meets_condition” incompatible with supertype “Event”
The violation to LSP is clear—since the derived class is using a type for theevent_dataparameter that is different from the one defined on the base class, we cannot expect them to work equally. Remember that, according to this principle, any caller of this hierarchy has to be able to work withEventorLoginEventtransparently, without noticing any difference. Interchanging objects of these two types should not make the application fail. Failure to do so would break the polymorphism on the hierarchy.
The same error would have occurred if the return type was changed for something other than aBooleanvalue. The rationale is that clients of this code are expecting a Boolean value to work with. If one of the derived classes changes this return type, it would be breaking the contract, and again, we cannot expect the program to continue working normally.
A quick note about types that are not the same but share a common interface: even though this is just a simple example to demonstrate the error, it is still true that both dictionaries and lists have something in common; they are both iterables. This means that in some cases, it might be valid to have a method that expects a dictionary and another one expecting to receive a list, as long as both treat the parameters through the iterable interface. Inthis case, the problemwould not lie in the logic itself (LSP might still apply), but in the definition of the types of the signature, which should read neitherlistnordict, but a union of both. Regardless of the case, something has to be modified, whether it is the code of the method, the entire design, or just the type annotations, but in no case should we silence the warning and ignore the error given bymypy.
Note: Do not ignore errors such as this by using # type: ignore or something similar. Refactor or change the code to solve the real problem. The tools are reporting an actual design flaw for a valid reason.
This principle also makes sense from an object-oriented design perspective. Remember that subclassing should create more specific types, but each subclass must be what the parent class declares. With the example from the previous section, the system monitor wants to be able to work with any of the event types interchangeably. But each of these event types is an event (aLoginEventmust be anEvent, and so must the rest of the subclasses). If any of these objects break the hierarchy by not implementing a message from the baseEventclass, implementing another public method not declared in this one, or changing the signature of the methods, then theidentify_eventmethod might no longer work.
Detecting incompatible signatures with pylint
Another strongviolation of LSP is when, instead of varying the types of the parameters on the hierarchy, the signatures of the methods differ completely. This might seem like quite a blunder, but detecting it might not always be so easy to remember; Python is interpreted, so there is no compiler to detect these types of errors early on, and therefore they will not be caught until runtime. Luckily, we have static code analyzers such asmypyandpylintto catch errors such as this one early on.
Whilemypywill also catch these types of errors, it is a good idea to also runpylintto gain more insight.
In the presence of a class that breaks the compatibility defined by the hierarchy (for example, by changing the signature of the method, adding an extra parameter, and so on) such as the following:
pylintwill detect it, printing an informative error:
Parameters differ from overridden ‘meets_condition’ method (arguments-differ)
Once again, like in the previous case, do not suppress these errors. Pay attention to the warnings and errors the tools give and adapt the code accordingly.
Remarks on the LSP
The LSP is fundamental togood object-oriented software design because it emphasizes one of its core traits—polymorphism. It is about creating correct hierarchies so that classes derived from a base one are polymorphic along the parent one, with respect to the methods on their interface.
It is also interesting to notice how this principle relates to the previous one—if we attempt to extend a class with a new one that is incompatible, it will fail, the contract with the client will be broken, and as a result such an extension will not be possible (or, to make it possible, we would have to break the other end of the principle and modify code in the client that should be closed for modification, which is completely undesirable and unacceptable).
Carefully thinking about new classes in the way that LSP suggests helps us to extend the hierarchy correctly. We could then say that LSP contributes to the OCP.
The SOLID principles are key guidelines for good object-oriented software design. Learn more about SOLID principles and clean coding with the book Clean Code in Python, Second Edition by Mariano Anaya.
The search only for documents is outdated. Users who have already adopted a question-answering (QA) approach with their personal devices, e.g., those powered by Alexa, Google Assistant, Siri, etc., are also appreciating the advantages of using a “search engine” with the same approach in a business context. Doing so allows them to not only search for documents, but also obtain precise answers to specific questions. QA systems respond to questions that someone can ask in natural language. This technology is already widely adopted and now rapidly gaining importance in the business environment, where the most obvious added value of a conversational AI platform is improving the customer experience.
Another key tangible benefit is the increased operational efficiency gained by reducing call center costs and increasing sales transactions. More recently we have seen a strong developing interest in in-house use cases, e.g., for IT service desk and HR functions. What if you didn’t have to painstakingly sift through your spreadsheets and documents to extract the relevant facts, but instead could just enter your questions into your trusty search field?
This is optimal from the user’s point of view, but transforming business data into knowledge is not trivial. It is a matter of linking and making all the relevant data available in such a way that all employees—not just experts—can quickly find the answers they urgently need within whichever business processes they find themselves.
With the power of knowledge graphs at one’s disposal, enterprise data can be efficiently prepared in such a way that it can be mapped to natural language questions. That might sound like magic, but it’s not. It is actually a well-established method to successfully roll out AI applications like QA systems in numerous industries.
Where do Current Question-Answering Methods Fall Short?
The use of semantic knowledge graphs supports a game-changing methodology to construct working QA engines, especially when domain-specific systems are to be built. Current QA technologies are based on intent detection, i.e., the incoming question must be mapped to some predefined intents. A common example of this is an FAQ scenario, where the incoming question is mapped to one of the frequently asked questions. This works well in some cases, but is not well suited to access large, structured datasets. That is because when accessing structured data, it is necessary to recognize domain-specific named entities and relations.
In these situations, intent detection technology requires a lot of training data and struggles to provide satisfactory results. We are exploiting a different technology based on semantic parsing, i.e., the question is broken down into its fundamental components, e.g., entities, relations, classes, etc., to infer a complete interpretation of the question. This interpretation is then used to retrieve the answer from the knowledge graph. What are the advantages?
You do not need special configuration files for your QA engine—everything is encoded within the data itself, i.e., in the knowledge graph. By doing so you automatically increase the quality of your data, with benefits for your organization and for applications using this data.
Contemporary QA engines frequently struggle with multilingual environments because they are typically optimized for a single language. With knowledge graphs in place, the expansion to additional languages can be established with relatively little effort, since concepts and things are processed in their core instead of simple terms and strings.
This technology scales, so it will not make a difference if you have 100 entities or millions of entities in your knowledge graph.
Lastly, you do not need to create a large training data corpus before setting up your engine. The data itself suffices and you can fine-tune the system as you go with little additional training data!
Building QA engines on knowledge graphs: an example from HR
What follows is a step-by-step outline of a methodology using a typical human resources (HR) use case as a running example.
Step 1: Gather your datasets In this step, business users define the requirements and identify the data sources for the enterprise’s knowledge. After collecting structured, semi-structured and unstructured data in different formats, you will be able to produce a data catalog that will serve as the basis for your enterprise knowledge graph (EKG).
Step 2: Create a semantic model of your data Here your subject matter experts and business analysts will define the semantic objects and design the semantic schemes of the EKG, which will result in a set of ontologies, taxonomies, and vocabularies that precisely describe your domain.
Step 3: Semantify your data Create pipelines to automatically extract and semantify your data, i.e., annotate and extract knowledge from your data sources based on the semantic model that describes your domain. This is performed by data engineers who automate the ingestion and normalization of data from structured sources, as well as automate the analysis of unstructured content using NLP tools in order to populate the EKG using the semantic model provided. The resulting enriched EKG continuously improves as new data is added. The result of this step is the initial version of your EKG.
Step 4: Harmonize and interlink your data After the previous step, your data is represented as things rather than strings. Each object gets a unique URI for links between entities and datasets to be established. This is facilitated by the use of ontologies and vocabularies, which, in addition to mapping rules, allow interlinking to external sources. During this stage, data engineers establish new relations in the EKG using logical inference, graph analysis or link discovery—altogether enriching and further extending the EKG. The result of this process is an extension of your EKG that is eventually stored in a graph database which provides interfaces for accessing and querying the data. Step 5: Feed the QA system with data Allowing to ask questions on top of a EKG requires that (a) the data is indexed and (b) ML models are available to understand the questions. Both steps are fully automated in QAnswer. The EKG data is automatically indexed, and pretrained ML models are already provided so that you can start asking questions on top of your data right away.
Step 6: Provide feedback to the QA system Improving the quality of the answers is done in the following two steps (6 and 7). The business user and a knowledge engineer are responsible for tuning the system together. The business user expresses common user requests and the knowledge engineer checks if the system returns the expected answers. Depending on the outcome, either the EKG is adapted (following Step 2-4) or the system is retrained to learn the corresponding type(s) of questions. The user can provide feedback to the provided answer either by stating whether it is correct or not or by selecting the right query from a list of suggested SPARQL queries:
Step 7: Train the QA system New ML models are generated automatically based on the training data provided in step 6. The system adapts to the type of data that has been put into the EKG and the type of questions that are important for your business. The provided feedback improves the ML model in order to increase the accuracy of the QA system and the confidence of the provided answers:
Step 8: Gain immediate insight into your knowledge With the HR dataset now at your fingertips, you can ask questions like the following: Who are my employees? What languages do my staff speak? Who knows Javascript? Who has experience as Project Leader? Who can program in Java and knows MySQL? Who speaks English and Chinese? Who knows both Java and SPARQL? What is the salary range of my employees? How many people can code in Java and Javascript? What is the average salary of a C++ programmer? Who is the top paid employee?
Looking to the future
In order to have a conversation with your Excel files and the rest of the disparate data that has accumulated over the years, you will need to begin by breaking up the data silos in your organization. While the EKG will help you dismantle the data silos, the Semantic Data Fabric solution allows you to prepare the organization’s data for question answering. This approach combines the advantages of Data Warehouses and Data Lakes and complements them with new components and methodologies based on Semantic Graph Technologies.
A lot of doors will open for your company by combining EKGs and QA technologies, and several domain-specific applications that allow organizations to quickly and intuitively access internal information can also be built on top of our solution.
One of the challenges we address is the difficulty of accessing internal information fast, intuitively and with confidence. People can find and gather useful information as they normally would when asking a human—in natural language. The capabilities of the technology we have presented in this article go well beyond what can be achieved with today’s mainstream voice assistants. This new direction offers organizations a significant opportunity to simplify human-machine interaction and profit from the improved access to the organizations’ knowledge while also offering new, innovative and useful services to their customers.
The future of question-answering systems is in leveraging knowledge graphs to make them smarter.
Last week, I taught a cybersecurity course at the University of Oxford case. I created a case study for my class based on an excellent recent paper: Deep Learning-Based Autonomous Driving Systems: A Survey of Attacks and Defences (link below)
This paper is unique because it discussed emerging cyber security threats and their mitigation using artificial intelligence in context of advanced autonomous
driving systems (ADSs). I felt that this is significant because typically the problem domain of AI and cybersecurity is mostly an Anomaly detection or a Signature detection problem. Also, most of the times, cybersecurity professionals use specific tools such as splunk or darktrace(which we cover in our course) – but these threats and their mitigations are very new. Hence, they need exploring from first principles/research. Thus, we can cover newer threats such as adversarial attacks(making modifications to input data to force machine-learning algorithms to behave in ways they’re not supposed to). By considering a complex and emerging problem domain like ADASS we can discuss many more emerging problems which we have yet to encounter at scale.
A deep learning-based ADS is normally composed of three functional layers, including a sensing layer, a perception layer and a decision layer, as well as an additional cloud
service layer.
The sensing layer: comprises heterogeneous sensors such as GPS, camera, LiDAR, radar and ultrasonic sensors are used to collect real-time ambient information including the current position and spatial-temporal data (e.g. time series image frames).
The perception layer contains deep learning models to analyze the data collected by the sensing layer and then extract useful environmental information from the raw data for further process.
The decision layer acts as a decision-making unit to output instructions concerning the change of speed and steering angle based on the extracted information from
the perception layer.
The perception layer includes functions like Localization, Road object detection and semantic segmentation which uses a variety of deep learning algorithms. The cloud service provides compute intensive resources such as preroute planning and enhance the perception of the surrounding environment. The decision layer includes functions like Path planning and object trajectory prediction; Vehicle control via deep reinforcement learning;
End-to-End driving:
These are depicted below
Based on this, the paper explores the below
ATTACKS IN ADSS
Physical attacks on sensors
Jamming attack, Spoofing attack
Cyberattacks on cloud services
Adversarial attacks on deep learning models in perception and decision layers
DEFENCE METHODS
Defence against physical sensor attacks
Defence for cloud services
Defence against adversarial evasion attacks( Proactive defences, Reactive defence)
Fixed point strategies can approximate infinite depth.
The methods are easy to train/implement.
This essential set of tools can model and analyze a wide range of DS problems.
Fixed point theory, which first developed about 60 years ago, is directly connected to limits and traditional control and optimization [1]. These methods are ideal for finding solutions to a broad range of phenomena that crop up in large-scale optimization and highly structured data problems. They work for problems formulated as minimization problems, or more general forms like Nash equilibria [no term] or nonlinear operator equations.
Compared to traditional models, fixed point methods are in their infancy. However, there’s a lot of research suggesting that these algorithms may be the future of data science.
How do Fixed Point Methods Work?
Fixed point theory works in a similar way to optimization and is related to the idea of a limit in calculus: at some point in the process, you get “close enough” to a solution: one that’s good enough for your purposes. When it’s not possible to find an exact solution, or an exact answer isn’t needed, a fixed-point algorithm can give an approximate answer.
As a simple example, your company might want to create a model for the maximum amount of money a U.S. citizen is willing to spend on new household gadgets per year. An exact solution would depend on many factors, including the fickle nature of consumerism, changing tastes and the effect of climate change on purchasing decisions. It would be difficult (if not impossible) to find an exact solution ($561.23? $981.65?). But there’s going to be a cap, or a limit, which the amount spent tends towards: possibly $570 per annum, possibly $1,000.
You could attempt to find the solution to a large-scale optimization problem like this one with traditional methods—if you have the computer resources to take on the challenge. In some cases, even the most powerful computer may not be able to hand the computations, which is where fixed point theory steps in.
Advantages of Fixed-Point Methods
Fixed point methods have several major advantages over traditional methods. They create a more efficient framework for implicit depth without requiring more memory or increasing the computational costs of training [2]. On the algorithmic front, fixed point strategies include powerful convergence principles which simplify the design and analysis of iterative methods. In addition, block-coordinate or block-iterative strategies reduce an iteration’s computational load and memory requirements [3].
Google research scientist Zelda Mariet and MIT professor Suvrit Sra approached the problem of maximum-likelihood estimation [no term] by comparing performance of the EM algorithm against a novel fixed-point iteration [4]. When the authors compared performance on both synthetic and real-world data, they found that their fixed-point method gave shorter runtimes when handling large matrices and training sets. The fixed-point approach also ran “remarkably faster” for a range of ground set sizes and number of samples. Not only was it faster than the EM algorithm, but it was also remarkably simple to implement.
The Future of Deep Learning?
One of the major problems with the creation of deep learning models is that the deeper and more expressible a model becomes, the more memory is required. In a practical sense, the amount of computer memory is limited by model depth. A workaround is implicit depth methods, but these come with the burden of more computational cost to train networks. At a certain point, some problems simply become too complex to be solve using traditional methods. As we go on to the future, models are destined to become more complex, which means we must find better ways to arrive at solutions.
When finding an exact solution isn’t possible because of computational limits, many problems can be formulated in terms of fixed-point optimization schemes. These schemes, applied to standard models, guarantee the convergence of a solution to the fixed point limit.
References:
Image: mikemacmarketing / photo on flickr, CC BY 2.0 , via Wikimedia Commons
Bigger is not always better for machine learning. Yet, deep learning models and the datasets on which they’re trained keep expanding, as researchers race to outdo one another while chasing state-of-the-art benchmarks. However groundbreaking they are, the consequences of bigger models are severe for both budgets and the environment alike. For example, GPT-3, this summer’s massive, buzzworthy model for natural language processing, reportedlycost $12 millionto train. What’s worse, UMass Amherstresearchers foundthat the computing power required to train a large AI model can produce over 600,000 pounds of CO2 emissions – that’s five times the amount of the typical car over its lifespan.
At the pace the machine learning industry is moving today, there are no signs of these compute-intensive efforts slowing down.Research from OpenAIshowed that between 2012 and 2018, computing power for deep learning models grew a shocking 300,000x, outpacing Moore’s Law. The problem lies not only in training these algorithms, but also running them in production, or the inference phase. For many teams, practical use of deep learning models remains out of reach, due to sheer cost and resource constraints.
Luckily, researchers have found a number of new ways to shrink deep learning models and optimize training datasets via smarter algorithms, so that models can run faster in production with less computing power. There’s even an entire industry summit dedicated to low-power, ortiny machine learning. Pruning, quantization, and transfer learning are three specific techniques that could democratize machine learning for organizations who don’t have millions of dollars to invest in moving models to production. This is especially important for “edge” use cases, where larger, specialized AI hardware is physically impractical.
The first technique, pruning, has become a popular research topic in the past few years. Highly cited papers includingDeep Compressionand theLottery Ticket Hypothesisshowed that it’s possible to remove some of the unneeded connections among the “neurons” in a neural network without losing accuracy – effectively making the model much smaller and easier to run on a resource-constrained device.Newer papershave further tested and refined earlier techniques to develop smaller models that achieve even greater speeds and accuracy levels. For some models, likeResNet,it’s possible to prune them by approximately 90 percent without impacting accuracy.
A second optimization technique, quantization, is also gaining popularity.Quantizationcovers a lot of different techniques to convert larger input values to smaller output values. In other words, running a neural network on hardware can result in millions of multiplication and addition operations. Reducing the complexity of these mathematical operations can help to shrink memory requirements and computational costs, resulting in big performance gains.
Finally, while this isn’t a model-shrinking technique,transfer learningcan help in situations where there’s limited data on which to train a new model. Transfer learning uses pre-trained models as a starting point. The model’s knowledge can be “transferred” to a new task using a limited dataset, without having to retrain the original model from scratch. This is an important way to reduce the compute power, energy and money required to train new models.
The key takeaway is that models can (and should) be optimized whenever possible to operate with less computing power. Finding ways to reduce model size and related computing power – without sacrificing performance or accuracy – will be the next great unlock for machine learning.
When more people can run deep learning models in production at lower cost, we’ll truly be able to see new and innovative applications in the real world. These applications can run anywhere – even on the tiniest of devices – at the speed and accuracy needed to make split-second decisions. Perhaps the best effect of smaller models is that the entire industry can lower its environmental impact, instead of increasing it 300,000 times every six years.
Image recognition technology has transformed the way visual data is pooled and processed. It offers opportunities similar to the ones portrayed in science fiction movies that make the imagination run wild. Faster detection of objects in real-time with assured accuracy, impressive face recognition mechanics, and improved augmented reality—all are made possible with image recognition, powered by machine learning.
Putting it simply,image annotation for machine learningbrings in unique capabilities for a wide range of businesses irrespective of the industry verticals they deal in. Startups to MNCs are leveraging image annotation services to decode the true value of image data. Take a look at some of the amazing use cases of image recognition as elucidated here:
1. Product Discoverability with Visual Search
One of the great applications of image recognition is visual search as it empowers the users to search for similar products via a reference image. Online retailers dealing in verticals such as fashion, home décor, furniture, etc. can implement image-based search features in their applications and software systems. This not only results in enhanced product discovery but allows them to deliver a seamless digital shopping experience. It offers product recommendations based on actual similarity, increases the conversion rate, and decreases shopping cart abandonment.
2. Face Recognition on Social Media
Though face recognition is a sensitive ground, yet it is integrated by platforms such as Facebook, Instagram, Snapchat, etc. to improve user’s experience. Objects and scenes in the photo uploaded are recognized way before the user enters the description. Computer vision can differentiate between facial expressions, natural landscapes, sports, and food, among others. Likewise, it is used to identify inappropriate or objectionable content. Besides, photo recognition is also embraced by other image-centric products including Apple’s photo app cluster and Google Photos. Users can organize their pictures in meaningful series. It is also helpful in translating the visual content for blind users, thus enabling companies to achieve enhanced accessibility standards.
3. Stock Imagery Websites
Image recognition speeds up millions of searches on various stock websites daily. Content contributors have to tag large volumes of visual material with proper keywords for indexing; otherwise, it cannot be discovered by buyers. Professionalimage annotation servicesthus help the stock contributors in attributing most appropriate keywords, tags and descriptions relevant to the image. They can also propose relevant keywords after analyzing visual assets, consequently reducing the time needed to process the material.
4. Creative Campaigns and Interactive Marketing
Advertising and marketing agencies are exploring the possibilities of image recognition for interactive and creative campaigns. It opens new prospects for the digital marketers to learn more about their potential customers by following their social media conversations and serve them with impressive content. Extracting useful information from huge volumes of visual content is possible only through machine learning. For example, use data from an image posted by the user can be gauged out using OCR.
Not only this, businesses can also craft engaging content that helps in building deeper relationships with brands. Take, for instance, image recognition can identify visual brand mentions as well as emotions expressed towards it and its logo. Based on the information collected after analyzing images, marketers can optimize their campaigns and offer personalized services.
5. Augmented Reality Gaming and Applications
The gaming arena strategically combines augmented reality with image recognition technology to their advantage. Developers use this to create real-life gaming characters and environments. It holds the key to generating new experiences and user interfaces. Besides, the combination of this technology with in-app purchasing and geo-targeting has paved way for AdWords-sized as well as off-device business opportunities.
Wrapping Up
Image recognition clubbed with machine learning holds the potential to transform businesses. Engaging professional services enables them to expand paradigms by harnessing the true potential of visual data and making most of it. They not only gain a competitive edge but can quickly respond and adapt to the changing market environments, thus facilitating a rare win-win case.
The insurance industry is a late bloomer in adopting cutting-edge technologies.
However, the rapid growth in the new-age technologies, such as AI, ML, blockchain, big data, cloud computing, IoT, is driving a shift in the insurance industry. The insurers are making strategies to enable the digital transformation of their business.
Nearly86% of insurersbelieve that innovation must happen at a rapid pace to retain a competitive edge, as per a recent Accenture report.
The investment in new technologies is making the insurance industry more effective and far-reaching.
Here are the five ways technology is reshaping the insurance industry:
1- More Accurate Underwriting
Cloud computing integrates various data resources, enabling insurance companies to implement intelligent operations in customer marketing, product development, risk pricing, underwriting, and claims.
For underwriting, AI applications help determine and record the authenticity of the information provided by customers. For example, documents, recordings, and images.
Thus, this helps to speed up operations and mitigate the risk of insurance fraud. Using these technologies helps insurance companies to conduct underwriting processes in real-time. It also helps to effectively reject certain high-risk applications and reduce the loss ratio.
For example, car insurance can be transformed by connected devices such as telematics. It helps transfer important data to assess customers’ risk profiles.
It allows insurers to obtain real-time data on their customers’ driving habits, such as abrupt turns or stops made, speed, or location. Thus, these details enable insurance firms to make more informed underwriting decisions and provide policies accordingly.
2- Better Customer Experience
Insurers better understand their consumer needs by activating and collecting the right data from IoT. It enables them to offer customized advice, coverage, and tailored pricing.
These technologies help customers compare products,such as car insurance rates, review, and find plans that match personal requirements.
For example, usage-based insurance policies incorporate customer data to charge customers as per their specific needs and behaviors. Thus, it makes the consumer in charge of their own fees.
Such clever and personalization data usage benefits customers as well as insurers.
On the one hand, it improves user satisfaction by providing tailored products. On the other hand, it provides more accurate risk assessment and stable margins to companies.
Another benefit of adopting digital strategies by insurers is to enable customers to fill and submit claims digitally.
Around61% of customersprefer to monitor their application status with digital tools.
Besides, insurance companies are adopting API and RPA, such as chatbots, mobile technologies, and voice recognition algorithms. It improves customer interactions and boosts data-harvesting capabilities.
3- Using Technology to Assess Damage Faster
An insurance policy is a hedge against a variety of issues. It may include a big car accident, loss of property, or a fire in a luxury house. In such cases, insurance companies first investigate the truthfulness of the claim and then credit the claim amount.
This is a time taking process as it involves reviewing the claim, investigating, making subsequent adjustments, and remitting payment or cancelling the claim.
Therefore, deploying AI and ML software can make the claims process simple, faster, and more effective. Machine learning algorithms can calculate damage using satellite images and drones, eliminating the human factor and significantly reducing time and cost.
As pera study by McKinsey, by 2030 AI will overtake all aspects of the insurance industry.
With the rise of intelligent machines, bio-sensors, and deep-learning algorithms in ordinary objects, the insurance sector is making a shift from pay for damage to prevent damage.
4- Identifying and Mitigating Fraud
Fraud is a great calamity for the insurance industry.
According to Coalition Against Insurance Fraud, US insurers lose at leastUSD 80 billion annually.
However, with fraud detection software, companies can identify and mitigate fraudulent activities.
Cloud technologies provide real-time information to insurance companies. This information supports the insurer to deal with duplicate claims, fake diagnoses, inflated claims, overpayments, or any internal employee scams.
For example, a client tries to recover from the same property fire by forged documents with a changed date. In this case, the technology will compare the claim data with the database and identify the fraud.
5- Improved Cybersecurity
Insurance companies have access to highly sensitive customer information, making themprone to cyber-attacks.
Close totwo-thirds of insurersacross all regions who participated in a study by Deloitte are looking to increase spending on cybersecurity.
Insurers are considering implementing “zero trusts” principles. It means imposing verification requirements on anyone seeking access to data or systems.
The focus is to invest in endpoint protection technologies to exert greater control over end-user devices.
Predictive analytics software is useful for detecting malware and suspicious network behavior. The ML models are built on a large sequence of user’s activities within a network. These activities are labeled as acceptable or normal to gain a sense of regular activity.
Final Thoughts
The insurance industry has a strong demand for investment in technology and innovative processes. AI, IoT, Blockchain, API, wearables, and Telematics are emerging technology trends to boost operational efficiency and stay ahead of the competition.
These technologies allow insurance companies to offer personalized solutions for customers, prevent risk, and improve fraud detection. They also help companies to track customer behavior and open the path to new business models.
The rise of IoT has offered telecom operators another chance to spring into action. If the telecom industry makes the most of IoT, availing the USP that edge computing and 5G offers them, they can have huge potential profits of more than $600 billion by 2022, as per Accenture.
Connectivity Decides the Telcos Success Ahead
They excellently use the IoT services backed up by amazing connectivity. The telecommunication industry needs to invest in its strong core –Connectivity– to gain the higher IoT value stack. However, that is to be figured out how exactly the industry is going to advantage from IoT utilization.
Read on to understand how the telecom industry utilizes IoT to meet its business needs. Also, we’ll see what’s in the future for telecom operators when utilizing IoT efficiently.
IoT in Telecommunications delivers higher safety at remote sites, better equipment monitoring, and more logical business analytics.
Existing Challenges Faced when Implemented IoT in Telecommunications
The Internet of Things is at a tipping point – on one side the progressing technology has made manufacturing smart devices easier than ever before (imperative for the business community). On the other side, there are some unavoidable challenges that businesses have to face when implementing IoT in the telecom industry.
The challenges faced by the industry are related to power supply, progressing architecture, IoT complexities, privacy, and enabling complex sensing circumstances.
These challenges are still repairable with the help of different wireless technologies, including Wi-Fi, Radio Frequency Identification (RFID), Bluetooth, and Near-field Communication (NFC). Besides, the existing Wi-Fi networks need to be improved to gain a larger coverage.
Moreover, it is essential to be knowledgeable about the confirmation of the communication pathway of the Internet of Things to further understand how information is exchanged within the IoT. The Internet of Things relies on different protocols and techniques to disperse information. Majorly, telcos need the support of Device-to-Server, Device-to-Device, and Server-to-Server communication systems to share the information within the Internet of Things.
However, even after settling all these challenges, telcos face other major challenges, which are as below:
Improved Performance, Availability, Higher Reliability, Complete Privacy, Absolute Scalability, Interoperability, Compatibility, Extensive Security, Investment, Mobility, and Big IoT Data.
1. AVAILABILITY
Usually, IoT is utilized to facilitate information anywhere at any time. It depends on what the user requests. Availability, therefore, is a critical issue for the IoT, it requires the high availability guarantee of all the physician devices used. Even IoT apps need to be highly available. The solution to this issue is to maintain the programs and hardware devices that are not in use so that they can be used to balance the load when a failure occurs.
Generally, these devices or hardware are redundant, even though it increases the complexity of the entire process. Hence, to achieve availability, the best solution is to use these redundant devices.
The redundancy is of two types:
Active (Doesn’t perform up to the mark)
Passive (Activated when primary components fail; sleeps at other times)
IoT involves the usage of multiple technologies. Hence, its performance can’t be judged by just using a single device. IoT’s performance is even dependent on some other factors:
✅ Huge Amount of Data
✅ Extreme Reliance on Cloud
✅ Network Traffic
Telcos need to figure out how to overcome these challenges of bringing IoT in Telecommunications so that they can deliver the required availability to the users.
2. RELIABILITY
Another important aspect of IoT in Telecommunications in the future and even now is to offer apt reliability to the users. And, it won’t come just by sending reliable information, but by adapting to the progressing environmental conditions. No matter what aspect of IoT you are dealing with (hardware or software), there should be guaranteed reliability.
3. PRIVACY & SECURITY
Another challenge for IoT in Telecommunications is to maintain the utmost security and privacy. The limited storage capacity of memory cards in IoT allows only small amounts of data to be stored. The remaining data is stored on other sites remotely. In the latter, users are not very comfortable with disclosing their information to others. Hence, telcos need to secure this data for maintaining the privacy of the users.
Privacy, trusted communication, digital forging are rarely addressed issues in the IoT. Since security has become a major concern when it comes to users’ data, IoT’s non-reliance on common security standards and architecture poses a big challenge for telcos to maintain utmost secrecy for the same.
4. INTEROPERABILITY
In IoT, different devices are connected. Hence, interoperability is a necessity, irrespective of the device type. IoT in Telecommunications must deliver services equally to all the devices connected. This challenge can be overcome by adhering to standardized protocols. Ambiguous interpretations of the same protocol make it tough to achieve interoperability. However, if these ambiguities are avoided, interoperability can be achieved in IoT.
5. BIG IoT DATA
IoT, indeed, is one of the major sources of gathering huge data. The future of telecom involves connecting billions of devices. That eventually leads to creating extensive big data production. Managing this data (accessing, processing, and storing) needs highly scalable computing platforms, which does not have any impact on the performance of the app. And, that is a big challenge for telcos.
6. COMPATIBILITY
The IoT-based smart environment faces another challenge called Compatibility. Proper compatibility is to be maintained between different products that are constantly connected. Most of the devices misbehave and create compatibility concerns, as there is no availability of a universal language for these devices. Hence, different industries need to collaborate to deliver the utmost compatibility. Otherwise, the co compatibility issues will exist.
Considering these challenges, there is a lot to be done with the implementation of IoT in telecom. However many of these concerns seem to have been sorted for the industry.
Here are some distinct ways IoT is leveraged by telcos:
IoT allows telecom totrack and trace all data and informationprovided to consumers. The information is generally regarding products and services offered to the customers. Telcos rely on this information to further identify any issues or glitches in the network and all the hindrances that are not allowing them to deliver the best communication services.
IoT in Telecommunications helps telcosconduct a performance evaluationof products. Once the products are deployed and data is collected using pre-integrated sensors, the performance of the telecommunication is checked.
Telecommunication industries can combat the challenge of maintaining security by installingIoT-powered beacons and camerasto deploy security.
Better interfaces can be enjoyed by customers ifIoT-integrated telecom serviceshelp in creating a stronger connection with customers’ apps used on tabs and phones.
5G in mobilescreates a smooth communication between autonomous vehicles via IoT-powered sensor fusions. IoT in Telecommunications is a must for a smoother communication system.
For better machine-to-machine communication,IoT blockchain systemscan be highly utilized.
IoT sensorsallow telcos to monitor the device performance installed at various places, including factories, workstations, and warehouses.
IoT in Telecommunications helps companies to buildimproved predictive analytics models, which assists in generating analytics that achieve expected results.
With IoT in Telecommunications, companies can deliverimproved location-based servicesby using the proximity sensors in the devices.
IoT brings excellent benefits to telecom operators. But, as the IoT is still in its evolving stage, more issues need to be addressed and sorted out. Seamless communication is built when
IoT in Telecommunications – The Accurate Utilization
The telecom operators deliver a collection of services/ products using IoT to bring additional value to their already existing networks. 5G is maturing and the 4G to 5G transformation is still in progress. Amidst this progression, the added value to IoT becomes more valuable for Telecom companies. IoT in Telecommunications builds platforms for organizations to develop their own IoT services and products.
The First Phase: EnhancingConnectivity
Currently, telcos are in the best position to transform themselves for the better. Thanks to IoT in Telecommunications that offers reliable and safe connectivity nationwide. The future is about these telcos leaders customizing their technology assets to deliver connectivity platforms backed up by IoT. Telcos will need highly reliable computing power to uncover the actual capabilities of modern IoT solutions.
The Second Phase:Building an Effective Ecosystem
With IoT in Telecommunications, the future is about a better ecosystem – as the current telecommunication business model will vanish in the coming five years, ecosystem being the main factor why this change will be needed. However, telecom is amongst those industries that will still drive profits in the future through the ecosystem amalgamated with other valuable factors, including talent, technology, and business vision.
Again, the focus will be on ‘connectivity’ even while building a better ecosystem for telcos. This ecosystem of partners will be to capture more lucrative opportunities. In short, to be a successful business in the future, telcos will have to first establish themselves as a genuine IoT provider and then partner up with other industry verticals to take advantage of their expertise.
Once mastering the single verticals, should telcos proceed to co-creating complicated use cases that involve multiple verticals? The innovation and new value streams will be witnessed when these multiple verticals are created.
Eventually, to become a reliable industry partner that manages multiple verticals, it is required to become a master of their innate strength, i.e. ‘the connectivity.’
The Ultimate Phase:Ruling IoT Readiness
Becoming the ultimate master of IoT agility is the endmost path. To figure out the exponential value, the telecom industry will have to shift from mastering single verticals to co-partnering with multiple industries.
Magnify Core Connectivity – To Win an IoT that is built on Trust
The key to bringing agility in work and mindset has to be developed. Stepping into a zone with faster-than-ever innovation cycles will need new and innovative methods of working. So, how do you fix those challenges and skill gaps to achieve steadiness in the business? Well, you will need to invest in modern technologies and quicker connectivity solutions to win the game. Besides, transfer your core expertise into an IoT business.
Build Trust, Serve Single Verticals First
Focus on gaining the trust of one industry at a time. Combine platform players and device manufacturers to build trust with your partners. Allow your partners to deliver enterprise-grade solutions while you deliver a secure and reliable IoT core network. That’s how you build trust amongst partners.
Move to Multiple Verticals
The ultimate step is beyond connectivity. When you have mastered connectivity, become a matchmaker for industry players who want to monetize their data. It could even be a platform where developers and businesses connect.
The Opportunities Companies can Leverage with IoT in Telecommunications
Telecoms can magnify the service area with multiple IoT-enabled services, like smart retail, smart homes, vehicle tracking, and much more. Telecommunication companies can build IoT platforms for their respective customers, allowing them to connect centralized control and devices to run essential IoT apps. Hence, with IoT in Telecommunications, these IoT-powered platforms need to be built using a modular architecture that features authentic APIs.
What will this modular architecture featuring reliable APIs deliver?
Backend Solutions
The backend offered by the telecommunication companies, delivers the ability to store, manage, and process IoT- generated data. Using the back-end solutions, customers can easily integrate them into their already existing apps.
Managed Connectivity Services
Telcos support the IoT infrastructure of their customers by offering managed connectivity services that heavily rely on Narrowband IoT. The data generated by IoT devices is saved and managed on the customer’s side.
Data Analytics Services
To deliver customers the value from the data generated by IoT devices, telcos offer predictive, diagnostics, and prescriptive analytics services to their valued customers.
IoT Data Storage Services
Business apps run on the customer side, hence, telcos take care of IoT-generated data by saving, storing, scanning, cleaning, and processing it for their customers.
The IoT in the Telecommunications industry helps their customers get complete access to SaaS tools for solving their business problems – and the solutions delivered are quite problem-focused in a particular space.
The Future of IoT in Telecommunications is Green
As new and innovative concepts are amalgamating with existing technologies, IoT still has the chance to grow. It is still evolving. The Green IoT has the potential to change the future environment, which will be more green, healthy, and economical. We can expect green communication and networking, green IoT services, and green design and implementation in the future.
Although telecom is facing certain challenges at the moment, with advancing IoT, it seems to be vanishing soon, allowing telecom to focus clearly on maintaining and building better ‘connectivity’ to impress the end-users.
The usage of IoT with a smart environment gives amazing opportunities to telecom:
Real-time information
New and innovative business models
Smart Operations
Flexible, secure, cost-effective cloud-based apps
2. Green IoT
The increasing awareness of all environmental concerns nationwide has led to the introduction of Green IoT. It included using technologies that help in building a healthy IoT environment. The aim is to help users collect, store, and access information using the storage and facilities available.
Conclusion:
IoT in Telecommunications is a boon for industries operating in this sector. However, it is high time to beat the existing and ongoing telecoms challenges and concerns. IoT in telecommunication can help expand and reach out for more innovative solutions to gain that competitive advantage in the market.
As Covid-19 continues to shape the global economy, analytics and business intelligence (BI) projects can help organisations prepare and implement strategies to navigate the crisis. According to the Covid-19 Impact Survey by Dresner Advisory Services, most respondents believe thatdata-driven decision-makingis crucial to survive and thrive during the pandemic and beyond. This article provides a step-by-step overview of the typical data science project life cycle, including some best practices and expert advice.
Results of a survey by O’Reilly show that enterprises stabilise their adoption patterns for artificial intelligence (AI) across a wide variety of functional areas.
The same survey shows that 53% of enterprises using AI today recognise unexpected outcomes and predictions as the greatest risk when building and deployingmachine learning (ML) models.
Being an executive person driving and overseeing data science adoption in your organisation, what can you do to achieve a reliable outcome of your data modelling project while getting the best ROI and mitigating security risks at the same time?
The answer lies in thorough project planning and expert execution at every stage of the data science project life cycle. Whether you use your in-house resources or outsource your project to an external team of data scientists, you should:
Define a business need or a problem that can be solved by data modelling
Have an understanding of the scope of work that lies ahead
Here’s our rundown of a data science project life cycle, including the six main steps of the cross-industry standard process for data mining (CRISP-DM) and additional steps from data science solutions that are essential parts of every data science project. This roadmap is based on decades of experience in delivering data modelling and analysis solutions for a range of business domains, including e-commerce, retail, fashion and finance. It will help you avoid critical mistakes from the start and ensure smooth rollout and model deployment down the line.
A typical data science project life cycle step by step
1. Ideation and initial planning
Without a valid idea and a comprehensive plan in place, it is difficult to align your model with your business needs and project goals to judge all of its strengths, its scope and the challenges involved. First, you need to understand what business problems and requirements you have and how they can be addressed with a data science solution.
At this stage, we often recommend that businesses run a feasibility study – exhaustive research that allows you to define your goals for a solution and then build the team best equipped to deliver it. There are usually several other software development life cycle (SDLC) steps that will run in parallel with data modelling, including solution design, software development, testing,DevOps activities and more. The planning stage is to ensure you have all required roles and skills in your team to make the project run smoothly through all of its stages, meet its purpose and achieve its desired progress within the given time limit.
2. Side SDLC activities: design, software development and testing
As you kick off your data analysis and modelling project, several other activities usually run in parallel as parts of the SDLC. These include product design, software development, quality assurance activities and more. Here, team collaboration and alignment are key to project success.
For your model to be deployed as a ready-to-use solution, you need to make sure that your team is aligned through all the software development stages. It’s essential for your data scientists to work closely with other development team members, especially with product designers and DevOps, to ensure your solution has an easy-to-use interface and that all of the features and functionality your data model provides are integrated there in the way that’s most convenient to the user. Your DevOps engineers will also play an important role in deciding how the model will be integrated within your real production environment, as it can be deployed as a microservice, which facilitates scaling, versioning and security.
When the product is subject to quality assurance activities, the model gets tested within the team’s staging environment and by the customer.
3. Business understanding: Identifying your problems and business needs, strategy and roadmap creation
The importance of understanding your business needs, and the availability and nature of data, can’t be underestimated. Every data science project should be ‘business first’, hence defining business problems and objectives from the outset.
And in the initial phase of a data science project, companies should also set the key performance indicators and criteria that will be indicative of project success. After defining your business objectives, you should assess the data you have at your disposal and what industry/market data is available and how usable it is.
Situational analysis. Experienced data scientists should be able to assess your current operational performance, then define any challenges, bottlenecks, priorities and opportunities.
Defining your ultimate goals. Undertake a rigorous analysis of how your business goals match the modelling approach and understand where the gaps in performance and technology are to define the next steps.
Building your data modelling strategy. When defining your strategy, two aspects are essential – your assets available and how well the potential strategy answers your business goals – before building business cases to kick start the process.
Creating a roadmap. After you have a strategy in place, you need to design a roadmap that encompasses programs that will help you reach your goals, what the key objectives are within each program and all necessary project milestones.
The most important task within the business understanding stage is to define whether the problem can be solved by the available or state-of-the-art modelling and analysis approaches. The second most important task is to understand the domain, which allows data scientists to define new model features, initiate model transformations and come up with improvement recommendations.
4. Data understanding: data acquisition and exploratory data analysis
The preceding stages were intended to help you define your criteria for data science project success. Having those available, your data science team will be able to prepare your data for analysis and recommend which data to use and how.
The better the data you use, the better your model is. So, an initial analysis of data should provide some guiding insights that will help set the tone for modelling and further analysis. Based on your business needs, your data scientists should understand how much data you need to build and train the model.
How can you tell good data from bad data? Data quality is imperative, but how are you to know if your information really isn’t up to the required standard? Here are some of the ‘red flags to watch out for:
It has missing variables and cannot be normalised to a unique basis.
The data has been collected from lots of very different sources. Information from third parties may come under this banner.
The data is not relevant to the subject of the algorithm. It might be useful, but not in this instance.
The data contains contradicting values. This could see the same values for opposing classes or a very broad variation inside one class.
Upon meeting any one of these red flags, there’s a chance that your data will need to be cleaned prior to your implementation of an ML algorithm.
Types of data that can be analysed include financial statements, customer and market demand data, supply chain and manufacturing data, text corpora video and audio, image datasets, as well as time series, logs and signals.
Some types of data are a lot more costly and time-consuming to collect and label properly than others; the process can take even longer than the modelling itself. So, you need to understand how much cost is involved, how much effort is needed and what outcome you can expect, as well as your potential ROI before you make a hefty investment in the project.
5. Data preparation and preprocessing
Once you’ve established your goals, gained a clear understanding of the data needed and acquired the data, you can move on to data preprocessing. The best method for this depends on the nature of the data you have: there are, for example, different time and cost ramifications for text and image data.
It’s a pivotal stage, and your data scientists need to tread carefully when they’re assessing data quality. If there are data values missing and your data scientists use a statistical approach to fill in the gaps, it could ultimately compromise the quality of your modelling results. Your data scientists should be able to evaluate data completeness and accuracy, spot noisy data and ask the right questions to fill any gaps, but it’s essential to engage domain experts, for consultancy.
Data acquisition is usually done through an Extract, Transform and Load (ETL) pipeline.
The ETL (Extract, Transform and Load) pipeline
ETL is a process of data integration that includes three steps that combine information from various sources. The ETL approach is usually applied to create a data warehouse. The information is extracted from a source, transformed into a specific format for further analysis and loaded into a data warehouse.
The main purpose of data preprocessing is to transform information from images, audio, log, and other sources into numerical, normalised, and scaled values. Another aim of data preparation is to cleanse the information. It’s possible that your data is usable; it just serves no outlined purpose. In such a case,70%-80% of total modelling time may be assigned to data cleansing or replacing data samples that are missing or contradictory.
In many situations, you may need additional feature extraction from your data (like calculating the square from the room width and length for the rent price estimation).
Proper preparation from kick-off will ensure that your data science project gets off on the right foot, with the right goals in mind. An initial data assessment can outline how to prepare your data for further modelling.
6. Modelling
We advise that you start from proof of concept (PoC) development, where you can validate initial ideas before your team starts pre-testing on your real-world data. After you’ve validated your ideas with a PoC, you can safely proceed to production model creation.
Define the modelling technique
Even though you may have chosen a tool at the business understanding stage, the modelling stage begins with choosing the specific modelling technique you’ll use. At this stage, you generate a number of models that are set up, built and can be trained. ML models — linear regression, KNN, Ensembles, Random Forest, etc. — and deep learning models – RNN, LSTN and GANs – are part of this step.
Come up with a test design
Before model creation, the testing method or system should be developed to review the quality and validity. Let’s take classification as a data mining task. Error rates can be used as quality measures; thus, you can separate datasets in train, validation sets. And build the model using a train set and make a quality assessment based on the separate test set (a validation set is used for the model/approach selection, not for the final error/accuracy measurement).
Build a model
To develop one or more models, use the modelling tool on the arranged dataset.
Parameter settings — modelling tools usually allow the adjustment of a wide range of parameters. Make a parameters rundown with their chosen values together with the parameter settings choice justification.
Models — models suggested by the modelling tool and not the models’ report.
Model descriptions — outline the resulting models, report the models’ interpretations and detail any issues with meanings.
7. Model evaluation
Model selection during the prototyping phase
To assess the model, leverage your domain knowledge, criteria of data mining success and desired test design. After evaluating the success of the modelling application, work together with business analysts and domain experts to review the data mining results in the business context.
Include business objectives and business success criteria at this point. Usually, data mining projects implement a technique several times, and data mining results are obtained by many different methods.
Model assessment – sum up task results, evaluate the accuracy of generated models and rank them in relation to each other.
Revised parameter settings – building upon the evaluation of the model, adjust parameter settings for the next run. Keep modifying parameters until you find the best model(s). Make sure to document modifications and assessments.
Here are some methods used by data scientists to check a model’s accuracy:
Lift and gain charts — used for problems in campaign targeting to determine target customers for the campaign. They also estimate the response level you can get from a new target base.
ROC curve — performance measurement between the false positive rate and true positive rate.
Gini coefficient — measures the inequality among values of a variable.
Cross-validation — dividing data into two or three parts; the first is used for model training and the second for the approach selection, and then the third, the test set, is used for the final model performance measurement.
Confusion matrix — a table that compares each class’s number of predictions to its number of instances. It can help to define the model’s accuracy, true positive, false positive, sensitivity and specificity.
The confusion matrix
Root mean squared error — the average amount of error made. Most used in regression techniques; help to estimate the average amount of wrong predictions.
The assessment method should fit your business objectives. When you turn back to preprocessing to check your approach, you can use different preprocessing techniques, extract some other features and then turn back to the modelling stage. You can also do factor analysis to check how your model reacts to different samples.
8. Deployment: Real-world integration and model monitoring
When the model has passed the validation stage, and you and your stakeholders are 100% happy with the results, only then you can move on to full-scale development – integrating the model within your real production environment. The role of engineers like DevOps, MLOps and DB is very important at this stage.
The model consists of a set of scripts that process data from databases, data lakes and file systems (CSV, XLS, URLs), using APIs, ports, sockets or other sources. You’ll need some technical expertise to find your way around the models.
Alternatively, you could have a custom user interface built, or have the model integrated with your existing systems for convenience and ease of use. This is easily done via microservices and other methods of integration. Once validation and deployment are complete, your data science team and business leaders need to step back and assess the project’s overall success.
9. Data model monitoring and maintenance
A data science project doesn’t end with the deployment stage; the maintenance step comes next. Data changes from day to day, so a monitoring system is needed to track the model’s performance over time.
Once the model’s performance falls down, monitoring systems can indicate whether a failure needs to be handled, or whether a model should be retrained, or even whether a new model should be implemented. The main purpose of maintenance is to ensure a system’s full functionality and optimal performance until the end of its working life.
10. Data model disposition
Data disposition is the last stage in the data science project life cycle, consisting of either data or model reuse/repurpose or data/model destruction. Once the data gets reused or repurposed, your data science project life cycle becomes circular. Data reuse means using the same information several times for the same purpose, while data repurpose means using the same data to serve more than one purpose.
Data or model destruction, on the other hand, means complete information removal. To erase the information, among other things, you can overwrite it or physically destroy the carrier. Data destruction is critical to protect privacy, and failure to delete information may lead to breaches, compliance problems among other issues.
Conclusion
AI will keep shaping the establishment of new business, financial and operating models in 2021 and beyond. The investments of world-leading companies will affect the global economy and its workforce and are likely to define new winners and losers.
The lack of AI-specific skills remains a primary obstacle on the way to adoption in the majority of organisations. In the survey by O’Reilly, around 58% of respondents typically mentioned the shortage of ML modellers and data scientists, among other skill gaps within their organisations.
Source: AI adoption in the enterprise 2020
Having questions on how your data can be used to help you boost your business performance?We will be happy to answer them. Drop us a line.