Financial markets are constantly evolving, facing some of the biggest challenges in their history like full digitization of markets, technology-driven disruption, reduced client switching costs, etc. To cope up with these challenges and keep moving ahead, the industry timely embraced the new opportunities created by technology advancements like high-speed processing, the convergence of data ubiquity, AI software solutions, etc.
As the industry continues to work towards digitizing and transforming for new growth and operations efficiencies, they are proactively focusing on innovating and differentiating by partnering up with AI development companies to deliver enhanced client experience. From AI trading to AI fraud detection, AI solutions are helping organizations redefine their operations more efficiently. By implementing such solutions, the firms are able to leverage their own data and deliver bottom-line results.
Artificial Intelligence is being used in different industries and finance is no exception. One of the main advantages of AI solutions is their ability to work with huge databases and finance is one industry that can utilize AI solutions to its full potential. These solutions are already being implemented in areas such as insurance, banking, asset management, etc.
Applications of AI in finance
AI solutions can be used in different ways. AI based chatbots can help financial firms communicate with their clients. It also serves as the basis for virtual assistants. With the help of AI solutions, organizations can also enable algorithmic trading based on machine learning algorithms and can be used for risk management, relationship management, and fraud detection, etc.
AI in finance offers a lot of benefits but one of the main advantages is that it brings along endless automation opportunities for financial organizations. Automation can help organizations increase their productivity and operational efficiency. Moreover, in some situations, AI can replace manual efforts and helps in eliminating human biases and errors. AI solutions enhance data analysis. AI based machine learning solutions help in identifying patterns, therefore providing valuable insights and helping firms with better decision making.
Below are some of the ways in which AI solutions are applied in finance:
Automation:Automation enables organizations to enhance productivity and cut down their operating costs. Time-consuming tasks could be completed much faster. For example, AI can use character recognition to verify data automatically and generate reports based on certain parameters. It helps organizations in eliminating human errors and enables the employees to focus on more important tasks by providing them with extended bandwidth. Research suggests that AI helps organizations to save up to 70% of the costs associated with data entry and other repetitive tasks.
Credit Decisions:AI solutions help banks analyze potential borrowers more accurately. They can quickly analyze countless factors and parameters that can impact the bank’s decision. AI solutions can use complex credit scoring approaches as compared to traditional systems. These solutions offer a higher degree of objectivity as these solutions are not biased which is essential especially in the financial sector.
Trading:The trend of data-driven investments has been picking up pace in the last couple of years. AI and machine learning solutions are being used for algorithmic trading. These systems can analyze huge amounts of structured and unstructured data quickly. The speed at which these systems process data leads to quick decision making and transaction, leading to increased profit within the same time period. These algorithms make precise predictions based on a lot of historical data. They can test different trading systems, offering the traders insights for each of them before making a decision. It can also help in analyzing the long-term and short-term goals to provide recommendations on portfolio decisions.
Risk Management:Risk management is another area wherein AI solutions help organizations. AI solution’s incredible processing power can help in handling risk management more efficiently than by human efforts. Algorithms can analyze the history of risks and can detect any potential problems in a timely manner. AI based solutions can analyze various financial activities in real-time, ignoring the current market environment if required. Organizations can opt for important parameters for their business planning and use them to get insights around forecasts and predictions for the future.
Fraud Prevention:AI based solutions are proving to be effective in preventing and identifying fraud cases. Cybercriminals are quick on developing new tactics but with the help of AI based solutions, organizations can quickly identify and adapt to hackers’ strategies. These solutions are effective when it comes to dealing with credit card fraud situations. AI driven algorithms can analyze a client’s behavior, keep a track of their locations and identify their purchasing patterns, therefore they can detect if there are any unusual activities associated with a certain account.
Personalized Banking:AI based solutions are one of the best when it comes to providing a personalized experience. Financial institutions can use AI based chatbots to offer timely help to their customers while minimizing the workload of their customer representatives. They can also adopt various voice-controlled virtual assistants for the personalized experience. These solutions are self-learning in nature i.e they identify patterns and learn on their own so they become more effective with time. There are a lot of solutions that offer personalized financial advice to their users. These systems use algorithms that can track their regular expenses, income, and purchasing habits to provide necessary financial suggestions based on the user’s financial goals.
The financial market has been strongly influenced by technological advancements. We are operating in an environment where speed and convenience are the competing advantages in the industries, especially in the financial markets. The digital transformation has increased the competition like never before, therefore this industry is becoming increasingly volatile and competitive. To stay relevant in the given circumstances, organizations need to keep up with the latest technological advancements while partnering up with tech companies like AI development companies as these companies would help them gain a significant advantage by preparing them for the new opportunities that the tech offers.
Metaprogramming is a collection of programming techniques which focus on ability of programs to introspect themselves, understand their own code and modify themselves. Such approach to programming gives programmers a lot of power and flexibility. Without metaprogramming techniques, we probably wouldn’t have modern programming frameworks, or those frameworks would be way less expressive.
This article is an excerpt from the book, Expert Python Programming, Fourth Edition by Michal Jaworski and Tarek Ziade – A book that expresses many years of professional experience in building all kinds of applications with Python, from small system scripts done in a couple of hours to very large applications written by dozens of developers over several years.
Metaclass is a Python feature that is considered by many as one of the most difficult things to understand in this language and thus avoided by a great number of developers. In reality, it is not as complicated as it sounds once you understand a few basic concepts. As a reward, knowing how to use metaclasses grants you the ability to do things that are not possible without them.
Metaclass is a type (class) that defines other types (classes). The most important thing to know in order to understand how they work is that classes (so types that define object structure and behavior) are objects too. So, if they are objects, then they have an associated class. The basic type of every class definition is simply the built-in type class (see Figure 1).
Figure 1: How classes are typed
In Python, it is possible to substitute the metaclass for a class object with youy own type. Usually, the new metaclass is still the subclass of the type class (refer to Figure 2) because not doing so would make the resulting classes highly incompatible with other classes in terms of inheritance:
Figure 2: Usual implementation of custom metaclasses
Let’s take a look at the general syntaxes for metaclasses in the next section.
The general syntax
The call to the built-in type() class can be used as a dynamic equivalent of the class statement. The following is an example of a class definition with the type() call:
This is equivalent to the explicit definition of the class with the class keyword:
class MyClass:
def method(self):
return 1
Every class that’s created with the class statement implicitly uses type as its metaclass. This default behavior can be changed by providing the metaclass keyword argument to the class statement, as follows:
class ClassWithAMetaclass(metaclass=type):
pass
The value that’s provided as a metaclass argument is usually another class object, but it can be any other callable that accepts the same arguments as the type class and is expected to return another class object. The call signature of metaclass is type(name, bases, namespace) and the meaning of the arguments are as follows:
name: This is the name of the class that will be stored in the __name__ attribute
bases: This is the list of parent classes that will become the __bases__ attribute and will be used to construct the MRO of a newly created class
namespace: This is a namespace (mapping) with definitions for the class body that will become the __dict__ attribute
One way of thinking about metaclasses is the __new__() method, but at a higher level of class definition.
Despite the fact that functions that explicitly call type() can be used in place of metaclasses, the usual approach is to use a different class that inherits from type for this purpose. The common template for a metaclass is as follows:
The name, bases, and namespace arguments have the same meaning as in the type() call we explained earlier, but each of these four methods is invoked at the different stage of class lifecycle:
__new__(mcs, name, bases, namespace): This is responsible for the actual creation of the class object in the same way as it does for ordinary classes. The first positional argument is a metaclass object. In the preceding example, it would simply be a Metaclass. Note that mcs is the popular naming convention for this argument.
__prepare__(mcs, name, bases, **kwargs): This creates an empty namespace object. By default, it returns an empty dict instance, but it can be overridden to return any other dict subclass instance. Note that it does not accept namespace as an argument because, before calling it, the namespace does not exist yet. Example usage of that method will be explained later in the Metaclass usage section.
__init__(cls, name, bases, namespace, **kwargs): This is not seen popularly in metaclass implementations but has the same meaning as in ordinary classes. It can perform additional class object initialization once it is created with __new__(). The first positional argument is now named cls by convention to mark that this is already a created class object (metaclass instance) and not a metaclass object. When __init__() is called, the class has been already constructed and so the __init__() method can do less than the __new__()method. Implementing such a method is very similar to using class decorators, but the main difference is that __init__() will be called for every subclass, while class decorators are not called for subclasses.
__call__(cls, *args, **kwargs): This is called when an instance of a metaclass is called. The instance of a metaclass is a class object (refer to Figure 1); it is invoked when you create new instances of a class. This can be used to override the default way of how class instances are created and initialized.
Each of the preceding methods can accept additional extra keyword arguments, all of which are represented by **kwargs. These arguments can be passed to the metaclass object using extra keyword arguments in the class definition in the form of the following code:
class Klass(metaclass=Metaclass, extra=”value”):
pass
This amount of information can be overwhelming at the beginning without proper examples, so let’s trace the creation of metaclasses, classes, and instances with some print() calls:
Using RevealingMeta as a metaclass to create a new class definition will give the following output in the Python interactive session:
>>> class RevealingClass(metaclass=RevealingMeta):
… def __new__(cls):
… print(cls, “__new__ called”)
… return super().__new__(cls)
… def __init__(self):
… print(self, “__init__ called”)
… super().__init__()
…
<class ‘RevealingMeta’> __prepare__ called
<class ‘RevealingMeta’> __new__ called
<class ‘RevealingClass’> __init__ called
And when you try to create actual instance of RevealingClass you can get following output:
>>> instance = RevealingClass()
<class ‘RevealingClass’> __call__ called
<class ‘RevealingClass’> __new__ called
<RevealingClass object at 0x1032b9fd0> __init__ called
Let’s take a look at the new Python 3 syntax for metaclasses.
Metaclass Usage and Applications
Metaclasses are great tool for doing unusual and sometimes wonky things. They give a lot of flexibility and power in modifying typical class behaviour. So, it is hard to tell what the common examples of their usage are. It would be easier to say that most usages of metaclasses are pretty uncommon.
For instance let’s take a look at the __preprare__() method of every object type. It is responsible for preparing the namespace of class attributes. The default type for class namespace is plain dictionary. For years the canonical example of __prepare__() method was providing an collections.OrderedDict instance as a class namespace. Preserving order of attributes in class namespace allowed for things like repeatable object representation and serialization. But since Python 3.7 dictionaries are guaranteed to preserve key insertion order so that use case is gone. But it doesn’t mean that we can’t play with namespaces.
Let’s imagine a following problem: we have a large Python code base that was developed over dozens of years and the majority of the code was written way before anyone in the team cared about coding standards. We may have for instance classes mixing camelCase and snake_case as method naming convention. If we care about consistency, we would be forced to spend a tremendous amount effort to refactor the whole code base into either of naming conventions. Or we could just use some clever metaclass that could be added on top of existing classes that would allow for calling methods in both of ways. We could write new code using new calling convention (preferably snake_case) while leaving old code untouched and waiting for gradual update.
That’s the example of situation when the __prepare__() could be used! Let’s start by writing a dict subclass that automatically interpolates camelCase names into snake_case keys:
Note: To save some work we use the inflection module that is not a part of standard library. is able to convert strings between various “string cases”. You can download it from PyPI using pip:
pip install inflection
Our CaseInterpolationDict class works almost like an ordinary dict type but whenever it stores new value it saves it under two keys: original one and one converted to snake_case. Note that we used dict type as a parent class instead of recommended collections.UserDict. This is because we will use this class in metaclass __prepare__() method and Python requires namespaces to be dict instances.
Now it’s time to write actual metaclass that will override the class namespace type. It will be surprisingly short:
class CaseInterpolatedMeta(type):
@classmethod
def __prepare__(mcs, name, bases):
return CaseInterpolationDict()
Since we are set up, we can now use the CaseInterpolatedMeta metaclass to create a dummy class with few methods that uses camelCase naming convention:
‘__dict__’: <attribute ‘__dict__’ of ‘User’ objects>,
‘__weakref__’: <attribute ‘__weakref__’ of ‘User’ objects>,
‘__doc__’: None
})
The first thing that catches the eye is the fact that methods got duplicated. That was exactly what we wanted to achieve. The second important thing is the fact that User.__dict__ is of mappingproxy type. That’s because Python always copies contents of the namespace object to new dict when creating final class object. The mapping proxy also allows to proxy access to superclasses within the class MRO.
So, let’s see if our solution works by invoking all of its methods:
>>> user = User(“John”, “Doe”)
>>> user.getDisplayName()
‘John Doe’
>>> user.get_display_name()
‘John Doe’
>>> user.greetUser()
‘Hello John Doe!’
>>> user.greet_user()
‘Hello John Doe!’
It works! We could call all the snake_case methods even though we haven’t defined them. For unaware developer that could look like almost like a magic!
However, this is kind of magic that should be used very carefully. Remember that what you have just seen is a toy example. The real purpose of it what to show what is possible with metaclasses and just few lines of code. Learn more in the book Expert Python Programming, Fourth Edition by Michal Jaworski and Tarek Ziadé.
Summary
In this article, we were first introduced to meta programming and eventually to the complex world of metaclasses. We explored the general syntax and practical usage of metaclasses. In the book, we further delve into advanced concepts of metaclasses pitfalls and the usage of __init__subclass__() method as alternative to metaclasses.
About the Authors
Michał Jaworski has more than 10 years of professional experience in writing software using various programming languages. Michał has spent most of his career writing high-performance and distributed backend services for web applications. He has served in various roles at multiple companies: from an ordinary software engineer to lead software architect. His beloved language of choice has always been Python.
Tarek Ziadé is a software engineer, located in Burgundy, France. He works at Elastic, building tools for developers. Before Elastic, he worked at Mozilla for 10 years, and he founded a French Python User group, called AFPy. Tarek has also written several articles about Python for various magazines, and a few books in French and English.
Small and medium sized businesses (SMEs) often find it difficult to balance the day-to-day need for data with the cost of employing data scientists or professional analysts to help with forecasting, analysis, data preparation and other complex analytical tasks.
A recent Gartner study found that by 2022, data preparation will become a critical capability in more than 60% of data integration, analytics/BI, data science, data engineering and data lake enablement platforms. If this assumption is correct, SMEs will have to find a way to satisfy the need for advanced analytics and fact-driven decision-making, if these businesses are going to grow and compete.
Data is everywhere in modern organizations and small and medium sized businesses are no exception! The tasks involved in gathering and preparing data for analysis are just the first steps. To make the best use of that data, the organization must have advanced analytics tools that can help them analyze and find patterns and trends in data and build analytics models. But these steps can be labor intensive and, without a suitable self-serve data preparation tool, the organization will have to employ the services of professional data scientists to get the job done.
Data prep and manipulation includes data extraction, transformation and loading (ETL) and shaping, reducing, combining, exploring, cleaning, sampling and aggregating data. With a targeted self-serve data preparation tool, the midsized business can allow its business users to take on these tasks without the need for SQL skills, ETL or other programming language or data scientist skills.
Augmented analytics features can help an SME organization to automate and enhance data engineering tasks and abstract data models, and use system guidance to quickly and easily prepare data for analysis to ensure data quality and accurate manipulation. With the right self-serve data preparation tools, users can explore data, use auto-recommendations to visualize the data in a way that is appropriate for a particular type of data analysis and leverage natural language processing (NLP) and machine learning to get at the data using simple search analytics that are familiar and commonly used in Google and other popular search techniques.
Because these sophisticated features are built with intuitive guidance and auto-recommendations, the user does not have to guess at how to prepare, visualize or analyze the data so results are accurate, easy to understand and suitable for sharing and reporting purposes. As small and medium sized organizations face the challenges of an ever-changing market and customer expectations, it will be more critical than ever to optimize business and data management and to make data available for strategic and day-to-day decisions. To manage budgets and schedules, SMEs will have to achieve more agility and flexibility and look to the business user community to increase data literacy and embrace business analytics.
Cardiovascular diseases are the primary cause of global deaths.
New model detects coronary heart disease with almost 99% accuracy.
DNN with hidden layers shows more accuracy than other models.
According to the World Health Organization (WHO), cardiovascular diseases (CVDs) are the leading cause of death globally, killing 17.9 million people in 2019 [1]. The WHO risk models identified many different variables as risk factors for CVDs, including the key predictor variables: age, blood pressure, body mass index, cholesterol, and tobacco use. Historically, this potpourri of factors made CVDs almost impossible to predict with any meaningful accuracy. A new study by Kondeth Fathima and E. R. Vimina [2], published in Intelligent Sustainable SystemsProceedings of ICISS 2021, used Deep Neural Networks (DNNs) with four Hidden Layers (HDs) to predict CVDs with an impressive 99% accuracy.
What is a DNN with Hidden Layers?
Neural network models have come to the forefront in recent years, gaining popularity because of their exceptional prediction capabilities. Many different deep learning techniques have been developed, including Convolutional Neural Networks (CNNs) —used extensively for object recognition and classification—and Long Short-Term Memory Units (LSTMs), widely used to detect anomalies in network traffic. This new study used a Deep Neural Network (DNN), known for their robustness to low and high data variations, generalizability to a wide range of applications, and scalability for additional data.
DNNs can be single- or multi-layered and are defined as “an interconnected assembly of processing elements that act upon a function” [2]. The additional computational layers in multi-layered DNNs are called Hidden Layers (HLs); HLs repeat a process through many cycles. A neural network model with hidden layers can handle increasingly complex information, making it an ideal choice for analyzing data with multiple features—like data on cardiovascular disease. When modeling the intricacies of CVD risk-factors, more hidden layers give better results than one with fewer layers. The goal of the study was to find the DNN with the optimal number of hidden layers—the one giving the best accuracy for predicting cardiovascular disease.
Methodology
The study authors used two datasets from the University of California at Irvine’s machine learning repository [3], Statlog and Cleveland. Both of these data sets are known for their data source reliability. After using Exploratory Data Analysis on the data, the researchers chose the best model based on accuracy performance on the two datasets.
Three different neural network models were studied, each with a different number of layers and neurons. After experimenting with various numbers of hidden layers, the researchers chose one with one input layer (IL), four HLs, and one output layer (OL).
Synthetic Minority Oversampling Technique (SMOTE) increased and balanced the number of cases in the imbalanced dataset, which contained disproportionate cases of healthy and unhealthy cases. Mean imputation replaced the missing data, and the datasets were divided into a training set (70%) and testing set (30%) containing equal proportions of healthy and unhealthy cases. The weights of the 13 features were optimized using gradient descent in neural networks, and the data was scaled using standardization.
Results
Different metrics used for evaluating the models, including accuracy, sensitivity, specificity, F1 score, misclassification, ROC, and AUC. The result was a four-HL DNN that detected coronary heart diseases with “promising results”. The selected model gave accuracies of 98.77 on the Statlog dataset and 96.70 for the Cleveland dataset.
References
DNN Image (Top): Adobe Stock / Creative Cloud.
4-HL DNN Model by Author (Based on Kondeth Fathima and E. R. Vimina’s original Fig. 1). Background: Adobe Stock / Creative Cloud.
A cartoon making its way around social media asks the provocative question “Who wants clean data?” (Everyone raises their hands) and then asks, “Who wants to CLEAN the data?” (Nobody raises their hands). I took the cartoon one step further (apology for my artistic skills) and asked, “Who wants to PAY for clean data?” and shows everyone running for the exits (Figure 1).
Figure1: Today’s Data Management Reality
Why does everyone run for the exits when asked to pay for data quality, data governance, and data management? Because we do a poor job of connecting high-quality, complete, enriched, granular, low-latency data to the sources of business and operational value creation.
Data is considered the world’s most valuable resource and providing compelling financial results to organizations focused on exploiting the economics of data and analytics (Figure 2).
Yet, most business executives are still reluctant to embrace the fundamental necessity of Data Management and fund it accordingly. If data is the catalyst for the economic growth of the 20th century, then it’s time we reframe how we view data management. It’s time to talk about Data Management 2.0.
The Data Management Association (DAMA) has long been the data management champion. DAMA defines data management as “the planning, oversight, and control over the management and use of data and data-related sources”. DAMA is instrumental in driving data management development of procedures, practices, policies, and architecture (Figure 3).
The DAMA Data Management Framework is great for organizations seeking to understand how to manage their data. However, if data is “the world’s most valuable resource”, then we must re-invent data management into a business strategy. We must help organizations understand how best to monetize or derive value from the application of data to their business (Figure 4).
Figure4: Transforming Data Management
Before exploring the Laws of Data Management 2.0, let me define “Data Monetization”:
Data Monetizationis the application of data to the business to drive quantifiable financial value.
While some organizations can sell their data, for the majority of organizations data monetization (or insights monetization) is about the application of the data to the organization’s top use cases to drive quantifiable financial value. Or as Doug Laney, author of the seminal book “Infonomics: How to Monetize, Manage, and Measure Information as an …” stated:
“If you are not quantifying the financial value that your organization derives from the use of data, then you are not doing data monetization”
Law #1: Data is of no value in of itself
Data possesses potential value, but in of itself, provides zero realized value. As I discussed in “Introducing the 4 Stages of Data Monetization”, data in Stage 1 is a cost to be minimized. Data in Stage 1 is burdened with the increasing costs associated with the storage, management, protection, and governance of the data, as well as potential regulatory and compliance costs, liabilities, and fines associated with not properly managing or protecting one’s data (Figure 5).
Data Management 2.0 provides a more holistic methodology that doesn’t just stop at managing data but enables the application of data to the organization’s most important use cases to drive quantifiable financial value.
Law #2: Not all data is off equal value
Many data management organizations waste precious resources (and business stakeholder street cred) by treating all the data the same way. Fact: some data is more important that other data in helping to predict and optimize customer engagement, product performance, and business operations.
To determine which data elements are most important, Data Scientists can apply analytic techniques like Principal Component Analysis (PCA) and Random Forest to quantify the importance of a particular data element (or feature, something that I’ll discuss in my next blog) in optimizing the organization’s key use cases such as customer attrition, predictive product maintenance, unplanned operational downtime, improved healthcare results, or surviving the sinking of the Titanic (Figure 6).
Data Management 2.0 operationalizes business stakeholder collaborate to identify, validate, value, and prioritize the use cases that deliver organizational value, and identify and triage the KPIs and metrics against which value delivery will be measured.
Law #3: One cannot ascertain the value of their data in isolation of the business
To identify which data variables are most important to the business, data management must start by understanding how the organization creates and measures value creation. This conversation starts with an organization’s business and operational intent; that is, what is the organization trying to accomplish from a business and operational perspective over the next 12 to 18 months, and what are the measures or KPIs against which progress and success will be measured.
Data Management 2.0 reframes how organization’s approach the application of data to the business by understanding how organizations create value (and where and how data can help create value) instead of starting with data (and hoping that data finds its way to value). For more on how to do that, check out my book “The Art of Thinking Like a Data Scientist” which provides an 8-step, collaborative process for engaging the business stakeholders in identifying, validating, valuing, and prioritizing the organization’s most important business and operational use cases (Figure 7).
Law #4: Turning everyone into Data Engineers is not practical and not scalable
Finally, asking business stakeholders to manage their own data sources is impractical and dangerous. It opens the door to random, orphaned data management processes that may address the data and analytic tactical needs, but at the expense of data and analytics’ strategic, economic value.
Data Management 2.0 empowers the entire organization with the capabilities for building, sharing, and refining the organization’s data and analytics capabilities and assets that enables organizations to unleash the business or economic value of their data.
If we believe that data is the new oil – that data will be the catalyst for the economic growth in the 21st century – then we need to spend less time and investments trying to manage data and dramatically increase the time and investments to monetize data. That will require organizations to expand their data management capabilities to support the sharing, re-using and continuous refinement of the data and analytics assets to derive and drive new sources of customer, product, and operational value.
Transformer based pre-trained language models (TPTLM) are a complex and fast growing area of AI – so I recommend this paper as a good way to understand and navigate the landscape
We can classify TPTLM from four perspectives
Pretraining Corpus
Model Architecture
Type of SSL (self-supervised learning) and
Extensions
Pretraining Corpus-based models
General pretraining: Models like GPT-1 , BERT etc are pretrained on general corpus. For example, GPT-1
is pretrained on Books corpus while BERT and UniLM are pretrained on English Wikipedia and Books corpus.
This form of training is more general from multiple sources of information
Social Media-based: you could train on models using social media
Language-based: Models could be trained on languages either monolingual or multilingual.
Architecture
TPTLM could be classified based on their architecture. A T-PTLM can be pretrained using a stack of encoders or decoders or both.
Hence, you could have architectures based on
Encoder-based
Decoder-based
Encoder-Decoder based
Self supervised learning – SSL is one of the key ingredients in building T-PTLMs.
A T-PTLM can be developed by pretraining using Generative, Contrastive or Adversarial, or Hybrid SSL. Hence, based on SSLs you could have
Generative SSL
Contrastive SSL
Adversarial SSL
Hybrid SSL
Based on extensions, you can classify TPTLMs according to the following categories
Compact T-PTLMs: aim to reduce the size of the T-PTLMs and make them faster using a variety of model compression techniques like pruning, parameter sharing, knowledge distillation, and quantization.
Character-based T-PTLMs: CharacterBERT uses CharCNN+Highway layer to generate word representations from character embeddings and then apply transformer encoder layers. ex AlphaBERT
Green T-PTLMs: focus on environmentally friendly methods
Sentence-based T-PTLMs: extend T-PTLMs like BERT to generate quality sentence embeddings.
Tokenization-Free T-PLTMs: avoid the use of explicit tokenizers to split input sequences to cater for languages such as Chinese or That that do not use white space or punctuations as word separators.
Large Scale T-PTLMs: Performance of T-PTLMs is strongly related to the scale rather than the depth or width of the model. These models aim to increase the parameters of the model
Knowledge Enriched T-PTLMs: T-PTLMs are developed by pretraining over large volumes of text data. During pretraining, the model learns
Long-Sequence T-PTLMs: self-attention variants like sparse self attention and linearized self-attention are proposed to reduce its complexity and hence extend T-PTLMs to long input sequences
Efficient T-PTLMs: ex DeBERTa which improves the BERT model using disentangled attention mechanism and enhanced masked decoder.
This is a complex area and I hope the taxonomy above is useful. The paper I referred provides more and makes a great effort at explain such a complex landscape
The post is based on a paper which covers this topic extensively: (also image source from the paper)
The distribution of data means the way the data gets spread out. This article talks about some essential concepts of the normal distribution:
How to measure normality
Ways to transform a dataset to fit the normal class distribution
How to use the normal distribution to showcase naturally distributed phenomena and provide statistical insights
Let’s get started!
Suppose you belong to the field of statistics. In that case, you know how vital data distribution is because we always sample from a population where you have no idea about full distribution. As a result, the distribution of our sample might limit the statistical techniques available to us.
Looking at the normal distribution, it is a frequently perceived continuous probability distribution.
When a database meets the normal distribution, you can employ other techniques to explore the data more.
Knowledge about the percentage of data in each standard deviation
Linear least-squares regression
Inference based on the sample mean
In some cases, it can be beneficial to change a skewed dataset to observe the normal distribution. It will be more relevant when your data is usually distributed for some distortion.
Here are the basic features of the normal distribution:
Symmetric bell shape
Equal Mean and median at the center of the distribution
≈68% of the comedown within 1 standard deviation of the mean
≈95% of the data come down within 2 deviations of the mean
≈99.7% of the data falls between 3 standard deviations of the mean
Important terms you need to know as a general overview of the normal distribution:
Normal Distribution:It is a symmetric probability distribution frequently used to represent real-valued random variables. Also called the bell-curved or Gaussian distribution.
Standard Deviation:It measures the amount of variation or dispersion of a set of values. It is also calculated as the square root of variance.
Variance:It is the distance from the mean of each data point
Ways to Use Normal Distribution
If the dataset you have does not conform to the normal distribution, you could apply these tips.
Collect more data:Even a tiny sample size lacking quality could distort your customarily distributed dataset. As a solution, collecting more data is the key.
Reduce sources of variance:Reducing the outliers can help with the normal distribution of data.
Apply a power transform: You can choose to apply the Box-Cox method for skewed data, which refers to taking the square root and the log of the observation.
Let’s also overview some normality measures and how you would use them in a Data science project.
Skewness
It is a measure of asymmetry relative to the mean.
The above graph has negative skewness. That means that the tail of the distribution is longer on the left side. The counterintuitive thing is that most of the data points are clustered on the right side. Make sure you are not getting confused with right or positive skewness that might get represented by this graph’s mirror image.
A Brief on How to Use Skewness
It is a significant factor in model performance. You can use skew from the scipy stats module to measure skewness.
The skewness measure can drive us to the potential deviation in model performance across all the feature values. A positively skewed feature for example the second array in the above image can enable better performance on lower values.
Kurtosis
The original meaning of Kurtosis is a measure of the tailedness of the distribution. It is typically measured relative to 0, the kurtosis value of the normal distribution with Fisher’s definition. A positive kurtosis value identifies “fatter” tails.
The Laplace Distribution has kurtosis > 0. via John D. Cook Consulting.
Understanding kurtosis supply a lens to the presence of outliers in a dataset. To measure kurtosis, you can use kurtosis from the scipy.stats module. Negative kurtosis indicates data that is grouped meticulously around the mean with fewer outliers.
Various naturally occurring datasets conform to the normal distribution. This claim has been made for everything from IQ to human heights. While normal distribution is drawn from observations of nature and frequently occurs, which is true, we risk oversimplification by applying this assumption too liberally.
Often the standard model won’t fit well in the extremes. It also undermines the probability of rare events.
Calculate the Share of Values within SD
As the amount of data set gets larger and larger, calculating the standard deviation (SD) and the number of values falling within each quarter of the bell-shaped curve becomes difficult. To this end, anempirical rule calculator can make the process faster. This calculator calculates the share of values that fall within a particular SD from the mean or the dataset average. To calculate the percentage of values, we just need to have mean and SD value handy.
Summary
This brief article covered everything about normal distribution—some fundamental concepts, how to measure them, and how to use them. Make sure not to over-apply normal distribution, or you risk discounting the chances of outliers. Let us know how it helped you in understanding the concepts.
Today Marketers play the role of advanced and technical matchmakers as their job is to match their target consumers with the products and solutions that best meet their needs or wants. They are also responsible for matching their consumer segments with the content, messaging, creatives, and CTAs that best suits – across all the platforms and channels their audiences are on. Marketers generally face massive barriers to understand how customers engage with marketing campaigns and where & how to optimize them. Data visualization, preparation, charts, dashboards and stats are the top areas where talented and expensive marketing resources are getting exhausted and that too are misaligned. The experienced marketing analysts spend their time preparing data rather than analyzing it, which is the wastage of available resources and not utilizing it efficiently.
Marketers today are looking for a centralized place where all stakeholders can connect to the right information and insights– to make smarter decisions across the customer journey. That means connecting a multitude of different data into a system that understands how they need to fit together can be flexibly improvised over time as per marketing requirements.
Marketers who leverage marketing intelligence successfully tend to have four core steps in the process. These are:
The combination of marketing’s data complexity and requirements of the simplicity of control and adaptability has created the conditions for a smarter approach. The main focus of Marketing Intelligence is to empower the marketing department with its smart insights. Marketing Intelligence uses smart technology that can easily understand and learn from the marketing data which keeps on changing. Now marketers can take control of their data from end to end, with the advances in AI, ML and embedded marketing expertise. This has helped marketers to connect, unify, analyze and act on all of marketing’s data in a way that’s easy– and even automatic.
Datorama helps businesses in handling their processing and analysis of data sets of different sizes and complex levels. It is scalable and cloud-based for growth, access and convenience. Salesforce Datorama, built to provide enterprise-grade intelligence and insights, enables managers to monitor KPIs (key performance indicators), track ROIs (return on investments), and other documents in a centralized repository.
As a leading marketing intelligence platform, Datorama reporting empowers marketers to connect the entire marketing data ecosystem. Equipped with the industry’s most extensive library of marketing API connectors and AI-powered data integration, Datorama can integrate, prepare, and export your marketing data. This is a valuable resource for Marketers looking for the data needed to measure campaign effectiveness and manage marketing performance through improved dashboards and reporting.
Using Datorama’s built-in data integration engine, marketers and media professionals can gain insight into various cross-channel marketing activities and implement data modeling operations. The application allows users to automate various data-related operations like cleansing, model mapping, file analysis, and update scheduling using machine learning technology. Features of Datorama include data analysis, embedded analytics, an interactive dashboard, data visualization, notifications, and more.
Datorama offers APIs, which allows marketing teams to connect the system with various applications for web analytics, customer relationship management (CRM), email management, and more. It also helps marketing agencies allocate campaign budgets, collect client details, and generate custom reports via a unified platform.
Key differentiators & advantages of Datorama
Single Cohesive Data Warehouse
Quick Insights
End-to-End Platform
Data Accessibility
Cross-Channel Visibility
Actionable Insights
Better Business Decisions
Usage and Industries
Datorama brings in all of your top marketing data sources. Customers can create, distribute, and access powerful marketing analytics apps with speed and ease. As a marketer, you want to see all your performance, outcome, and investment data in one place, Datorama allows you to take hold on all these things. You can let datorama’s AI automatically connect and organize any data source with just a few clicks. Easy to use real-time dashboards put your goals, insights, and trends at your fingertips.
Why Salesforce Datorama
Datorama allows users to connect and unify all their data and insights into one centralized platform for holistic reporting, measurement, and optimization.
Analyze and report across all channels and campaigns so every stakeholder in the organization has the right information at their fingertips.
Act and collaborate to drive ROI to bring the organization together toward common goals using built-in activation and collaboration tools to connect customer’s marketing technology stack and decision-makers.
Compares and contrasts the performance of all your channels and campaigns in one view.
Allows marketers to unify all their data, KPIs, and stakeholders across teams, channels, platforms, etc.
Datorama Marketplace unlocks limitless opportunities to deliver and access powerful solutions that leverage the full power of datorama with immediate time to value.
Salesforce marketing cloud email offers customers interactive analytics that helps benchmark and measures the effectiveness of email marketing campaigns.
If you too want to explore the advanced marketing intelligence platforms for the growth of your business, then you can consider the following consultants:
Nabler
EMS Consulting
Search Discovery
ATCS
Blast Analytics
Resources to learn more about Datorama:
Salesforce Official Youtube Channel
Salesforce Trailhead
Salesforce Community
Some of the interesting articles which I came across:
You can always hire Python programmers who are very proficient in developing web applications for small and medium-sized businesses. Also, avail the outstanding services provided by brilliant Python programmers who have mastered the art of creating apps for enterprises and financial industries. They are best for creating custom apps that can make any small business achieve utmost success. This ensures that your app gets more visibility online to gain maximum revenue. As they are experts in programming, you can be assured of a better user experience and a higher level of app functionality.
The Python code offers an excellent value proposition that can prove to be very beneficial for small businesses. It is written in easy-to-understand programming languages, which ensures that all requirements are fulfilled. The best thing about hire python developer is that they offer custom web apps according to your specific requirements. They use unique and superior technologies that are developed based on extensive client feedback and research. Thus, you can be assured of the best quality work at affordable prices.
The technologies employed by a Python programmer are excellent for creating innovative web apps. The technologies that they employ in the most popular Python programming language are active python script, easy-to-use 3D Model oriented interface, Unicode support, Pygments, custom views, direct comprehension, and others. These technologies are what make up the advanced functionality of a Python programmer. Thus, you can hire expert python full-stack developers to ensure that your website performs well.
When you hire full-time or permanent professional developers, you are ensured that your business benefits from their expertise. The dedicated python developers have years of experience in developing websites of various sizes and complexities. Most of these professionals are also experienced programmers that offer full support to your requirements. Thus, you can rely on their expertise to create and develop the best possible website in a short period of time.
When you hire python developers, you get all of the latest tools and frameworks available. You can be sure of the fact that these developers utilize the latest technologies and frameworks that enhance your website performance and functionality. Developers work closely with you to enhance the design and development of your website so that it meets your desired online goals. You can hire the most talented developers so that your web development team can deliver customized and attractive websites to your target customers. You can have a comprehensive and interactive website with high conversion rates.
In today’s competitive market, every business wants to remain ahead of its competitors. This is one reason why you should hire experienced and professional offshore python programmers. These developers can use the best technology available and develop customized apps to cater to your business needs. Thus, you can focus on other aspects of your business and leave the technicalities to the experts.
The developers can use a scalable and robust open-source framework that works well with both small and large organizations. Therefore, you can be assured that your business applications will run smoothly on any platform. With a scalable and robust open-source framework, developers are able to construct advanced apps with minimal coding and integration. So, you can hire angularjs programmers who can easily handle customized and scalable apps that meet your business needs. They can also deliver custom development and solutions that help you save time and money while you concentrate on other aspects of your business.
Developers can work on a full-time and part-time basis. If you need more assistance, you can hire them to complete your project in a faster and better manner. For instance, if you hire programmers who work on a full-time basis, they will be able to provide you with a consistent and reliable solution that works well with your business. However, if you hire programmers who work on a part-time basis, you will be guaranteed their availability so that you can discuss your requirements as and when necessary.
Azure DevOps is the successor to Team Foundation Server (TFS) and is said to be its advanced version. It is a complete suite of software development tools that are being used on-premises. Azure DevOps Server integrates with currently integrated development environments (IDEs) and helps teams to develop and deploy cross-functional software. It provides a set of tools and services with which you can easily manage the planning and development of your software projects through testing and deployment.
Azure DevOps practices enable IT departments to augment quality, decrease cycle times, and optimize the use of resources to improve the way software is built, delivered and operated. It increases agility and enables better software development and speeds up the delivery by providing you the following power:
Curbing cycle times DevOps helps organizations to improve transparency and collaboration amid their development and operations teams as well as helps them to curb cycle times and enhance the traceability of every release.
Resource optimization The implementation of Azure DevOps helps organizations to:
Manage environments to provision/de-provision it
Control costs
Utilize the provisioned resources efficiently
Reduce security risks
Improving quality Azure DevOps facilitates the identification of defects and their main cause early in the development cycle. It helps to test and deploy the solutions to those issues quickly.
Let’s take a look at ways Azure DevOps helps your business grow:
Security
Azure DevOps helps you to unite your hardware, processes and workforce. It is designed in such a way that it completely adheres to the standards of security, control and scalability of almost every company. It works on-premises and if you want to secure your company information onsite, Azure DevOps is the best choice. This Microsoft product also gives you complete control over the access of your organizational data. With Azure DevOps Server, authentication is done through the Active Directory of your organization. Moreover, you can use User Groups to update permissions in Azure DevOps server implementations in bulk.
Bug tracking
The Azure DevOps Server understands your development processes well. It becomes a knowledge base through which you can easily find other bugs of a particular type. You can also search for bugs that were reported for an application before. The Azure DevOps Server also sends notifications ever since the release of the TFS Power Tools.
Collaboration
Sharing is one of the core functions of Azure DevOps. Every organization optimizes its code to host and manage it centrally. No matter what sort of code your organization uses for arranging accounts or managing servers, you can store it in the Azure DevOps Server. This provides you a central location to manage your codes. Apart from storing and sharing, you can also manage your code by versioning it through the Azure DevOps Server.
Work items
The Azure DevOps Server not only helps you to manage your code, but also facilitates you to organize the administration of systems with work items. A work item can be a server, project risk, system bug, or anything that you want. When you create work items for a process template, you can model it in an Agile Framework or the Capability Maturity Model Integration (CMMI) as per the requirement of your process. Irrespective of the arrangement, work items help your team to segregate difficult systems into feasible workloads.
Azure DevOps CI / CD pipeline
The Azure DevOps Server provides a strong platform to deploy solutions in a pipeline permitting continuous integration and delivery in a software-driven organization. Furthermore, it offers an extensive marketplace for plug-ins and integrations through which you can incorporate infrastructure-as-code into the pipeline with the help of which, the system administrator can automate changes from a single location.
Increases agility
The Azure DevOps Server offers fully integrated deployment capabilities and facilitates organizations with faster time to market. It helps you to deploy changes as and when required and sets you free from a restricted release cycle every quarter.
Continuous updates
With the Azure DevOps Server, the software gets updated regularly which ensures that it is future-proof. The software bugs get cleaned and still it continues to support the latest advancements in technology.
Minimizes outages
Outage reduction leads to big value. By implementing the DevOps approach, organizations can improve their work processes, automation and deployment. They also prevent the inferences that arise as a result of the outage.
Advanced innovation
When you minimize outages and deploy code with better quality, you can spare more time to improve your working methods. Since you have to spend less time fixing issues that arise as a result of deployment, you drive greater business outcomes.
Conclusion
By now you would have understood that the Azure DevOps Server has evolved with the arrival of new technologies to cater to the increasing demands for fast processes and superior quality. It provides you with outstanding features and functionality in one platform. The incorporation of Azure DevOps offers the finest internal culture and drives expanding growth for your company. Talk to our experts to discuss your business needs and let us help you achieve your goals with our Azure DevOps services