Blog - Page 24 Of 53 - Pro Lead Brokers USA | Targeted Sales Leads

Jul 6 2021

What’s In a Name?

I wrote on this topic way back in 2016, but when a recent reader indicated that the original article didn’t have images anymore (it happens), it seemed like a good opportunity to write about it again.

I am Kurt Cagle, or, according to my birth certificate, Kurt Alan Cagle. My name is Kurt Cagle.

Now, think about that for a bit. The to be verb is remarkably slippery, and it is slippery in almost every language on the planet that has such a construct. For instance, consider the following statements:

I am Kurt Cagle.

I am a writer.

These are two of the most fundamental assertions in language. The first statement can be broken down as:

There exists a label associated with the referenced entity that (at least locally) identifies that entity to differentiate it from other entities.

The second statement can also be restated as:

There exists a set associated with the referenced entity that indicates membership of that entity within that set, which in turn has a label.

Makes sense, right? Welcome to the world of ontology!

The shape of names evolved over time. The concept goes way back- the proto-indo-european word for name (which had its origins in the Crescent Valley) was nomen (nuh-min) , and outside of that family tree, the Chinese root for name is ming, which many linguists would recognize as a cognate form of nomen meaning that people have been using names for at least six thousand years, and possibly for far longer.

The first names were likely given names, and were in essence “gifted” names: the name bestowed by others (typically the parent) to signify a given aspiration – such as Grace, Hope, Luke (shining) – or a beseechment or dedication to a deity, such as Mark (Mars-like or martial) or Michael (gift of God) or Gabriel (strength of God), with the suffix “el” in the latter two cases meaning Lord or Ruler (from the Sumerian and Phoenician Ba’el, reflected in the name Al-lah in Arabic and Muslim cultures).

Women’s names were typically diminutives of men’s names, where a diminutive was a shortened or “softened” form of a man’s name that often stemmed from the roots for small, such as Gabriella or Marcia, softened forms of Gabriel and Mark respectively. They were also given names that reflected beauty, such as plant names (e.g., Holly, Ivy, or Lily), or gem names (Ruby, Pearl). Occasionally male names in different languages than the naming language would become feminized variants, such as the French Jean (John, in English) becoming the feminine form of John in England. In general, there are many more variants of female names than male one.

Within family groups, this differentiation was sufficient to ensure uniqueness most of the time, though in small groups you might have adjectives that qualify these names – Big John, Tall John, Red John, and so forth. In some cases, especially among rulers, these qualifiers became parts of their name – Charlemagne was, as an example, Charles the Great. The word nickname, by the way, has nothing to do with the devil (Old Nick) but instead started out as ekename in Old English, where eke meant “also” or “alternative”. As eke fell out of usage in comparison to also in OE, eke became nekename, with the middle syllable eventually lost to become nickname. Alternative names, synonyms, or aliases, tend to be weaker because they generally have weaker authority (a lesson that ontologists should pay especially close attention to).

Once cultures reached a certain size, given names were no longer adequate to fully differentiate members of that population. One solution to this, seen especially in northern cultures, was to use familial relationships: John, Son of James (John Jameson), was different from John, Son of John (John Johnson). Admittedly, this made more sense in villages where people knew one another’s families reasonably well. but it also accounts for the reason that Johnson is one of the most common surnames in regions with strong Nordic roots. In other places (especially in England and Germany) profession names were used to differentiate family lines – Smith, Sawyer (a person who used saws to cut down trees, or a lumberjack), Miller, Tinker (a tin smith), Carpenter, and so forth often uniquely identified a person in that profession, and as family trades were frequently handed down, so too were the differentiating surnames.

Finally, family names also tended to echo prominent place features – Lake, Brook, Craig (a mountain), Fields, etc. – associated with the family (this was especially true of nobles). This was especially true of nobles and other officals, who often took the name of a given property or city that they had dominion over, though the use of originating cities or regions as qualifiers also goes way back.

The use of both a given name and a family or surname almost invariably was tied into tax collection. For instance, after the invasion of England by Willelm of Normandy (a.k.a,. William the Conqueror) in 1066, one of the first orders of business was to identify the wealthy people and assets in the country, in a survey called the Domesday Book. These tax records served to freeze what had been until that time colloquial names (such as the use of professional names such as Smith or Miller as differentiators), while also formalizing “House Names” such as the Houses of York or Lancaster (lampshaded in George R.R. Martin’s Game of Thrones series as House Stark and House Lanister respectively).

It’s worth noting that taxonomists and ontologies refer to the given + family or sur-names as qualified names; the surname qualifies the given (or local) name. In a more formal code standpoint, the qualified name acts as a namespace for the terms (names) within that space, and the qualifier typically denotes set or class membership. Such a system dramatically reduces the likelihood that a name may refer to more than one person. As such, it is a mechanism for determining uniqueness in a broader set.

Note that beyond the emergence of given and surnames, there are other qualifiers that can differentiate a name, such as patronymics (senior, junior, the third, elder, younger, etc.), and honorifics that ironically also qualify a person by profession or distinction (sir, which is a contraction of Senior, doctor, reverend, etc.) as well as gender identifiers, up to and including the latest fashion of specifying pronouns for address purposes.

Western European styles also reflect a cultural preference for putting the given name first in narrative prose, though in legal contracts and other communications, the reverse order of surname and given name, separated by a comma is frequently used to facilitate sorting by family name. Asian countries,, on the other hand (with notable exceptions including Thailand and the Philippines), always use the qualifying (sur) name first. As such, it is typical to store a common usage name in the Western-style while also storing given names and surnames separately in order to facilitate sorting using either convention.

Cardinality and Reification

It is dangerous to assume that there is always a one-to-one correspondence between an individual and a name. Indeed, for fifty percent of the population, it is likely that their name will change at least once in their lifetime. That segment, of course, is women. Until comparatively recently (the 1960s in the United States) if a woman married, she was expected to take the surname of her husband. The feminist movement started changing that, in part as a reflection of shifting expectations about property ownership, taxation, and a weakening of the ecclesiastical view of marriage and divorce. While still a fairly low percentage, women in more marriages than ever are choosing to keep their “maiden names” if they marry, or both partners (especially in same-sex relationships) are choosing to create hyphenated surnames that differ from their pre-marriage surnames.

Nonetheless, in modeling individuals, the assumption should be that surnames especially will change over time, and given names may very well change too. Once again, gender plays a role. A person may very well either physically change their sex through surgery or may at least publicly present themselves as the opposite gender, with names reflecting this event.

It’s worth noting that there are always political dimensions when it comes to data modeling, and nowhere is that as intense as with identity modeling. Any modeling involves making certain assumptions, assumptions that are often informed by cultural norms and expectations. We are now entering an era where identity is fluid: it changes over time based upon gender intent, relational status, professional appelation (The Artist Formerly Known as Prince) and even social context. For instance, you are increasingly seeing gender pronoun preferences (he,him,his;she,her,her;ze,zir,zis) in social media.

Yet at the same time this adds to the complexity of the model. From a semantics perspective, this recreates a structure that occurs whenever you have temporal evolution, what I’d call the now-then pattern.

The now part of the pattern is an assertion that, at the time the assertion is made, is true:

Her name is Jane Doe

The then part of the pattern, on the other hand, is a set of assertions that specify a range (possibly open-ended) identifying an event or state:

This is an event.
This event refers to a property called name.
The value of this property is Jane Doe.
This event began on March 16, 1993.
This event ended on June 17, 2021.
This event was reported by Kurt Cagle.

This second structure is known in semantic circles as an example of reification, meaning that the second set of assumptions describes a single relationship. The this in this case is in fact the statement Her name is Jane Doe. For those familiar with SQL, reification typically describes 3rd Normal Forms (or 3NF).

In more abstract terms, the initial statement can be broken down as:

r = {s->[p]->o}

where q is a reference to a subject entity, p is a reference to a relationship or property, and o is a reference to an object or value relative to that relationship. The reification is then a set of other relationships that refer to the given assertion or statement q:

r is a reification.
r has property p.
r has subject s.
r has object o.
r starts at time t1
r optionally ends at time t2.
r was reported by m.

The reification is significant because it specifies the time to live of a given relationship between two things. Reifications can also hold other metadata (for instance, specifying a pronoun type indicating preferred gender designation). However, it’s also worth noting that you can have a great deal of information within a reification, but that also adds significantly to the number of assertions (triples) bound to that reification.

In terms of a graph, a reification is in fact the metadata associated with the information about an edge, when given two objects. For instance, if s is an airport, o is also an airport, and p is an indication that a route exists between s and o, then r:{s->[p] ->o} is in fact the route between ?s and ?o:

airport:_SEA airport:hasRoute airport:_DEN (Seattle has a route to Denver).

The route is in effect a reification (especially as routes, which are largely ephemeral and abstract entities, change far more quickly than airports do).

The route can assign a mean travel time as a property on the reification. This is, effectively, contextual information, information that belongs not to either airport but rather to the relationship that exists between the two.

With regard to names, this introduces some interesting modeling issues. A personal name goes from being a simple label to being something with a structure, a presence, and a role or type. More on that in a bit, but before digging into the weeds, its time to emphasize an important point here:

Reifications are almost invariably trade-offs between the need to deal with transients and the complexity of combinatorics. In the case of names, for instance, a given individual may have multiple names, though some may be birth names, some nicknames, some professional names, and some due to change in marital status or presentation status. A person may even have multiple names simultaneously. Names are, of course, not necessarily unique, but they still serve as one of the most commonly used identifiers for people, and for this reason as much as any other, this kind of reification makes sense.

Modeling Names (and a Sneak Peak of Templeton)

Given all of this, what would the best model for names look like? The now-then pattern suggests a two pronged approach: first, model what a Personal Name should look like, then, from the set of all such names for the individual, choose the primary name for that person from the set, the name that is currently used to best represent that individual.

The following example is in what I’m calling Templeton (short for RDF Template Notation).

?PersonalName a Class:_PersonalName;

PersonalName:hasType ?PersonalNameType;

PersonalName:hasFullName ?fullName;

PersonalName:hasGivenName ?givenName; #+

PersonalName:hasSecondaryName; #*

PersonalName:hasSignatoryName; #? ## Name used on a legal document

PersonalName:hasFamilyName ?familyName; #*

PersonalName:hasFamilySortName ?sortName; #? ## For convenient sorting

PersonalName:hasHonorific ?honorofic; #* ## Mr., Ms., Dr., etc.

PersonalName:hasPatronymic ?patronyic; #* ## Sr, Jr, III

PersonalName:hasDistinction ?distinction; #* ## PhD, JD

PersonalName:hasNominativePronoun ?nominativePronoun; #? ## he, she, ze

PersonalName:hasPosessivePronoun ?possessivePronoun; #? ## his,hers,zes

PersonalName:hasObjectivePronoun ?objectivePronoun; #? ## him,her,zem

PersonalName:hasStartDate ?startDate; #? xsd:date

PersonalName:hasEndDate ?endDate; #? xsd:date

PersonalName:hasLanguage; #? ## indicates the language code of the name (en,de,es,cn, etc.)

PersonalName:hasFullName a Class:_Property;

rdfs:subPropertyOf rdfs:label

?Person a Class:_Person;

Person:hasPrimaryPersonalName ?PersonalName;

Person:hasPersonalName ?PersonalName; #+

Person:hasPrimaryNameString ?fullName;

?PersonalNameType a Class:_PersonalNameType.

PersonalNameType:_BirthName,

PersonalNameType:_AdoptedName,

PersonalNameType:_LegalChangedName,

PersonalNameType:_ProfessionalName,

PersonalNameType:_MarriedName,

PersonalNameType:_LegalAlias,

PersonalNameType:_IllegalAlias,

PersonalNameType:_NickName,

]% a Class:_PersonalNameType.

First a few words about the notation. The core of it (just as with SPARQL) is Turtle as a way of describing assertions (triples here). Variable names (beginning with a question mark) provide a label, and in some cases (such as ?fullName) a value used in multiple assertion templates. If a line is indented (and the preceding line ends with a semicolon) then the un-indented first term remains in force. For instance,

?PersonalName a Class:_PersonalName;

PersonalName:hasType ?PersonalNameType;

is short for

?PersonalName a Class:_PersonalName;

?PersonalName PersonalName:hasType ?PersonalNameType;

The hash mark (#) is a comment, but in the template it’s used to signal cardinality. Thus #* indicates that the previous assertion may be repeated zero or more times, #+ indicates a one-or-more repetition, and #? indicates an optional assertion. If a variable starts with an uppercase letter, it indicates an IRI (or reference pointer), if it indicates a lowercase letter, though, then the value is an atomic value, defaulting to a string. Thus,

?PersonalName personalName:hasStartDate ?startDate; #? xsd:date

indicates that ?startDate in this particular case is a date.

The notation

%[a,b,c,…]% a class:PersonalNameType.

indicates that the list of items are each subjects to the associated predicate and object, and is very useful for specifying type enumerations. Finally the single a is a shorthand for rdf:type.

Note: Templeton is a shorthand templating notation I’ve been developing as a way of creating schemas that can be expanded to OWL, SHACL, XML Schema, or JSON-Schema. I’m working on a parser for it now.

Of Compositions, Associations, and the Now/Then Pattern.

The modeling of PersonalName should seem straightforward, with a few caveats. First, it has been my observation working with dozens of ontologies over the years that almost every time you define a class, there is usually some kind of intent indicator needed. Such indicators do not materially change the definition of the class, but they do provide a level of context about what a particular instance is intended to do. For instance, PersonalNameType identifies whether something is a birth name, a married name, an alias, or a professional name (among others) These are differentiated from being subclasses because they do not change any other properties.

The second caveat has to do with modeling. UML differentiates between a composition and an association. An association typically describes a relationship between two disparate entities, and in the semantic parlance could be considered the same as a reification (or third normal form construction in SQL) . A composition, on the other hand, occurs when there is an existential dependency between the subject and object. For instance, even if you have two people who have the same personal name, these two instances are distinctive (having different start and end dates, for instance). Should a person be deleted from the database, all of the names associated with that person would also need to be deleted (which is not true for associations).

In my own modeling, compositions should always belong to the reference subject, or, put another way, the relationship points from the subject to the object semantically. Associations, on the other hand, generally are reifications – there is a reifying object such as the route in our airport example, that binds two entities together. If you delete the reification (the route, here), you don’t in this case delete the associated entities (the airports),

There are some objects that seem to skirt the boundaries. An address is a good example. If a person has an associated address, a naive modeling would make an address a composition. However, it’s not. Multiple people can live at the same address. If one person moves away, that does not cause the address itself to “disappear”. This also means that the association of a person with an address should be seen as being a reification. I use the term Habitation as the class for that reification, one that points to both a person and an address:

?Habitation a Class:_Habitation;

Habitation:hasType ?HabitationType;

Habitation:hasTenant ?Person;

Habitation:hasAddress ?Address;

Habitation:hasStartDate ?startDate;

Habitation:hasEndDate ?endDate; #?

Regardless of whether something is a composition or an association, there are times where you just want to know what a person’s current primary name is, without having to build complex queries to find it. This is where inferred triples come into play. An inferred triple is typically generated, either through a SPARQL Update query or as part of a CONSTRUCT (these are more or less the same, depending upon how inferred triples are persisted).

For instance, the following SPARQL Query will change the primary name for a person to the specified value:

# Update Primary Name

delete {

?Person Person:hasPrimaryName ?oldPrimaryName;

Person:hasPrimaryNameString ?oldFullName.

}

insert {

?Person Person:hasPrimaryName ?newPrimaryName;

Person:hasPrimaryNameString ?newFullName.

}

where {

values (?Person ?newPrimaryName) {(Person:_JaneDoe PersonName:_JaneDoeBirthName)}

?Person Person:hasPrimaryName ?oldPrimaryName

?Person Person:hasPrimaryNameString ?oldFullName.

?newPrimaryName PersonName:hasFullName ?newFullName.

}

Inferred triples are frequently transitory assertions – they reflect the default value from a set of objects, but that can change, and frequently they provide a way of shortcircuiting complex queries. For instance Person:hasPrimaryNameString is the string representation of the default personal name, This can be made even more powerful by making that particular property the subproperty of something like skos:prefLabel (assuming a basic inference engine), so that a naive query, such as:

select ?s ?name where {

?s skos:prefLabel ?name.

filter (contains(?name,’Jane Doe’))

}

will return a list of all entities which have a primary label of “Jane Doe” in them. Note that this isn’t a terribly efficient query, but it can be handy, nonetheless.

So when you’re thinking about the design of your models, identify those properties that you’d intuitively want to see for the classes in question that can be inferred or derived, and in effect pre-generate or update these properties as the state of the object changes so that your users don’t have to build complex queries. Remember, a triple store is an index, and such actions can be thought of as optimizing that index.

Summary

Modeling, when it comes right down to it, is the process of questioning your assumptions and optimizations. A big issue that arises with most traditional SQL systems is that many database modelers optimize for complexity by reducing the number of database tables and joins, but this also reduces the contextual metadata that is increasingly a requirement in today’s data rich world.

Source Prolead brokers usa

Goldman 0 Comments

Jul 6 2021

An Overview of Logistic Regression Analysis

An Intuitive study of Logistic Regression Analysis

Logistic regression is a statistical technique to find the association between the categorical dependent (response) variable and one or more categorical or continuous independent (explanatory) variable.

We can define the regression model as,

G(probability of event)=β₀+β₁x₁+β₂x₂+…+β_kx_k

We determine G using link function as following,

Y={1 ; β₀+β₁x₁+ϵ>0

{0 ; else

There are three types of link fuction. They are,

Logit
Normit (probit)
Gombit

Why we use logistic regression?

We use it when there exists,

One Categorical response variable
One or more explanatory variable.
No linear relationship between dependent and independent variables.

Assumptions of Logistic Regression

The dependent variable should be categorical (binary, ordinal, nominal or count occurrences).
The predictor or independent variable should be continuous or categorical.
The correlation among the predictors or independent variable (multi-collinearity) should not be severe but there exists linearity of independent variables and log odds.
The data should be the representative part of population and record the data in the order its collected.
The model should provide a good fit of the data.

Logistic regression vs Linear regression

In the case of Linear Regression, the outcome is continuous while in the case of logistic regression outcome is discrete (not continuous)
To perform linear regression, we require a linear relationship between the dependent and independent variables. But to perform Logit we do not require a linear relationship between the dependent and independent variables.
Linear Regression is all about fitting a straight line in the data while Logit is about fitting a curve to the data.
Linear Regression is a regression algorithm for Machine Learning while Logit is a classification Algorithm for machine learning.
Linear regression assumes Gaussian (or normal) distribution of the dependent variable. Logit assumes the binomial distribution of the dependent variable.

*Logit=logistic regression

Types

There are four types of logistic regression. They are,

Binary logistic: When the dependent variable has two categories and the characteristics are at two levels such as yes or no, pass or fail, high or low etc. then the regression is called binary logistic regression.
Ordinal logistic: When the dependent variable has three categories and the characteristics are at natural ordering of the levels such as survey results (disagree, neutral, agree) then the regression is called ordinal logistic regression.
Nominal logistic: When the dependent variable has three or more categories but the characteristics are not at natural ordering of the levels such as colors (red, blue, green) then the regression is called nominal logistic.
Poisson logistic: When the dependent variable has three or more categories but the characteristics are the number of time of an event occurs such as 0, 1, 2, 3, …, etc. then the regression is called Poisson logistic regression.

Source…

Source Prolead brokers usa

Goldman 0 Comments

Jul 5 2021

Defining “Value” – the Key to AI Success

I recently conducted a 3-day, remote “Data Monetization: Thinking Like a Data Scientist” workshop for a transportation agency in the Middle East. Doing this training remotely is a personal challenge as I miss the face-to-face interaction in ideating, validating, and prioritizing the business areas that can benefit from data and analytics. However, conducting the workshop remotely did provide some valuable learnings for me.

One learning was my “Thinking Like a Data Scientist” visual was outdated (Figure 1).

Figure 1: Original “Thinking Like a Data Scientist” (TLADS) Visual

Figure 1 portrayed the “Thinking Like a Data Scientist” (TLADS) process as a linear process, where you would complete one step and then cleanly move onto the next step. But in reality, the process is highly iterative where it is common for learnings from one step to impact an earlier step such as refining the KPIs against which the targeted business initiative’s progress and success will be measured. So, I created an updated “Thinking Like a Data Scientist” visual in Figure 2 to reflect the highly iterative nature of the TLADS process.

Figure 2: Updated “Thinking Like a Data Scientist” Visual

One other learning from the workshop was the need to spend more time thoroughly understanding and defining the “value” that the business initiative sought to create. And that’s where things get tricky. Too many organizations limit how they define and measure “value”. And defining a robust and diverse set of KPIs and metrics against which to measure business initiative progress and success becomes critically important as we apply AI to continuously-optimize the business initiative.

To understand the AI “value” quandary, one must first understand how an AI model works:

The AI model interacts with its environment to gain feedback in order to continuously learn and adapt its performance (using backpropagation and stochastic gradient descent).
AI model’s continuously learn and adapt process is guided by the AI Utility Function, which are the metrics and KPIs that define AI model progress and success.
The AI model seeks to continuously make the “right” decisions, as framed by the AI Utility Function, as the AI model continuously interacts with its environment.
In order to create an AI model that makes the “right” decisions, the AI Utility Function must be comprised of a holistic and robust definition of “value” including financial, economic, operational, customer, society, environmental, and maybe even spiritual.

Bottom-line: the AI model determines “right and wrong” actions based upon the definition of “value” as articulated in the AI Utility Function (Figure 3).

Figure 3: AI Rational Agent Makes the “Right” Decisions based on AI Utility Function

Consequently, we must invest the effort across a diverse set of stakeholders to thoroughly explore and validate a diverse set of metrics and KPIs against which the AI model is seeking to optimize. And that’s exactly one of the key objectives of the TLADS process.

If we want AI to work for us humans (versus us humans working for AI), then we must thoroughly define “value” before we start building our AI ML models. Consequently, I expanded the TLADS process to drive a more thorough exploration and definition of “value” (Figure 4).

Figure 4: How Does Your Organization Define “Value”

A more holistic suite of “value” dimensions that today’s organizations need to consider are represented in red in Figure 4. To support the ideation around the exploration of these expanded value dimensions, I updated TLADS Step #1 and Template #1 (Figure 5) to include:

What is the targeted Business Initiative? A clear statement about what the business initiative is trying to accomplishment.
What are the KPIs or metrics against which business initiative progress and success will be measured? There should be at least 6 – 8 KPIs and metrics against which the organization is measuring the progress and success of the targeted business initiative.
What are the Ideal Outcomes from this business initiative? This is a “future visioning” exercise to envision what successful execution of the business initiative looks like.
What are the Benefits from the business initiative from the value perspectives of financial, customer, product, operational, environmental, societal? There should be at least 6 to 8 benefits across the broader dimensions that define value.
What are the Potential Impediments to successful execution of the business initiative? There should be at least 6 to 8 potential impediments across technology, data, skills, personnel, competitive, market, and organizational factors.
What are the Ramifications of the Failure of this business initiative? This one is the most fun because it gives everyone a chance to envision and explore all the different ways where things can go wrong.

Note: capturing a robust set of KPIs, benefits and impediments should not be difficult if 1) you have a diverse group of stakeholders participating in the brainstorming process and 2) you ensure that everyone has an equal voice in the ideation process.

See Figure 5 for an updated Template #1 using my traditional Chipotle example, where the items in red are related to the expanded definition of “value” for Chipotle.

Figure 5: Updated Template 1 of the “Thinking Like a Data Scientist” methodology

By the way, how your organization defines “value” probably says more about your organization than whatever your charter and mission statement states. Or said another way:

You are what you measure, and you measure what you reward

Yea, you may say that your organization’s charter is such and such, but your organization’s charter is actually defined by the metrics and KPIs against which you measure (and reward) business success. Period.

Value definition is critical from an AI execution perspective. Unfortunately, in a rush to get to the fun part of the AI job and start playing with the AI algorithms, organizations sometimes shortchange the upfront work in thoroughly and holistically defining the value (metrics and KPIs) against which business initiative progress and success will be measured.

Organizations must be thoughtful and thorough in how it defines the values against which the operations of the business will be measured. Getting those “values” wrong can lead to unintended, biased, and disastrous consequences in your AI models (check out Terminators, VIKI, and ARIIA…your homework assignment for my next blog).

Source Prolead brokers usa

Goldman 0 Comments

Jul 5 2021

What Skills Does an IT Business Analyst Need?

The success of an IT project largely depends on a Business Analyst – the intermediary between IT processes and a business. Thanks to the Business Analyst, products of the required quality prosper on the market. We’ll tell you what skills this specialist should have so that the above is true.

The Business Analyst’s mission on a project

The Business Analyst analyzes future products to figure out what needs to be improved so that the development is as useful to consumers and as profitable to the customer as possible. The Business Analyst performs the following tasks at different stages of the software development life cycle (SDLC):

studies the market and competitors to improve the product’s functionality if possible,
communicates with customers to collect and document product requirements,
approves the requirements with stakeholders,
advises the teams on the product, and more.

To summarize, the Business Analyst provides the teams with high-quality requirements, strives to avoid the development of useless features, and maximizes business value.

What skills the Business Analyst needs

The following six skills help the Business Analyst ensure that the project is completed to good quality, on time, and within budget.

Technical skills.

IT Business Analysts should possess the following knowledge:

understand complex technologies and terms and know how to use different tools,
have a good understanding of the Big Data concept and ways to obtain information to analyze a business,
have an idea of software architecture, understand the basics of testing and programming, and know fundamental SQL queries.

This knowledge allows Business Analysts to elaborate development plans and strategies for improving the product at all stages of SDLC.

Research skills.

Every project begins with a request from a customer. The Business Analyst is to conduct research of the customer’s business, identify problems or opportunities, and recommend a solution. In the course of their work, Business Analysts study the market and competitors, estimate possible benefits for the business, and suggest the best way to reach the customer’s goals.

Analytical skills.

The Business Analyst has to study lots of information: statistics, requirements, documentation, market conditions, and so on. The wealth of information that Business Analysts obtain after the research is completed needs to be analyzed. This allows them to estimate risks, forecast success, and choose the best solution for the business.

Communication skills.

Since Business Analysts are intermediaries between customers and development teams, they have extensive constant communication with both these parties.

The Business Analyst receives requirements forming the basis of a project through communication with the customer. The documentation that Business Analysts create must be clear, consistent, and without any ambiguity, as the product development is based on it.

As they know all the nuances of the project, Business Analysts also advise other employees. They receive feedback in the course of development and modify the product creation plan.

Leadership skills.

Business Analysts’ work is tied with management skills because they eliminate problems such as project delays and are responsible for the project results. All SDLC participants go to the Business Analyst with development-related issues as they are an authoritative source of knowledge. After all, it is hard to negotiate with a customer if you don’t have leadership skills.

Negotiation skills.

Negotiation and persuasion skills differ from the ability to simply communicate with teams. The Business Analyst interacts with managers of different levels and convinces them that some or other decision is correct. If the customer wants to have certain features in the app but the Business Analysts see a better option, they must prove their point and strike a balance between customer desires and business needs.

To sum up, we can say that competent Business Analysts balance expertise with interpersonal skills. These specialists combine technical and non-technical competence to ensure the competitive edge of products, which is needed in a world of rapid business development.

Source Prolead brokers usa

Goldman 0 Comments

Jul 4 2021

Fine-Tuning Transformer Model for Invoice Recognition

A step-by-step guide from annotation to training

Photo by Andrey Popov from Dreamstime

Introduction

Building on my recent tutorial on how to annotate PDFs and scanned images for NLP applications, we will attempt to fine-tune the recently released Microsoft’s Layout LM model on an annotated custom dataset that includes French and English invoices. While the previous tutorials focused on using the publicly available FUNSD dataset to fine-tune the model, here we will show the entire process starting from annotation and pre-processing to training and inference.

LayoutLM Model

The LayoutLM model is based on BERT architecture but with two additional types of input embeddings. The first is a 2-D position embedding that denotes the relative position of a token within a document, and the second is an image embedding for scanned token images within a document. This model achieved new state-of-the-art results in several downstream tasks, including form understanding (from 70.72 to 79.27), receipt understanding (from 94.02 to 95.24), and document image classification (from 93.07 to 94.42). For more information, refer to the original article.

Thankfully, the model was open sourced and made available in huggingface library. Thanks, Microsoft!

For this tutorial, we will clone the model directly from the huggingface library and fine-tune it on our own dataset. But first, we need to create the training data.

Invoice Annotation

Using the UBIAI text annotation tool, I have annotated around 50 personal invoices. I am interested to extract both the keys and values of the entities; for example in the following text “Date: 06/12/2021” we would annotate “Date” as DATE_ID and “06/12/2021” as DATE. Extracting both the keys and values will help us correlate the numerical values to their attributes. Here are all the entities that have been annotated:

DATE_ID, DATE, INVOICE_ID, INVOICE_NUMBER,SELLER_ID, SELLER, MONTANT_HT_ID, MONTANT_HT, TVA_ID, TVA, TTC_ID, TTC

Here are a few entity definitions:

MONTANT_HT: Total price pre-tax

TTC: Total price with tax

TVA: Tax amount

Below is an example of an annotated invoice using UBIAI:

Image by author: Annotated invoice

After annotation, we export the train and test files from UBIAI directly in the correct format without any pre-processing step. The export will include three files for each training and test datasets and one text file containing all the labels named labels.txt:

Train/Test.txt

2018 O
Sous-total O
en O
EUR O
3,20 O
€ O
TVA S-TVA_ID
(0%) O
0,00 € S-TVA
Total B-TTC_ID
en I-TTC_ID
EUR E-TTC_ID
3,20 S-TTC
€ O
Services O
soumis O
au O
mécanisme O
d'autoliquidation O
- O

Train/Test_box.txt (contain bounding box for each token):

€ 912 457 920 466
Services 80 486 133 495
soumis 136 487 182 495
au 185 488 200 495
mécanisme 204 486 276 495
d'autoliquidation 279 486 381 497
- 383 490 388 492

Train/Test_image.txt (contain bounding box, document size, and name):

€ 912 425 920 434 1653 2339 image1.jpg
TVA 500 441 526 449 1653 2339 image1.jpg
(0%) 529 441 557 451 1653 2339 image1.jpg
0,00 € 882 441 920 451 1653 2339 image1.jpg
Total 500 457 531 466 1653 2339 image1.jpg
en 534 459 549 466 1653 2339 image1.jpg
EUR 553 457 578 466 1653 2339 image1.jpg
3,20 882 457 911 467 1653 2339 image1.jpg
€ 912 457 920 466 1653 2339 image1.jpg
Services 80 486 133 495 1653 2339 image1.jpg
soumis 136 487 182 495 1653 2339 image1.jpg
au 185 488 200 495 1653 2339 image1.jpg
mécanisme 204 486 276 495 1653 2339 image1.jpg
d'autoliquidation 279 486 381 497 1653 2339 image1.jpg
- 383 490 388 492 1653 2339 image1.jpg

labels.txt:

B-DATE_ID
B-INVOICE_ID
B-INVOICE_NUMBER
B-MONTANT_HT
B-MONTANT_HT_ID
B-SELLER
B-TTC
B-DATE
B-TTC_ID
B-TVA
B-TVA_ID
E-DATE_ID
E-DATE
E-INVOICE_ID
E-INVOICE_NUMBER
E-MONTANT_HT
E-MONTANT_HT_ID
E-SELLER
E-TTC
E-TTC_ID
E-TVA
E-TVA_ID
I-DATE_ID
I-DATE
I-SELLER
I-INVOICE_ID
I-MONTANT_HT_ID
I-TTC
I-TTC_ID
I-TVA_ID
O
S-DATE_ID
S-DATE
S-INVOICE_ID
S-INVOICE_NUMBER
S-MONTANT_HT_ID
S-MONTANT_HT
S-SELLER
S-TTC
S-TTC_ID
S-TVA
S-TVA_ID

Fine-Tuning LayoutLM Model:

Here, we use google colab with GPU to fine-tune the model. The code below is based on the original layoutLM paper and this tutorial .

First, install the layoutLM package…

! rm -r unilm

! git clone -b remove_torch_save https://github.com/NielsRogge/unilm.git

! cd unilm/layoutlm

! pip install unilm/layoutlm

…as well as the transformer package from where the model will be downloaded:

! rm -r transformers

! git clone https://github.com/huggingface/transformers.git

! cd transformers

! pip install ./transformers

Next, create a list containing the unique labels from labels.txt:

from torch.nn import CrossEntropyLoss

def get_labels(path):
with open(path, "r") as f:
labels = f.read().splitlines()
if "O" not in labels:
labels = ["O"] + labels
return labels

labels = get_labels("./labels.txt")
num_labels = len(labels)
label_map = {i: label for i, label in enumerate(labels)}
pad_token_label_id = CrossEntropyLoss().ignore_index

Then, create a pytorch dataset and dataloader:

from transformers import LayoutLMTokenizer
from layoutlm.data.funsd import FunsdDataset, InputFeatures
from torch.utils.data import DataLoader, RandomSampler, SequentialSampler

args = {'local_rank': -1,
'overwrite_cache': True,
'data_dir': '/content/data',
'model_name_or_path':'microsoft/layoutlm-base-uncased',
'max_seq_length': 512,
'model_type': 'layoutlm',}

# class to turn the keys of a dict into attributes
class AttrDict(dict):
def __init__(self, *args, **kwargs):
super(AttrDict, self).__init__(*args, **kwargs)
self.__dict__ = self

args = AttrDict(args)

tokenizer = LayoutLMTokenizer.from_pretrained("microsoft/layoutlm-base-uncased")

# the LayoutLM authors already defined a specific FunsdDataset, so we are going to use this here
train_dataset = FunsdDataset(args, tokenizer, labels, pad_token_label_id, mode="train")
train_sampler = RandomSampler(train_dataset)
train_dataloader = DataLoader(train_dataset,
sampler=train_sampler,
batch_size=2)

eval_dataset = FunsdDataset(args, tokenizer, labels, pad_token_label_id, mode="test")
eval_sampler = SequentialSampler(eval_dataset)
eval_dataloader = DataLoader(eval_dataset,
sampler=eval_sampler,
batch_size=2)

batch = next(iter(train_dataloader))

input_ids = batch[0][0]

tokenizer.decode(input_ids)

Load the model from huggingface. This will be fine-tuned on the dataset.

from transformers import LayoutLMForTokenClassification
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = LayoutLMForTokenClassification.from_pretrained("microsoft/layoutlm-base-uncased", num_labels=num_labels)
model.to(device)

Finally, start the training:

from transformers import AdamW
from tqdm import tqdm

optimizer = AdamW(model.parameters(), lr=5e-5)

global_step = 0
num_train_epochs = 50
t_total = len(train_dataloader) * num_train_epochs # total number of training steps

#put the model in training mode
model.train()
for epoch in range(num_train_epochs):
for batch in tqdm(train_dataloader, desc="Training"):
input_ids = batch[0].to(device)
bbox = batch[4].to(device)
attention_mask = batch[1].to(device)
token_type_ids = batch[2].to(device)
labels = batch[3].to(device)

# forward pass
outputs = model(input_ids=input_ids, bbox=bbox, attention_mask=attention_mask, token_type_ids=token_type_ids,
labels=labels)
loss = outputs.loss

# print loss every 100 steps
if global_step % 100 == 0:
print(f"Loss after {global_step} steps: {loss.item()}")

# backward pass to get the gradients 
loss.backward()

#print("Gradients on classification head:")
#print(model.classifier.weight.grad[6,:].sum())

# update
optimizer.step()
optimizer.zero_grad()
global_step += 1

You should be able to see the training progress and the loss getting updated.

Image by author: Layout LM training in progress

After training, evaluate the model performance with the following function:

import numpy as np
from seqeval.metrics import (
classification_report,
f1_score,
precision_score,
recall_score,
)

eval_loss = 0.0
nb_eval_steps = 0
preds = None
out_label_ids = None

# put model in evaluation mode
model.eval()
for batch in tqdm(eval_dataloader, desc="Evaluating"):
with torch.no_grad():
input_ids = batch[0].to(device)
bbox = batch[4].to(device)
attention_mask = batch[1].to(device)
token_type_ids = batch[2].to(device)
labels = batch[3].to(device)

# forward pass
outputs = model(input_ids=input_ids, bbox=bbox, attention_mask=attention_mask, token_type_ids=token_type_ids,
labels=labels)
# get the loss and logits
tmp_eval_loss = outputs.loss
logits = outputs.logits

eval_loss += tmp_eval_loss.item()
nb_eval_steps += 1

# compute the predictions
if preds is None:
preds = logits.detach().cpu().numpy()
out_label_ids = labels.detach().cpu().numpy()
else:
preds = np.append(preds, logits.detach().cpu().numpy(), axis=0)
out_label_ids = np.append(
out_label_ids, labels.detach().cpu().numpy(), axis=0
)

# compute average evaluation loss
eval_loss = eval_loss / nb_eval_steps
preds = np.argmax(preds, axis=2)

out_label_list = [[] for _ in range(out_label_ids.shape[0])]
preds_list = [[] for _ in range(out_label_ids.shape[0])]

for i in range(out_label_ids.shape[0]):
for j in range(out_label_ids.shape[1]):
if out_label_ids[i, j] != pad_token_label_id:
out_label_list[i].append(label_map[out_label_ids[i][j]])
preds_list[i].append(label_map[preds[i][j]])

results = {
"loss": eval_loss,
"precision": precision_score(out_label_list, preds_list),
"recall": recall_score(out_label_list, preds_list),
"f1": f1_score(out_label_list, preds_list),
}

With only 50 documents, we get the following scores:

Image by author: Evaluation score after training

With more annotations, we should certainly get higher scores.

Finally, save the model for future prediction:

PATH='./drive/MyDrive/trained_layoutlm/layoutlm_UBIAI.pt'

torch.save(model.state_dict(), PATH)

Inference:

Now comes the fun part, let’s upload an invoice, OCR it, and extract relevant entities. For this test, we are using an invoice that was not in the training or test dataset. To parse the text from the invoice, we use the open source Tesseract package. Let’s install the package:

!sudo apt install tesseract-ocr

!pip install pytesseract

Before running predictions, we need to parse the text from the image and pre-process the tokens and bounding boxes into features. To do so, I have created a preprocess python file layoutLM_preprocess.py that will make it easier to preprocess the image:

import sys
sys.path.insert(1, './drive/MyDrive/UBIAI_layoutlm')
from layoutlm_preprocess import *

image_path='./content/invoice_test.jpg'

image, words, boxes, actual_boxes = preprocess(image_path)

Next, load the model and get word predictions with their bounding boxes:

model_path='./drive/MyDrive/trained_layoutlm/layoutlm_UBIAI.pt'

model=model_load(model_path,num_labels)

word_level_predictions, final_boxes=convert_to_features(image, words, boxes, actual_boxes, model)

Finally, display the image with the predicted entities and bounding boxes:

draw = ImageDraw.Draw(image)

font = ImageFont.load_default()

def iob_to_label(label):
if label != 'O':
return label[2:]
else:
return ""

label2color = {'data_id':'green','date':'green','invoice_id':'blue','invoice_number':'blue','montant_ht_id':'black','montant_ht':'black','seller_id':'red','seller':'red', 'ttc_id':'grey','ttc':'grey','':'violet', 'tva_id':'orange','tva':'orange'}

for prediction, box in zip(word_level_predictions, final_boxes):
predicted_label = iob_to_label(label_map[prediction]).lower()
draw.rectangle(box, outline=label2color[predicted_label])

 draw.text((box[0] + 10, box[1] - 10), text=predicted_label, fill=label2color[predicted_label], font=font)

image

Et voila:

Image by author: Predictions on a test invoice

While the model made few mistakes such as assigning the TTC label to a purchased item or not identifying some IDs, it was able to extract the seller, invoice number, date, and TTC correctly. The results are impressive and very promising given the low number of annotated documents (only 50)! With more annotated invoices, we will be able to reach higher F scores and more accurate predictions.

Conclusion:

Overall, the results from the LayoutLM model are very promising and demonstrate the usefulness of transformers in analyzing semi-structured text. The model can be fine-tuned on any other semi-structured documents such as driver licences, contracts, government documents, financial documents, etc.

If you have any question, don’t hesitate to ask below or send us an email at [email protected]

If you liked this article, please like and share!

Source Prolead brokers usa

Goldman 0 Comments

Jul 4 2021

A must have tool to analyse latest AI research papers fast

Links

Title

Description

Abstract

https://papers.labml.ai/paper/2105.04026

The Modern Mathematics of Deep Learning

We describe the new field of mathematical analysis of deep learning. This field emerged around a list of research questions that were not answered within the classical framework of learning theory. These questions concern: the outstanding generalization power of overparametrized neural networks, the role of depth in deep architectures, the apparent absence of the curse of dimensionality, the surprisingly successful optimization performance despite the non-convexity of the problem, understanding what features are learned, why deep architectures perform exceptionally well in physical problems, and which fine aspects of an architecture affect the behavior of a learning task in which way. We present an overview of modern approaches that yield partial answers to these questions. For selected approaches, we describe the main ideas in more detail.

https://papers.labml.ai/paper/2106.04554

A Survey of Transformers

Transformers have achieved great success in many artificial intelligence fields. Up to the present, a great variety of Transformer variants (a.k.a. X-formers) have been proposed. A systematic literature review on these Transformer variants is still missing.

Transformers have achieved great success in many artificial intelligence fields, such as natural language processing, computer vision, and audio processing. Therefore, it is natural to attract lots of interest from academic and industry researchers. Up to the present, a great variety of Transformer variants (a.k.a. X-formers) have been proposed, however, a systematic and comprehensive literature review on these Transformer variants is still missing. In this survey, we provide a comprehensive review of various X-formers. We first briefly introduce the vanilla Transformer and then propose a new taxonomy of X-formers. Next, we introduce the various X-formers from three perspectives: architectural modification, pre-training, and applications. Finally, we outline some potential directions for future research.

https://papers.labml.ai/paper/2106.06561

GANs N’ Roses: Stable, Controllable, Diverse Image to Image Translation (works for videos too!)

A map that takes a content code, derived from a face, and a randomly chosen style code to an anime image. The map is not just diverse, but also correctly represents the probability of an anime, conditioned on an input face.

We show how to learn a map that takes a content code, derived from a face image, and a randomly chosen style code to an anime image. We derive an adversarial loss from our simple and effective definitions of style and content. This adversarial loss guarantees the map is diverse — a very wide range of anime can be produced from a single content code. Under plausible assumptions, the map is not just diverse, but also correctly represents the probability of an anime, conditioned on an input face. In contrast, current multimodal generation procedures cannot capture the complex styles that appear in anime. Extensive quantitative experiments support the idea the map is correct. Extensive qualitative results show that the method can generate a much more diverse range of styles than SOTA comparisons. Finally, we show that our formalization of content and style allows us to perform video to video translation without ever training on videos.

https://papers.labml.ai/paper/2106.03253

Tabular Data: Deep Learning is Not All You Need

Several deep learning models for tabular data have been proposed. They claim to outperform XGBoost for some use-cases. We show that an ensemble of the deep models and X GBoost performs better on these datasets.

A key element of AutoML systems is setting the types of models that will be used for each type of task. For classification and regression problems with tabular data, the use of tree ensemble models (like XGBoost) is usually recommended. However, several deep learning models for tabular data have recently been proposed, claiming to outperform XGBoost for some use-cases. In this paper, we explore whether these deep models should be a recommended option for tabular data, by rigorously comparing the new deep models to XGBoost on a variety of datasets. In addition to systematically comparing their accuracy, we consider the tuning and computation they require. Our study shows that XGBoost outperforms these deep models across the datasets, including datasets used in the papers that proposed the deep models. We also demonstrate that XGBoost requires much less tuning. On the positive side, we show that an ensemble of the deep models and XGBoost performs better on these datasets than XGBoost alone.

https://papers.labml.ai/paper/2009.05673

Applications of Deep Neural Networks

Deep learning is a group of exciting new technologies for neural networks. It is now possible to create neural networks that can handle tabular data, images, text, and audio as both input and output. Readers will use the Python programming language to implement deep learning using Google TensorFlow and Keras

Deep learning is a group of exciting new technologies for neural networks. Through a combination of advanced training techniques and neural network architectural components, it is now possible to create neural networks that can handle tabular data, images, text, and audio as both input and output. Deep learning allows a neural network to learn hierarchies of information in a way that is like the function of the human brain. This course will introduce the student to classic neural network structures, Convolution Neural Networks (CNN), Long Short-Term Memory (LSTM), Gated Recurrent Neural Networks (GRU), General Adversarial Networks (GAN), and reinforcement learning. Application of these architectures to computer vision, time series, security, natural language processing (NLP), and data generation will be covered. High-Performance Computing (HPC) aspects will demonstrate how deep learning can be leveraged both on graphical processing units (GPUs), as well as grids. Focus is primarily upon the application of deep learning to problems, with some introduction to mathematical foundations. Readers will use the Python programming language to implement deep learning using Google TensorFlow and Keras. It is not necessary to know Python prior to this book; however, familiarity with at least one programming language is assumed.

https://papers.labml.ai/paper/2104.13478

Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

Deep learning can be used to solve complex problems. It can be applied to complex problems such as learning to fold a ball into a square. It is also a way to study the structure of the human brain.

The last decade has witnessed an experimental revolution in data science and machine learning, epitomised by deep learning methods. Indeed, many high-dimensional learning tasks previously thought to be beyond reach — such as computer vision, playing Go, or protein folding — are in fact feasible with appropriate computational scale. Remarkably, the essence of deep learning is built from two simple algorithmic principles: first, the notion of representation or feature learning, whereby adapted, often hierarchical, features capture the appropriate notion of regularity for each task, and second, learning by local gradient-descent type methods, typically implemented as backpropagation. While learning generic functions in high dimensions is a cursed estimation problem, most tasks of interest are not generic, and come with essential pre-defined regularities arising from the underlying low-dimensionality and structure of the physical world. This text is concerned with exposing these regularities through unified geometric principles that can be applied throughout a wide spectrum of applications. Such a ‘geometric unification’ endeavour, in the spirit of Felix Klein’s Erlangen Program, serves a dual purpose: on one hand, it provides a common mathematical framework to study the most successful neural network architectures, such as CNNs, RNNs, GNNs, and Transformers. On the other hand, it gives a constructive procedure to incorporate prior physical knowledge into neural architectures and provide principled way to build future architectures yet to be invented.

https://papers.labml.ai/paper/2106.08962

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Deep Learning has revolutionized the fields of computer vision, natural language understanding, speech recognition, information retrieval and more. We believe this is the first comprehensive survey in the efficient deep learning space that covers the landscape of model efficiency from modeling techniques to hardware support.

Deep Learning has revolutionized the fields of computer vision, natural language understanding, speech recognition, information retrieval and more. However, with the progressive improvements in deep learning models, their number of parameters, latency, resources required to train, etc. have all have increased significantly. Consequently, it has become important to pay attention to these footprint metrics of a model as well, not just its quality. We present and motivate the problem of efficiency in deep learning, followed by a thorough survey of the five core areas of model efficiency (spanning modeling techniques, infrastructure, and hardware) and the seminal work there. We also present an experiment-based guide along with code, for practitioners to optimize their model training and deployment. We believe this is the first comprehensive survey in the efficient deep learning space that covers the landscape of model efficiency from modeling techniques to hardware support. Our hope is that this survey would provide the reader with the mental model and the necessary understanding of the field to apply generic efficiency techniques to immediately get significant improvements, and also equip them with ideas for further research and experimentation to achieve additional gains.

https://papers.labml.ai/paper/2007.01547

Descending through a Crowded Valley — Benchmarking Deep Learning Optimizers

Analyze over 50,000 runs with different optimizers. Optimizer performance varies across tasks. Adam remains strong.

Choosing the optimizer is considered to be among the most crucial design decisions in deep learning, and it is not an easy one. The growing literature now lists hundreds of optimization methods. In the absence of clear theoretical guidance and conclusive empirical evidence, the decision is often made based on anecdotes. In this work, we aim to replace these anecdotes, if not with a conclusive ranking, then at least with evidence-backed heuristics. To do so, we perform an extensive, standardized benchmark of fifteen particularly popular deep learning optimizers while giving a concise overview of the wide range of possible choices. Analyzing more than $50,000$ individual runs, we contribute the following three points: (i) Optimizer performance varies greatly across tasks. (ii) We observe that evaluating multiple optimizers with default parameters works approximately as well as tuning the hyperparameters of a single, fixed optimizer. (iii) While we cannot discern an optimization method clearly dominating across all tested tasks, we identify a significantly reduced subset of specific optimizers and parameter choices that generally lead to competitive results in our experiments: Adam remains a strong contender, with newer methods failing to significantly and consistently outperform it. Our open-sourced results are available as challenging and well-tuned baselines for more meaningful evaluations of novel optimization methods without requiring any further computational efforts.

https://papers.labml.ai/paper/2106.14843

CLIPDraw: Exploring Text-to-Drawing Synthesis through Language-Image Encoders

CLIPDraw synthesizes novel drawings based on natural language input. The algorithm does not require any training. It operates over vector strokes rather than pixel images.

This work presents CLIPDraw, an algorithm that synthesizes novel drawings based on natural language input. CLIPDraw does not require any training; rather a pre-trained CLIP language-image encoder is used as a metric for maximizing similarity between the given description and a generated drawing. Crucially, CLIPDraw operates over vector strokes rather than pixel images, a constraint that biases drawings towards simpler human-recognizable shapes. Results compare between CLIPDraw and other synthesis-through-optimization methods, as well as highlight various interesting behaviors of CLIPDraw, such as satisfying ambiguous text in multiple ways, reliably producing drawings in diverse artistic styles, and scaling from simple to complex visual representations as stroke count is increased. Code for experimenting with the method is available at: https://colab.research.google.com/github/kvfrans/clipdraw/blob/main…

https://papers.labml.ai/paper/2106.06981

Thinking Like Transformers

Transformers have no such familiar parallel. We propose a computational model for the transformer-encoder in the form of a programming language. We show how RASP can be used to program solutions to tasks that could conceivably be learned by a Transformer.

What is the computational model behind a Transformer? Where recurrent neural networks have direct parallels in finite state machines, allowing clear discussion and thought around architecture variants or trained models, Transformers have no such familiar parallel. In this paper we aim to change that, proposing a computational model for the transformer-encoder in the form of a programming language. We map the basic components of a transformer-encoder — attention and feed-forward computation — into simple primitives, around which we form a programming language: the Restricted Access Sequence Processing Language (RASP). We show how RASP can be used to program solutions to tasks that could conceivably be learned by a Transformer, and how a Transformer can be trained to mimic a RASP solution. In particular, we provide RASP programs for histograms, sorting, and Dyck-languages. We further use our model to relate their difficulty in terms of the number of required layers and attention heads: analyzing a RASP program implies a maximum number of heads and layers necessary to encode a task in a transformer. Finally, we see how insights gained from our abstraction might be used to explain phenomena seen in recent works.

https://papers.labml.ai/paper/2106.02584

Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning

We challenge a common assumption underlying most supervised deep learning. Our approach uses self-attention to reason about relationships between datapoints explicitly. Empirically, our models solve cross-datapoint lookup and complex reasoning tasks unsolvable by traditional deep learning models.

We challenge a common assumption underlying most supervised deep learning: that a model makes a prediction depending only on its parameters and the features of a single input. To this end, we introduce a general-purpose deep learning architecture that takes as input the entire dataset instead of processing one datapoint at a time. Our approach uses self-attention to reason about relationships between datapoints explicitly, which can be seen as realizing non-parametric models using parametric attention mechanisms. However, unlike conventional non-parametric models, we let the model learn end-to-end from the data how to make use of other datapoints for prediction. Empirically, our models solve cross-datapoint lookup and complex reasoning tasks unsolvable by traditional deep learning models. We show highly competitive results on tabular data, early results on CIFAR-10, and give insight into how the model makes use of the interactions between points.

https://papers.labml.ai/paper/2106.10207

Distributed Deep Learning in Open Collaborations

Large corporations and institutions use dedicated High-Performance Computing clusters. Grid- or volunteer computing has seen successful applications in scientific areas. Using this approach for machine learning is difficult due to high latency, asymmetric bandwidth, and other challenges.

Modern deep learning applications require increasingly more compute to train state-of-the-art models. To address this demand, large corporations and institutions use dedicated High-Performance Computing clusters, whose construction and maintenance are both environmentally costly and well beyond the budget of most organizations. As a result, some research directions become the exclusive domain of a few large industrial and even fewer academic actors. To alleviate this disparity, smaller groups may pool their computational resources and run collaborative experiments that benefit all participants. This paradigm, known as grid- or volunteer computing, has seen successful applications in numerous scientific areas. However, using this approach for machine learning is difficult due to high latency, asymmetric bandwidth, and several challenges unique to volunteer computing. In this work, we carefully analyze these constraints and propose a novel algorithmic framework designed specifically for collaborative training. We demonstrate the effectiveness of our approach for SwAV and ALBERT pretraining in realistic conditions and achieve performance comparable to traditional setups at a fraction of the cost. Finally, we provide a detailed report of successful collaborative language model pretraining with 40 participants.

https://papers.labml.ai/paper/2106.12627

Provably efficient machine learning for quantum many-body problems

Classical machine learning (ML) provides a potentially powerful approach to solving challenging quantum many-body problems. We prove that classical ML algorithms can efficiently predict ground state properties of gapped Hamiltonians in finite spatial dimensions.

Classical machine learning (ML) provides a potentially powerful approach to solving challenging quantum many-body problems in physics and chemistry. However, the advantages of ML over more traditional methods have not been firmly established. In this work, we prove that classical ML algorithms can efficiently predict ground state properties of gapped Hamiltonians in finite spatial dimensions, after learning from data obtained by measuring other Hamiltonians in the same quantum phase of matter. In contrast, under widely accepted complexity theory assumptions, classical algorithms that do not learn from data cannot achieve the same guarantee. We also prove that classical ML algorithms can efficiently classify a wide range of quantum phases of matter. Our arguments are based on the concept of a classical shadow, a succinct classical description of a many-body quantum state that can be constructed in feasible quantum experiments and be used to predict many properties of the state. Extensive numerical experiments corroborate our theoretical results in a variety of scenarios, including Rydberg atom systems, 2D random Heisenberg models, symmetry-protected topological phases, and topologically ordered phases.

https://papers.labml.ai/paper/2106.10745

Calliar: An Online Handwritten Dataset for Arabic Calligraphy

Calligraphy is an essential part of the Arabic heritage and culture. It has been used in the past for the decoration of houses and mosques. In the past few years, there has been a considerable effort to digitize such type of art.

Calligraphy is an essential part of the Arabic heritage and culture. It has been used in the past for the decoration of houses and mosques. Usually, such calligraphy is designed manually by experts with aesthetic insights. In the past few years, there has been a considerable effort to digitize such type of art by either taking a photo of decorated buildings or drawing them using digital devices. The latter is considered an online form where the drawing is tracked by recording the apparatus movement, an electronic pen for instance, on a screen. In the literature, there are many offline datasets collected with a diversity of Arabic styles for calligraphy. However, there is no available online dataset for Arabic calligraphy. In this paper, we illustrate our approach for the collection and annotation of an online dataset for Arabic calligraphy called Calliar that consists of 2,500 sentences. Calliar is annotated for stroke, character, word and sentence level prediction.

https://papers.labml.ai/paper/2106.11189

Regularization is all you Need: Simple Neural Nets can Excel on Tabular Data

Tabular datasets are the last “unconquered castle” for deep learning. Traditional ML methods like Gradient-Boosted Decision Trees still perform strongly against specialized neural architectures. We propose regularizing plain MLPs by searching for the optimalcombination of 13 regularization techniques for each dataset.

Tabular datasets are the last “unconquered castle” for deep learning, with traditional ML methods like Gradient-Boosted Decision Trees still performing strongly even against recent specialized neural architectures. In this paper, we hypothesize that the key to boosting the performance of neural networks lies in rethinking the joint and simultaneous application of a large set of modern regularization techniques. As a result, we propose regularizing plain Multilayer Perceptron (MLP) networks by searching for the optimal combination/cocktail of 13 regularization techniques for each dataset using a joint optimization over the decision on which regularizers to apply and their subsidiary hyperparameters. We empirically assess the impact of these regularization cocktails for MLPs on a large-scale empirical study comprising 40 tabular datasets and demonstrate that (i) well-regularized plain MLPs significantly outperform recent state-of-the-art specialized neural network architectures, and (ii) they even outperform strong traditional ML methods, such as XGBoost.

https://papers.labml.ai/paper/2106.11959

Revisiting Deep Learning Models for Tabular Data

The choice between GBDT and DL models highly depends on data and there is still no universally superior solution. We demonstrate that a simple ResNet-like architecture is a surprisingly effective baseline, which outperforms most of the sophisticated models.

The necessity of deep learning for tabular data is still an unanswered question addressed by a large number of research efforts. The recent literature on tabular DL proposes several deep architectures reported to be superior to traditional “shallow” models like Gradient Boosted Decision Trees. However, since existing works often use different benchmarks and tuning protocols, it is unclear if the proposed models universally outperform GBDT. Moreover, the models are often not compared to each other, therefore, it is challenging to identify the best deep model for practitioners. In this work, we start from a thorough review of the main families of DL models recently developed for tabular data. We carefully tune and evaluate them on a wide range of datasets and reveal two significant findings. First, we show that the choice between GBDT and DL models highly depends on data and there is still no universally superior solution. Second, we demonstrate that a simple ResNet-like architecture is a surprisingly effective baseline, which outperforms most of the sophisticated models from the DL literature. Finally, we design a simple adaptation of the Transformer architecture for tabular data that becomes a new strong DL baseline and reduces the gap between GBDT and DL models on datasets where GBDT dominates.

Source Prolead brokers usa

Goldman 0 Comments

Jul 3 2021

Augmented Reality Trends: Check to Make a Smart Choice for Your Business!

Technology trends are keeping pace with allowing to emerge among the market competition with serving customer requirements excellently. The initial prototyping phase has passed, and it is now time that augmented reality is put into use practically. There will be 1 billion users around the world putting augmented reality in daily use by 2020, and hence, newer ways and increased user experience are guaranteed with this new technology working at its best.

Augmented reality has been in the market for years as the current market value of augmented reality has reached 3.5 billion dollars. The journey started in 1965 and is still evolving with the changing times of industries. It was used for head-mounted display systems for the first and now is gradually setting its foot in every significant industry you have ever known. As Gearbrain conducts from their research, nearly half of the U.S.A. citizens use augmented reality without realizing that they are using it.

Moreover, building an AR (Augmented Reality) project with experts reduces the time taken for development by a significant amount and provides better output than before. Therefore, bringing in evolution in every industry known, some of the amazing augmented reality trends are able to blow our minds and can prove that technology is truly everything we ever need.

Teaching and Training Exercises

The education sector has been benefitting from the use of augmented reality in their operations, where the knowledge can be transferred in real-time. And because of that, 70% of learners believe that augmented reality can help them learn and develop new personal and professional skills easily. Well, looking at the programs that augmented reality supports, this is very much true as AR has introduced many turnkey programs and solutions for the education industry that emphasizes developing skills practically rather than forcing only theoretical knowledge down their throats.

Augmented reality has been put into use by combining the concepts of it with different other technology trends that enhance the user experience and also enable them to provide enhanced usability. There are many industries where augmented reality and artificial intelligence work parallel to deliver excellent results.

Automobile Industry

Self-driven cars are still in their initial phase. However, there are certain areas in the automobile industry that encourage the use of AR (augmented reality) and AI (artificial intelligence) combined and put them to extreme tests where they still deliver the best results. For example, augmented reality is currently being used for experimenting in providing complete footage of the outer surroundings of the car through the camera to reduce the accident rates and improve security measures.

Also, there are various measures with which augmented reality can work perfectly by implementing artificial intelligence measures along it. The experiments, including augmented reality, are primarily focused on providing better navigation facilities, improving driver safety, increasing passenger convenience, and many more. The strategies include the possibilities where drivers can perform multiple tasks along with keeping their focus intact on the road while driving and also ensure the safety of the vehicle while parking or driving through tight spots.

AR with VR: Extended Usability Explained

Virtual reality (VR) and augmented reality have always been discussed together. The concepts, when implemented together, turn out to be exceptional by promising excellent results. While augmented reality plays its part in connecting persons, virtual reality bonds them together to form a social culture with visuals. The biggest example of this is “conference calls,” where people can see each other and interact with each other.

Sales of VR and AR headsets have been booming in the market by promising real-world experience while connecting the persons from farther distances and scores 22.8 million pieces sold (of AR headsets) by 2022. Mobile apps are being developed, including the AR and VR concepts, to guarantee users’ excellent smartphones.

Augmented Reality with Mobile Apps

Mobile applications are the current trend of the software industry. The number of smartphone users has hiked to 5.11 billion, and mobile applications are developed to offer an excellent user experience. However, augmented reality hikes the user experience to a new height by delivering excellent outputs. Many businesses choose to create an app that delivers such an experience, but most of the augmented reality apps are games.

Pokemon Go played a major role in introducing most of the audiences to augmented reality, and its popularity increased by leaps and bounds as many other gaming apps followed the same path. However, there is more that it can deliver than just gaming. Augmented reality can be implemented in various other categories where it promises excellent results and can bring any untouched market up.

Artificial Intelligence in Music Sector

Artificial Intelligence (AI) has broadly achieved popularity in the music industry in recent years. The primary reason behind it is an evolution that took place in the streaming sector in recent years. Another reason is core music streaming app development. Most artists and streaming app companies are investing in streaming app like Pandora, Spotify, and many others. It helps them to analyze everything starting from users’ preferences of the listeners and deliver work accordingly. They can use an AI-based recommendation engine in streaming apps to study the existing history of the listeners and new songs’ recommendations as well.

Event Management by AI-based Tools

AI-based tools create efficiency of money and time; hence integrating AI technology can help event planners to manage everything efficiently. An AI-based open-source PHP ticket system can help event planners manage and plan their next live event in a systematic manner that’s also without hassle. Most event planners and organizers are using AI technology to streamline and enhance their event management processes. AI-based tools and apps help event planners to:

Sort vast amounts of data in no time
Discover an excellent place for venues
Locate perfect vendor as per their event needs
Develop efficiencies with quicker decision-making

Augmented Reality for Advertising

Advertising and digital marketing approaches are evolving with changing demands of clients and the market. While augmented reality plays a vital role in connecting people with providing natural experience, advertising proceeds to make emotional connections with the group of audiences. Providing a real-like experience, the customers can easily be connected with the brand, and also it is cost-effective. Increasing sales and reducing the overall spend augmented reality is updating the digital marketing approaches and is delivering excellent outputs.

Though augmented reality is performing excellently in every field and sector it is put to use; there are still various steps to take to improve efficiency and increase productivity. The tools and gears used for AR have still not reached the stage when the manufacturers and scientists can claim them to be error-free and productive enough to deliver an excellent experience. The gap of a few milliseconds is enough to worsen the user experience, which is being minimized by the experts to improve the experience.

Investing your time and funds in a valuable and future-promising technology is advisable. While the journey of augmented reality has just begun, there is still a lot to discover and overcome in the near future, and the fully-fledged use of this technology leads you to a better world where every single application can drive productive and effective results.

Source Prolead brokers usa

Goldman 0 Comments

Jul 2 2021

Is Facebook’s “Prophet” the Time-Series Messiah, or Just a Very Naughty Boy?

A debate rages on page one of Hacker News about the merits of the world’s most downloaded time-series library. Facebook’s Prophet package aims to provide a simple, automated approach to the prediction of a large number of different time series. The package employs an easily interpreted, three-component additive model whose Bayesian posterior is sampled using STAN. In contrast to some other approaches, the user of Prophet might hope for good performance without tweaking a lot of parameters. Instead, hyper-parameters control how likely those parameters are a priori, and the Bayesian sampling tries to sort things out when data arrives.

Judged by popularity, this is surely a good idea. Facebook’s prophet package has been downloaded 13,698,928 times according to pepy. It tops the charts, or at least the one I compiled here where hundreds of Python time series packages were ranked by monthly downloads. Download numbers are easily gamed and deceptive but nonetheless, the Prophet package is surely the most popular standalone Python library for automated time series analysis.

The funny thing is though, that if you poke around a little you’ll quickly come to the conclusion that few people who have taken the trouble to assess Prophet’s accuracy are gushing about its performance. The article by Hideaki Hayashi is somewhat typical, insofar as it tries to say nice things but struggles. Yahashi notes that out-of-the-box, “Prophet is showing a reasonable seasonal trend unlike auto.arima, even though the absolute values are kind of off from the actual 2007 data.” However, in the same breath, the author observes that telling ARIMA to include a yearly cycle turns the tables. With that hint, ARIMA easily beats prophet in accuracy — at least on the one example he looked at.

Professor Nikolaos Kourentzes benchmarked prophet against several other R packages — namely the forecast package and the smooth package which you may have used, and also mapa and thief. His results are written up in this article which uses the M3 dataset and mean absolute scaled error (link). His tone is more unsparing. “Prophet performs very poorly… my concern is not that it is not ranking first, but that at best it is almost 16% worse than exponential smoothing (and at worst almost 44%!).”

What’s up with the top dog?

Is this a case of Facebook’s brand and marketing catapulting a mediocre algorithm to prominence? Or perhaps it is the echo-chamber effect (lots of people writing how-to articles on medium?). Let’s not be quick to judge. Perhaps those yet to be impressed by Prophet are not playing to its strengths, and those are listed on Facebook’s website. The software is good for “the business forecast tasks we have encountered at Facebook” and that, according to the site, means hourly, daily or weekly observations with strong multiple seasonalities.

In addition, Prophet is designed to deal with holidays known in advance, missing observations and large outliers. It is also designed to cope with series that undergo regime changes, such as a product launch, and face natural limits, due to product-market saturation. These effects might not have been well captured by other approaches. It doesn’t seem unreasonable then, to imagine that Prophet could work well on a domain it was built for. It is presumably under these conditions that the claim can be made, as it is on a 2017 Facebook blog post, that “Prophet’s default settings produce forecasts that are often [as] accurate as those produced by skilled forecasters, with much less effort.’’

The claim is pretty bold — as bold as Prophet itself, as we shall see. Not only does the software work better than benchmarks (though none are explicitly provided) but also human experts. Presumably, those human experts are able to use competing software in addition to drawing lines by hand … but what were they using, one might wonder? The same blog post suggests “as far as we can tell there are few open-source software packages for forecasting in Python.”

Here I’m sympathetic, conscious of the officious policing of firewalls that can occur at large companies. And that statement was made in 2017, I believe, though even accounting for the date, the lack of objective benchmarking strikes me as a tad convenient. My listing of Python time series packages is fairly long, as noted, though of course many have come along in the last three years.

Still, it shouldn’t be too hard to find something to test Prophet against, should it? A recent note suggests that prophet performs well in a commercial setting, but — you guessed it — does not explicitly provide a comparison against other Python packages. Nor here. An article by Navratil Kolkova is quite favorable too (pdf). The author notes that the results are relatively easy to interpret — which is certainly true. But was performance compared to anything? I’ll let you guess.

You will have surmised by now that the original Prophet paper, Forecasting at Scale by Taylor and Letham, is also blissfully comparison free (pdf). The article appears, slightly modified, in volume 72 of the American Statistician, 2018, so perhaps my expectations are unreasonable (pdf). The Prophet methodology is plausible, it must be said, and the article has been cited 259 times. The authors explain the tradeoffs well, and anyone looking to use the software will understand that this is, at heart, a low pass filter. You get what comes with that.

Objectively measured hard-to-beat accuracy might not be part of that bargain. As noted, I wasn’t the only person dying of curiosity on the matter of whether the world’s number one Python time series prediction library can actually predict stuff. A paper considering Prophet by Jung, Kim, Kwak and Park comes with the title A Worrying Analysis of Probabilistic Time-series Models for Sales Forecasting (pdf). As the spoiler suggests, things aren’t looking rosy. The authors list Facebook’s Prophet as the worst performing of all algorithms tested. Oh boy.

Ah, you object, but under what metric? Maybe the scoring rule used was unfair and not well suited to sales of Facebook portals? That may be, but according to those authors Prophet was the worst uniformly across all metrics — last in every race. Those criteria included RMSE and MAPE as you would expect, but also mean normalized quantile loss where (one might have hoped) the Bayesian approach could yield better distributional prediction than alternatives. The author’s explanation is, I think, worth reproducing in full.

The patterns of the time series are complicated and change dynamically over time, but Prophet follows such changes only with the trend changing. The seasonality prior scale is not effective, while higher trend prior scale shows better performance. There exist some seasonality patterns in the EC dataset, but these patterns are not consistent neither smooth. Since Prophet does not directly consider the recent data points unlike other models, this can severely hurts performance when prior assumptions do not fit.

In recent times, attention has turned to prediction of COVID-19 rather than product cycles. But again, Papanstefanopoulos, Lindardatos and Kotsiantis (pdf) find Prophet underperforms ARIMA. Stick to TBATS, their results advise. There’s no love either from Vishvesh Shah in his master’s thesis comparing SARIMA, Holt-Winters, LSTM and Prophet. Therein, Prophet is the least likely to perform the best on any given time series task. LSTM’s won out twice as often, and both were soundly beaten by the tried and tested SARIMA.

Woes continue for Prophet in the paper Cash Flow prediction: MLP and LSTM compared to ARIMA and Prophet by Weytjens, Lohmann and Kleinsteuber (download). I’ve included their summary table. Compared to the other papers is relatively favorable — as far as a head to head with ARIMA is concerned. However as you can see, neural networks easily best Prophet and ARIMA — at least in their setup.

Is there a sweet spot for Prophet, somewhere where data hungry methods can’t trounce it, yet classical time series isn’t so strong? Determined to find a paper that compared Prophet favorably to anything, I finally located Samal, Babu, Das and Acharaya’s paper titled Time Series based Air Pollution Forecasting using SARIMA and Prophet Model (paper). Air pollution is a good choice, I feel, since multiple cycles might confuse some competing approaches. The authors find that Prophet wins, hoorah! But I’m not convinced, as this seems to be a relatively small sample with one absolutely enormous spike in the middle of the time series — seeming to make the RMSE something of a lottery.

That’s what worries me about Prophet. The smattering of favorable reviews all seem to involve one or two time series. I’m not sure what the FDA would think about that. The only wins seem to occur against relatively weak fields, usually two-horse races between ARIMA and Prophet. That seems to be the race that Prophet can win sometimes, as suggested by Cayir, Kozan and Yenidogan in Bitcoin Forecasting Using ARIMA and PROPHET (download). It should be noted that the authors perform manual pre-processing and feature selection, so arguably this isn’t forecasting at scale but we’ll call it a Prophet win.

Similarly, in the almost perfectly seasonal Kuwait electricity load time series studied by Almazrouee et al (paper) Prophet scores a victory over Holt-Winters. Just one time series there, however, and I would have thought auto-arima would be a better benchmark. A post by Michael Grogan (article) where passenger traffic is studied also helps highlight some of Prophet’s strengths — but again the win over ARIMA is for one time series only. As soon as we get to studies involving numerous time series, as with Al Yazdani’s use of FRED data (repo) or with Fred Viole’s comparison to NNS-ARMA back in 2017 (post), Prophet get’s stomped.

I began writing this post because I was working on integrating Prophet into a Python package I call time machines, which is my attempt to remove some ceremony from the use of forecasting packages. These power some bots that the prediction network (explained at www.microprediction.com if you are interested). How could I not include the most popular time series package?

I hope you interpret this post as nothing more than an attempt to understand the quizzical performance results, without denying the possible utility of Prophet or its strengths (if nothing else it might be classified as a change-point detection package). I mean seriously, can Prophet really be all that bad? At minimum, all those who downloaded Prophet are casting a vote for interpretability, scalability and good documentation — but perhaps accuracy as well in a manner that is hard to grasp quantitatively.

Let’s be clear about one thing, nobody has a right to complain about open-source, freely distributed software that doesn’t live up to their expectations. They are free to make pull requests that improve it, and I hope that continues to be the case for Prophet. Also, I offer below what I hope is a simple, concrete way to improve forecasts made using Prophet.

And today I want to determine if there is something we can see with our eyes which argues for Prophet. Let’s brush aside the naysayers, and to some extent the error metrics, and give it a really good run over hundreds of different time series with diverse sources (you can browse the live streams I am referring too). Let’s also not fall into the trap of looking at the out of sample data while visually assessing Prophet — that would be unfair. I want to structure this so that you first see the data the way Prophet would — only the data you would train it on.

I happened to start with wait times at a hospital so we’ll run with that first. You can click through that link to see the nature of that time series but, as you can imagine, there is quite a lot of predictability to it. The data is sampled every 15 minutes, and that in theory plays to Prophet’s strengths. For those not familiar with Prophet, the following steps are undertaken:

We marshal the time series, including timestamps and exogenous variables into a pd.DataFrame df say.
We call m.fit(df) after each and every data point arrives, where m is a previously instantiated Prophet model. There is no alternative, as there is no notion of “advancing” a Prophet model without refit.
We make a “future dataframe” called forecast say, that has k extra rows, holding the times when we want predictions to be made and also known-in-advance exogenous variables.
We call m.predict(forecast) to populate the term structure of predictions and confidence intervals.
We call m.plot(forecast) and voila!

Well, that’s what you are supposed to do but personally, I’d rather gnaw my own leg off that do that every single time. I wrote a simple functional interface here where you can use one line of code instead, if you wish (it is explained in the README.md).

However the Prophet interface as provided is well documented and easy enough to use. The response to GitHub issues is also excellent — much better than the wait times at the ER we are plotting. The style is easy to follow — albeit rooted in an offline, tabular mode of work that isn’t the most convenient for deployment. Most will be familiar with it and not everyone will feel the need to mutter under their breathe about the wisdom of including pandas in a central role. The ceremony isn’t too bad, really.

These are important thumbs-ups for an open source project.

So what do we think of the output? The plot you see looks pretty, although the blue shading is a little seductive. Imagine all the other ways to paint that in and you start to realize a few of them might be quite a bit more convincing. Yes you can imagine my disappointment when, out-of-the-box, Prophet was beaten soundly by a “take the last value” forecast but probably that was a tad unlucky (even if it did send me scurrying to google, to see if anyone else had a similar experience).

Let’s try not to be too harsh. Firstly, any automated approach will come up against the no free lunch theorem sooner or later, to the extent that you buy that. Moreover as noted, one-step ahead forecasting isn’t exactly the raison-d’etre of Prophet. It is more for medium-to longer-term forecasting, presumably, where pronounced patterns persist and one must rail against serially correlated noise and regime changes. When I increased the lookahead (k>1 steps to forecast), the out-of-the-box Facebook Prophet did start to add value — as judged by root mean square error.

So Facebook has bequeathed the world something useful, potentially. Sure, this might not be everyone’s idea of automatic time series forecasting if a last value cache can beat it sometimes and you have to chaperone it around such cases, but I only supplied it with a rolling window of 200 data points (~2 days) initially which might not be “fair”. You can see from the picture above it is somewhat shy about the daily effect, and who wouldn’t be after two days? When we include 400 trailing data points, Reverend Bayes — the ghost in the machine — says “yup, I see something more.” Then, you get a better tracking forecast.

Unfortunately, this picture still doesn’t scream at me that Prophet is doing a good job — depending on your purpose. Look at the serial correlation in the model residuals, for one thing. What the picture really does is help us understand why Prophet might do a consistently bad job of predicting some real world time series (as judged by error metrics).

The Prophet generative model also suggests that where cycles occur which are rhythmic but not precisely so, things might go wrong. In these instances, it seems to me that that Prophet — which is actually a simple combination of three terms — will make some really courageous predictions.

Perhaps we start by looking at some of the more bold Prophet predictions.

In this discussion we need to redefine “bold”, as compared to what you’re probably accustomed to if you use, say, Kalman filters, DLMs or the like. Of course, with any time series model, there will be predictions we think push the upper or lower envelopes — but with Prophet we go further. There are really strong views. For example, what would you think the next number in this sequence will be?

69, 55, 55, 53, 53, 41, 41, 28, 28, 35,…

Is 181 the first number that comes to mind? Me neither, and that’s where one starts to wonder if the Prophet approach is salvageable from the standpoint of error metrics — notwithstanding the anecdotal success people have apparently had using it to forecast sales (you know, compared to those “expert” forecasters).

I guess nobody gets to be a famous Prophet by making mundane safe predictions, but at minimum, one should be aware of some Prophet mechanics. For instance, the last 20% of data points are not used to estimate the trend component. Did you know that? Did you expect that? I didn’t. Let it sink in.

As we zoom in on the left, we see the data takes a turn but Prophet sails on — perhaps because of this particular quirk but also because Mr. Markov Chain Monte Carlo doesn’t necessarily have enough possibilities to explore. In some real world applications, those data points are going for a bonafide walk. It might not be noise obscuring a cycle, but rather real, predictable falloff in hospital wait times. It is just that today’s falloff doesn’t happen to occur at precisely the same time as yesterday.

For this reason, I’d be shy of asserting that this methodology will be accurate in the presence of overlapping cycles, even if it can sometimes do quite well. The generative model suggests, on the contrary, some brittleness. Needless to say, those time series that are actually hard to predict (say financial price time series) are probably going to provide an even stiffer challenge to Prophet — not that I imagine anyone is about to make a firm two-sided market in bitcoin based on what comes out of Prophet. Not anyone who has read this far, anyway.

Now it isn’t a bad thing to try to ignore noise. The issue, however, is that a slightly different generative model might also ignore noise pretty well, but would have woken up the Reverand Bayes (or Laplace maybe) at some point when all the model residuals were on the same side for an extended run. Prophet doesn’t care. Prophet is a honey-badger. Prophet carries on. Prophet is, I suggest, for when you view the projections from the furthest seat of the conference room, and you’re the impatient non-technical boss, and you just want to know if beanie babies are selling or not.

But even then I’m a little skeptical that Prophet is going to do a better job of dealing with noise than some other filtering approaches, because Prophet assumes gaussian measurement errors — which is pretty much giving up the battle before you start. Asking Bayes to save you from chasing outliers while you tell it they are gaussian is like asking a dog not to chase its tail.

Unless … you also constrain it in other rather draconian ways. To do so can come at a price. To illustrate, the y-axis shows travel times on a section of the New Jersey turnpike. You can see what I mean about the difficulty of applying Bayes Rule, when your model spans a tiny fraction of the space of all possible time series. No human expert would provide the same extrapolation, which I think you’ll agree is pretty darn dreadful.

Trajectories can exhibit cyclical patterns and yet, on balance, this probably fails the “six-year-old daughter test”. So, too, the following example where first graders with markers would do a better job at extrapolation. Like Prophet, I’ve made you blind to the source of this data but rest assured these are real, instrumented data points.

How much would you really wager that this is going to head to -2? I’ll take the other side in size if that’s really your median, since the average of my two six-year-old daughters’ estimates comes to -1.2, approximately. Here I’m sorry not to be more web-adept, because a button toggling the blue shading would probably remove some bias. Try to remove it from your visual calculation, if you can.

Similarly, while some might buy a consistent trend in the plot below, I think it is safer (and pragmatic from the perspective of minimizing squared error over many similar time series) to suggest that we are likely going to be in the vicinity of 1.5–2.0, not 2.5, in the near future. Perhaps some of the longer-term trend can enter the picture, but shrunk towards zero.

Prophet isn’t shy though.

In a similar vein, do you expect the wait time at Newark Airport to go negative any time soon? Now we’re reinserting our generalized intelligence and that seems unfair but even so, if a time series has never, ever gone below a certain level, isn’t it pretty cheeky to predict that it will, with high probability, do precisely that? That might be an easy way to improve the generative model just a tad.

Notice in the plot below that, in Prophet’s defense, it has tried to find the change-point. However the lack of flexibility with trend is the issue. Here I wager it will get beaten by the last value cache more often than not, never mind something more sophisticated. A log transform might not save it, nor a longer horizon.

Again, it is easy to criticize and hard to implement. These examples are, as noted, just my own wrestling with the methodology in an effort to understand how it might be improved. In the Prophet generative model, there are not too many levers to pull, really (and those that exist can, perhaps, over-identify structure that isn’t really there, perhaps, if there is no other recourse for Bayes Rule).

One wouldn’t want to ruin Prophet in the process of improving it. The relatively fast model fitting is essential, since this has to occur every time a new data point arrives (there is no state, or notion of carrying forward the Prophet model from one point to the next). One of the benefits of a relatively simple parsimonious model is the ability to sail through noise. Here it is doing a good job, no?

It may seem like I’m nit-picking here, but the imagination of sinusoidal structure you see above is going to chip away at the chances of a low root mean square error — as compared to something else closer to a martingale. To put a more positive spin on this, one might argue that if you only care about the overall picture — say panned out — then you might be indifferent to the wiggles that contribute to least square error but don’t harm your insight. I’m not sure how to really turn that vague defense of Prophet into a more formal one, however.

I’m also concerned that Prophet will process time series in a way that makes for predictable “anchor points,” if we allow ourselves to relax the idea that it is trying to create point estimates judged by some scoring rule — a task it apparently isn’t naturally succeeding at. If you are making markets, you don’t want that. But one way to defend Prophet theoretically — and here I’m just throwing up ideas — is to judge it based on how well it helps subsequent processing (that is to say, some metric applied to Prophet used in conjunction with some other method, presumably one that fixes up the serially correlated errors, and so forth, or at minimum uses it for change-point detection).

However, it isn’t just the predictability of those wiggles but also their magnitude that concerns me. Here’s a real market time series where we look at ticks up and down for the 30-yr bond. I would want to shrink those predictions towards zero, and time series filters designed for that purpose do precisely that. But I admit the picture isn’t proof that Prophet is inherently unworkable for microstructure — just a cause for concern. A fair analysis would use the full posterior.

Setting aside the fact that few will be tempted to use Prophet for a market making mid price anyway, you can also see why it might not be topping the league tables for easier-to-predict things either, no matter what metric is applied. With only trend to play with, there are going to be plenty of interpolations and extrapolations that beg the question. In the time series below, Prophet doesn’t get a prize here for predicting a trend inside any of the three seeming regimes — any model given a hundred data points or more should do okay. It does make a bold extrapolation, however, as you can see.

Sometimes those guesses will be right, but I’d be inclined to treat Prophet as a signal generator at best, which is to say we could use it as a feature. Out of the box, my gut says this model will effect poorly calibrated estimates (i.e. estimates that could easily be improved by some meta-model analyzing its proclivity to make mistakes). For instance, here is Prophet seemingly failing to discern any signal at all in the ranges taken by the price of corn over fixed intervals (a measure of volatility). But there is a pattern, unless my eyes deceive me.

Prophet needs regularity, and sometimes it gets it. The best possible scenario, it would seem, is one in which a quantity follows a piecewise linear path. In nature, one has to hunt and peck to find them, but it isn’t impossible. Hey, check out New York’s electricity production!

And yet even here we see the flaw in Prophet. Its basis manages to span the space in this example — fortuitously one might think — but when compared to other approaches (such as state space models) it is placing a really big bet on certain types of path — such as that straight line continuing. Glance at the cover image of this post for the end of a similar story.

In contrast, if you model with gaussian processes, Kalman filters or the like, you are also performing a Bayesian calculation but doing it over a much larger space of possibilities. That’s why, in my humble prior opinion pending a more formal analysis, Prophet’s numbers aren’t likely to be very impressive.

In other domains, the generative model paucity is — I would suggest — glaringly obvious. For example, if you wanted to model traffic flow, or the rise and fall and rise of epidemics, just about any sensible generative model would be able to trace out bursty behavior, should that be required of it by the model.

I completely accept that this might not have been the motivation for Prophet. And on the flip side, perhaps there is a relatively easy way to improve the software. Even a one-parameter family might be able to model the length of a queue, if push came to shove (which it sometimes does in New York City traffic). The underlying MCMC can handle anything (thanks Professor Gelman) so why not?

You’d want Prophet to be able to model travel times between I-80 and the Alexander Hamilton bridge, for instance, whereas at present Prophet doesn’t cope with that — unless I have some horrible bug in my code. The generative model doesn’t want to discern the phenomenon that is seemingly evident in the data. Just imagine how bad the out-of-sample performance of this model is going to be compared to, say, an ARIMA with change-point detection or even a Kalman filter.

By design, Prophet’s generative model might treat all manner of disturbances as noise, no matter how prolonged they are or whether they are the dominant feature of the data. I won’t tell you what the example below represents, except to say that — like all of the series I am presenting — it is real. I want you to imagine how many real-world time series this might represent (Prophet is really going to enrage Albert-Laszlo Barabas, author of Bursts).

This series could be mentions of GameStop on reddit, or solar activity, or the number of Trump’s tweets (note the flatline at the end). Maybe it is the progress of a chemical reaction, or cyber-attack occurrences. The point is it could represent any number of things and Prophet may, out of the box, do an utterly terrible job of modeling all of them. More kindly, we’d say it is sometimes going to move, like the Lord, in mysterious ways.

Oh, you say, but Prophet is trying to pick up on the overall trend — nothing more. Okay, but then what do you make of the number of comments on the front page of Hacker News right now? Is it trending up or down?

I’m going with trending up here after a pronounced change-point that should, I would think, be quite similar to a change in a product you’d see in those sales time series where Prophet (we are told) excels. If this were sales of second hand exercise bikes instead (perhaps over a longer time horizon) I’d think things were looking rosy. Prophet says no, they are headed down, down, down.

Here a more neutral forecast is going to have lower mean square error, surely. My point, to reiterate, is that Prophet is very strongly opinionated despite its use of Bayes Rule because the generative model represents a sparse set of possibilities. Way too sparse for my liking. To return to the example that started all this, Prophet might lead you to believe that due to one particularly bad day, hospital wait times at Piedmont-Atlanta are going to trend upwards indefinitely.

I think we know that’s kind of ridiculous, and I can’t imagine an “expert forecaster” presenting that one to the board room. So no, Facebook Prophet has not solved the problem of automating forecasting. But it is a noteworthy and interesting attempt, and hopefully contributors will continue to push it forward. There are also simple things you can do to improve your own use of Prophet.

Now, having shown you in-sample data, let’s look at some examples with the truth revealed. You’ll see that some of those wagers made by Prophet do pay out. For example, here’s Prophet predicting the daily cycle of activity in bike sharing stations close to New York City hospitals. It does a nice job of anticipating the dropoff, don’t you think?

And in the example below, Prophet anticipates the uptick in trading volume for heating oil futures. That isn’t evident from looking at the last few data points, so we might give it credit (though a skeptic might suggest that this was lucky, and only due to the mis-fitting of a sinusoid to a straight line just prior).

These examples help Prophet in the race for a low mean square error. Now it’s true that there can be an element of dumb luck with this metric. Here’s an example of Prophet being more accurate than the last value, but, um …

Yeah.

Let’s move on! Yes, something has gone terribly wrong here and I suspect it relates to the Fourier transform of an epidemic. But again I’m tempted to make excuses for Prophet (we could fix it by pre-processing using the Lambert W function, of course!). Similarly, in the example below I suspect but did not verify, that a change in hyper-parameters will help.

Furthermore, there are cases where Prophet shouldn’t really be punished by the error metric as much as it is.

Pretty boring time series, eh? Traffic flows smoothly in the HOV (at least during COVID-19) except for a few times when it doesn’t. Not everyone will like Prophet’s answer to this question (not Mr. Least Squared Error anyway) but Prophet is trying, nobly, to tell us something. I think we should listen.

But the question is how, when and how much to listen? For instance, if the last couple of data points are indicating the beginning of a decline cycle, as with parking occupancy at Newark Airport you see below, but Prophet thinks the trend will continue, I’d be inclined to shrink Prophet’s prediction back towards the average of the last few data points. I guess it is easy to say that when you see the red out-of-sample points.

Motivated by these examples, here’s a really simple hack that seems to improve prophet

Look at the last five data points, and compute their standard deviation.
Construct an upper bound by adding m standard deviations to the highest data point, plus a constant. Similarly for a lower bound.
If Prophet’s prediction is outside these bounds, use an average of the last three data points instead.

As elementary as this sounds, it really works — even when forecasting way ahead. For example, when selecting random time series from this list, giving Prophet 500 points to train on, and requesting a prediction 50 steps ahead, this simple heuristic with m=3 decreased the root mean square error by a whopping 25% (on those cases where it applied, not overall). The hack is roughly as effective, it would seem, when forecasting 20 steps ahead using m=1.5.

I don’t claim that this heuristic is optimal in any sense — I’m merely noting this particular rule as it was the first thing I tried. Any reasonable shrinkage will probably serve a similar purpose, likely with even better results. One could, of course, do all manner of related things such as use some combination of Prophet with other predictions.

Even the mere possibility of differencing the series would remove the gains Prophet makes over the last value cache on those occasions when the trend is linear and it makes most of its hay. That’s because there are plenty of examples where differencing is all you need to increase an already existing advantage held by the last value estimator over Prophet. For instance, here Prophet declines to chase the Loch Ness Monster any further whereas any half-way reasonable extrapolation would do so.

I don’t know how Nessie got away this time, but consider it part of the mysterious allure of Prophet.

In fairness, differencing could also serve Prophet well and one can surely wield it in many interesting ways, either as a signal generator, or part of a more comprehensive pipeline. The seemingly poor performance reported in the articles I noted does not preclude this, and there are certainly subtleties associated with assessment of time series modeling in the presence of serially correlated errors. One might argue that this is all a giant inverse problem and there is no definitive evaluation — though if statistical solipsism is the only defense we may have bigger problems.

To close, let me say that this post ended up being more negative than I expected and, like Nessie, my opinion may rise in the future when I understand the implications of the Prophet generative model better, and either modify it or find better ways to identify its strengths. The unanswered question here is why Prophet is so popular, and this surely merits a better explanation than I have given. I think there are probably statistical angles I am not seeing — something reflecting the fact that people are voting with their eyeballs when they use Prophet.

The pragmatic advantage of being able to forecast many different time series with some degree of accuracy and no tweaking should not be underestimated. This, assuredly, is driving the popularity of prophet and it speaks to the accomplishment. I, for one, will continue to play with Prophet and I’d encourage you to do the same. That said, I do think that those of you writing “introduction to forecasting” articles for your fellow data scientists might want to scan a little further down the list I’ve provided, and give the little guys a good run as well. For instance auto_ts attempts to tell you when Prophet is being outperformed by alternatives, just to pick one.

I have begun a more systematic assessment of Prophet, as well as tweaks to the same. As with this post, I’m using a number of different real world time series and analyzing different forecast horizons. The Elo ratings seem to be indicative of Prophet’s poor performance — though I’ll give them more time to bake. However, unless things change my conclusions are:

It is just way too easy to improve Facebook Prophet with dead simple hacks. Notice that on the leaderboards for 1-step ahead forecasting (here) and most of the others, the fbprophet_cautious algorithm is performing better than fbprophet_univariate. The former curtails “crazy” predictions by Prophet, whereas the latter runs the factory default settings.
In keeping with some of the cited work, I find that Prophet is beaten by exponential moving averages at every horizon thus far (ranging from 1 step ahead to 34 steps ahead when trained on 400 historical data points). More worrying, the moving average models don’t calibrate. I simply hard wired two choices of parameter.

I think you can create a much better time series approach than Prophet. If you have ideas, and can render them with a simple “skater” signature (explained in the README.md) I’d love to include them.

This article originally appeared on the microprediction blog.To ensure articles like this are in your thread, or category updates here, consider following microprediction on LinkedIn. We’re trying to make powerful, bespoke AI free and convenient, and you are welcome to contribute in large or small ways – even win a competition or two. If you have a suggestion, please file an issue.

Source Prolead brokers usa

Goldman 0 Comments

Jun 30 2021

What is Data Literacy and How is it Playing a Vital Role in Today’s World?

What literacy was for the past century is what data literacy is for the twenty-first century. Most employers now prefer people with demonstrated data abilities over those with higher education, even data science degrees. According to the report, only 21% of businesses in the United States consider a degree when hiring for any position, compared to 64% who look for applicants who can demonstrate their data skills. When data is viewed as a company’s backbone, it’s critical that corporations assist their staff in properly utilizing data.

What is Data Literacy?
The capacity to understand, work with, analyze, and communicate with data is known as data literacy. It’s a talent that requires workers at all levels to ask the right questions of data and machines, creates knowledge, make decisions, and communicate meaning to others. It isn’t only about comprehending data. To be educated, you must also have the confidence to challenge evidence that isn’t performing as it should. Literacy aids the analysis process by allowing for the human element of critique to be considered. Not only for data and analytics professions, but in all occupations, organizations are looking for data literacy. Companies that rigorously invest in data literacy programs will outdo those that don’t.

Why is it Important?
There are various components to achieving data literacy. Tools and technology are important, but employees must also learn how to think about data to understand when it is valuable and when it is not. When employees interact with data, they should be able to view it, manipulate it, and share the results with their colleagues. Many people go to Excel because it is a familiar tool, but confining data to a desktop application is restrictive and leads to inconsistencies. Employees receive conflicting results even though they are looking at the same statistics because information becomes outdated. It’s beneficial to have a single platform for viewing, analyzing, and sharing data. It provides a single source of truth, ensuring that everyone has access to the most up-to-date information. When data is kept and managed centrally, it is also much easier to implement security and governance regulations. Another vital aspect of data culture is having excellent analytical, statistical, and data visualization capabilities. Complex data may be made easy using data visualization, and simple humans can drill through data to find answers to queries.

Should Everyone be Data Literate?
A prevalent misconception regarding data literacy is that only data scientists should devote time to it; instead, these skills should be developed by all employees. According to a Gartner Annual Chief Data Officer (CDO) Survey, poor data literacy is one of the main roadblocks to the CDO’s success and a company’s ability to grow. To combat this, 80% of organizations will have specific initiatives to overcome their employees’ data deficiencies by 2020, Gartner predicts. Companies with teams that are literate in data and its methodologies can keep up with new trends and technologies, stay relevant, and leverage this skill as a competitive advantage, in addition to financial benefits.

How to Build Data Literacy
1. Determine your company’s existing data literacy level.
Determine your organization’s current data literacy. Is it possible for your managers to propose new projects based on data? How many individuals nowadays genuinely make decisions based on data?

2. Identify data speakers who are fluent in the language and data gaps.
You’ll need “translators” who can bridge the gap and mediate between data analysts and business groups, in addition to data analysts who can speak naturally about data. Identify any communication barriers that are preventing data from being used to its full potential in the business.

3. Explain why data literacy is so important.
Those who grasp the “why” behind efforts are more willing to support the necessary data literacy training. Make sure to explain why data literacy is so important to your company’s success.

4. Ensure data accessibility.
It’s critical to have a system in place that allows everyone to access, manipulate, analyze, and exchange data. This stage may entail locating technology, such as a data visualization or management dashboard, that will make this process easier.

5. Begin small when developing a data literacy program.
Don’t go overboard by conducting a data literacy program for everyone at the same time. Begin with one business unit at a time, using data to identify “lost opportunities.” What you learn from your pilot program can be used to improve the program in the future. Make your data literacy workshop enjoyable and engaging. Also, don’t forget that data training doesn’t have to be tedious!

6. Set a good example.
Leaders in your organization should make data insights a priority in their own work to demonstrate to the rest of the organization how important it is for your team to use data to make decisions and support everyday operations. Insist that any new product or service proposals be accompanied by relevant data and analytics to back up their claims. This reliance on data will eventually result in a data-first culture.

So, how is your organization approaching data literacy? Is it one of the strategic priorities? Is there a plan to get a Chief Data Officer? Feel free to share your thoughts in the comments section below.

Source Prolead brokers usa

Goldman 0 Comments

Jun 29 2021

How AI Benefits EHR Systems

As AI continues to make waves across the medical ecosystem, its foray into the world of EHR has been interesting. This is obviously because of the countless benefits both systems offer. Now, imagine you use a basic EHR for patients. One patient is administered an MRI contrast agent before the scan. What you may not know is that they are prone to an allergy or conditions that could cause the dye to negatively affect the patient. Perhaps the data was in the patient’s EHR but was buried so deep that it would have been impossible to look for it specifically.

An AI-enabled EMR, on the other hand, would have been able to analyze all records and determine if there was a possibility of any conditions that may render the patient susceptible to adverse reactions and alert the lab before any such dyes are administered.

Here are other benefits of AI-based EHR to help you understand how they contribute to the sector.

Better diagnosis: Maintaining extensive records is extremely helpful for making a better, more informed diagnosis. However, with AI in the mix, the solution can then identify even the most nominal changes in health stats to help doctors confirm or disprove it. Furthermore, such systems can also alert doctors about any anomalies and straight away link them to reports and conclusions submitted by doctors, ER staff, etc.
Predictive analytics: Some of the most important benefits of AI-enabled EHRs is that they can analyze health conditions, flag any risk factors and automatically schedule appointments. Such solutions are also able to help doctors corroborate and correlate test results and help set up treatment plans or further medical investigations to deliver better and more robust conclusions about patients’ well-being.
Condition mapping: Countless pre-existing conditions may impede medical diagnosis and procedures challenging or even dangerous. This can be easily tended to by AI-enabled EHRs that can help doctors rule out any such possibilities based on factual information.

Now, let’s look at some of its challenges.

Real-time access: For data to be accessible by AI, the vast amounts of data generated by a hospital daily are stored in proper data centers.
Data sharing: Of course, the entire point of EHRs is to make data accessible. Unfortunately, that isn’t exactly possible until you have taken care of the storage and that it is in the requisite formats. Unprocessed data is not impossible for AI to sift through but it does count up as a completely different task — one that takes a toll on the time taken to execute AI’s other, more important objectives in this context.
Interoperability of data: It is not enough to just be able to store data; the said data must be also readable across a variety of devices and formats.

Artificial intelligence has a lot to offer when it comes to electronic health records and the healthcare sector in general. If you too want to put this technology to work for you, we recommend looking up a trusted custom EHR system development service provider right away and get started on the development project ASAP.

Source Prolead brokers usa

Goldman 0 Comments

One of the best PPC companies

One of the best PPC companies around! Rather than making a bunch of promises that they fail to deliver on,...

Jeff Trombone

ADT Solar

Pro Lead Brokers USA | Targeted Sales Leads | Pro Lead Brokers USA

2022-11-10T16:04:25+00:00

Jeff Trombone

ADT Solar

One of the best PPC companies around! Rather than making a bunch of promises that they fail to deliver on, the Driven Team actually keeps it real with its clients. That's one of my favorite things about working with them. They actually LISTEN to what you're trying to do, then they tailor an approach that helps to reach that goal. There's no one-size fits all approach, so if that's what you seek, then Driven ISN'T for you. If you want a customized campaign specific to your needs and goals, then the Driven Team will be a fantastic ally for your team!

https://proleadbrokersusa.net/blog/testimonials/one-of-the-best-ppc-companies/

I have worked with Pro Lead Brokers USA

I have worked with Pro Lead Brokers USA for several years now and they are fantastic! They have helped me...

Pro Lead Brokers USA | Targeted Sales Leads | Pro Lead Brokers USA

2022-11-10T16:17:21+00:00

I have worked with Pro Lead Brokers USA for several years now and they are fantastic! They have helped me create the leads I need to build and provide for my team. I highly recommend them!

https://proleadbrokersusa.net/blog/testimonials/i-have-worked-with-pro-lead-brokers-usa/

Could not be any happier

Could not be any happier. They dropped my lead cost in half from what my previous campaign manger was doing.......

Pro Lead Brokers USA | Targeted Sales Leads | Pro Lead Brokers USA

2022-11-10T16:18:58+00:00

Could not be any happier. They dropped my lead cost in half from what my previous campaign manger was doing.... They have excellent customer service... Closed and pending I am at 14 transactions so far this year (Its April) From Google PPC and have a fantastic pipeline. Amazing!! Thank you!

https://proleadbrokersusa.net/blog/testimonials/could-not-be-any-happier/

We have been working with Bill

We have been working with Bill at Pro Lead Brokers USA for almost two years and have found their leads...

Pro Lead Brokers USA | Targeted Sales Leads | Pro Lead Brokers USA

2022-11-10T16:22:59+00:00

We have been working with Bill at Pro Lead Brokers USA for almost two years and have found their leads and their service to be world class, thanks for your efforts.

https://proleadbrokersusa.net/blog/testimonials/we-have-been-working-with-bill/

I love Pro Lead Brokers USA

I love Pro Lead Brokers USA.. it works its magic on prospects while I work on other things.

Pro Lead Brokers USA | Targeted Sales Leads | Pro Lead Brokers USA

2022-11-10T16:34:06+00:00

I love Pro Lead Brokers USA.. it works its magic on prospects while I work on other things.

https://proleadbrokersusa.net/blog/testimonials/i-love-pro-lead-brokers-usa/

The leads are more viable

The leads are more viable, the people are more open to follow up with and I have people who are...

Pro Lead Brokers USA | Targeted Sales Leads | Pro Lead Brokers USA

2022-11-10T16:37:08+00:00

The leads are more viable, the people are more open to follow up with and I have people who are actually interested in my assistance. I have closed two deals this year alone with leads from Pro Lead Brokers USA. The investment more than pays off!

https://proleadbrokersusa.net/blog/testimonials/the-leads-are-more-viable/

Pro Lead Brokers USA is the leverage

Pro Lead Brokers USA is the leverage I needed to increase production and make sales. My pipeline is full of...

Abigay Williams
SupportNinja

Pro Lead Brokers USA | Targeted Sales Leads | Pro Lead Brokers USA

2022-11-10T17:20:02+00:00

Abigay Williams
SupportNinja

Pro Lead Brokers USA is the leverage I needed to increase production and make sales. My pipeline is full of prospects that are actually talking with me, doing business with me, or have already done business with me.

https://proleadbrokersusa.net/blog/testimonials/pro-lead-brokers-usa-is-the-leverage/

Sweepstakes Leads

Not only have I been pleased with the Sweepstakes Leads I am receiving but also with the customer service provided....

Gabrielle

Customer Serv

www.customerserv.com

Pro Lead Brokers USA | Targeted Sales Leads | Pro Lead Brokers USA

2022-11-10T17:25:06+00:00

Gabrielle

Customer Serv

www.customerserv.com

Not only have I been pleased with the Sweepstakes Leads I am receiving but also with the customer service provided. I strongly recommend using their services, it is 100% worth the investment!

https://proleadbrokersusa.net/blog/testimonials/sweepstakes-leads/

COVID-19 Situation

Under the COVID-19 situation, while our sales team and most of our prospects are forced to work from home in...

Matt B

Mortgage Broker

Pro Lead Brokers USA | Targeted Sales Leads | Pro Lead Brokers USA

2021-08-31T16:26:22+00:00

Matt B

Mortgage Broker

Under the COVID-19 situation, while our sales team and most of our prospects are forced to work from home in Hong Kong, they would connect our prospects over the phone, set up online appointments and online demo for our sales team.

https://proleadbrokersusa.net/blog/testimonials/covid-19-situation/

Price was Fair

Pro Lead Brokers USA has great customer service, and always answered when I called. Rick helped me in targeting my...

Clyde L. Poe

Pro Lead Brokers USA | Targeted Sales Leads | Pro Lead Brokers USA

2021-08-31T16:32:13+00:00

Clyde L. Poe

Pro Lead Brokers USA has great customer service, and always answered when I called. Rick helped me in targeting my Marketing, and the price was fair. And it was exactly what I ordered! I will continue to recommend Pro Lead Brokers USA and I will be coming back!

https://proleadbrokersusa.net/blog/testimonials/price-was-fair/

My Company Needed New Clients

I hired Pro Lead Brokers USA at a time when my company needed new clients dearly. End results, I saw...

Serena R. Koch, Best Hire Inc.

Pro Lead Brokers USA | Targeted Sales Leads | Pro Lead Brokers USA

2021-08-31T16:40:06+00:00

Serena R. Koch, Best Hire Inc.

I hired Pro Lead Brokers USA at a time when my company needed new clients dearly. End results, I saw a return on my investment for at least 10 years. Here are the three biggest reasons to hire SSM: #1) They pay attention to your requests and your unique needs. #2) They watch the bottom line for both of you. #3) They follow through with some very innovative results. My recommendation: Any business can use SSM to get from point A to Z. Please contact me if you have any questions.”

https://proleadbrokersusa.net/blog/testimonials/my-company-needed-new-clients/

Best Sales Leads

“My business continues to grow exponentially with the help of the leads I’ve received!”

Edelberto Magana Guajardo , Sales Representative

Pro Lead Brokers USA | Targeted Sales Leads | Pro Lead Brokers USA

2021-08-31T16:43:55+00:00

Edelberto Magana Guajardo , Sales Representative

“My business continues to grow exponentially with the help of the leads I’ve received!”

https://proleadbrokersusa.net/blog/testimonials/best-sales-leads/

Exceptional Lead Generation Program

I have been extremely satisfied with the number of leads I have received. Some leads are for a quick closing...

Gloria A F

Real Estate Professional

Pro Lead Brokers USA | Targeted Sales Leads | Pro Lead Brokers USA

2021-09-02T06:24:12+00:00

Gloria A F

Real Estate Professional

I have been extremely satisfied with the number of leads I have received. Some leads are for a quick closing and others are for the future. I have had many transactions through this program. One lead I received translated into five closed transactions.

https://proleadbrokersusa.net/blog/testimonials/exceptional-lead-generation-program/

Pro Lead Brokers USA | Targeted Sales Leads | Pro Lead Brokers USA