Here some off-the-beaten-path options to consider, when looking for a first job, a new job or extra income by leveraging your machine learning experience. Many were offers that came to my mailbox at some point in the last 10 years, mostly from people looking at my LinkedIn profile. Thus the importance of growing your network and visibility, write blogs, and show to the world some of your portfolio and accomplishments (code that you posted on GitHub, etc.) If you do it right, after a while, you will never have to apply for a job ever again: hiring managers and other opportunities will come to you, rather than the other way around.
1. For beginners
Participating in Kaggle and other competitions. Being a teacher for one of the many online teaching companies or data camps, such as Coursera. Writing, self-publishing, and selling your own books: an example is Jason Brownlee (see here) who found his niche by selling tutorials explaining data science in simple words, to software engineers. I am moving in the same direction as well, see here. Another option is to develop an API, for instance to offer trading signals (buy / sell) to investors, who pay a fee to subscribe to your service – one thing I did in the past and it earned me a little bit of income, more than I had expected. I also created a website where recruiters can post data science job ads for a fee: it still exists (see here) thought it was acquired; you need to aggregate jobs from multiple websites, build a large mailing list of data scientists, and charge a fee only for featured jobs. Many of these ideas require that you promote your services for free, using social media: this is the hard part. A starting point is to create and grow your own groups on social networks. All this can be one while having a full-time job at the same time.
You can also become a contributor/writer for various news outlets, though initially you may have to do it for free. But as you gain experience and notoriety, it can become a full time, lucrative job. And finally, raising money with a partner to start your own company.
2. For mid-career and seasoned professionals
You can offer consulting services, especially to your former employers to begin with. Here are some unusual opportunities I was offered. I did not accept all of them, but I was still able to maintain a full time job while getting decent side income.
Expert witness – get paid by big law firms to show up in court and help them win big money for their clients (and for themselves, and you along the way.) Or you can work for a company specializing in statistical litigation, such as this one.
Become a part-time, independent recruiter. Some machine learning recruiters are former machine learning experts. You can still keep your full-time job.
Get involved in patent reviews (pertaining to machine learning problems that you know very well.)
Help Venture Capital companies do their due diligence on startups they could potentially fund, or help them find new startups worthy to invest in. The last VC firm that contacted me offered $1,000 per report, each requiring 2-3 hours of work.
I was once contacted to be the data scientist for an Indian Tribe. Other unusual job offers came from the adult industry (fighting advertising fraud on their websites, they needed an expert) and even working for the casino industry. I eventually created my own very unique lottery system, see here. I plan to either sell the intellectual property or work with some existing lottery companies (governments or casinos) to make it happen and monetize it. If you own some IP (intellectual property) think about monetizing it if you can.
There are of course plenty of other opportunities, such as working for a consulting firm or governments to uncover tax fraudsters via data mining techniques, just to give an example. Another idea is to obtain a realtor certification if you own properties, to save a lot of money by selling yourself without using a third party. And use your analytic acumen to buy cheap and sell high at the right times. And working from home in (say) Nevada, for an employer in the Bay Area, can also save you a lot of money.
To receive a weekly digest of our new articles, subscribe to our newsletter, here.
About the author: Vincent Granville is a data science pioneer, mathematician, book author (Wiley), patent owner, former post-doc at Cambridge University, former VC-funded executive, with 20+ years of corporate experience including CNET, NBC, Visa, Wells Fargo, Microsoft, eBay. Vincent is also self-publisher at DataShaping.com, and founded and co-founded a few start-ups, including one with a successful exit (Data Science Central acquired by Tech Target). You can access Vincent’s articles and books, here.
The field of information technology loves to prove that the only constant is change. What intensifies the truth of this axiom is that we are living in the greatest period of technological change that humans have experienced so far. Any one of the several megatrends that are occurring simultaneously could be viewed independently as one of the biggest changes in history.
Together, they are radically changing the way people communicate – and the way networks are built to support them. These trends include 5G, cloud and edge computing, and the IoT (Internet of Things.)While each of these is unique in its own way, one thing they share is the need for a new class of network for the communications infrastructure that underpins the vision.
The rise of 5G
5G is widely considered the fastest–growing mobile technology in history. According to Omdia, telecom operators added 225 million 5G subscribers between Q3 2019 and Q3 2020. As of December 2020, there were 229 million 5G subscriptions globally – a 66% increase over the prior quarter.
With the goal of unlocking unimaginable new services based on greater bandwidth and lower latency, 5G telecommunications operators are building new datacenters and expanding the number of access points to a massive scale.
Cloud and edge computing
Industry experts expect increased investments in edge capacity to reduce latency and support personalized content delivery and custom security policies. Cloud and edge computing datacenter operators continue to expand to hyper scale, delivering more intelligent, localized, and autonomous computing resources for an increasing set of users, and applications.
IoT
The IoT continues to add more “things” as innovators discover its possibilities. This includes the invention of devices and sensors spanning personal use, entertainment and quality of life, to factory automation, safety, connected cities, smart power grids, new applications of machine learning and artificial intelligence, and so much more.
New network requirements
The possibilities that these trends engender necessitate new network designs – and there’s too much at stake for failure to be an option. Datacenter designers and operators have emerged with a clear set of networking requirements that cannot be compromised:
Security: A new distributed architecture is essential in light of so many users, devices and applications able to enter the network in many configurations.
Performance: Line-rate networking with ultra-low latency that scales from 25 gigabit Ethernet in the servers to 100 gigabit Ethernet and beyond in the network links that connect them is paramount.
Agility: In a world of software-defined-everything, the network must provide hardware performance at the speed of software innovation. This equates to programmability that allows the hardware to evolve as fast as new networking protocols and standards do, as well as the threat landscape driven by bad actors.
Orchestration: New methods for automation, configuration and control are needed to orchestrate and manage so many elements at this scale.
Economics: A fundamental requirement for minimal costs shifts the design of networks towards open, standard and commodity off-the-shelf products. Cost at this incredible scale cannot be forgotten.
Efficiency:The central processing units (CPUs),the most valuable and costly resources within the datacenter severs, spend most of their time supporting the applications, services and revenue they were intended for, offloading all burdensome workloads such as networking, data and security processing.
Sustainability: This last piece ensuresthat the network operates as required, but in an environmentally friendly size and power configuration.
How programmable SmartNICs fit in
Due to this new set of non-negotiable requirements, networks are moving away from large, proprietary, expensive, monolithic, vertically integrated systems – and a new method of software-defined networking has emerged.
To address the problems created by the decline of Moore’s Law, a new heterogeneous processing architecture exists where expensive and burdensome workloads are offloaded from the CPUs. This model has proven to be successful in the past, with GPUs offloading video and graphics processing from the CPU. This same model is now being applied for data, network, and security processing. However, the applications underpinned by 5G, cloud, edge computing and IoT often fail on standard servers with basic NICs.
As Moore’s Law ends and other megatrends skyrocket, organizations prefer SmartNICs as the method to avoid these problems and help advance datacenter computing. A SmartNIC’s primary function is to operate as a co-processor inside of the server, offloading the CPU from the burdensome tasks of network and security processing, while simultaneously accelerating applications on multiple dimensions. It uniquely meets the network design requirements for performance, agility, efficiency, security, economics, orchestration and sustainability.
Equipped for the future
5G, cloud and edge computing, and the IoT are radically changing the way people communicate. And that means that these megatrends are changing the way networks are built to support them. Organizations need a new class of network with stringent requirements for their communications infrastructure.SmartNICs enable this network architecture, resulting in high performance at a reduced cost – along with meeting all the other needed requirements. They form a critical element of the future-forward infrastructure organizations need to capitalize on today’s massive technological shifts.
It’s no secret that when you seek trustworthy information on which to base your business and investment strategy, traditional and internal data sources, such as quarterly earnings reports, are no longer enough. This is especially the case when it comes to the financial services world. These stalwarts of a previous era can’t keep up with the current 24/7 business cycle that demands real-time actionable insights. This is where alternative or external data sources come into play. They allow companies to utilize online data from a variety of sources that are only limited by one single factor: the organization’s creativity to integrate them.
Alternative data is not quite living up to its name anymore – it has hit the mainstream. New survey data we’ve commissioned with Vanson Bourne supports this, with 24 percent of financial services professionals using alternative data every day. These organizations have recognized that external information sources are too vital to neglect when building an agile, comprehensive, and winning business or investment strategy. The certainty that they provide is just too undeniable to ignore.
It’s not all smooth sailing
Using alternative data sources is always going to be a subjective endeavor tailored to the specific needs of that organization or business; there is no right or wrong path. Whether it’s looking at a prospective investment, measuring growth by monitoring employee counts on LinkedIn, or tracking trade shipments to predict which way revenue is heading, relying on online data or any external data is no longer a nice-to-have element but a business or market necessity.
These data sources can help lead you to many potential investment routes or to verify whether your current routes will actually produce the desired results. The past year has made this growing realization a well-known fact. The aforementioned study illustrates it with 80 percent of those surveyed expressing that they require more competitive insights from alternative data, and 79 percent hoping to get information on customer behavior from the data.
When the need is so obviously stated, what are the challenges stopping it from being widely incorporated into all organizations? Let’s start with the fact that there’s no single playbook on how to gain actionable insights from these data sources in the most effective way. With 61 percent of financial services professionals citing data analysis as a major bottleneck for integrating the data into operations, this is clearly indicating an operational challenge to overcome.
This challenge is also felt across related industries. Three-quarters of banking professionals surveyed felt that data analysis was the biggest challenge they faced when it came to alternative data. Meanwhile, hedge funds and insurance company employees found it less challenging on average and are gaining the required insights on the go.
What generates this gap in the need for alternative data consumption and the actual integration and utilization? The reasons lie within the organization’s readiness as well as assigned professional resources to look after the entire data sourcing process. In addition, the source of the challenge often goes back to a strict compliance process and demands that inhibit the inclusion of alternative data.
When it comes to insurance companies and hedge funds, the business cases are clearly defined. In the insurance industry, this could mean integrating alternative data – such as crime reports in a target area – into the calculations used to set rates. For hedge funds, it could be using the real-time data to make on-the-spot trading decisions on a frequent basis.
Banks and banking services are known for their more traditional approach and security concerns. However, there are multiple ways alternative data can directly and successfully serve banks’ growing business cases. For example, banks could use alternative data to determine how to market products in different regions or to decide what products to develop to begin with. However, these require a bit more ingenuity when it comes to seeking out data sources and gaining meaningful insights from them.
Traditional banks also tend to be much slower at adopting new technology than some of their financial services counterparts; they require a more “sensitive touch” to their overall compliance needs – needs that can be met with the right kind of ethical data provider. It’s therefore likely that their process for integrating the technology and talent needed to get the best insights is a more intensive and less agile process than it would be at a hedge fund.
What all of this means is that there’s still a massive opportunity for financial services firms to leverage alternative data more effectively. And those that put the resources behind doing it well, will undoubtedly gain a more profound competitive advantage.
It all comes down to strategy and sensitive understanding
Alternative data purely works for the benefit of all financial markets – this fact has been established. It often removes the guesswork from immediate decision-making, which should be the goal of any data initiative. Relying on outdated data or wrong data is frankly a recipe for disaster. This is what has led 64 percent of organizations that use alternative data to leverage it for investing decisions and 59 percent to use it to inform customer service strategies.
In the end, it all boils down to whether you have the following: the right, trustworthy operation that is sensitive to your security requirements, the professional talent to support your competitive strategy with data-driven creativity, and, last but not least, the ability to translate that huge amount of data into winning results! Whether you are a small organization starting out on your data-driven path or a large global bank, your data must set you apart from – and ahead of – all others.
For regular readers of the (lately somewhat irregularly published) The Cagle Report, I’ve finally managed to get my feet underneath me at Data Science Central, and am gearing up with a number of new initiatives, including a video interview program that I’m getting underway as soon as I can get the last of the physical infrastructure (primarily some lighting and a decent green screen) in place. If you are interested in being interviewed for The Cagle Report on DSC, please let me know by dropping a line to editor@techtarget.com.
I recently purchased a new laptop one with enough speed and space to let me do any number of projects that my nearly four-year-old workhorse was just not equipped to handle. One of those projects was to start going through the dominant triple stores and explore them in greater depth as part of a general evaluation I hope to complete later in the year. The latest Ontotext GraphDB (9.7.0) had been on my list for a while, and I was generally surprised and pleased by what I found there, especially as I’d worked with older versions of GraphDB and found it useful but not quite there.
There were four features in particular that I was overjoyed to see:
the use of RDF* and SPARQL* (also known as RDF-STAR, etc. for easier searchability),
the implementation of the SHACL specification for validation
Implementation of the SERVICE protocol for federation
and the adoption of a GraphQL interface.
These four items have become what I consider essential technologies for a W3C stack triple store to fully implement. I’d additionally like to see some consensus around a property path-based equivalent to Gremlin (RDF* and SPARQL* is a starting point), but right now the industry is still divided with respect to that.
There are a couple of other features that I find quite useful in GraphDB. One of the first is something I’ve been advocating for a long time – if you load in triples in Turtle, you are also providing the prefix to namespace associations to the system, but all too often these get thrown out or mapped to some other, less useful term (typically, p1: to pN). Namespaces do have some intrinsic meaning, and prefixes can, within controlled circumstances, do a lot of the heavy lifting of that meaning. When I make a SPARQL query, I don’t want to have to re-declare these prefixes. OntoText automatically does that for you, transparently.
The other aspect of OntoText that I’ve always liked is that they realized early on that visualizations were important – they communicate information about seemingly complex spaces that tables simply can’t, they make it easier to visually spot anomalies, and when you’re talking to a client you don’t want to spend a lot of time building up yet another visualization layer. OntoText has worked closely with Linkurious to develop Ogma.js, a visualization library that is used extensively within their own product but is also available as an open-source javascript library. I still rely fairly heavily on vizjs.org and d3.js) as well as working with the DOT and related libraries out of GraphViz, but having something that’s available “out of the box” is a big advantage when trying to explain complex graph math to non-ontologists.
I’ll be doing a series exploring these points as part of this newsletter, with Ontotext being my testbed for putting together examples. In this particular issue, I’d like to talk about RDF* and SPARQL*, and why these will help bridge the gap between knowledge graphs and property graphs.
Is an Edge a Node?
This seemingly simple question is what differentiates a property graph from an RDF graph. In RDF, an edge is considered to be an abstract connection between two concepts. For instance, consider the marriage between two people: person:_Jane and person:_Mark. In a (very) simple ontology, this distinction can be summarized in one statement:
# Turtle
person:_Jane person:isMarriedTo person:_Mark.
If you are describing the state of the world as it currently exists, this is perfectly sufficient. The edge `person:isMarriedTo` is abstract, and can be thought of as a function. Indeed, Manchester notation, used in early OWL documentation, made this very clear by turning the edge into a function:
person:isMarriedTo(person:_Jane, person:_Mark)
What’s most interesting about this is that while `person:isMarried` itself is abstract (you can think of it as a function with two parameters), the resulting object is NOT in fact a person, but a marriage instance:
These are two separate marriages, two separate instances, and they are clearly not the same marriage.
In an RDF graph, this kind of association typically needed to be described by what amounted to a third form normalization:
marriage:_JaneToMark
a class:_Marriage;
marriage:hasSpouse person:_Jane;
marriage:hasSpouse person:_Mark;
.
There are advantages to this approach: first, you have a clear declaration of type (Marriage), and you essentially invert the marriage function. Additionally, you can add more assertions that help qualify this node:
marriage:_JaneToMark
marriage:hasStartDate “2012-05-15″^^xsd:date;
marriage:hasEndDate “2018-03-02″^^xsd:date;
marriage:officiatedBy person:_Deacon_Deacon;
.
You can also describe other relationships by referencing the created object:
person:hasParents marriage:_JaneToMark;
which makes it possible to ask questions like Who were the children of Jane?
# Sparql
select ?child where {
?child person:hasParents ?marriage.
?marriage marriage:hasSpouse ?parent.
values ?parent {person:_Jane}
}
If you are building a genealogy database, this level of complexity and abstraction is useful, but it does require that you explicitly model these entities and that you then expect that there will be a large number (this is where combinatorics and factorials begin to kick in creating geometric growth in databases).
A property graph short-circuits this by specifying that an edge is a concrete node that can take attributes but not relationships. In effect, the property edge has both the abstract relationship description (`person:isMarriedTo`) and the relationships that exist between the nodes that are bound to this assertion or function. However, this benefit comes from the fact that property graphs essentially have nodes that are inaccessible to other nodes. This tends to make property graphs more optimized for performance, at the cost of making them less versatile for modeling or inferencing.
In some cases it makes sense to explicitly label these edge relational models, but sometimes you just don’t have that luxury: for instance, what happens when you get data in the form
person:_Jane person:isMarriedTo person:_Mark.
You can take a shortcut here by creating a reification, a very fancy word that means creating a reference to a triple as a single assertion. Reifications actually were in place why back in the first RDF proposal, and look something like this:
marriage:_JaneToMark
rdf:subject person:_Jane;
rdf:predicate person:isMarriedTo;
rdf:object prerson:_Mark;
.
Introducing RDF* Statements
The problem with this is that it is fairly unwieldy. RDF* and SPARQL* were proposals that would provide a syntax that would do much the same thing in Turtle and SPARQL respectively but without the verbosity. The syntax was to wrap double angle brackets around a given triple to indicate that it should be treated as a reification:
<<person:_Jane person:isMarriedTo person:_Mark>>
a class:_Marriage;
marriage:hasStartDate “2012-05-15″^^xsd:date;
marriage:hasEndDate “2018-03-02″^^xsd:date;
marriage:officiatedBy person:_Deacon_Deacon;
.
In effect, this creates a uniquely identifiable URI that represents a triple. It’s worth noting that this doesn’t affect any of the components of that triple. For instance, I can still say:
person:_Jane
a class:_Person;
person:hasFullName “Jane Elizabeth Doe”^^xsd:string;
person:hasBirthDate “1989-11-21″^^xsd:date;
.
Reification in the RDF (in either guise) can somewhat simplify the generation of RDF, but the more powerful benefits come when you provide support in SPARQL for the same concept.
For instance, SPARQL* supports three functions rdf:subject,rdf:predicate and rdf:object. This comes in handy for doing things like finding out whom Jane Doe was married to on January 1, 2015.
select ?spouse where {
VALUES (?personOfInterest ?targetDate) {
(person:_Jane “2015-01-01″^^xsd:date)
}
?marriage a class:_Marriage.
bind(rdf:subject(?marriage) as ?spouse1)
bind(rdf:object(?marriage) as ?spouse2)
filter(?personOfInterest IN (?spouse1,?spouse2)}
?marriage marriage:hasStartDate ?startDate.
bind(if(?sameTerm(spouse1,?personOfInterest),?spouse2,?spouse1) as ?spouse)
OPTIONAL {?marriage hasEndDate ?endDate}
filter(?targetDate >= ?startDate AND ?endDate > ?targetDate)
}
This example is more complicated than it should be only because person:isMarriedTo is symmetric. What’s worth noting, however, is that, from SPARQL’s standpoint, the reified value in ?marriage is a node just like any other node. If the predicate hadn’t been symmetric, the expression:
bind(rdf:subject(?marriage) as ?spouse1)
bind(rdf:object(?marriage) as ?spouse2)
filter(?personOfInterest IN (?spouse1,?spouse2)}
?marriage marriage:hasStartDate ?startDate.
could have been replaced with
?marriage rdf:subject ?personOfInterest.
?marriage rdf:object ?spouse.
There are two additional functions that SPARQL* supports: rdf:isTriple() and rdf:Statement. The first takes a node and tests whether it has the rdf:subject,rdf:predicate and rdf:object predicates defined for the entity, while the subject creates a triple statement from the corresponding node URIs:
It’s also worth noting that if any of the three components are null values, then the rdf:Statement() function will also return a null value and the isTriple() function will return false().
RDF* Duplications
So when would you use reification? Surprisingly, not quite as many as you’d think. One seeming obvious use case comes when you want to annotate a relationship without creating a formal class to support the annotation. For instance let’s say that you have a particular price associated with a given product, and that price changes regularly even if the rest of the product does not. You can annotate that price change over time as follows:
This annotational approach lets you track the price evolution of a given product over time, and also provides a way of indicating whether the current assertion is active. Of course, if the price changes away from “21.95” then changes back, you suddenly end up with multiple hasLastUpdate, hasApprovalBy and isActive assertions – unless the new reification has a different URI than the old one does.
This can lead to some unexpected (though consistent) results. For instance, if you create a sequence in SPARQL, such as:
it’s entirely possible that you will get multiple values returned.
This actually gets to a deeper issue in RDF, which is the fact that reifications can create a large number of seemingly duplicate triples that aren’t actually duplicates, especially if reification was an automatic capability of creating triples in the first place. In effect, it requires another field in what is already becoming a crowded tuple, as “triples” now have one additional slot for graphs, a second slot for types in objects, and now a third slot (for a total of six) for managing a unique URI that acts as the reified URI for the entry, when you’re talking about potentially billions of triples, each of these slots has a cumulative effect in terms of both performance and memory requirements.
RDF Graphs Are Property Graphs Event Without RDF*
This raises a question – do you need RDF*? From the annotational standpoint, quite possibly, as it provides a means of tracking volatile property and relationship value changes over time. From a modeling perspective, however, perhaps not. For instance, the marriage example given earlier in this article can actually be resolved quite handily by simply creating a marriage class that points to both (or potentially more than both) spouses with a `marriage:hasSpouse` property, rather than attempting to create a poorly considered `person:isMarriedTo` relationship.
Put another way – RDF* should not be used to solve modeling deficiencies. The marriage example without RDF* is actually easier to articulate and understand as just one instance of this.
select ?spouse where {
VALUES (?personOfInterest ?targetDate) {
(person:_Jane “2015-01-01″^^xsd:date)
}
?marriage a class:_Marriage.
?marriage marriage:hasSpouse ?spouse.
filter(?personOfInterest IN ?spouse and not(sameTerm(?personOfInterest,?spouse)))
?marriage marriage:hasStartDate ?startDate.
OPTIONAL {?marriage hasEndDate ?endDate}
filter(?targetDate >= ?startDate AND ?endDate > ?targetDate)
}
The primary illustration of this comes in the area of paths. Property graphs are used frequently in path analysis, where the goal is to either minimize or maximize the aggregate values of a particular property that crosses a path. A good example of this would be airline maps, where an airline flies certain routes, and the goal is to minimize the distance travelled to get to a particular airport from another airport. Again, this is where modeling can actual prove more effective than trying to emulate property-graph-like behavior.
For instance, you can define an airline route as the path taken between airports to get to a final destination. While you COULD use RDF* for this, you’re probably better off putting the time into modeling this correctly:
In this particular case, calculating flight distances for different routes, while not trivial, is doable. The trick is to understand that sequences in RDF are represented by rdf:first/rdf:rest chains, where rdf:first points to a given item in the sequence, and rdf:rest points to the next pointer in the chain:
select ?route (sum(?flightDistance) as ?totalDistance) where {
It’s always worth remembering that RDF graphs generally work best by capturing derived information. Thus, once such route distances are calculated once, they can always be stored back into the graph to minimize the overall computational costs. In other words, with knowledge graphs, the more information that you can index, the more intelligent the overall system becomes:
# Sparql
insert {?route route:hasTotalDistance sum(?flightDistance) where {
Reification plays a big part in managing annotations, and a lesser role in operational logic such as per property permissions, and for that, RDF* and SPARQL* provide some powerful tools. However, in general, intelligent model design is all that is really needed to make RDF graphs both as efficient and as flexible as labeled property graphs. That RDF is not used as much tends to come down to the fact that most developers prefer to model their domain as little as possible, even though the long-term benefits of intelligent modeling make such solutions useful for far longer than plug-and-play labeled property graphs.
Another area worth exploring is the ability to extend SPARQL through various tools. I’ll explore this in more detail in my next post.
2020 demonstrated how helpless humankind could be when facing an epidemic. It is clear now that healthcare systems in many countries need to be transformed. And this transformation is impossible without leading-edge technology as the challenges are enormous. Like many other spheres, healthcare is undergoing a digital revolution. Data Science, Machine Learning, and Artificial Intelligence will be dominating the healthcare software development sphere in the decades to come. They will help create unprecedented products and systems that will save and improve the lives of people globally: software for hospitals, such as digital workplaces for healthcare professionals, drug prescription assistance, medical staff training, coordinating care and health information exchange; for patients – healthcare chatbots, virtual medicine apps; for device manufactures – medical apps for users, cloud solutions for data storage and management; for pharmaceutical companies – systems for drug testing and medication guidance, etc. Overall, healthcare technology pursues the following ambitious goals:
Building sustainable healthcare systems
Improvement of patient-doctor interactions
Prevention of epidemics
Making a breakthrough in curing cancer, AIDS, and other diseases
Increase in life expectancy and quality
Let’s have a closer look at technology trends in the medical industry, which promise a healthier future for us all, with some looking truly mind-blowing even in the hi-tech age.
Healthcare technology – industry trends and solutions
1. Telemedicine and personal medical devices
The Covid-19 pandemic created a pressing need to reduce contact between patients and healthcare workers and caused an upsurge in telehealth services’ popularity. Smart wearables are a crucial component for telemedicine as they enable access to real-time patient data, such as blood pressure and oxygen saturation, heart rate, etc., for physicians. Several companies are working to create a multifunctional device that will measure all key vital parameters. The doctor can conduct an exam remotely, track changes in the patient’s condition, and adjust the treatment. One of them isMedWand, which offers cloud-based systems for smooth virtual healthcare sessions.
The new generation of telemedicine software will ensure a remarkably high security level for electronic health records thanks to Blockchain and cloud data storage.WebRTCis among the key technologies that underpin the success of telehealth apps. Google’s open-source project enables API-based real-time interaction between mobile apps and web browsers with different data types, such as audio and video. App integration with smartphone health trackers, like AppleHealthKit, is also a must. Smartphones will turn into a min-lab, equipped with microscopes and sensors to analyze swab samples and detect abnormalities.
Despite fraud concerns, telemedicine popularity isforecastedto grow after the pandemic ends as virtual appointments are less stressful and time-consuming than conventional hospital visits. From the healthcare practitioner’s perspective, they allow providing service to more patients daily.
2. Artificial Intelligence
Artificial Intelligence is omnipresent and disrupts medicine like many other domains. Here are just a few medical objectives that can be reached with the help of AI:
Development of personalized treatment plans with AI-driven analytics
Accelerated design of new effective drugs. (e.g., Machine Learning enabled the development of the Covid-19 vaccine by identifying viral components responsive to the immune system.)
Considerable improvement of early diagnostics, automated image classification, and description (e.g., Google’s DeepMindcreated an AIfor more accurate detection of breast cancer)
Collection, processing, and storage of medical records.
Automation of monotonous jobs and eliminating paperwork for the hospital staff.
Epidemic prevention and control (e.g., analysis of thermal screening, facial recognition of masked people)
3. Virtual and Augmented Reality
Virtual and Augmented Reality tools are in wide use for educational and entertainment purposes across numerous industries. In medicine, the range of applications includes simulations for health professionals training, planning complex surgeries, diagnostics, anxiety and pain therapy, and rehabilitation (for instance, dealing with motor deficiencies, memory loss). Companies likeImmersiveTouchandOsso VRprovide virtual platforms for surgeons and hospital staff. Interestingly, VR headsets have proveneffectivefor alleviating pain through sound and color therapy. Augmented Reality screens help surgeons make better decisions during emergencies. AR also streamlines robotic surgeries.
4. Nanotechnology
Nanomedicine is only emerging and will probably have a slow adoption by patients. Although having invisible robots performing surgery or delivering medicine to the exact organs or cells is not for the weak-hearted, there is a vast range of no-daunting applications. Miniature devices likePillCamserve non-invasive diagnostics purposes. Another direction for nanotechnology is smart patches with biosensors. The Medical Futurist Journal features a fascinatingoverviewof the latest products and solutions presented at the Consumer Electronics Show (CES) in 2020. Among the wow gadgets, there is a patch for continuous wound monitoring from a French companyGrapheal. In October 2021,NanoMedicine International Conference and Exhibitionwill take place in Milan, Italy. The list of themes for discussion shows the immense potential of nanotechnology.
5. Robotics
Robotics is booming and has a wide range ofhealthcare applications. Exoskeletons with wirelessbrain-machine interface, robotic limbs, surgical robots, robotic assistants, andcompanionsfor disabled people – the future has come! Importantly, disinfectant and sanitary robots may play an essential role in preventing epidemics as they cannot contract a virus while taking care of infected patients. There is only one downside – robotic medicine is expensive. For instance, the famousda Vinci Surgical Systemcosts over 1million US dollars
6. 3D printing
Nowvirtually anythingcan be printed – human tissues and organs, models and prosthetics, medical devices and pills. This field requires highly sophisticated software solutions for processing medical images, segmentation, mesh editing, and 3 D modeling.
7. In silico medicine trials
The creation of virtual organs (organs-on-a-chip) for simulated clinical trials is taking off. Computer mathematical models of human anatomy and physiology will allow testing new drugs on thousands of virtual patients in very short timeframes. The ethical aspect is also essential – this futuristictechnologywill help decrease or eliminate tests conducted on animals and human volunteers.
Conclusion
With all the amazing medical technology trends in mind, we can expect many breakthroughs in the upcoming decades. However, what already exists is not available or affordable for the developing countries’ population, who are often deprived even of basic healthcare. Therefore, like in education and finance, accessibility is among the main challenges technology faces today.
SaaS is web-based software accessible through the internet. Since SaaS adopts cloud computing technology, there’s no need for installing desktop applications — users simply subscribe to a service hosted on a remote server. For example, Netflix is a B2C SaaS platform that offers licensed videos on-demand and follows a subscription model.
The global SaaS market is expanding rapidly andwill hit$436.9 billion in 2025. While COVID-19 is causing dramatic shifts in business due to telecommuting and social distancing, more companies rely on the SaaS model. Let’s take a closer look at the benefits of SaaS for business.
Benefits of B2B SaaS solutions
The SaaS modelbenefitssoftware providers and their customers. For developers, SaaS allows a recurring revenue stream and faster deployment compared to traditional on-premise software. For companies, SaaS offers the chance to reach a wider audience and save software development and maintenance costs.
Here are the top advantages of choosing SaaS for business:
Cost-effective. A software vendor holds all maintenance and infrastructure costs. Accessible. SaaS products can be accessed from anywhere via a web browser. Scalable. Customers can change their usage plans anytime without hassle. Easy to integrate. SaaS solutions support multiple integrations with other platforms. Secure. The decentralized nature of cloud-based technology protects user data from breaches and loss. Come with free service. SaaS provides automated backups, free updates, and swift customer support. Easy to use. A friendly interface and simple user flow make SaaS easy to use for everyone, regardless of their technical skills. Offer free trials. Most SaaS vendors allow you to try it before you buy it.
The best part is that SaaS fits all industries and company sizes, so both large and small businesses can benefit from it equally. In the next chapter, we will explore different categories along with the most prominent examples of SaaS software.
The most common types of SaaS for business
As we mentioned earlier, the SaaS market is huge and highly segmented. The cloud service model has reached all business niches and generates billions of dollars in revenue for SaaS companies. For example, Salesforce — the world’s largest SaaS provider — made $17,1 billion in 2020 alone.
While the complete list of SaaS categories can be extensive, we will cover the most widely used ones.
Data Curation can be defined in different ways. Roughly put, data curation entails managing an organization’s data throughout its life cycle.
Data Curation is a way of managing data that makes it more useful for users engaging in data discovery and analysis. It can also be termed as the end-to-end process of creating good data by identifying and forming resources with long-term value. The main goal of data curation is to make data easily retrievable for future use.
Role of data curation in data management
Acts as a bridge
Data Curation facilitates collecting and controlling the data that all can make use of it in their various ways. Without Data Curation, it would be inconceivable to get, process, and validate data in a given organization. As it acts as an overpass, there’s an increasing emphasis on leveraging the powers of Data Curation.
Organizes the data
Data Curation arranges the data that keeps piling up every moment. No matter how huge the datasets may be, Data Curation can help us systematically manage them so that the analysts and scientists can approach them in a format most suitable. Once it is organized in a convenient way for data scientists, they can use it to fetch insights that the business can rely on. However, it all pivots on how well you use it to organize the data.
Manages data quality
You can control Data Curation to beware of the quality of the data. You can make sure that good data remains with you, and you let go of that which is not applicable. Data analysts and data scientists will realize that Data Curation has taken care of the quality and will be able to believe the data provided to them. In the age of big data and surplus data in a way, one can get lost entirely without Data Curation on one’s side. Therefore, there’s a growing recognition in the data industry to capitalize on Data Curation and ensure quality control.
Makes ML more effective
Machine Learning algorithms have made big strides towards understanding the consumer space. AI consisting of “neural networks” collaborate and can use Deep Learning to recognize patterns. However, Humans need to intervene, at least initially, to direct algorithmic behavior towards practical learning. Curations are about where the humans can add their knowledge to what the machine has automated. This results in prepping for intelligent self-service processes, setting up organizations for insights.
Speeds up innovation
Organizations are looking to identify ways to manage data most effectively while establishing a collaborative ecosystem to enable this efficiency. Data Curation enhances collaboration by opening and socializing how data is used.
What is the future of data curation?
Organizations and businesses continue to work and understand the concept of big data. Data has proven how important it is in opening up previously unknown fronts in the running of organizations and the achievement of results.
As data continues to pile, businesses will increasingly invest in data curation for better processing and analysis to improve operations and drive better results.
Data curation will soon become the distinguishing feature between organizations and businesses. That will effectively harness the power of data curation, are set to become the most successful, and will leap ahead of their counterparts in the market.
Capitalizing on data curation will make organizations crystallize the stockpiles of data and see its worth. Leveraging smart data curation platforms ensure that a business is powered by clean, useful data to make it gain a competitive advantage and take a lead position in the market.
Welcome to the second installment of our ModelOps blog series, where we dive deep into the next step in the ModelOps pipeline, Model Training. During Model Training, we feed large volumes of data to our model so it can learn to perform a certain task very well. This blog follows the first post in our series where we cover everything you need to know about Data Acquisition and Preparation and discuss how foundational it is for successful Artificial Intelligence (AI) investments. If you missed it, check as a great lead-in to this post.
Training a model to learn how to execute a task is just like any other action that requires learning. Training a dog to “sit” requires countless hours of practice and repetition until the dog obeys consistently. Training to run a marathon requires months of long-distance running. Learning how to ride a bike takes time with gradual improvement until it clicks. In each of these repetitive-learning scenarios, we invest a nontrivial amount of time and resources to master a given task. As you develop your model training process, it is important to consider the time, resources, and corresponding costs of each.
Three Categories of Model Training Considerations
1. Model Approach Considerations
What is the objective for this model and how do we measure success?
Is our objective comparable to an existing state of the art (SoA) solution on the market or completely novel?
How will our data (structured or unstructured) impact the modeling approach we take?
Do we need to build a solution entirely from scratch or can we use open source architectures as a starting point?
2. Tools & Frameworks Considerations
Which machine learning framework and programming language are we going to use?
Are we going to build custom training code, use a tool that can help us, or use a combination of both?
3. Experimental Considerations
How long will it take to train our model and what hardware resources do we need? What is the overall investment to successfully complete this process?
Does our experiment require hyperparameter optimization?
How are we going to monitor the training experiments and implement early stopping if needed?
When building out a model training process, the first thing to consider is the business application and the goal of the project. Odds are high you have already defined your goal before you arrived at this step in your AI journey. However, revisiting the objective and defining how you are going to measure its success is critical when deciding on modeling approaches. The quickest way to achieve your goal while saving time and money is to start by exploring which models might be available within the open source community. Many SoA architectures exist in open source form and can serve as the foundation for your data scientists to build upon. Rather than waste investment dollars reinventing the wheel, your team can use one of these existing architectures that other AI researchers spent months developing (e.g., ResNet variants for image classification, RCNN for object detection, or BERT for Natural Language Processing). Selecting an already available model that aligns to your objective as your model base will save your teams a considerable amount of development time and costs.
Once you have defined your objective and chosen a solid base model to build upon, the next decision to consider involves the tools and frameworks you will use to build your model. Right off the bat, you will want to decide whether using a training platform is the route to take or if you instead prefer to build your pipeline by hand. Low- and no-code commercial and open source training platforms (e.g., AWS SageMaker, RunwayML) can in some cases automate most of this build process for organizations. While thinking about the tradeoffs that come with this approach, it is important to make this decision based on the maturity of your data science team and ultimately settle on a solution that will allow your team to optimize its build process.
In the case that you decide to build your training pipeline from scratch, most common programming languages (e.g., Python, R, Java) provide access to user-friendly libraries that make ML feasible to implement. Moreover, many common ML frameworks, including Tensorflow, PyTorch, scikit-learn, and others, make common AI architectures and other ML techniques easily accessible. With many of these open source languages and frameworks at your fingertips, your team should ultimately select the tools with which they are most comfortable. Tool familiarity will reduce, and hopefully eliminate, any learning curve expenses. Additionally, by using familiar tools, your team can efficiently set up all model training code and kick off their experiments without any unnecessary delays.
With a plan in place and tools identified, there are several implementation and execution factors to consider—starting with hardware. Training a ML model is computationally expensive, and takes time. To minimize this training time, developers commonly leverage GPUs because they can process multiple computations in parallel. But this hardware comes at a cost, and requires teams to balance their objectives against the amount of GPU usage they need, the number of training experiments required, and the budget. Once the hardware decision is confirmed, your development team should consider a set of experimental conditions to include:
Will your dataset require data augmentation to increase the variation of samples your model processes? If so, your team will need to add in a preprocessing step during training that might randomly perform flips, translations, rotations, scales, crops or noise additions to the data.
Would your model benefit from hyperparameter optimization? In other words, should you set up your experiment to test several combinations of loss functions, optimizers, learning rates, batch size, and number of layers? Free products like Ray Tune and MLflow make it easy for teams to set up these experiments and constantly monitor results during training.
Building a successful model training process requires several key considerations. As a result, this step in the ModelOps process can be the lengthiest and most computationally expensive direct cost. To minimize budgetary impact, it is important to select the correct combination of model approach, tools, frameworks, and experimental factors that best suit your use case and technical team maturity.
At the end of the day, remember that every AI investment is unique and will look different. For those reasons, it is important to always make these decisions based on what makes the most sense for your organization.
Make sure to check out our next blog in the ModelOps series: Model Code Versioning: Reduce Friction. Create Stability. Automate. To learn more, visit modzy.com.
Globally, a wealth of data is collected and stored each day. Currently, more than 2.5 quintillion bytes of data are created each day. By 2025, it is estimated that the world will create 463 exabytes of data each day.
It can bring transformative benefits to businesses and societies around the world if interpreted correctly with the help of data science.
As per Gartner, data science holds the key to unveiling better solutions to old problems.
Also, according to the International Data Corporation, data science is the key for industries to provide analysis and light on best practices for avoidingdata breaches and attacks.
In fact,two-thirds of companieswith formal customer programs are already leveraging data science to help them make sense of their data.
Let’s go through some of the best ways to leverage data science for customer management:
1- Customer Segmentation
Customer segmentation is a powerful method for businesses to identify unsatisfied customer needs. It involves arranging customers into homogenous groups based on factors, such as demographics, preferences, or purchase history.
Machine Learning, one of the branches of Data Science, uses clustering algorithms to facilitate customer segmentation. It split your customer base into groups by common interests.
Here are the steps to segment customers using ML:
Build a business case:Know the purpose of using ML and AL. For example, your business case can be to find the most profitable customer group.
Create data:Find the number of customers you have. More numbers will be beneficial for customer segmentation deep learning. Also, set different measurable attributes based on the best metrics for your business. For example, average lifetime value, retention rate, client satisfaction, etc. Tools such aspandasare helpful for data preparation.
Apply K-means clustering:K-means clusteringis an unsupervised ML algorithm method. Unsupervised algorithms do not have a labeled data to assess their performance. K-means clustering helps arrange data into more similar clusters.
Choose optimal hyperparameters:Hyperparametersare the properties that govern the training process. Hyperparameter optimization helps to find the most rewarding customer group based on past work.
Visualization and interpretation:Visualize and interpret the findings once you have profitable customer profiles using the above steps. It helps you improve your marketing campaign, targeting potential customers, and building a product map. You can usePlotly Pythonfor making interactive graphs and charts.
Thus, it makes it easy to cross-sell and upsell business products. When customers receive content relevant to their needs, they are more likely to make a purchase.
Moreover, it builds customer loyalty to your brand as your business adds to their lives.
The best way to find deep, real-time insights and predict user behavior and patterns are by using predictive analytics tools. The top predictive analytics software and service providers include Acxiom, IBM, Information Builders, Microsoft, SAP, SAS Institute, Tableau Software, Teradata, and TIBCO Software.
Once you have selected the software, use an appropriate predictive analytical model to turn past and current data into actionable insights.
Predictive models using business data generate informed predictions about future outcomes and revamp business decision-making processes. The business data can include user profiles, transaction information, marketing metrics, customer feedback, etc.
The typical business model for customer management that you can use is theCustomer Lifetime Value model. It finds out customers who are most likely to invest more in your products and services.
Now, you have to choose a predictive modeling technique. Model users have access to endless predictive modeling techniques. The widely supported technique across predictive analytics platforms for customer management is the decision tree. It determines a course of action and shows a statistical probability of a possible outcome.
In the long term, predictive analytics is more cost-effective than losing a customer.
3- Provide Better Personalized Services
Providing personalized services is a great way to build relationships with your customers. It helps increase sales by offering them products and services as per their interest.
One of the smartest ways to provide personalized services is through artificial intelligence. AI software is efficient, spends less time searching for solutions, and works with multiple customers at the same time.
The AI-powered chatbots and virtual assistants initiate the conversation with a customer, help with routing, engagement, and interaction. The chatbots trained with natural language processing easily answer questions and collect critical customer insights.
Software such asZendesk live chatoffers you the flexibility to reach customers in real-time and build amazing conversational experiences.
4-Analyze the Trends and Sentiments on Social Media
Social media is one of the most crucial sources of data. Data science and big data take advantage of these large volumes of spontaneous and unstructured data.
Social media sentiment analysis allows businesses to learn more about their customers. It helps them understand how their customers feel about their brand or product. ML automatically detects the emotion of online conversations, classifying them as positive, negative, or neutral.
There are three best ways to use data science or ML for effective social media customer management:
Social media monitoring:It allows you to monitor and regulate social media content that is necessary for better customer service. There are built-in analytics tools in platforms such as Instagram and Twitter. These tools measure the success of past posts, such as likes, clicks, comments, or views.
Sentiment analysis:It judges the sentiments of a text using NLP. It helps analyze social media conversations as positive, negative, or neutral. You can utilize sentiment analysis for customer support and collecting feedback on new products.
Image recognition:Computer vision recognizes brand logos and product images without texts. It is helpful when customers upload pictures of your products without directly mentioning or tagging the brand. In that case, with the help of image recognition, you can notice it and send the targeted promotion.
Tools such asMicrosoft AutomateandPower BIservices help you track feedback for your company. These tools have built-in algorithms that consider words, such as good, awesome, or happy as positive sentiments and words like horrible, or worst, as bad sentiments. Moreover,there are social media engagement toolsto measure the engagement levels of your content.
You can leverage these tools to:
Decide times for the best performance of your company’s product launches.
Predict the best times and audience types for your campaigns.
Access and analyze competitor data to improve processes in your company.
Final Thoughts
The world of data analysis is evolving. In the coming years, you will see business disruptions in almost every sector powered by data. It will result in an increasing demand for data science.
Data science and its branches, including AI and ML, allow businesses to track and understand customer data. It helps them communicate with their consumers to increase revenue and make better decisions. The key to leveraging data science for maximum returns is being able to visualize and take action on the large volumes of data.
It’s an extraordinary apparatus to use for information investigation and finding significant bits of knowledge. Yet, let us broadly expound and find out about the favorable circumstances and detriments of Power BI so you can have some premise to contrast it and different devices.
To start with, we should rapidly overhaul Power BI.
What is the Power BI?
Power BI is a cloud-based business insight administration suite by Microsoft. It is utilized to change over crude information into important data by utilizing natural perceptions and tables. One can without much of a stretch dissect information and settle on significant business choices dependent on it. power bi training is an assortment of business knowledge and information representation devices, for example, programming administrations, applications, and information connectors that together comprise Power BI.
We can utilize the datasets imported in Power BI for information perception and investigation by making sharable reports, dashboards, and applications. Power BI is an easy to use instrument offering great simplified highlights and self-administration capacities. A client can convey Power BI on both on-premise and on-cloud stages.
In the picture given beneath, view the cycle stream in Power BI.
Power BI Process Flow – Power BI Pros and Cons
Stars of Power BI
Let us examine probably the most basic preferences of Power BI which assumes a vital part in creating Power BI a fruitful device.
1. Reasonableness
A significant preferred position of utilizing Power BI for information investigation and representation is that it is reasonable and moderately economical. The Power BI Desktop adaptation is liberated from cost. You can download and begin utilizing it to make reports and dashboards on your PC. In any case, on the off chance that you need to utilize more Power BI administrations and distribute your reports on the cloud, you can take the Power BI Cloud administration answer for $9.99 per client every month. Hence, Power BI is offered at a reasonable cost when contrasted with other BI instruments.
2. Custom Visualizations
Power BI offers a wide scope of custom representations for example perceptions made by designers for a particular use. Custom visuals are accessible on Microsoft commercial center. Notwithstanding the overall arrangement of perceptions accessible, you can utilize Power BI custom representations in your reports and dashboards. The scope of custom perceptions incorporates KPIs, maps, diagrams, charts, R content visuals, and so on
3. Dominate Integration
In Power BI, you additionally have the choice to transfer and view your information in Excel. You can choose/channel/cut information in a Power BI report or dashboard and put it in Excel. You would then be able to open Excel and view similar information in even structure in an Excel bookkeeping page. All in all, Power BI’s capacity of Excel joining causes clients to view and work with the crude information behind a Power BI representation.
Investigate the least demanding strategy to Create a Dashboard in Power BI
4. Information Connectivity
Another significant bit of leeway of utilizing Power BI as your information investigation instrument is that you can import information from a wide scope of information sources. It offers information availability to information documents, (for example, XML, JSON), Microsoft Excel, SQL Server information bases, Azure sources, cloud-based sources, online administrations, for example, Google Analytics, Facebook, and so on Notwithstanding this, Power BI can likewise get to Big Data sources straightforwardly. In this way, you will get a wide range of information sources to associate with and get information for investigation and report making.
Cons of Power BI
After the focal points, it’s an ideal opportunity to illuminate the weaknesses of Power BI.
1. Table Relationships
Power BI is acceptable with taking care of basic connections between tables in an information model. Yet, if there are perplexing connections between tables, that is, on the off chance that they have more than one connection between tables, Power BI probably won’t deal with them well. You need to make an information model cautiously by having more remarkable fields so that Power BI doesn’t confound the connections with regards to complex connections.
2. Setup of Visuals
By and large, you probably won’t want to design and advance representations in Power BI. However, regardless of whether you do, Power BI doesn’t give numerous alternatives to arrange your representations according to your prerequisites. Consequently, clients have restricted choices for what they can change in visuals.
3. Swarmed User Interface
The UI of Power BI is regularly discovered swarmed and cumbersome by the clients. It is as if there are numerous symbols of alternatives that block the perspective on the dashboard or report. Most clients wish that the UI or the report canvas was more clear with fewer symbols and choices. Additionally, making looking over dashboards is a local element.
4. Inflexible Formulas
As we probably are aware, the articulation language used to manage information in Power BI is DAX. Be that as it may, you can play out a lot of activities utilizing the DAX equation in Power BI, it is as yet not the least demanding language to work with. Now and again the equations you make function admirably in Power BI, some of the time they don’t. You can connect up to two components however linking an overabundance of settling articulations.
Outline
This closes our conversation on the advantages and disadvantages of Power BI. Even after experiencing some of Power BI’s general disservices, we are sure that Power BI is an extraordinary device for information representation and information examination. Additionally, power bi certification is continually taking a shot at creating upgrades in it so we can anticipate that better forms should come.