Computer Vision (CV) is everywhere, in the military, in medical, in academics, in logistics… CV touches
virtually all segments of the economy and society. (Military) Drones in Ukraine have a vital role to detect
and neutralise potential threats, identify objects of interest - they rely on CV. (Logistics) Automated
drone delivery leverages computer perception to manoeuvre in remote areas or affluent suburbs - that relies
on CV. (Social) Startups helps you find clothing items in retailers catalog by matching pieces detected in
your pictures - they rely on CV. (Medical) Doctors increasingly depend on computer assisted diagnostics and
VR to identify suspected cancerous cell and operate remote brain areas - they rely on CV. Beyond CV’s
business impact, individuals should be able to leverage computer perception to serve their social projects,
concepts, and interests, beyond Apple or Google shy attempts. CV has a clear impact on growth, revenues,
profits and society.
All industries are trying to build sophisticated CV products heavily depending on what we call Vanilla CV
(VCV) namely classification, detection and segmentation. While essential, the implementation of VCV in
industry has been plagued by high attrition rates due in most part to labelling outsourcing, over reliance
on big data, poor data quality, [absence of clear internal data
strategy](https://sloanreview.mit.edu/projects/reshaping-business-with-artificial-intelligence/) within
companies and misalignment between ML engineers and subject matter experts. We believe that, at the roots of
it all, is the lack of standardised efficient ML tools to for humans to teach machines. At **l`école we are
setting CV back on the right path by implementing Machine Teaching, infusing human expertise, distribution
frequency and business consequences into the selection of training data so that the most relevant outputs /
predictions are produced.
Let’s backtrack to clarify why think that standardisation is key. Imagine a world where each construction
company has to manufacture their own nails in addition to building structures. Results from one construction
company to another would vary as the hired talents and the knowledge of nail manufacturing would differ,
resulting in buildings with unreliable safety standards, delayed completion, and increased costs. The
building industry has standardized and streamlined nail production to focus on the practice of construction
with reliable and consistently well built nails. Companies dealing with large dataset face that very issue
with CV, a tool as fundamental as nails. They allocate too much resources towards meagre results building CV
tools internally or getting access to third parties tools or outsourcing it altogether. By the time they get
to every companies’ purpose - build revenue generating products or services - getting the necessary CV has
at best diminished their ROI. At worst the company has wasted some of its precious resources and failed.
Said plainly, companies keep overpaying for underperforming AI.
L`école is building the first standardised and consolidated ML platform implementing the concept of Machine
Teaching, that is ML done right, cheaper, and better by the subject matter expert to achieve business
impacting results. That means that we are developing business oriented and reliable ML products built for
companies depending on ML. Our vision does not stop to businesses. Strong supporters of open source and Ai
for good, we are building the first fun, accessible, non-threatening AI platform for everyone’s need through
a gamified interface that is libre for non-profit project. Our straightforward value prop materialises into
a consolidated Machine Teaching platform for visually enhanced human-in-the-loop data evaluation, continuous
ground truthing, clear calls for actions and actionable performance reporting. Our product is built upon two
interfaces - data and user. We’ll get back to that but first, we would like to take a short detour and
discuss why 9 out of 10 ML projects fail. If you are in a hurry or if you are in the know, please jump to
the UX section below.
Since their beginnings, organisms have tried to capture a finer picture of their environment statistical regularities - labels - with increasing sophistication. Humans, first with the emergence of their associative cortex then with the development of mathematics and computation, have been able to abstract their environment with ever higher and finer predictive power. Until recently and with the notable exception of parts of physics and maths, our predictive power was limited to ad hoc recipes combining linear relations between variables selected by intuition, experience and, sometimes, reflecting biased assumptions. Following the development of neural science and neuroscience in the 60s, researchers formalised the principles that would break the limitations of legacy mathematics, capturing non linear relationships between observations and giving raise to the emergence of Convolutional Neural Network (CNN). Combined with computation’s exponential developments, CNN allowed the development of prediction algorithms associating non linear variables. Humanity had finally solved its longstanding hard limit when making inferences based on large dataset analysis and ML was within hand’s reach. Directly stemming out of CNN, Deep Learning is a true step toward general machine intelligence enabling a unique type of statistical learning : representation learning AKA learning what is stationary in a distribution of observations. Applied to image classification, representation learning does not classify images per se, it learns features to compress images from millions of pixels to hundred(s) of value vectors that are then analysed with classical maths approaches to predict classes. Unsupervised and supervised ML using Deep Learning learns the features’ non linear relations and interdependencies that humans cannot grasp, no more, no less.
The most widely used and operational form of supervised ****ML, and the core of our business focus today,
is representation learning based computer perception commonly known as Computer ****Vision Artificial
Intelligence (CV). While its mathematic and computational aspects are challenging, CV’s working principle
is simple : show your machine objects of the world you wish to predict (e.g. cats) and it will extract
features and their underlying relationships to produce labels and detect objects accurately. Beyond a
seemingly simple concept, companies using CV face data challenges and make systematic errors that have
produced wasted capital at best, and human tragedies at worst. Said errors are common, leading to
catastrophes even within the best tech companies around and being responsible for the >87% [attrition
rates](https://breakdowndata.com/top-10-reasons-why-87-of-machine-learning-projects-fail/) observed in CV
development**.** As a team of ML pioneers, we classify CV common mistakes within 3 categories -
Evaluation, Production and Curation.
Evaluation is key. Businesses’ hard problem with CV is not to produce predictions but to rigorously
evaluate their risks and accuracy in light of the harm it could do to their business and customers.
Evaluation of your CV must be analysed within data constrains (e.g. geographical, cultural, tech
compatible…), and the calibration threshold selected in accordance with your business specificities. Where
is your threshold when detecting cancerous cells in your patient’s bloodstream? Is your threshold is low
enough to boost your consumers’ reach while still avoiding customer frustration? Having a clear view of
the implications of your prediction’s accuracy and of the different outcomes when building your model has
become CV’s sinews of war. We believe that putting the subject matter expert, be it the product developer
or the oncologist, in charge of building and testing his CV model is the only way to clear that risk. An
experienced oncologist evaluating his models’ predictions will never be replaced by a ML engineer.
The right production in the right hands.Today’s CV model building is akin to discrete manufacturing.
Produce one model every few months, update it every few month and loose touch with your customers /
patients / fleet in the meantime. We believe CV should be akin to continuous process manufacturing where
the model’s accuracy and predictions are updated in sink with your business. We see your data stream as a
resource to be tapped continuously not discretely. Model discontinuity is topped by misalignment and
siloed visions between you engineering and business teams. While the first ones will be focused mainly on
accuracy and technical metrics, the business team will be more interested in financial benefits or
business insights.
Curate your data through the eyes of the expert. Today, nobody looks into the data and large, uncured,
sometimes irrelevant datasets are handled by engineers who most likely do not know the nature of the data
they handle nor the keys to their company’s business. Ensuing this lack of overseeing are poor target data
distribution and poor prediction power when confronting your model with the rest of the world. ****If you
only showed your machine images of cats and images of dogs and you submit it a car picture, it will
predict that car to be a cat or a dog. If you train a model to distinguish vehicles in the US and you
submit it a TukTuk, it will be wrong. Because we cannot show all the existing and upcoming objects of the
world, you need to adapt to your target distribution with proper data sampling. The alternative,
increasing the sample size and outsource labelling, will cost you capital while never entirely derisking
your model predictions. Only through the eyes of the expert we can ensure to use business oriented,
business impactful, representative data. An experienced oncologist looking at something as idiosyncratic
as cell biopsies will never be replaced by a data scientist.
ML issues are akin to sending your child on Wikipedia’s library expecting that he would come back with
Voltaire’s critical mind. Rather, you would want your child to be taught the necessary albeit limited
knowledge **by the best teacher** to be adaptable to their likely future environment. The very same
principles apply to machines. Machines could predict. Machines can learn. It is time for machines to be
taught the best of human expertise. Welcome to l`école, the first Machine Teaching plateform.
Since their beginnings, organisms have tried to capture a finer picture of their environment statistical regularities - labels - with increasing sophistication. Humans, first with the emergence of their associative cortex then with the development of mathematics and computation, have been able to abstract their environment with ever higher and finer predictive power. Until recently and with the notable exception of parts of physics and maths, our predictive power was limited to ad hoc recipes combining linear relations between variables selected by intuition, experience and, sometimes, reflecting biased assumptions. Following the development of neural science and neuroscience in the 60s, researchers formalised the principles that would break the limitations of legacy mathematics, capturing non linear relationships between observations and giving raise to the emergence of Convolutional Neural Network (CNN). Combined with computation’s exponential developments, CNN allowed the development of prediction algorithms associating non linear variables. Humanity had finally solved its longstanding hard limit when making inferences based on large dataset analysis and ML was within hand’s reach. Directly stemming out of CNN, Deep Learning is a true step toward general machine intelligence enabling a unique type of statistical learning : representation learning AKA learning what is stationary in a distribution of observations. Applied to image classification, representation learning does not classify images per se, it learns features to compress images from millions of pixels to hundred(s) of value vectors that are then analysed with classical maths approaches to predict classes. Unsupervised and supervised ML using Deep Learning learns the features’ non linear relations and interdependencies that humans cannot grasp, no more, no less.
The most widely used and operational form of supervised ****ML, and the core of our business focus today, is
representation learning based computer perception commonly known as Computer ****Vision Artificial
Intelligence (CV). While its mathematic and computational aspects are challenging, CV’s working principle is
simple : show your machine objects of the world you wish to predict (e.g. cats) and it will extract features
and their underlying relationships to produce labels and detect objects accurately. Beyond a seemingly
simple concept, companies using CV face data challenges and make systematic errors that have produced wasted
capital at best, and human tragedies at worst. Said errors are common, leading to catastrophes even within
the best tech companies around and being responsible for the >87% [attrition
rates](https://breakdowndata.com/top-10-reasons-why-87-of-machine-learning-projects-fail/) observed in CV
development**.** As a team of ML pioneers, we classify CV common mistakes within 3 categories - Evaluation,
Production and Curation.
Evaluation is key. Businesses’ hard problem with CV is not to produce predictions but to rigorously evaluate
their risks and accuracy in light of the harm it could do to their business and customers. Evaluation of
your CV must be analysed within data constrains (e.g. geographical, cultural, tech compatible…), and the
calibration threshold selected in accordance with your business specificities. Where is your threshold when
detecting cancerous cells in your patient’s bloodstream? Is your threshold is low enough to boost your
consumers’ reach while still avoiding customer frustration? Having a clear view of the implications of your
prediction’s accuracy and of the different outcomes when building your model has become CV’s sinews of war.
We believe that putting the subject matter expert, be it the product developer or the oncologist, in charge
of building and testing his CV model is the only way to clear that risk. An experienced oncologist
evaluating his models’ predictions will never be replaced by a ML engineer.
The right production in the right hands.Today’s CV model building is akin to discrete manufacturing. Produce
one model every few months, update it every few month and loose touch with your customers / patients / fleet
in the meantime. We believe CV should be akin to continuous process manufacturing where the model’s accuracy
and predictions are updated in sink with your business. We see your data stream as a resource to be tapped
continuously not discretely. Model discontinuity is topped by misalignment and siloed visions between you
engineering and business teams. While the first ones will be focused mainly on accuracy and technical
metrics, the business team will be more interested in financial benefits or business insights.
Curate your data through the eyes of the expert. Today, nobody looks into the data and large, uncured,
sometimes irrelevant datasets are handled by engineers who most likely do not know the nature of the data
they handle nor the keys to their company’s business. Ensuing this lack of overseeing are poor target data
distribution and poor prediction power when confronting your model with the rest of the world. ****If you
only showed your machine images of cats and images of dogs and you submit it a car picture, it will predict
that car to be a cat or a dog. If you train a model to distinguish vehicles in the US and you submit it a
TukTuk, it will be wrong. Because we cannot show all the existing and upcoming objects of the world, you
need to adapt to your target distribution with proper data sampling. The alternative, increasing the sample
size and outsource labelling, will cost you capital while never entirely derisking your model predictions.
Only through the eyes of the expert we can ensure to use business oriented, business impactful,
representative data. An experienced oncologist looking at something as idiosyncratic as cell biopsies will
never be replaced by a data scientist.
ML issues are akin to sending your child on Wikipedia’s library expecting that he would come back with
Voltaire’s critical mind. Rather, you would want your child to be taught the necessary albeit limited
knowledge **by the best teacher** to be adaptable to their likely future environment. The very same
principles apply to machines. Machines could predict. Machines can learn. It is time for machines to be
taught the best of human expertise. Welcome to l`école, the first Machine Teaching plateform.
Computer Vision (CV) is everywhere, in the military, in medical, in academics, in logistics… CV touches
virtually all segments of the economy and society. (Military) Drones in Ukraine have a vital role to detect
and neutralise potential threats, identify objects of interest - they rely on CV. (Logistics) Automated
drone delivery leverages computer perception to manoeuvre in remote areas or affluent suburbs - that relies
on CV. (Social) Startups helps you find clothing items in retailers catalog by matching pieces detected in
your pictures - they rely on CV. (Medical) Doctors increasingly depend on computer assisted diagnostics and
VR to identify suspected cancerous cell and operate remote brain areas - they rely on CV. Beyond CV’s
business impact, individuals should be able to leverage computer perception to serve their social projects,
concepts, and interests, beyond Apple or Google shy attempts. CV has a clear impact on growth, revenues,
profits and society.
All industries are trying to build sophisticated CV products heavily depending on what we call Vanilla CV
(VCV) namely classification, detection and segmentation. While essential, the implementation of VCV in
industry has been plagued by high attrition rates due in most part to labelling outsourcing, over reliance
on big data, poor data quality, [absence of clear internal data
strategy](https://sloanreview.mit.edu/projects/reshaping-business-with-artificial-intelligence/) within
companies and misalignment between ML engineers and subject matter experts. We believe that, at the roots of
it all, is the lack of standardised efficient ML tools to for humans to teach machines. At **l`école we are
setting CV back on the right path by implementing Machine Teaching, infusing human expertise, distribution
frequency and business consequences into the selection of training data so that the most relevant outputs /
predictions are produced.
Let’s backtrack to clarify why think that standardisation is key. Imagine a world where each construction
company has to manufacture their own nails in addition to building structures. Results from one construction
company to another would vary as the hired talents and the knowledge of nail manufacturing would differ,
resulting in buildings with unreliable safety standards, delayed completion, and increased costs. The
building industry has standardized and streamlined nail production to focus on the practice of construction
with reliable and consistently well built nails. Companies dealing with large dataset face that very issue
with CV, a tool as fundamental as nails. They allocate too much resources towards meagre results building CV
tools internally or getting access to third parties tools or outsourcing it altogether. By the time they get
to every companies’ purpose - build revenue generating products or services - getting the necessary CV has
at best diminished their ROI. At worst the company has wasted some of its precious resources and failed.
Said plainly, companies keep overpaying for underperforming AI.
L`école is building the first standardised and consolidated ML platform implementing the concept of Machine
Teaching, that is ML done right, cheaper, and better by the subject matter expert to achieve business
impacting results. That means that we are developing business oriented and reliable ML products built for
companies depending on ML. Our vision does not stop to businesses. Strong supporters of open source and Ai
for good, we are building the first fun, accessible, non-threatening AI platform for everyone’s need through
a gamified interface that is libre for non-profit project. Our straightforward value prop materialises into
a consolidated Machine Teaching platform for visually enhanced human-in-the-loop data evaluation, continuous
ground truthing, clear calls for actions and actionable performance reporting. Our product is built upon two
interfaces - data and user. We’ll get back to that but first, we would like to take a short detour and
discuss why 9 out of 10 ML projects fail. If you are in a hurry or if you are in the know, please jump to
the UX section below.