Make no mistake – we are totally on board with automating the more mundane tasks on your to-do list, whether they are at work or at home. But when it comes to folding laundry, technology is not close to taking that out of your hands.
That isn’t stopping researchers from trying to speed up the process, though. NPR reported this week that a team of researchers at UC Berkeley will soon be presenting a paper on a robotics method called SpeedFolding which they say folds laundry at record speed.
They even have video to prove it. But unfortunately for those looking to delegate laundry off of their list of chores, it isn’t yet practical.
Why is it so hard to fold laundry?
It turns out, teaching a robot to fold laundry could be a robotics field of its own. One researcher, Professor Pieter Abbeel at UC Berkeley, devoted seven years to teach a robot to successfully fold a towel.
Even then, it took the robot 20 minutes to do so, though the professor was able to cut that time to a minute and a half. Why is it so complex to fold laundry?
Professor Abbeel explained that it’s extremely difficult for a robot to analyze disparate piles of laundry. He and his team had to train the technology to be able to analyze the items in the pile as 3D shapes, find where the corners are, and determine how best to fold each of them individually.
Put simply, robots struggle in environments where they have to make unique decisions. We touched on how Google is solving this problem in a previous blog.
So how does SpeedFolding solve the problem?
Previous iterations of laundry folding bots generally had one arm. The latest robot from the team at UC Berkeley has two.
Robots of the past also relied on complex algorithms to help it decipher the unique ways in which laundry might be situated. Now, the SpeedFolding bot uses a camera to scan the garment, then it calculates the optimal movement to lay it smoothly on the surface in front of it.
Analysis is repeated to continuously and correctly fold the garment through each step in the process. This video shows the process from start to finish:
The result: the SpeedFolding method can successfully fold 30 to 40 disorganized pieces of laundry in an hour. That’s nearly 10 times better than previous robots, with SpeedFolding achieving an overall success rate of 93%.
Why the future isn’t exactly nigh
Well, there are two reasons.
Robots are great at some things, while humans are better at others. When it comes to business processes, Invisible combines both people and tech to do them better, faster, and cheaper.
Could your team use a strategic partner? Get in touch.
01|
02|
03|
Make no mistake – we are totally on board with automating the more mundane tasks on your to-do list, whether they are at work or at home. But when it comes to folding laundry, technology is not close to taking that out of your hands.
That isn’t stopping researchers from trying to speed up the process, though. NPR reported this week that a team of researchers at UC Berkeley will soon be presenting a paper on a robotics method called SpeedFolding which they say folds laundry at record speed.
They even have video to prove it. But unfortunately for those looking to delegate laundry off of their list of chores, it isn’t yet practical.
Why is it so hard to fold laundry?
It turns out, teaching a robot to fold laundry could be a robotics field of its own. One researcher, Professor Pieter Abbeel at UC Berkeley, devoted seven years to teach a robot to successfully fold a towel.
Even then, it took the robot 20 minutes to do so, though the professor was able to cut that time to a minute and a half. Why is it so complex to fold laundry?
Professor Abbeel explained that it’s extremely difficult for a robot to analyze disparate piles of laundry. He and his team had to train the technology to be able to analyze the items in the pile as 3D shapes, find where the corners are, and determine how best to fold each of them individually.
Put simply, robots struggle in environments where they have to make unique decisions. We touched on how Google is solving this problem in a previous blog.
So how does SpeedFolding solve the problem?
Previous iterations of laundry folding bots generally had one arm. The latest robot from the team at UC Berkeley has two.
Robots of the past also relied on complex algorithms to help it decipher the unique ways in which laundry might be situated. Now, the SpeedFolding bot uses a camera to scan the garment, then it calculates the optimal movement to lay it smoothly on the surface in front of it.
Analysis is repeated to continuously and correctly fold the garment through each step in the process. This video shows the process from start to finish:
The result: the SpeedFolding method can successfully fold 30 to 40 disorganized pieces of laundry in an hour. That’s nearly 10 times better than previous robots, with SpeedFolding achieving an overall success rate of 93%.
Why the future isn’t exactly nigh
Well, there are two reasons.
Robots are great at some things, while humans are better at others. When it comes to business processes, Invisible combines both people and tech to do them better, faster, and cheaper.
Could your team use a strategic partner? Get in touch.
LLM Task
Benchmark Dataset/Corpus
Common Metric
Dataset available at
Sentiment Analysis
SST-1/SST-2
Accuracy
https://huggingface
.co/datasets/sst2
Natural Language Inference / Recognizing Textual Entailment
Stanford Natural Language Inference Corpus (SNLI)
Accuracy
https://nlp.stanford.edu
projects/snli/
Named Entity Recognition
conll-2003
F1 Score
https://huggingface.co/
datasets/conll2003
Question Answering
SQuAD
F1 Score, Exact Match, ROUGE
https://rajpurkar.github.i
o/SQuAD-explorer/
Machine Translation
WMT
BLEU, METEOR
https://machinetranslate
.org/wmt
Text Summarization
CNN/Daily Mail Dataset
ROUGE
https://www.tensorflow
.org/datasets/catalog/
cnn_dailymail
Text Generation
WikiText
BLEU, ROUGE
Paraphrasing
MRPC
ROUGE, BLEU
https://www.microsoft.
com/en-us/download/details.a
spx?id=52398
Language Modelling
Penn Tree Bank
Perplexity
https://zenodo.org/recor
d/3910021#.ZB3qdHbP
23A
Bias Detection
StereoSet
Bias Score, Differential Performance
Table 1 - Example of some LLM tasks with common benchmark datasets and their respective metrics. Please note for many of these tasks, there are multiple benchmark datasets, some of which have not been mentioned here.
Metric
Usage
Pros
Cons
Accuracy
Measures the proportion of correct predictions made by the model compared to the total number of predictions.
Simple interpretability. Provides an overall measure of model performance.
Sensitive to dataset imbalances, which can make it not informative. Does not take into account false positives and false negatives.
Precision
Measures the proportion of true positives out of all positive predictions.
Useful when the cost of false positives is high. Measures the accuracy of positive predictions.
Does not take into account false negatives.Depends on other metrics to be informative (cannot be used alone).Sensitive to dataset imbalances.
Recall
Measures the proportion of true positives out of all actual positive instances.
Useful when the cost of false negatives is high.
Does not take into account false negatives.Depends on other metrics to be informative (cannot be used alone)and Sensitive to dataset imbalances.
F1 Score
Measures the harmonic mean of precision and recall.
Robust to imbalanced datasets.
Assumes equal importance of precision and recall.May not be suitable for multi-class classification problems with different class distributions.
Perplexity
Measures the model's uncertainty in predicting the next token (common in text generation tasks).
Interpretable as it provides a single value for model performance.
May not directly correlate with human judgment.
BLEU
Measures the similarity between machine-generated text and reference text.
Correlates well with human judgment.Easily interpretable for measuring translation quality.
Does not directly explain the performance on certain tasks (but correlates with human judgment).Lacks sensitivity to word order and semantic meaning.
ROUGE
Measures the similarity between machine-generated and human-generated text.
Has multiple variants to capture different aspects of similarity.
May not capture semantic similarity beyond n-grams or LCS.Limited to measuring surface-level overlap.
METEOR
Measures the similarity between machine-generated translations and reference translations.
Addresses some limitations of BLEU, such as recall and synonyms.
May have higher computational complexity compared to BLEU or ROUGE.Requires linguistic resources for matching, which may not be available for all languages.
Table 2 - Common LLM metrics, their usage as a measurement tool, and their pros and cons. Note that for some of these metrics there exist different versions. For example, some of the versions of ROUGE include ROUGE-N, ROUGE-L, and ROUGE-W. For context, ROUGE-N measures the overlap of sequences of n-length-words between the text reference and the model-generated text. ROUGE-L measures the overlap between the longest common subsequence of tokens in the reference text and generated text, regardless of order. ROUGE-W on the other hand, assigns weights (relative importances) to longer common sub-sequences of common tokens (similar to ROUGE-L but with added weights). A combination of the most relevant variants of a metric, like ROUGE is selected for comprehensive evaluation.