About

Shared Experiences Get Better ROI Than Shared Walls

About

In The Innovators, Walter Isaacson argues that collaboration is the key that unlocked all great innovations. When people have conversations, things move. 

At Invisible, we impart our innovative spirit and vision of the future of work in everything we do. As we usher in the evolution of how work is outsourced and automated, we’re also looking forward to a future where work isn’t constrained to cubicles and conference rooms. 

That’s why we’re leading by example. 

While influential business leaders are touting a return to corporate offices, we’re a remote-only company with partners and agents in 36 countries. We have no central headquarters, no fancy office buildings, no high-tech conference rooms. 

Instead of shared walls to boost team performance, we’ve invested in memorable experiences amongst partners to foster the spirit of collaboration. In fact, we recently gathered the partnership in Costa Rica for 8 days. 

Our goal when we invest in experiences is to improve our teamwork. We think you should try it, too. 

To test our theory, we surveyed 665 employed Americans to get their take on shared experiences with their co-workers. Almost 71% of those we surveyed said they had attended a group experience hosted by their employer. Bravo!  

Here’s what the experts say: 

First, Harvard Professor Amy Edmonson argues teaming is a function of information sharing, perspective taking, and turn-taking, which are habits critical to trust behaviors. 

Why is trust so important? Stephen R. Covey writes in The Speed of Trust, “Teams and organizations that operate with high trust significantly outperform those who don’t.” 

Using Professor Edmonson’s framework, one way to share information and promote trust through a group experience is storytelling. Our CEO Francis Pedraza did so at our Costa Rica retreat by sharing business performance and goals through the medium of a play he wrote. Every partner had a part. 

Our survey results concur with Professor Edmonson’s argument: 70% of workers report trusting their co-workers more after a group experience hosted by their employer.

Second, it helps people meet work friends, which are statistically correlated with performance and retention and only made through informal sharing and close physical proximity and shared memories. 

Invisible partner Olivia Chiong wrote about the company’s off-site in Costa Rica in a recent blog. Olivia said about the trip, “I gained so much in this wonderful week that even a month after it, I’m still buzzing from the post event high. I learned some things about myself and so much about my co-workers.” 

“The connections made during the on-site are already bearing fruit," Olivia noted.

We learned in our survey that having strong relationships with co-workers matters to over 68% of working Americans. Most reported feeling more engaged in their work when they have strong relationships with co-workers. 

Third, it creates a context that normalizes small acts of vulnerability and creates psychological safety. Not like sharing your darkest secrets, but sharing the details of a challenging company decision, or even what gives you personal meaning. 

Olivia concluded that aside from the fun activities like kayaking and animal sanctuary tours, “[there] were many real conversations that detailed our plans and vision for the company and how we can work together to achieve our moonshot.” 

Investment in shared experiences over shared walls goes beyond improved teamwork – it impacts the bottom line. 

Let’s do the math. 

We referenced in a previous blog post a study detailing the cost of keeping employees at the office, which can be up to $11,000 a year for every employee. On the other hand, an impactful yearly corporate retreat would cost your company between $2,000 and $7,000 per employee. 

With workers returning with a renewed sense of team, we’d argue that the ROI for the shared experience is much higher. 

Another cost is turnover. 

We learned in our survey that a staggering 76% of workers felt more engaged in their work after a group experience hosted by their employer.  Engaged employees are more likely to stay in your organization, which can save you thousands in turnover costs. 

How do I get started with Invisible? 

If you’re looking for the perfect blend of a flexible workforce and automation expertise, Invisible is ready to help.  Check out our case study explorer where you can find real examples of when Invisible leveled up a company. 

Get started by creating a free account, and someone on our dedicated accounts team will get in touch with you. 

Work smart, move fast, and focus on what matters most - with Invisible.

What are Your Top 3 moments at Invisible?

01|

02|

03|

Andrew Hull

In The Innovators, Walter Isaacson argues that collaboration is the key that unlocked all great innovations. When people have conversations, things move. 

At Invisible, we impart our innovative spirit and vision of the future of work in everything we do. As we usher in the evolution of how work is outsourced and automated, we’re also looking forward to a future where work isn’t constrained to cubicles and conference rooms. 

That’s why we’re leading by example. 

While influential business leaders are touting a return to corporate offices, we’re a remote-only company with partners and agents in 36 countries. We have no central headquarters, no fancy office buildings, no high-tech conference rooms. 

Instead of shared walls to boost team performance, we’ve invested in memorable experiences amongst partners to foster the spirit of collaboration. In fact, we recently gathered the partnership in Costa Rica for 8 days. 

Our goal when we invest in experiences is to improve our teamwork. We think you should try it, too. 

To test our theory, we surveyed 665 employed Americans to get their take on shared experiences with their co-workers. Almost 71% of those we surveyed said they had attended a group experience hosted by their employer. Bravo!  

Here’s what the experts say: 

First, Harvard Professor Amy Edmonson argues teaming is a function of information sharing, perspective taking, and turn-taking, which are habits critical to trust behaviors. 

Why is trust so important? Stephen R. Covey writes in The Speed of Trust, “Teams and organizations that operate with high trust significantly outperform those who don’t.” 

Using Professor Edmonson’s framework, one way to share information and promote trust through a group experience is storytelling. Our CEO Francis Pedraza did so at our Costa Rica retreat by sharing business performance and goals through the medium of a play he wrote. Every partner had a part. 

Our survey results concur with Professor Edmonson’s argument: 70% of workers report trusting their co-workers more after a group experience hosted by their employer.

Second, it helps people meet work friends, which are statistically correlated with performance and retention and only made through informal sharing and close physical proximity and shared memories. 

Invisible partner Olivia Chiong wrote about the company’s off-site in Costa Rica in a recent blog. Olivia said about the trip, “I gained so much in this wonderful week that even a month after it, I’m still buzzing from the post event high. I learned some things about myself and so much about my co-workers.” 

“The connections made during the on-site are already bearing fruit," Olivia noted.

We learned in our survey that having strong relationships with co-workers matters to over 68% of working Americans. Most reported feeling more engaged in their work when they have strong relationships with co-workers. 

Third, it creates a context that normalizes small acts of vulnerability and creates psychological safety. Not like sharing your darkest secrets, but sharing the details of a challenging company decision, or even what gives you personal meaning. 

Olivia concluded that aside from the fun activities like kayaking and animal sanctuary tours, “[there] were many real conversations that detailed our plans and vision for the company and how we can work together to achieve our moonshot.” 

Investment in shared experiences over shared walls goes beyond improved teamwork – it impacts the bottom line. 

Let’s do the math. 

We referenced in a previous blog post a study detailing the cost of keeping employees at the office, which can be up to $11,000 a year for every employee. On the other hand, an impactful yearly corporate retreat would cost your company between $2,000 and $7,000 per employee. 

With workers returning with a renewed sense of team, we’d argue that the ROI for the shared experience is much higher. 

Another cost is turnover. 

We learned in our survey that a staggering 76% of workers felt more engaged in their work after a group experience hosted by their employer.  Engaged employees are more likely to stay in your organization, which can save you thousands in turnover costs. 

How do I get started with Invisible? 

If you’re looking for the perfect blend of a flexible workforce and automation expertise, Invisible is ready to help.  Check out our case study explorer where you can find real examples of when Invisible leveled up a company. 

Get started by creating a free account, and someone on our dedicated accounts team will get in touch with you. 

Work smart, move fast, and focus on what matters most - with Invisible.

Overview

LLM Task

Benchmark Dataset/Corpus

Sentiment Analysis

SST-1/SST-2

Natural Language Inference /  Recognizing Textual Entailment

Stanford Natural Language Inference Corpus (SNLI)

Named Entity Recognition

conll-2003

Question Answering

SQuAD

Machine Translation

WMT

Text Summarization

CNN/Daily Mail Dataset

Text Generation

WikiText

Paraphrasing

MRPC

Language Modelling

Penn Tree Bank

Bias Detection

StereoSet

LLM Task

Benchmark Dataset/Corpus

Common Metric

Dataset available at

Sentiment Analysis

SST-1/SST-2

Accuracy

https://huggingface
.co/datasets/sst2

Natural Language Inference /  Recognizing Textual Entailment

Stanford Natural Language Inference Corpus (SNLI)

Accuracy

https://nlp.stanford.edu
projects/snli/

Named Entity Recognition

conll-2003

F1 Score

https://huggingface.co/
datasets/conll2003

Question Answering

SQuAD

F1 Score, Exact Match, ROUGE

https://rajpurkar.github.i
o/SQuAD-explorer/

Machine Translation

WMT

BLEU, METEOR

https://machinetranslate
.org/wmt

Text Summarization

CNN/Daily Mail Dataset

ROUGE

https://www.tensorflow
.org/datasets/catalog/
cnn_dailymail

Text Generation

WikiText

BLEU, ROUGE

https://www.salesforce.
com/products/einstein/
ai-research/the-wikitext-dependency-language-modeling-dataset/

Paraphrasing

MRPC

ROUGE, BLEU

https://www.microsoft.
com/en-us/download/details.a
spx?id=52398

Language Modelling

Penn Tree Bank

Perplexity

https://zenodo.org/recor
d/3910021#.ZB3qdHbP
23A

Bias Detection

StereoSet

Bias Score, Differential Performance

https://huggingface.co/
datasets/stereoset

Table 1 - Example of some LLM tasks with common benchmark datasets and their respective metrics. Please note for many of these tasks, there are multiple benchmark datasets, some of which have not been mentioned here.

Metric Selection

Metric

Usage

Accuracy

Measures the proportion of correct predictions made by the model compared to the total number of predictions.

Precision

Measures the proportion of true positives out of all positive predictions.

Recall

Measures the proportion of true positives out of all actual positive instances.

F1 Score

Measures the harmonic mean of precision and recall.

Perplexity

Measures the model's uncertainty in predicting the next token (common in text generation tasks).

BLEU

Measures the similarity between machine-generated text and reference text.

ROUGE

Measures the similarity between machine-generated and human-generated text.

METEOR

May have higher computational complexity compared to BLEU or ROUGE.Requires linguistic resources for matching, which may not be available for all languages.

Pros

Cons

Simple interpretability. Provides an overall measure of model performance.

Sensitive to dataset imbalances, which can make it not informative. Does not take into account false positives and false negatives.

Useful when the cost of false positives is high. Measures the accuracy of positive predictions.

Does not take into account false negatives.Depends on other metrics to be informative (cannot be used alone).Sensitive to dataset imbalances.

Useful when the cost of false negatives is high.

Does not take into account false negatives.Depends on other metrics to be informative (cannot be used alone)and Sensitive to dataset imbalances.

Robust to imbalanced datasets.

Assumes equal importance of precision and recall.May not be suitable for multi-class classification problems with different class distributions.

Interpretable as it provides a single value for model performance.

May not directly correlate with human judgment.

Correlates well with human judgment.Easily interpretable for measuring translation quality.

Does not directly explain the performance on certain tasks (but correlates with human judgment).Lacks sensitivity to word order and semantic meaning.

Has multiple variants to capture different aspects of similarity.

May not capture semantic similarity beyond n-grams or LCS.Limited to measuring surface-level overlap.

Addresses some limitations of BLEU, such as recall and synonyms.

May have higher computational complexity compared to BLEU or ROUGE.Requires linguistic resources for matching, which may not be available for all languages.

Metric

Usage

Pros

Cons

Accuracy

Measures the proportion of correct predictions made by the model compared to the total number of predictions.

Simple interpretability. Provides an overall measure of model performance.

Sensitive to dataset imbalances, which can make it not informative. Does not take into account false positives and false negatives.

Precision

Measures the proportion of true positives out of all positive predictions.

Useful when the cost of false positives is high. Measures the accuracy of positive predictions.

Does not take into account false negatives.Depends on other metrics to be informative (cannot be used alone).Sensitive to dataset imbalances.

Recall

Measures the proportion of true positives out of all actual positive instances.

Useful when the cost of false negatives is high.

Does not take into account false negatives.Depends on other metrics to be informative (cannot be used alone)and Sensitive to dataset imbalances.

F1 Score

Measures the harmonic mean of precision and recall.

Robust to imbalanced datasets.

Assumes equal importance of precision and recall.May not be suitable for multi-class classification problems with different class distributions.

Perplexity

Measures the model's uncertainty in predicting the next token (common in text generation tasks).

Interpretable as it provides a single value for model performance.

May not directly correlate with human judgment.

BLEU

Measures the similarity between machine-generated text and reference text.

Correlates well with human judgment.Easily interpretable for measuring translation quality.

Does not directly explain the performance on certain tasks (but correlates with human judgment).Lacks sensitivity to word order and semantic meaning.

ROUGE

Measures the similarity between machine-generated and human-generated text.

Has multiple variants to capture different aspects of similarity.

May not capture semantic similarity beyond n-grams or LCS.Limited to measuring surface-level overlap.

METEOR

Measures the similarity between machine-generated translations and reference translations.

Addresses some limitations of BLEU, such as recall and synonyms.

May have higher computational complexity compared to BLEU or ROUGE.Requires linguistic resources for matching, which may not be available for all languages.

Table 2 - Common LLM metrics, their usage as a measurement tool, and their pros and cons. Note that for some of these metrics there exist different versions. For example, some of the versions of ROUGE include ROUGE-N, ROUGE-L, and ROUGE-W. For context, ROUGE-N measures the overlap of sequences of n-length-words between the text reference and the model-generated text. ROUGE-L measures the overlap between the longest common subsequence of tokens in the reference text and generated text, regardless of order. ROUGE-W on the other hand, assigns weights (relative importances) to longer common sub-sequences of common tokens (similar to ROUGE-L but with added weights). A combination of the most relevant variants of a metric, like ROUGE is selected for comprehensive evaluation.

Andrew Hull

Schedule a call to learn more about how Invisible might help your business grow while navigating uncertainty.

Schedule a Call
Request a Demo
Request a Demo
Request a Demo
Request a Demo
Request a Demo
Request a Demo
Request a Demo
Request a Demo
Request a Demo
Request a Demo
Request a Demo