AutoWS-Bench-101: Benchmarking Automated Weak Supervision on Diverse Tasks
Published:
Published:
Authors: Tzu-Heng Huang
Large pretrained models like GPT-4, Gemini, and Claude 3 are fantastic at labeling data—-whether it’s spam detection in YouTube comments or classifying topics in medical documents. But there’s a drawback: querying these models for every single data point via API calls gets expensive fast.
Published:
Authors: Changho Shin
Exploring how overlap density drives weak-to-strong generalization and its applications in data source selection.
Published:
Authors: Sonia Cromp
While impressive examples of AI-generated art and dialogue have captured the public’s attention in recent years, one of the most fundamental data formats–tabular data–still lacks specialized, high-performing models. Tables are ubiquitous in modern life, but are not modeled well by off-the-shelf models intended for other datatypes. Given the central role of tabular data in everything from global economic forecasts and astronomical observations to classroom gradebooks and household budgets, the lack of deep learning methods tailored for tables is quite surprising. To address the table synthesis gap, we introduce Tabby: a foundation model designed specifically for tabular data. Tabby introduces the inductive biases necessary to represent tabular data into a pre-trained large language model, avoiding the costly process of training a foundation model from scratch. Read on to discover how Tabby generates synthetic data that is nearly indistinguishable from real-world datasets!
Published:
Authors: Tzu-Heng Huang
Large pretrained models like GPT-4, Gemini, and Claude 3 are fantastic at labeling data—-whether it’s spam detection in YouTube comments or classifying topics in medical documents. But there’s a drawback: querying these models for every single data point via API calls gets expensive fast.
Published:
Authors: Changho Shin
Exploring how overlap density drives weak-to-strong generalization and its applications in data source selection.
Published:
Authors: Sonia Cromp
While impressive examples of AI-generated art and dialogue have captured the public’s attention in recent years, one of the most fundamental data formats–tabular data–still lacks specialized, high-performing models. Tables are ubiquitous in modern life, but are not modeled well by off-the-shelf models intended for other datatypes. Given the central role of tabular data in everything from global economic forecasts and astronomical observations to classroom gradebooks and household budgets, the lack of deep learning methods tailored for tables is quite surprising. To address the table synthesis gap, we introduce Tabby: a foundation model designed specifically for tabular data. Tabby introduces the inductive biases necessary to represent tabular data into a pre-trained large language model, avoiding the costly process of training a foundation model from scratch. Read on to discover how Tabby generates synthetic data that is nearly indistinguishable from real-world datasets!
Published:
Authors: Dyah Adila
Efficient LLM alignment without the data and compute expense of traditional methods.
Published:
Authors: Tzu-Heng Huang
Large pretrained models like GPT-4, Gemini, and Claude 3 are fantastic at labeling data—-whether it’s spam detection in YouTube comments or classifying topics in medical documents. But there’s a drawback: querying these models for every single data point via API calls gets expensive fast.
Published:
Authors: Changho Shin
OTTER offers a tuning-free, inference-time label distribution adaptation of zero-shot models by leveraging optimal transport.
Published:
Authors: Dyah Adila
Effortlessly robustify CLIP-based models to handle spurious currelations– no xtra data, no xtra training!
Published:
Authors: Harit Vishwakarma
Published:
Authors: Dyah Adila
Effortlessly robustify CLIP-based models to handle spurious currelations– no xtra data, no xtra training!
Published:
Authors: Harit Vishwakarma
Published:
Authors: Changho Shin
OTTER offers a tuning-free, inference-time label distribution adaptation of zero-shot models by leveraging optimal transport.
Published:
Authors: Changho Shin
Exploring how overlap density drives weak-to-strong generalization and its applications in data source selection.
Published:
Authors: Dyah Adila
Effortlessly robustify CLIP-based models to handle spurious currelations– no xtra data, no xtra training!
Published:
Authors: Dyah Adila
Efficient LLM alignment without the data and compute expense of traditional methods.
Published:
Authors: Sonia Cromp
While impressive examples of AI-generated art and dialogue have captured the public’s attention in recent years, one of the most fundamental data formats–tabular data–still lacks specialized, high-performing models. Tables are ubiquitous in modern life, but are not modeled well by off-the-shelf models intended for other datatypes. Given the central role of tabular data in everything from global economic forecasts and astronomical observations to classroom gradebooks and household budgets, the lack of deep learning methods tailored for tables is quite surprising. To address the table synthesis gap, we introduce Tabby: a foundation model designed specifically for tabular data. Tabby introduces the inductive biases necessary to represent tabular data into a pre-trained large language model, avoiding the costly process of training a foundation model from scratch. Read on to discover how Tabby generates synthetic data that is nearly indistinguishable from real-world datasets!
Published:
Authors: Harit Vishwakarma
Published:
Authors: Sonia Cromp
While impressive examples of AI-generated art and dialogue have captured the public’s attention in recent years, one of the most fundamental data formats–tabular data–still lacks specialized, high-performing models. Tables are ubiquitous in modern life, but are not modeled well by off-the-shelf models intended for other datatypes. Given the central role of tabular data in everything from global economic forecasts and astronomical observations to classroom gradebooks and household budgets, the lack of deep learning methods tailored for tables is quite surprising. To address the table synthesis gap, we introduce Tabby: a foundation model designed specifically for tabular data. Tabby introduces the inductive biases necessary to represent tabular data into a pre-trained large language model, avoiding the costly process of training a foundation model from scratch. Read on to discover how Tabby generates synthetic data that is nearly indistinguishable from real-world datasets!
Published:
Authors: Harit Vishwakarma
Published:
Authors: Harit Vishwakarma
Published:
Authors: Changho Shin
Exploring how overlap density drives weak-to-strong generalization and its applications in data source selection.
Published:
Authors: Changho Shin
OTTER offers a tuning-free, inference-time label distribution adaptation of zero-shot models by leveraging optimal transport.
Published:
Authors: Dyah Adila
Effortlessly robustify CLIP-based models to handle spurious currelations– no xtra data, no xtra training!