Providing information from trusted sources is more valuable than from random sources, for high quality content.
There will be no need to train data if the quality of data is taken for granted. Any data coming from a trusted source is going to be automatically trusted. If inconsistencies are found, that would mean that the source needs to be amended, and it will be the responsibility to the information owner to correct its source.
Derek mentions that one implication is that licensing/permission agreements could replace the ad revenue model for online human generated content. AI giants (or a future middleman in the form of content curation companies) might work with humans and trusted sources to bundle and license “trusted” data to use in models. Since the entire ad revenue model is teetering as a result of the AI answers used in lieu of googling it yourself, such may be the next evolution of the internet.
Because of the overall political context and the interest that authoritarians have to disseminate garbage, the ability of companies that rely on clean content would become a differentiating factor. In other words, the open Internet is set to become a place for generalized garbage, free of charge, whereas the valuable information is going to exist on curated information servers, available through licensing agreements, or paying firewalls. It already exists, as traditional newspaper outlets continue to exist on the Internet, available for paying subscribers, and set to deliver vetted content.