Climate Data- NLP-based Textual Analysis- Part 2

The dataset taken had 66 files and with total number of words being 97099. We know this dataset is small, given the topic is so important, but with our resources and web crawling of important websites, this much was collected. This can be taken as a nice sample over the big data available on web. And same analysis can be performed with a huge data set and better processors.

The aim of this article is initial analysis and understanding data.

The WordCloud for words in this dataset is as follows:

The top 10 keywords are

{‘climate’: 1032, ‘change’: 471, ‘emissions’: 450, ‘zero’: 446, ‘net’: 434, ‘energy’: 295, ‘action’: 288, ‘global’: 287, ‘carbon’: 237, ‘transition’: 217}

Top 20 words in order of importance are:

The top N value pairs are [‘climate’, ‘change’, ‘emissions’, ‘zero’, ‘net’, ‘energy’, ‘action’, ‘global’, ‘carbon’, ‘transition’, ‘countries’, ‘people’, ‘world’, ‘need’, ‘smes’, ‘also’, ‘banks’, ‘warming’, ‘support’, ‘company’]

And with frequencies these

{‘climate’: 1032, ‘change’: 471, ‘emissions’: 450, ‘zero’: 446, ‘net’: 434, ‘energy’: 295, ‘action’: 288, ‘global’: 287, ‘carbon’: 237, ‘transition’: 217, ‘countries’: 215, ‘people’: 213, ‘world’: 206, ‘need’: 202, ‘smes’: 187, ‘also’: 181, ‘banks’: 176, ‘warming’: 166, ‘support’: 153, ‘company’: 149}

The top 25 words in order of importance are:

The top 25 value pairs are [‘climate’, ‘change’, ‘emissions’, ‘zero’, ‘net’, ‘energy’, ‘action’, ‘global’, ‘carbon’, ‘transition’, ‘countries’, ‘people’, ‘world’, ‘need’, ‘smes’, ‘also’, ‘banks’, ‘warming’, ‘support’, ‘company’, ‘group’, ‘business’, ‘targets’, ‘greenhouse’, ‘businesses’]

And with frequencies these are:

{‘climate’: 1032, ‘change’: 471, ‘emissions’: 450, ‘zero’: 446, ‘net’: 434, ‘energy’: 295, ‘action’: 288, ‘global’: 287, ‘carbon’: 237, ‘transition’: 217, ‘countries’: 215, ‘people’: 213, ‘world’: 206, ‘need’: 202, ‘smes’: 187, ‘also’: 181, ‘banks’: 176, ‘warming’: 166, ‘support’: 153, ‘company’: 149, ‘group’: 144, ‘business’: 143, ‘targets’: 138, ‘greenhouse’: 137, ‘businesses’: 134}

Here is the frequency distribution

The top 100 words in order of importance with frequencies these are:

{‘climate’: 1032, ‘change’: 471, ‘emissions’: 450, ‘zero’: 446, ‘net’: 434, ‘energy’: 295, ‘action’: 288, ‘global’: 287, ‘carbon’: 237, ‘transition’: 217, ‘countries’: 215, ‘people’: 213, ‘world’: 206, ‘need’: 202, ‘smes’: 187, ‘also’: 181, ‘banks’: 176, ‘warming’: 166, ‘support’: 153, ‘company’: 149, ‘group’: 144, ‘business’: 143, ‘targets’: 138, ‘greenhouse’: 137, ‘businesses’: 134, ‘sme’: 133, ‘gas’: 130, ‘buyers’: 125, ‘new’: 120, ‘solutions’: 119, ‘companies’: 119, ‘developing’: 118, ‘financial’: 115, ‘innovation’: 109, ‘must’: 108, ‘impacts’: 104, ‘development’: 104, ‘data’: 103, ‘many’: 102, ‘loss’: 101, ‘make’: 100, ‘finance’: 100, ‘including’: 100, ‘one’: 100, ‘future’: 97, ‘nature’: 96, ‘co2’: 96, ‘like’: 96, ‘earth’: 91, ‘actors’: 91, ‘us’: 89, ‘years’: 89, ‘reduce’: 88, ‘commitments’: 88, ‘role’: 87, ‘plans’: 87, ‘use’: 86, ‘changes’: 85, ‘temperature’: 85, ‘governments’: 85, ‘across’: 83, ‘sustainable’: 81, ‘resources’: 81, ‘would’: 81, ‘2030’: 80, ‘adaptation’: 80, ‘could’: 79, ‘report’: 79, ‘land’: 78, ‘help’: 78, ‘year’: 78, ‘2’: 76, ‘water’: 75, ‘progress’: 74, ‘heat’: 74, ‘increase’: 74, ‘time’: 73, ‘human’: 71, ‘see’: 71, ‘market’: 71, ‘un’: 70, ‘around’: 70, ‘renewable’: 70, ‘power’: 69, ‘work’: 68, ‘ocean’: 67, ‘efforts’: 67, ‘communities’: 67, ‘regions’: 66, ‘agreement’: 66, ‘set’: 66, ‘services’: 66, ‘value’: 66, ‘take’: 65, ‘international’: 65, ‘corporate’: 65, ‘small’: 65, ‘1’: 65, ‘part’: 64, ‘rise’: 63}

The frequency distribution of the above top 100 words is

The top 15 words are

{‘climate’: 1032, ‘change’: 471, ‘emissions’: 450, ‘zero’: 446, ‘net’: 434, ‘energy’: 295, ‘action’: 288, ‘global’: 287, ‘carbon’: 237, ‘transition’: 217, ‘countries’: 215, ‘people’: 213, ‘world’: 206, ‘need’: 202, ‘smes’: 187}

Its frequency graph is:

Noun Phrase Analysis

Here only noun phrases in the complete text is considered

Top 25 Noun Phrases in these texts along with the frequencies are as follows:

{‘net_zero’: 123, ‘climate_action’: 103, , ‘climate_crisis’: 47, ‘paris_agreement’: 43, ‘non-state_actors’: 41, ‘renewable_energy’: 41, ‘greenhouse_gas_emissions’: 35, ‘young_people’: 31, ‘greenhouse_gases’: 30, ‘sme_net_zero_transition’: 30, ‘corporate_minds’: 26, ‘non-state_entities’: 25, ‘high-level_expert_group’: 24, ‘financial_institutions’: 24, ‘carbon_credits’: 22, ‘net_zero_emissions_commitments’: 21, ‘voluntary_carbon_market’: 20, ‘innovation_sprints’: 20, ‘clean_energy’: 19, ‘world’_s’: 18, ‘global_emissions’: 18, ‘business_coalition’: 17, ‘small_island’: 17, ‘value_chain’: 16, ‘total_number’: 16}

The frequency distribution is as follows

The top 100 phrases with frequencies are as follows:

{‘net_zero’: 123, ‘climate_action’: 103, , ‘climate_crisis’: 47, ‘paris_agreement’: 43, ‘non-state_actors’: 41, ‘renewable_energy’: 41, ‘greenhouse_gas_emissions’: 35, ‘young_people’: 31, ‘greenhouse_gases’: 30, ‘sme_net_zero_transition’: 30, ‘corporate_minds’: 26, ‘non-state_entities’: 25, ‘high-level_expert_group’: 24, ‘financial_institutions’: 24, ‘carbon_credits’: 22, ‘net_zero_emissions_commitments’: 21, ‘voluntary_carbon_market’: 20, ‘innovation_sprints’: 20, ‘clean_energy’: 19, ‘world_’_s’: 18, ‘global_emissions’: 18, ‘business_coalition’: 17, ‘small_island’: 17, ‘value_chain’: 16, ‘total_number’: 16, ‘net_zero_pledges’: 15, ‘human_activities’: 14, ‘climate_impacts’: 14, ‘private_sector’: 14, ‘don_’_t’: 14, ‘’_ve’: 14, ‘sme_climate_hub’: 14, ‘”_“‘: 13, ‘carbon_dioxide’: 13, ‘business_leaders’: 13, ‘•_non-state_actors’: 13, ‘intergovernmental_panel’: 12, ‘quote_card’: 12, ‘indigenous_peoples’: 12, ‘transition_plans’: 12, ‘expert_group’: 12, ‘global_temperature_rise’: 11, “earth_’s_climate”: 11, ‘negative_impacts’: 11, ‘emission_reductions’: 11, ‘net_zero_targets’: 11, ‘net_zero_commitments’: 11, ‘water_vapour’: 11, ‘sustainable_development’: 10, ‘food_security’: 10, ‘indigenous_communities’: 10, ‘adelle_thomas’: 10, ‘co2_emissions’: 10, ‘net_zero_transition’: 10, ‘new_types’: 10, ‘buyers_innovation_sprints’: 10, ‘net-zero_commitments’: 9, ‘targets_initiative’: 9, ‘sustainable_development_goals’: 9, ‘local_communities’: 9, ‘’_re’: 9, ‘carbon_emissions’: 9, ‘greenhouse_effect’: 8, ‘international_community’: 8, ‘vulnerable_countries’: 8, ‘ocean_acidification’: 8, ‘address_loss’: 8, ‘small_islands’: 8, ‘sea_level_rise’: 8, ‘energy_transition’: 8, ‘energy_efficiency’: 8, ‘green_hydrogen’: 8, ‘small_businesses’: 8, ‘greenhouse_gas’: 8, ‘net_zero_co2_emissions’: 8, ‘non‑state_actors’: 8, ‘climate_system’: 8, ‘net-zero_emissions_commitments’: 7, ‘sharm_el-sheikh’: 7, ‘wide_range’: 7, ‘net-zero_emissions’: 7, ‘un_secretary-general’: 7, ‘human_rights’: 7, ‘sustainable_agriculture’: 7, ‘collective_action’: 7, ‘adaptation_options’: 7, ‘coral_reefs’: 7, ‘multilateral_development_banks’: 7, ‘energy_poverty’: 7, ‘ice_sheets’: 7, ‘climate_goals’: 7, ‘extreme_weather_events’: 7, ‘air_pollution’: 7, ‘various_posts’: 7, ‘climate_targets’: 7, ‘interim_targets’: 7, ‘financial_incentives’: 7}

The graph can be seen as follows, of phrases versus frequencies:

The WordCloud here is as follows:

Published by Nidhika

Hi, Apart from profession, I have inherent interest in writing especially about Global Issues of Concern, fiction blogs, poems, stories, doing painting, cooking, photography, music to mention a few! And most important on this website you can find my suggestions to latest problems, views and ideas, my poems, stories, novels, some comments, proposals, blogs, personal experiences and occasionally very short glimpses of my research work as well.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: