Skip to content

Commit f8a3d65

Browse files
authored
feat: check for invalid elements in <head> (#86)
* feat: invalid html elements in head check (and tests) * replace vormkracht10 with backstage * phpstan fix * Fix styling * Trigger workflow --------- Co-authored-by: Baspa <[email protected]>
1 parent a5df948 commit f8a3d65

32 files changed

+443
-217
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,7 @@ These checks are available in the package. You can add or remove checks in the c
118118
✅ The lang attribute is set on the html tag.<br>
119119
✅ The title contains one or more keywords.<br>
120120
✅ One or more keywords are present in the first paragraph.<br>
121+
✅ The page does not contain invalid HTML elements in the head section.<br>
121122

122123
### Performance
123124

resources/lang/de.json

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@
1717
"The meta description is used by search engines to show a description of the page in the search results.": "Die Meta-Beschreibung wird von Suchmaschinen verwendet, um eine Beschreibung der Seite in den Suchergebnissen anzuzeigen.",
1818
"The page response should return a 200 status code because this means the page is available.": "Der Antwortcode der Seite sollte ein Code 200 sein, um anzuzeigen, dass die Seite funktionsfähig ist.",
1919
"The page should have an Open Graph image because this is the image that will be used when the page is shared on social media.": "Die Seite sollte ein Open Graph-Bild haben, da dies das Bild ist, das verwendet wird, wenn die Seite in sozialen Netzwerken geteilt wird.",
20+
"The page should have only one H1 tag because there should be only one main heading on the page. The H1 tag should be used to describe the main topic of the page. The H1 tag is also used by search engines to determine the topic of the page.": "Die Seite sollte nur einen H1-Tag haben, da es nur eine Hauptüberschrift auf der Seite geben sollte. Der H1-Tag sollte verwendet werden, um das Hauptthema der Seite zu beschreiben. Der H1-Tag wird auch von Suchmaschinen verwendet, um das Thema der Seite zu bestimmen.",
21+
"The page does not contain invalid HTML elements in the head section": "Die Seite enthält keine ungültigen HTML-Elemente im Head-Bereich",
2022
"The page should not contain any broken images because it is bad for the user experience.": "Die Seite sollte keine zerbrochenen Bilder enthalten, da dies die Benutzererfahrung beeinträchtigt.",
2123
"The page should not contain any broken links because it is bad for the user experience.": "Die Seite sollte keine gebrochenen Links enthalten, da dies die Benutzererfahrung beeinträchtigt.",
2224
"The robots.txt file should allow indexing of the page.": "Die Datei robots.txt sollte die Indexierung der Seite zulassen.",
@@ -71,6 +73,8 @@
7173
"failed.meta.open_graph_image": "Die Seite enthält kein Open Graph-Bild, obwohl sie eines enthalten sollte.",
7274
"failed.meta.open_graph_image.broken": "Die Seite enthält ein gebrochenes Open Graph-Bild. Dieses Bild wurde gefunden: :actualValue.",
7375
"failed.meta.title": "Der Titel der Seite enthält :actualValue, obwohl er es nicht sollte.",
76+
"failed.meta.invalid_head_elements.found": "Die Seite enthält ungültige HTML-Elemente im Head-Bereich: :actualValue. Diese Elemente können dazu führen, dass Google das Parsen des Head-Bereichs vorzeitig beendet.",
77+
"failed.meta.invalid_head_elements.no_head": "Die Seite enthält keine Elemente im Head-Bereich, was für eine ordnungsgemäße SEO erforderlich ist.",
7478
"failed.meta.title.no_content": "Der Seitentitel ist leer, obwohl er es nicht sein sollte.",
7579
"failed.performance.compression": "Die Seite ist nicht komprimiert (mithilfe von gzip oder deflate), obwohl sie es sein sollte.",
7680
"failed.performance.css_size": "Die Seite enthält CSS-Dateien, die zu groß sind (max :expectedValue). Diese Dateien wurden gefunden: :actualValue.",

resources/lang/en.json

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
"The page response should return a 200 status code because this means the page is available.": "The page response should return a 200 status code because this means the page is available.",
1818
"The page should have an Open Graph image because this is the image that will be used when the page is shared on social media.": "The page should have an Open Graph image because this is the image that will be used when the page is shared on social media.",
1919
"The page should have only one H1 tag because there should be only one main heading on the page. The H1 tag should be used to describe the main topic of the page. The H1 tag is also used by search engines to determine the topic of the page.": "The page should have only one H1 tag because there should be only one main heading on the page. The H1 tag should be used to describe the main topic of the page. The H1 tag is also used by search engines to determine the topic of the page.",
20+
"The page does not contain invalid HTML elements in the head section": "The page does not contain invalid HTML elements in the head section",
2021
"The page should not contain any broken images because it is bad for the user experience.": "The page should not contain any broken images because it is bad for the user experience.",
2122
"The page should not contain any broken links because it is bad for the user experience.": "The page should not contain any broken links because it is bad for the user experience.",
2223
"The robots.txt file should allow indexing of the page.": "The robots.txt file should allow indexing of the page.",
@@ -71,6 +72,8 @@
7172
"failed.meta.open_graph_image": "The page does not contain an open graph image, while it should.",
7273
"failed.meta.open_graph_image.broken": "The page contains a broken open graph image. This image was found: :actualValue.",
7374
"failed.meta.title": "The page title contains :actualValue in the title, while it should not.",
75+
"failed.meta.invalid_head_elements.found": "The page contains invalid HTML elements in the head section: :actualValue. These elements can cause Google to stop parsing the head prematurely.",
76+
"failed.meta.invalid_head_elements.no_head": "The page does not contain any elements in the head section, which is required for proper SEO.",
7477
"failed.meta.title.no_content": "The page title is empty, while it should not be.",
7578
"failed.performance.compression": "The page is not compressed (using either gzip or deflate), while it should be.",
7679
"failed.performance.css_size": "The page contains CSS files that are too large (max :expectedValue). These files were found: :actualValue.",

resources/lang/fr.json

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
"The content should contain at least 30% transition words.": "Le contenu devrait contenir au moins 30% de mots de liaison.",
1515
"The content should not contain sentences with more than 20 words.": "Le contenu ne devrait pas contenir de phrases de plus de 20 mots.",
1616
"The page should have only one H1 tag because there should be only one main heading on the page. The H1 tag should be used to describe the main topic of the page. The H1 tag is also used by search engines to determine the topic of the page.": "La page ne devrait avoir qu'une seule balise H1. Il ne peut y avoir qu'un seul titre principal sur la page. La balise H1 devrait être utilisée pour décrire le sujet principal de la page. La balise H1 est également utilisée par les moteurs de recherche pour déterminer le sujet de la page.",
17+
"The page does not contain invalid HTML elements in the head section": "La page ne contient pas d'éléments HTML invalides dans la section head",
1718
"All links on the page should redirect to an url using HTTPS instead of HTTP because this is more secure.": "Tous les liens présents sur la page devraient être avec une URL en HTTPS au lieu de HTTP car c'est plus sécurisé.",
1819
"The focus keyword should be in the title of the page because the visitor will see this in the search results.": "Le mot-clé principal devrait être dans le titre de la page pour que le visiteur le voit dans les résultats de recherche.",
1920
"The focus keyword should be in the first paragraph of the content because this is the most important part of the content.": "Le mot-clé principal devrait être dans le premier paragraphe du contenu car c'est la partie la plus importante du contenu.",
@@ -71,6 +72,8 @@
7172
"failed.meta.open_graph_image": "La page ne contient pas d'image Open Graph, alors qu'elle devrait en contenir.",
7273
"failed.meta.open_graph_image.broken": "La page contient une image Open Graph brisée. Cette image a été trouvée : :actualValue.",
7374
"failed.meta.title": "Le titre de la page contient :actualValue, alors qu'il ne devrait pas.",
75+
"failed.meta.invalid_head_elements.found": "La page contient des éléments HTML invalides dans la section head : :actualValue. Ces éléments peuvent amener Google à arrêter prématurément l'analyse de la section head.",
76+
"failed.meta.invalid_head_elements.no_head": "La page ne contient aucun élément dans la section head, ce qui est requis pour un bon référencement.",
7477
"failed.meta.title.no_content": "Le titre de la page est vide, alors qu'il ne devrait pas l'être.",
7578
"failed.performance.compression": "La page n'est pas compressée (à l'aide de gzip ou deflate), alors qu'elle devrait l'être.",
7679
"failed.performance.css_size": "La page contient des fichiers CSS trop volumineux (max :expectedValue). Ces fichiers ont été trouvés : :actualValue.",

resources/lang/nl.json

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
"The page response should return a 200 status code because this means the page is available.": "De pagina moet een 200 statuscode retourneren, omdat dit betekent dat de pagina beschikbaar is.",
1818
"The page should have an Open Graph image because this is the image that will be used when the page is shared on social media.": "De pagina moet een Open Graph-afbeelding hebben omdat dit de afbeelding is die wordt gebruikt wanneer de pagina wordt gedeeld op sociale media.",
1919
"The page should have only one H1 tag because there should be only one main heading on the page. The H1 tag should be used to describe the main topic of the page. The H1 tag is also used by search engines to determine the topic of the page.": "De pagina moet slechts één H1-tag hebben omdat er slechts één hoofdonderwerp op de pagina moet zijn. De H1-tag moet worden gebruikt om het hoofdonderwerp van de pagina te beschrijven. De H1-tag wordt ook door zoekmachines gebruikt om het onderwerp van de pagina te bepalen.",
20+
"The page does not contain invalid HTML elements in the head section": "De pagina bevat geen ongeldige HTML-elementen in de head-sectie",
2021
"The page should not contain any broken images because it is bad for the user experience.": "De pagina mag geen kapotte afbeeldingen bevatten omdat dit slecht is voor de gebruikerservaring.",
2122
"The page should not contain any broken links because it is bad for the user experience.": "De pagina mag geen kapotte links bevatten omdat dit slecht is voor de gebruikerservaring.",
2223
"The robots.txt file should allow indexing of the page.": "Het robots.txt-bestand moet indexering van de pagina toestaan.",
@@ -71,6 +72,8 @@
7172
"failed.meta.open_graph_image": "De pagina bevat geen open graph-afbeelding, terwijl dat wel zou moeten.",
7273
"failed.meta.open_graph_image.broken": "De pagina bevat een kapotte open graph-afbeelding. Deze afbeelding is gevonden: :actualValue.",
7374
"failed.meta.title": "De paginatitel bevat :actualValue in de titel, terwijl dat niet zou moeten.",
75+
"failed.meta.invalid_head_elements.found": "De pagina bevat ongeldige HTML-elementen in de head-sectie: :actualValue. Deze elementen kunnen ervoor zorgen dat Google stopt met het lezen van de head-sectie.",
76+
"failed.meta.invalid_head_elements.no_head": "De pagina bevat geen elementen in de head-sectie, wat vereist is voor goede SEO.",
7477
"failed.meta.title.no_content": "De paginatitel is leeg, terwijl dat niet zou moeten.",
7578
"failed.performance.compression": "De pagina is niet gecomprimeerd (met gzip of deflate), terwijl dat wel zou moeten.",
7679
"failed.performance.css_size": "De pagina bevat CSS-bestanden die te groot zijn (maximaal :expectedValue). Deze bestanden zijn gevonden: :actualValue.",

resources/lang/pt_BR.json

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
"The page response should return a 200 status code because this means the page is available.": "A resposta da página deve retornar um código de status 200 para inferir que a página está disponível.",
1818
"The page should have an Open Graph image because this is the image that will be used when the page is shared on social media.": "A página deve conter uma imagem Open-Graph, pois é essa imagem que será utilizada quando a página for compartilhada nas redes sociais.",
1919
"The page should have only one H1 tag because there should be only one main heading on the page. The H1 tag should be used to describe the main topic of the page. The H1 tag is also used by search engines to determine the topic of the page.": "A página deve conter apenas uma tag H1, pois esse será o título principal da página. A tag H1 deve ser usada para descrever o tópico principal do conteúdo da página. A tag H1 também é usada pelos motores de busca para determinar o tema da página.",
20+
"The page does not contain invalid HTML elements in the head section": "A página não contém elementos HTML inválidos na seção head",
2021
"The page should not contain any broken images because it is bad for the user experience.": "A página não deve conter imagens quebradas, pois isso prejudica a experiência do usuário.",
2122
"The page should not contain any broken links because it is bad for the user experience.": "A página não deve conter links quebrados, pois isso prejudica a experiência do usuário.",
2223
"The robots.txt file should allow indexing of the page.": "O arquivo robots.txt deve permitir a indexação da página pelos motores de busca.",
@@ -71,6 +72,8 @@
7172
"failed.meta.open_graph_image": "A página não contém uma imagem open-graph, embora deveria.",
7273
"failed.meta.open_graph_image.broken": "A página contém uma imagem open-graph quebrada. Imagem encontrada: :actualValue.",
7374
"failed.meta.title": "O título da página contém :actualValue no título, mas não deveria.",
75+
"failed.meta.invalid_head_elements.found": "A página contém elementos HTML inválidos na seção head: :actualValue. Esses elementos podem fazer com que o Google pare de analisar a seção head prematuramente.",
76+
"failed.meta.invalid_head_elements.no_head": "A página não contém elementos na seção head, o que é necessário para um bom SEO.",
7477
"failed.meta.title.no_content": "O título da página está vazio, mas não deveria estar.",
7578
"failed.performance.compression": "A página não está compactada (usando gzip ou deflate), mas deveria estar.",
7679
"failed.performance.css_size": "A página contém arquivos CSS muito grandes (máximo de :expectedValue). Arquivos encontrados: :actualValue.",
Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
<?php
2+
3+
namespace Backstage\Seo\Checks\Meta;
4+
5+
use Backstage\Seo\Interfaces\Check;
6+
use Backstage\Seo\Traits\PerformCheck;
7+
use Backstage\Seo\Traits\Translatable;
8+
use Illuminate\Http\Client\Response;
9+
use Symfony\Component\DomCrawler\Crawler;
10+
11+
class InvalidHeadElementsCheck implements Check
12+
{
13+
use PerformCheck,
14+
Translatable;
15+
16+
public string $title = 'The page does not contain invalid HTML elements in the head section';
17+
18+
public string $description = 'The head section should not contain invalid HTML elements. According to Google\'s documentation, once Google detects an invalid element in the head, it assumes the end of the head element and stops reading any further elements. This can cause important meta tags to be missed.';
19+
20+
public string $priority = 'high';
21+
22+
public int $timeToFix = 2;
23+
24+
public int $scoreWeight = 8;
25+
26+
public bool $continueAfterFailure = true;
27+
28+
public ?string $failureReason;
29+
30+
public mixed $actualValue = null;
31+
32+
public mixed $expectedValue = null;
33+
34+
/**
35+
* Valid HTML elements that are allowed in the head section
36+
* Based on HTML5 specification and Google's documentation
37+
*/
38+
private array $validHeadElements = [
39+
'title',
40+
'base',
41+
'link',
42+
'meta',
43+
'style',
44+
'script',
45+
'noscript',
46+
'template',
47+
];
48+
49+
public function check(Response $response, Crawler $crawler): bool
50+
{
51+
if (! $this->validateContent($response)) {
52+
return false;
53+
}
54+
55+
return true;
56+
}
57+
58+
public function validateContent(Response $response): bool
59+
{
60+
// Get the raw HTML content from the response
61+
$html = $response->body();
62+
63+
// Extract the head section using regex
64+
if (preg_match('/<head[^>]*>(.*?)<\/head>/is', $html, $matches)) {
65+
$headContent = $matches[1];
66+
} else {
67+
// No head section found
68+
$this->failureReason = __('failed.meta.invalid_head_elements.no_head');
69+
$this->actualValue = 'No head section found';
70+
71+
return false;
72+
}
73+
74+
// Extract all HTML tags from the head content, but exclude tags inside template elements
75+
$headTags = [];
76+
77+
// First, remove template content to avoid detecting nested elements
78+
$headContentWithoutTemplates = preg_replace('/<template[^>]*>.*?<\/template>/is', '', $headContent);
79+
80+
// Extract tags from the cleaned content
81+
preg_match_all('/<([a-zA-Z][a-zA-Z0-9]*)[^>]*>/i', $headContentWithoutTemplates, $matches);
82+
$headTags = $matches[1];
83+
84+
if (empty($headTags)) {
85+
// No elements in head section
86+
$this->failureReason = __('failed.meta.invalid_head_elements.no_head');
87+
$this->actualValue = 'No head elements found';
88+
89+
return false;
90+
}
91+
92+
$invalidElements = [];
93+
94+
foreach ($headTags as $tagName) {
95+
$tagName = strtolower($tagName);
96+
97+
// Check if the element is valid for the head section
98+
if (! in_array($tagName, $this->validHeadElements)) {
99+
$invalidElements[] = $tagName;
100+
}
101+
}
102+
103+
if (! empty($invalidElements)) {
104+
$this->failureReason = __('failed.meta.invalid_head_elements.found', [
105+
'actualValue' => implode(', ', array_unique($invalidElements)),
106+
]);
107+
$this->actualValue = $invalidElements;
108+
109+
return false;
110+
}
111+
112+
return true;
113+
}
114+
}

tests/Checks/Configuration/NoFollowCheckTest.php

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -9,36 +9,36 @@
99
$crawler = new Crawler;
1010

1111
Http::fake([
12-
'vormkracht10.nl' => Http::response('', 200, ['X-Robots-Tag' => 'nofollow']),
12+
'backstagephp.com' => Http::response('', 200, ['X-Robots-Tag' => 'nofollow']),
1313
]);
1414

15-
$crawler->addHtmlContent(Http::get('vormkracht10.nl')->body());
15+
$crawler->addHtmlContent(Http::get('backstagephp.com')->body());
1616

17-
$this->assertFalse($check->check(Http::get('vormkracht10.nl'), $crawler));
17+
$this->assertFalse($check->check(Http::get('backstagephp.com'), $crawler));
1818
});
1919

2020
it('can perform the nofollow check with robots metatag', function () {
2121
$check = new NoFollowCheck;
2222
$crawler = new Crawler;
2323

2424
Http::fake([
25-
'vormkracht10.nl' => Http::response('<html><head><meta name="robots" content="nofollow"></head></html>', 200),
25+
'backstagephp.com' => Http::response('<html><head><meta name="robots" content="nofollow"></head></html>', 200),
2626
]);
2727

28-
$crawler->addHtmlContent(Http::get('vormkracht10.nl')->body());
28+
$crawler->addHtmlContent(Http::get('backstagephp.com')->body());
2929

30-
$this->assertFalse($check->check(Http::get('vormkracht10.nl'), $crawler));
30+
$this->assertFalse($check->check(Http::get('backstagephp.com'), $crawler));
3131
});
3232

3333
it('can perform the nofollow check with googlebot metatag', function () {
3434
$check = new NoFollowCheck;
3535
$crawler = new Crawler;
3636

3737
Http::fake([
38-
'vormkracht10.nl' => Http::response('<html><head><meta name="googlebot" content="nofollow"></head></html>', 200),
38+
'backstagephp.com' => Http::response('<html><head><meta name="googlebot" content="nofollow"></head></html>', 200),
3939
]);
4040

41-
$crawler->addHtmlContent(Http::get('vormkracht10.nl')->body());
41+
$crawler->addHtmlContent(Http::get('backstagephp.com')->body());
4242

43-
$this->assertFalse($check->check(Http::get('vormkracht10.nl'), $crawler));
43+
$this->assertFalse($check->check(Http::get('backstagephp.com'), $crawler));
4444
});

0 commit comments

Comments
 (0)