Lien de la note Hackmd
Section 1: AWS Well-Archtitected Framework
Architecture: designing and building
Architecture is the art and science of designing and building large structure.
- manage size and complexity
- identify business goals
- capabilities needing improvement
- alignement between tech deliverables of a solution and business goals
- work with delivery team
What is the AWS Well-Architected Framework ?
- A guide for designing infrastructure that are
- Secure
- High-performing
- Resilient
- Efficient
- A consistent approach to evaluating and implementing cloud architectures
- A way to provide best practices that were developed through lessons learned by reviewing customer architectures
Pillars of the AWS Well-Architected Framework
Pillar organization
Best practice area
- Identity and Access Management
Question text
- SEC 1: How do you manage credentials ad authentication ?
Question context
- Credential and authentication mechanisms include password, tokens and keys granting access directly or inderectly in your workload. Protect credentials with appropriate mechanisms to help reduce the risk of accidental or malicious use.
Best practices
- Define requirements of identity and access management
- Secure AWS account root user
- Enforce use of multi-factor authentication
- Automate enforcement of access controls
- Integrate with centralized federation provider
- Enforce password requirements
- Rotate credentials regularly
- Audit credentials periodically
Section 2: Operational Excellence Pillar
- Focus
- Run and monitor systems to deliver business value, and to continually improve supporting processes and procedures
- Key topics
- Managing and automating changes
- Responding to events
- Defining standards to successfully manage daily operations
Operational excellence design principles
- Perform operations as code
- Define entire workload (app and infra)
- Limit human error
- Consistent responses to event
- Annotate documentation
- automating the creation of annoted doc
- input to your operations as code
- Make frequent, small, reversible changes
- components updated regularly
- Refine operations procedures frequently
- opportunities to improve procedures
- Anticipate failure
- potential sources of failure
- Learn from all operational events failures
- share what is learned
Operational excellence questions
- Prepare
- How do your determine what your priorties are ?
- How do you design your workload so that you can understand its state ?
- How do you reduce defects, ease, remediation and improve flow into production ?
- How do you mitigate deployement risks ?
- How do you know that you are ready to support a workload ?
- Operate
- How do you understand the health of workload ?
- How do you manage workload and operation events ?
- Evolve
- How do you evolve operations ?
Section 3: Security Pillar
- Focus
- Protecte info, systemes, and assets while delivering business value through risk assessments and mitigation strategies
- Key topics
- Identifying and managing who can do what
- Establishing controls to detect security events
- Protecting systems and services
- Protecting confientiality and integrity of data
Security design principles
- Implement a strong identity foundation
- principle of least privileges
- separation of duties
- Enable traceability
- monitor, alert and audit actions
- integrate logs and metrics to automatically respond and take action
- Apply security to all layers
- defense-in-depth
- security controls to all layers of your architecture
- Automate security best practices
- improve ability to securely scale more rapidly and cost effectively
- Protect data in transit and at rest
- classify data into sensitivity levels
- use mechanisms such as encryption, tokenization and access control
- Keep people away from data
- reduce risk of loss or modif of sensitive data due to human error
- create tools to reduce manual processing of data
- Prepare for security events
- incident management process
Security questions
- Identity and access management
- How do you manage credentials and authentication ?
- How od you control human access ?
- Ho do you control programmatic access ?
- Detective controls
- How do you detect and investigate security events ?
- How do you defend against emerging security threats ?
- Infrastructure protection
- How do you protect your networks ?
- How do you protect your compute resources ?
- Data protection
- How do you classify your data ?
- How do you protect your data at rest ?
- How do you protect your data in transit ?
- Incident response
- How od you respond to an incident ?
Section 4: Reliabality Pillar
- Focus
- Prevent and quickly recover from failures to meet business and customer demand
- Key topics
- Setting up
- Cross-project requirements
- Recovery planning
- Handling change
Reliability design principles
- Test recovery procedures
- test how you system fails
- validate recovery procedures
- expore failure pathways
- Automatically recover from failure
- monitor systems for key performance indicator
- configure system to trigger an automated recovery when a threshold is breached
- Scale horizontally to increase aggregate system availability
- replace one large resources with multiple smaller one
- reduce impact of a single point of failure
- Stop guessing capacity
- monitor demand and system usage
- automate the addition or removal of resources to maintain the optimal level
- Manage change in automation
- use automation to make changes to infra
- manage changes to automation
Reliability question
- Foundations
- How do you manage service limits ?
- How do you manage your network topology ?
- Change management
- How does your system adapt to changes in demand ?
- How do you monitor your resources ?
- How do you implement change ?
- Failure management
- How do you back up data ?
- How does your system withstand component failure ?
- How do you test resilience ?
- How do you plan for disaster recovery ?
Section 5: Performance Efficiency Pillar
- Focus
- Use IT and computing resources efficiently to meet system requirements and to maintain that efficiency as demand changes and technologies evolve
- Key topics
- Selecing thr right resource types and sizes based on workload requirements
- Monitoring performances
- Making informed decisions to maintain efficiency as business needs evolve
Performance efficiency design principles
- Democratize advanced technologies
- consume tech as a service
- focus on product dev
- Go global in minutes
- deploy systems in multiple AZ
- Use serverless architectures
- remove operational burden of maintaining servers
- reduce costs
- Experiment more often
- comparative testing
- Have mechanical sympathy
- use tech approach aligning best with what you want to achieve
Performance efficiency question
- Selection
- How do you select the best performing architecture ?
- How do you select your compute solution ?
- How do you select your storage solution ?
- How do you select your db solution ?
- How do you select your networking solution ?
- Review
- How do you evolve your workload to rake advantageof new releases ?
- Monitoring
- How do your monitor your resources to ensure they are performing as expected ?
- Tradeoffs
- How do you use tradeoffs to improve performance ?
Section 6: Cost Optimization Pillar
- Focus
- Run systems to deliver business value at the lowest price point
- Key topics
- Understanding and controlling when money is being spent
- Selecting the most appropriate and right number of resource types
- Analyzing spending over time
- Scaling to meeting business needs without overspending
Cost optimization design principles
- Adopt a consumption mode
- pay only for the computing resources required
- Measure overall efficiency
- measure business output
- cost associated
- measure the gain you are making
- Stop spending money on data centers operations
- AWS does the heavy-lifting
- Analyze and attribute expenditure
- accuratly identify system usage and cost
- Use managed and application-level services to reduce cost of ownership
- reduce operational burden of maintaining servers for tasks
- lower cost per transaction
Cost optimization questions
- Expenditure awareness
- How do you govern usage ?
- How do you monitor usage and cost ?
- How od you decommission resources ?
- Cost-effective resources
- How do you evaluate cost when you select services ?
- How do you meet cost target when you select resource type and size ?
- How do you you use pricing models to reduce cost ?
- How do you plan for data transfer changes ?
- Matching supply and demand
- How do you match supply of resources with demand ?
- Optimizing over time
- How do you evaluate new services ?
Section 7: Reliability and Availability
“Everything fails, all the time” - Werner Vogels, CTO, Amazon.com
Reliability
- A measure of your system’s ability to provide functionality when desired by the user
- System includes: hardware, software, firmware
- Probability that your entire system will function as intended for a specified period
- Mean time between failures (MTBF) = total time in serviceumber of failures
Understanding reliability metrics
Availability
- Normal operation time / total time
- A percentage of uptime (ex: 99.9%) over time (ex: 1y)
- Number of 9s - 5 9s means 99.999% availability
High availability
- System can withstand some measure of degradation while still remaining available
- Downtime is minimized
- Minimal human intervention is required
Availability tiers
Factors that influenc availability
- Fault tolerance
- The built-in redundancy of an app compononents and its ability to remain operational
- Scalability
- The ability of an app to accomodate increases in capacity needs without changing design
- Recoverability
- The process, policies, and procedures that are related to restoring service after a catastrophic event
Section 8: AWS Trusted Advisor
- Online tool that provides real-time guidnace to help you provision your resources following AWS best practices
- Looks at your entire AWS enivronment and gives you a real time recommendations in
- Cost Optimization
- eliminating unused resources
- commitment to reserve capacity
- Performance
- checking service limit
- service throughput
- Security
- improve security of the app by identifying gaps
- permissions
- increase availability and redundancy
- Fault Tolerance
- service usage more than 80% of the service limit
- values based on snapshot
- Cost Optimization
Wrap-up
A SysOps engineer working at a company wants to protect their data in transit and at rest. What service could they use to protect their data ?
- Elastic Load Balancibg
- Amazon Elastic Block Stor (Amazon EBS)
- Amazon Simple Storage Service (Amazon S3)
- All of the above
Answer
Keywords:
- protect their data in transit and at rest
Answer: 4.